One of the big data points missing from this article is the price. Unless you need features specific to the high end cards such as unlocked 64bit or 16bit performance or antialiased lines, the consumer cards have much higher performancep dollar$ [1]. It would be really interesting if they compared their 8 K80s ~(8 * ($4K for 8TFLOPS)) against a set of GTX 1080s ~($650 for 8TFLOPS)
Thanks for the info. The tl;dr is "It's fantastic to see that I've been able to use a machine that costs 1/10th of the one used in the 8 x Tesla K80s benchmark but still have queries running within 33% of the previous performances witnessed."
However, I'm suspicious of the numbers in those articles since the author lists only 4 data points in each trial and doesn't mention the stdev in his measurements. One of his measurements was .964 vs .891 so it looks like the Titan Xs were 90% as fast as K80s if the numbers can be trusted.
It's the memory size here that's likely important, 8 GTX 1080s have a third of the total global memory size as 8 K80s. And the aggregate memory bandwidth is way more important than flops, since almost certainly this workload isn't bound by arithmetic throughput.
One more thing to note is that gamer cards aren't meant to be abused like workstation cards are. You can leave a K80 running at full blast for a week doing stuff, but the same workload will significantly reduce the lifespan of a high-end gamer video card. They're meant for gaming sessions that last a few hours with some outliers going for maybe a full day, but not much more than that.
If you really have a tight budget and need to use gamer cards as workstation cards (e.g. a two-person startup that needs to crunch things on 4 GPUs), find yourself some aftermarket coolers, preferably liquid cooling.
There's probably a good reason they are using server hardware. But sure, just like you could slap consumer CPUs into a server for a cheaper unit cost, you could use consumer GPUs.
For Graphistry's GPU platform, we suggest our users to go with server-grade GPUs because they get (a) more memory and (b) great multitenancy. So using MapD as a personal system is an expensive use of resources, but when a system is architected and billed as an elastic, multitenant system, total cost of ownership for a team is less. Not all platforms are built for this (and I don't know enough about MapD vs. other GPU databases), but that's the engineering view.
And mini-disclaimer: Graphistry is a related platform focused on scaling & automating investigations. Part of that is a GPU compute stack the we started building around the same time as MapD, though we're not in the database business. E.g., our customers will generally use us to look across multiple other systems that already feature high-availability, long-term storage, and scaleout querying for TB+ storage. As some examples: SQL engines, Spark, Splunk, Datastax, and various graph databases.
Why are anti-aliased lines specifically a high-end feature? I thought that anti-aliasing was done by over-sampling and then down-sampling, so all drawing primitives would work with it uniformly.
The question, though, is what are you oversampling? Depending on how the line gets rasterized, supersampling (or multisampling) may or may not help you at all.
High quality anti-aliased lines aren't a bottleneck in video games, but they're very important in CAD applications, so enterprise customers paid a large premium for Quadro cards with CAD-specific functionality. The specific list of premium vs consumer features has varied over time, but previous generations of consumer cards could have their professional features unlocked via modded drivers.
> we use the GPU to render the image, compress it to a .png (about 100KB) and send it to the browser as a tile. This allows for lightning fast rendering and the perception by the user that all of this data is actually in their browser.
With the enormous caveat that you need to have a low latency to their server to get this illusion of client-side rendering. Considering that they only have this cluster of K80's close to them, geographically, and not a number of clusters spread out globally, this isn't a usable example in much of the world.
Now, I don't expect them to roll out K80 clusters world-wide just for the sake of a demo, but it's still pretty important.
I'm in Eastern Europe and it loads up in like half a second. Much faster than I'd expect the browser to process queries on a 1.2bln dataset and without taking up untold gigabytes of memory.
This is 77M rows, not the full 1.2B dataset shown in the MapD demo (with 60 variables). It also looks like he map is pre-rendered as opposed to being dynamically rendered with filters applied.
d3/dc.js/crossfilter to be precise. having been working on something similar, I found dc.js to be redundant if you already use redux. it's much cleaner to use a lighter weight charting library with cross filter.
Look at the coloring around the rides near bridges. People take the subway down to the closest point and then take a cab home. The hybrid trip is both pocketbook friendly and probably faster.
That's an interesting slide, but without knowledge of the size of the dataset it could be misleading (especially considering communication costs between nodes in a cluster).
Hi infinite8s, to get additional information on how that chart was made, you can to go https://www.mapd.com/product/ scroll down to the bar chart, and click “See Details” under the chart. Shows the machines used, queries, and the source data set and size. Note that the machine configurations used to generate the chart were normalized for equivalent cost on AWS, i.e. the chart is hardware-dollar normalized.
Very impressive technology, but is there an open source version? Even a limited one? That one can try on something more modest than 100 grand's worth of pro GPUs?
thanks, but at 5 bucks an hour for an entry-level instance (single 12GB GPU) I'm looking at 120 bucks a day if I don't want to constantly re-upload my dataset into MapD (a very slow operation judging by Mark Litwintschik's posts linked by you). That's a very very high price for such a modest hardware configuration, not to mention the more credible one which goes for an eye-watering 30 bucks an hour ie not much change from a grand a day. Not for us startup folk, clearly.
I have to say it seems your pricing for such a new entrant and before having built share, is bound to attract very stiff newcomer competition. "Interesting" business model.
MapD has a persistent store and normally customers would keep that on an EBS volume, so they don't have to reload their data every time they spin up an AWS instance.
[1] https://www.youtube.com/watch?v=LC_sx6A5Wko & http://www.videocardbenchmark.net/gpu.php?gpu=Tesla+C2050