FloatColumn fc = new FloatColumn("test", 1_000_000_000);
for (int i = 0; i < 1_000_000_000; i++) {
fc.add((float) Math.random());
}
fc.sortAscending();
Neither as fast nor out-of-core but, for fun, parallel sort of 2^30 floats (i5-6260U, two cores running at 2.7GHz):
$ go run billion.go
1m32.471498831s
Where billion.go goes:
package main
import (
"fmt"
"math/rand"
"time"
"github.com/twotwotwo/sorts/sortutil"
)
func main() {
floats := make([]float64, 1<<30)
for i := range floats {
floats[i] = rand.Float64()
}
t := time.Now()
sortutil.Float64s(floats)
fmt.Println(time.Now().Sub(t))
}
Out-of-core stuff's cool, and sometimes seems a shame it's not available directly (rather than by exporting data to some other program) and widely used in more programming environments.
Glad original poster's experimenting and got something about external sorting up here.
As part of Spark 2.0, we are introducing some new neat optimizations to make a general engine as efficient as specialized code.
I just tried on Spark master branch (i.e. the work-in-progress code for Spark 2.0). It takes about 1.5 secs to sum up 1 billion 64-bit integers using a single thread, and about 1 secs using 2 threads. This was done on my laptop (Early 2015 Macbook Pro 13, 3.1GHz Intel Core i7).
We haven't optimized integer sorting yet, so that's probably not going to be super fast, but the aggregation performance has been pretty good.
scala> val start = System.nanoTime
start: Long = 56832659265590
scala> sqlContext.range(0, 1000L * 1000 * 1000, 1, 2).count()
res8: Long = 1000000000
scala> val end = System.nanoTime
end: Long = 56833605100948
scala> (end - start) / 1000 / 1000
res9: Long = 945
Part of the time are actually spent analyzing the query plan, optimizing it, and generating bytecode for it. If we run this on 10 billion integers, the time is about 5 secs.
I was initially skeptical of the 2.5s for the SUM base case but after a bit of experimenting I concluded the following. Note this is for the SUM base case only. The whole file is loaded into memory for my tests.
MMAP'd from HDD 1st run - 47s
MMAP'd from HDD 2nd run - 1.35s (OS caches pages in memory)
MMAP'd from NVMe 1st run - 6.5s (OS Caches dropped)
in a purely functional language like Haskell, you can sort a billion numbers in nanoseconds, all with no performance crushing side effects. Yes, it's true, that's the kind of results you get with lazy evaluation!
When you start searching the resultant sorted list, it might be somewhat slower than if you sorted using other techniques, but--silver lining--search times will only improve after that!
I'd give you the actual stats but so far I've only lazy evaluated them.
No offence, but that's absurd. If anyone said their database can do things almost instantly because they can write a view which is defined super quick vs materialising or running an actual query they'd be, justifiably and hopefully, laughed out of the room.
But that's pretty much what you've done/contributed.
Of course lazy evaluation is faster if it's so lazy it doesn't actually evaluate anything/does something completely different to the original problem.
Finding sorted(array)[k] is possible in linear (even worst case!) time. Using that n times would as you note take n^2 time. But we could employ a trick. Once we have been asked for log n elements, and have thus already spent n log n time, we could then sort the array.
It should stay n log n. A lazy quicksort can get the min element of a list super fast by only working on the partitions relevant to finding the min. As it requires more elements of the list, it can finish sorting the other partitions. It's the exact same algo as usual.
I pointed it out to be lighthearted, but it's true not sarcasm, and not only true, it's also meaningfully instructive to think about the benefits of lazy evaluation.
For example, a priority queue (list of tasks) does not need to be fully sorted, only sorted to the extent that the highest priorty item is quickly determinable, and there are algorithms and data structures designed for this behavior.
Sorting (other than to print a sorted list) doesn't pay for itself till you do a number of searches, and associative memory hashes are frequently better if you simply wish to find exact matches again. Even the lowly bubble sort has the not-insignificant benefit of finding the first value in O(n) time which may be the behavior you need: lazy evaluation can be the exact right way to go if you are instructed to sort, as you await more info as to what the appropriate technique might be.
I think humor is only funny when it's based on truthiness.
What matters when sorting a billion numbers is wallclock runtime from the start of the sort to the end of materialization (see http://sortbenchmark.org/) or iteration. And it means, almost always, materializing or iterating the entire result set.
it's not clear to me any haskell programs have won any real awards in sorting modest amounts of data, since you have to visit all the data and compare nlogn times.
Yes, they are being sarcastic. The list has to be fully materialized and iterated over. Further, the Haskell version probably doesn't have a concept of doing work in memory-sized batches with spill-to-disk, so the runtime would likely be much higher. Of course, I don't know enough Haskell so it's entirely possible they somehow figured this how to iterate over the input data in sorted order efficiently without materializing it.
1 billion numbers (let's say DWORDs) doesn't even use full 4GB worth of space. Load it up into memory and sort. With QWORDS it grows to about 8GB. A modern laptop still can load twice the amount and sort everything in memory. That is not a large dataset.
It's not ready for prime time, but:
Time to sum 1,000,000,000 floats: 1.5 seconds
Time to sort 1,000,000,000 floats: 30.5 seconds
The code to fill a column and sort it: