Hacker News new | past | comments | ask | show | jobs | submit login

I'm working on a dataframe for Java for large datasets: https://github.com/lwhite1/tablesaw.

It's not ready for prime time, but:

Time to sum 1,000,000,000 floats: 1.5 seconds

Time to sort 1,000,000,000 floats: 30.5 seconds

The code to fill a column and sort it:

    FloatColumn fc = new FloatColumn("test", 1_000_000_000);
    for (int i = 0; i < 1_000_000_000; i++) {
      fc.add((float) Math.random());
    }
    fc.sortAscending();



Neither as fast nor out-of-core but, for fun, parallel sort of 2^30 floats (i5-6260U, two cores running at 2.7GHz):

	  $ go run billion.go
	  1m32.471498831s
Where billion.go goes:

        package main
        
        import (
		"fmt"
		"math/rand"
		"time"

		"github.com/twotwotwo/sorts/sortutil"
	)

	func main() {
		floats := make([]float64, 1<<30)
		for i := range floats {
		        floats[i] = rand.Float64()
		}

		t := time.Now()
		sortutil.Float64s(floats)
		fmt.Println(time.Now().Sub(t))
	}
Out-of-core stuff's cool, and sometimes seems a shame it's not available directly (rather than by exporting data to some other program) and widely used in more programming environments.

Glad original poster's experimenting and got something about external sorting up here.


This is cool, I'm kinda fascinated by golang.

Tablesaw's not out of core either: too much of a pain (for me) to work with variable width columns like text that way.


Impressive, what's the speed like when you add a few bytes of payload?


Good question. I'll have to test it later today if i have some time




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: