Hacker News new | past | comments | ask | show | jobs | submit login

> > Common Lisp also lets you use mmap

> But not without allocating dynamic memory and copying data.

Sure it does. In SBCL you can force a stack allocation (though rarely does it improve performance), and very short-lived values do not leave registers in any case.

> > They clearly wanted automatic memory management

> Most likely because of some misconceptions.

There are both good and bad reasons to want automatic memory management. At least one good reason it it would decrease the porting effort by keeping the code similar.

> > so the C++ implementation is reasonable.

> How so?

Using reference counting is a reasonable way to get automatic memory management in C++

> > but I suspect the results would have been similar

> Don't forget the data sets to be filtered, sorted an analyzed are up to 200 GB.

Which is going to be rough on any automatic memory management system, which makes using a language with a better ecosystem of automatic memory management more performant.




> > Don't forget the data sets to be filtered, sorted an analyzed are up to 200 GB.

> Which is going to be rough on any automatic memory management system

You could equally say that it makes the case for actually designing memory allocation strategy (which only C++ really supports) that much more important.

You always see this in Java programs for large data analysis. They pick java because of memory management and the tooling. But it's just SO slow and after optimisation the only thing that stubbornly remains up there in the profiler data is memory and GC. And what do they do?

A global object of the following form:

  class DataStore {
    float theFloatsWeNeed[constHowMany];
    int theIntsWeNeed[anotherConst];
  }
You get the idea. Because this avoids memory allocation in java. And you use the flyweight pattern to pass data around. Or you fake pointer arithmetic in java. You create your own pointers by specifying indexes and you oh the horror use math on those indexes. Even then just checking those indexes actually becomes a significant time sink (and then you disable that, which of course kills memory safety in java, but you won't care).

The truth is you don't want memory management for large amounts of data. You don't want to allocate it, track it or deallocate it at all. You leave it in it's on-disk data format and never serialize/deserialize it at all. You want to mmap it into your program, operate on it and then just close the mmap when you're done. C++ definitely has the best tools for this way of working.


Yeah, right. The people who know this can apparently be counted on one hand when I look at the advertised publication and the discussion in this forum.


Common Lisp: It is permissible for an implementation to simply ignore such declarations. And you still have to copy.

Ref counting: only makes sense in a few special cases.

Avoiding dynamic memory management: have a look at mmap.


C++: It is permissible for an implementation to allocate every local variable on the heap.

Going back to my original point, you are suggesting a complete rearchitecture of their allocation system. That does not require switching languages to C++. If we are talking about working with 100s of GB of data, that's probably even the correct approach!

TFA does not, however, claim that they have a working set of 100s of GB of data. The data is 100s of GB at rest, but can be processed in chunks with a single pass. That, by itself, does not scream "mmap" to me. On top of that, the data is compressed at rest, so copying is inevitable.


So we are glad that we can use the data of the memory mapped file directly and do not have to allocate or copy anything. But of course you may solve the problem badly if you prefer so; after all, there are even "scientific" publications that do it that way, as the example shows.


It's permissible for a C compiler to emit shell scripts.




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: