Hacker News new | past | comments | ask | show | jobs | submit | lowsenberg's comments login

Location: Hanover, Germany

Remote: Yes

Willing to relocate: No

Technologies: Python, C/C++, databases, Django, HPC (MPI, threads), web technologies, gRPC, Docker, comp sci (numpy/scipy/HDF5/FFTW/....)

Résumé/CV: upon request

Email: hn@jomx.net

I am a theoretical Physicist (PhD, semiconductor research) and software developer. I have broad experience in scientific programming, web and application development with Python and the usual front-end stuff, and I've worked a lot with data storage and processing. At the moment I work for a large German car maker in the R&D department, but I'd like to do something more meaningful. My ideal job combines math/natural sciences and professional software development, with enough flexibility to spend time with my family.


I'm administrating a large HPC infrastructure at may day-to-day work and often need to check something on many or all of the nodes. Compare directories, files, system settings, or the like. As we have a total of around ~7000 nodes at different geographical sites, all other tools were unsatisfyingly slow when I want to run a command on all of them.

cash (with warm caches) takes less than 20 seconds.


I agree, I work part time (30hrs) at a company in Germany where "full time" is 35 hours/week, which more and more companies adopt here.

It's great. I'm done early in the afternoon and there is plenty of time left to spend with my family.


> The front-end varnish box for instance is doing the square root of zero most of the time.

Nicely written. :)


I have a PhD in physics and worked as a post-doc for a few years, until I left for industry a couple of weeks ago. The last project I was busy with is developing a massively parallelized image simulation code for scanning transmission electron microscopy. It is open sourced here: www.stemsalabim.de

My new job in industry is consulting about HPC systems in the context of computer aided engineering.


Your quotation marks seem like you disagree. There are other reasons not to do screening, for example not be faced with the decision to abort or not to abort. Here in Germany, at least in my circles, people tend to avoid such screenings not because of religiousness.


Very neat, especially useful for peaking into (SQLite) databases and getting a quick overview of the contents. Thank you for your efforts!


This looks very interesting. Currently we are storing our dense simulation (and experimental) data in NetCDF/HDF5. Given correct chunking, this seems to be pretty efficient both performance and compression wise. What would we gain using TileDB? How does performance compare with HDF5?


Stavros from TileDB, Inc. here: HDF5 is a great software and TileDB was heavily inspired by it. HDF5 probably works great for your use case. TileDB matches the HDF5 performance in the dense case, but in addition it addresses some important limitations of HDF5, which may or may not be relevant to your use case. These include: sparse array support (not relevant to you), multiple readers multiple writers through thread- and process-safety (HDF5 does not have full thread-safety, whereas also it does not support parallel writes with compression - I am assuming you are using MPI and a single writer though, so still HDF5 should work well for you), efficient writes in a log-structured manner that enables multi-versioning and fault tolerance (HDF5 may suffer from file corruption upon error and file fragmentation - you are probably not updating, so still not very relevant to you). Having said that and echoing Jake's comment, we would love to hear from you how TileDB could be adapted to serve your case better.

A general comment: TileDB’s vision goes beyond that of the HDF5 (or any scientific) format. Considering though the quantities of HDF5 data out there (and the fact that we like the software), we are thinking about building some integration with HDF5 (and NetCDF). For instance, you may be able to create a TileDB array by “pointing” to an HDF5 dataset, without unnecessarily ingesting the HDF5 files but still enjoying the TileDB API and extra features.


Jake from TileDB, Inc. Performance wise I would look at the referenced paper in this thread which provides benchmarks for various workloads. As to what advantages TileDB may offer you that is problem dependent, esp. compared to dense simulation output data which is the use case HDF5 was designed for. If you have specific suggestions for ways to improve HDF5 for your use case we would love to hear about them.


I don't know how often I type 'subl3 -n' every day. One of my all-time favorite tools. I'm happy to pay for the upgrade.


Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: