Hacker News new | past | comments | ask | show | jobs | submit | gijzelaerr's comments login

And MonetDB took some inspiration back from DuckDB to develop an embedded version also:

https://github.com/MonetDBSolutions/MonetDBe-Python/

Disclaimer: i'm working on this.


Could you elaborate on some of the similarities and differences between DuckDB and MonetDB?


Disclaimer: i am working on MonetDB and I established the CWI database architectures group

DuckDB is designed as an experimental system after heavy exposure to the technniques deployed in MonetDB (open-source), Hyper, and Vectorwise.

The properties of the embedded version of MonetDB can be found here https://monetdbe.readthedocs.io/en/latest/introduction.html#

Some difference between MonetDB and DuckDB can be found here https://monetdbe.readthedocs.io/en/latest/migrations.html

and the blogpost mentioned above is covered in https://twitter.com/MonetDB/status/1282412295235280901?s=20


Last year I worked with a start-up called Scentronix, which makes customized perfumes. The company made a device which can quickly blend fragrances on the spot based on 200 most commonly used ingredients in the business. Apparently, you can make most of the available perfumes with these base ingredients. The recipes are a bunch of known well-received recipes, and it will also blend new random recipes (with certain constraints like don't add too much of an overwhelming base scent). They sample feedback from people and train a collaborative filtering classifier. It is a hard problem, and the data is very noisy, but we did start to get better-than-random results at some point. https://scentronix.com/


Interesting. I've been working on something related, a non intrusive docker 'extension', kliko. Kliko is a specification to formalize file based (no network yet) input/output flow for containers. It makes to possible to have a generic API for containers, so you can automatically create user interfaces for an application or chain them together in a pipeline.

https://github.com/gijzelaerr/kliko

Here is an example kliko file for a container defining the input and output:

https://github.com/kernsuite-docker/lwimager/blob/master/kli...


So one process writes to a flat file, then another process reads the flat file? Wouldn't named pipes be more useful here?

Also it feels like you are just making a seralization format on top of Stdin/Stdout.


There is doc available on the website, but I'm not sure if I explain the concept properly.

The assumption is that you have containers that operate on input files and generates output files. The behavior of the container depends on the given parameters which are defined in the kliko file. The user (or runner) will supply these parameters at runtime.

To illustrate, we use it for creating pipelines in radio astronomy where we operate on datasets of gigabyes or bigger. most of these tools are file based, they read files in and write files out. It is all quite complex and old software, so Docker is ideal for encapsulating this complexity. A scientist can easily recombine the various containers and play with the arguments. By the split of input/output the container effectively become 'functional', no side effects and the results can be cached if the parameters are the same. The intermediate temporary volumes can be memory based to speed things up. We use stdout for logging.


If I'm understanding correctly that sounds similar to Pachyderm (http://www.pachyderm.io)


Pachyderm looks quite cool, but I think it's lacking a quick-start and a way of running things locally in a simple way.

I grabbed the repo and clicked through a few links in the docs and hit a 404. I searched on google and found a link to a way of running it just locally simply but that doesn't work with the new version. Then I followed the instructions and hit a problem installing something to do with k8 about mapped paths and the fix printed in the console doesn't work.

I understand that this is a personal complaint and others might not care at all about having it setup locally because it solves the big problems so well but I just want to try it at least locally.


This would likely benefit other scientific fields -- bioinformatics for example has the same sort of software tooling.


SQLAlchemy dialect maintainer here. I'm happy to hear somebody is using it, I've been working on it for a while but in the last 2 years I only received one bug report. And I don't think that is because the code is flawless.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: