Hacker News new | past | comments | ask | show | jobs | submit login
Show HN: Functions matter – an alternative to SQL and map-reduce data processing (github.com/asavinov)
6 points by asavinov on May 10, 2021 | hide | past | favorite | 2 comments



The main motivation is that the conventional approaches to data processing are based on manipulating mathematical sets for all kinds of use cases: we produce a new set if we want to calculate a new attribute, we produce a new set if want to match data from different tables, we get a new set if we aggregate data. Yet, we actually do not need to produce new sets (table, collections etc.) in many cases - it is enough to add a new column to an existing set. Here are more details about the motivation:

https://prosto.readthedocs.io/en/latest/text/why.html

Column is an implementation of a function (similarly to how table is an implementations of a set). Theoretically, this approach leads to a data model based on two core elements: mathematical functions (new) and mathematical sets (old).

This approach was implemented in Prosto which is a data processing toolkit radically changing how data is processed by heavily relying on functions and operations with functions - an alternative to map-reduce and join-groupby.


Here is another project based on the same idea of processing data using functions:

https://github.com/asavinov/lambdo - Feature engineering and machine learning: together at last!

Yet, here the focus is on feature engineering and rethinking how it can be combined with traditional ML. Essentially, the point is that there no big differences and it is more natural and simpler to think of them as special cases of the same concept: features can be learned and ML models are frequently are used for producing intermediate results.




Consider applying for YC's W25 batch! Applications are open till Nov 12.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: