Hacker News new | past | comments | ask | show | jobs | submit login

Summary:

- They created an entirely new dataset of ~5000 molecules, all hand-labeled by perfume experts.

- They held a competition (presumably Kaggle or a similar platform) to classify this dataset, and used the results as a strong baseline.

- Their GNNs get comparable (slightly better, but not statistically significant) results than the winning random forest model of the competition.

The embeddings show promise, but I'm curious why they omitted a simple "fully connected layer" on the Morgan bit descriptors as a baseline classifier. Seems like that would outperform the random forest.




The baseline isn't that strong tbh, it's from a very small competition from a long time ago. https://www.synapse.org/#!Synapse:syn2811262/wiki/78388


So how well does the network perform on unseen molecules? Does olfactory data allow extrapolation?




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: