- They created an entirely new dataset of ~5000 molecules, all hand-labeled by perfume experts.
- They held a competition (presumably Kaggle or a similar platform) to classify this dataset, and used the results as a strong baseline.
- Their GNNs get comparable (slightly better, but not statistically significant) results than the winning random forest model of the competition.
The embeddings show promise, but I'm curious why they omitted a simple "fully connected layer" on the Morgan bit descriptors as a baseline classifier. Seems like that would outperform the random forest.
- They created an entirely new dataset of ~5000 molecules, all hand-labeled by perfume experts.
- They held a competition (presumably Kaggle or a similar platform) to classify this dataset, and used the results as a strong baseline.
- Their GNNs get comparable (slightly better, but not statistically significant) results than the winning random forest model of the competition.
The embeddings show promise, but I'm curious why they omitted a simple "fully connected layer" on the Morgan bit descriptors as a baseline classifier. Seems like that would outperform the random forest.