Hacker News new | past | comments | ask | show | jobs | submit login
List of review articles on ML and AI that are on arXiv (freenode-machinelearning.github.io)
101 points by painful on Jan 26, 2019 | hide | past | favorite | 20 comments



This is at once an awesome and overwhelming list. Major kudos to whoever took the time to put it together. I wonder if there’s a way to tag these or group these into categories so that they’d be easier to bite into.


"Arxiv Sanity Preserver: Because things are seriously getting out of hand" --Andrej Karpathy

:)


An experimental RSS feed for it is now available: https://us-east1-ml-feeds.cloudfunctions.net/arxiv-ml-review...

It's configured to be updated once a day.


I agree. Though auto-generated, that someone chose to do this and share their curation with the world is wonderful.


Categorization would definitely be good to have, but it requires the use of good quality ML to discern them accurately :) Meanwhile, please use Ctrl+F.

A higher priority is to serve a RSS feed for the results.


RSS feed would be great! Even better would be to pull the paper’s content out into the content body of the feed, so I could read it directly in an RSS reader. Probably no easy task, but I can dream.


An experimental RSS feed for it is now available. Please find the link to it at the new project repo: https://github.com/ml-feeds/arxiv-ml-reviews


The feed is in the works, but its content body will only contain the abstract, not the full paper.


it's auto-generated - it says so at the top. i don't see the point of this at all since you could reproduce by simply searching "survey" or "introduction". at minimum a cite count would've been helpful to distinguish well written ones from poorly written ones


The list of terms to include/exclude looks like it has taken some trial and error to compile: https://github.com/impredicative/arxiv-ml-reviews/blob/maste...

For example, it excludes "aerial survey", "peer review" and similar false positives.


It'd be easy to split them by year or subject, as that's provided by the arXiv.


Just how is the subject provided by arXiv? By subject, do you mean the categorization such as stat.ML, cs.AI, etc.?


yea sure why not? like just a tiny bit more curation would've made this actually useful. as it stands now it's just an unordered list.


It's currently ordered by date, with most recent on top. It's not unordered.


I written a service recently which predict articles future citation from arXiv, IEEE as rank which would save time from avoiding reading all the articles. It's still a work in progress, especially the keyword filtering part. link: https://www.notify.institute/


As Panoramix queried, could you introduce more details to your project? From your website, I only got that you are analyzing paper based on the author info and the citation history of his previous works and filtering papers by some tech like topic modeling. Though the project is still in progress, will you share info about what you have done and what you are about to do?


My model is based on these papers [1,2,3]. I found that adding the paper meta info such as table count, page count improves my model performance ( R^2 score of future citation of 2 years later ). For now, I am working on better filtering method using word embedding, such that a keyword "CNN" would also include papers about convolutional neural network.

1. Xiao, Shuai et al. “On Modeling and Predicting Individual Paper Citation Count over Time.” IJCAI (2016).

2. Dong, Yuxiao et al. “Can Scientific Impact Be Predicted?” IEEE Transactions on Big Data 2 (2016): 18-30.

3. Yan, Rui et al. “Citation count prediction: learning to estimate future citations for literature.” CIKM (2011).


How does it work? Do you take into account the authors' previous success? You have a typo at the top of your page btw.


1) It predicts the citation count 2 years later using a mix of features from the articles, author and venue it was published.

2) I guess H-Index and previous citation count stats (mean, max, min). But I find the most influential factors are the author's H-Index, publish venue, author rank.

3) Thanks, I just fixed it.


Someone needs to work on the deep learning problem of automatically curating these things and surfacing the important ones.

I suspect it's harder than the self driving car problem.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: