Hacker News new | past | comments | ask | show | jobs | submit login
A Reboot of the Legendary Physics Site ArXiv Could Shape Open Science (wired.com)
174 points by tonybeltramelli on May 12, 2016 | hide | past | favorite | 70 comments



I currently really like the site as-is. Useful links to useful PDFs :) The article says it's a physics site, but I use it more for Mathematics and Computer Science.

The HTML is small and loads fast and automatically works on mobile because it's just simple HTML.

I hope the reboot won't mean huge bloated HTML+JS, and then a "mobile" version that has less features than what it has right now.

P.S. the wired article talks about arxiv.org but doesn't have a single clickable link that goes to it??!!


As the W3C core styles from the 2000s and even earlier can attest to, good page styling is timeless, and ArXiv can very likely improve its design with no extra bloat. Some like ultramarine and "modernist" really don't hold up, but others I can easily imagine on a well-designed page in 2016.

https://www.w3.org/StyleSheets/Core/preview.html


Exactly, not a single link to the site. Had to google and find: Here it is just in case: https://arxiv.org/


Welcome to modern web articles, where every link points back to more articles of the same website. For example:

> As one of the first open access science sites, this redesign may influence the paths of its younger siblings, like biologists’ bioRxiv whether they follow the same route or rebel against it.

There's a link in the article.. which links to a Wired article on bioRxiv[0], but not to bioRxiv[1] itself. And before you ask: no, the article on bioRxiv does not link to bioRxiv.

[0] http://www.wired.com/2016/02/the-rainbow-unicorn-trying-real...

[1] http://biorxiv.org/


Can't let that precious link juice spill out to other websites! Don't you remember link juice? Bullshit SEO tactics are killing the web.


Agreed, the juice stinginess is lame.

They could've at least added a useful outbound link with rel="nofollow" to be helpful while avoiding leaking any of their precious juice.


My favorite is pop science reporting never linking to the press release for the publication that it's pulling from, much less to the publication itself. Often the title of the article isn't even mentioned.


Interesting to see their approach and the reasons why they are not building more features on top of ArXiv. Although comments on papers might be a dangerous area to venture into there are definitely places on the web where the ability to annotate and comment papers is helping science move forward. A good example is the Polymath project and Terry Tao's blog. Tao's recent solution to the Erdos discrepancy problem,an 80-year-old number theory problem, was actually triggered by a comment on his blog. Another example is www.fermatslibrary.com. Although the papers in the platform are more historical/foundational, they were able to get consistently good/constructive comments that help people understand papers better.


Would you elaborate on why you think comments on papers might be dangerous? For me it seems like the most natural step forward, given where we are today.

As I see it, it would solve many issues. Ideally, we would move away from a publication-count metric, and more onto a reputation-based metric. It would lower the bar for participation in scientific discussion, make reproducibility more important, and generally be a healthy thing for science as a whole, I think. But I want to hear opposing viewpoints!


I don't think comments would be useful. To make a proper comment on a paper requires a significant amount of work (as does refereeing). Lowering the barrier to entry would produce some first impressions, the usual moans about lack of citation and misunderstandings. If a paper deserves a serious comment, I think the proper format for that is another paper.

Although citations are horrible metrics, I think some sort of reputation ratings based on quick likes would be even worse and even easier to game.


>To make a proper comment on a paper requires a significant amount of work (as does refereeing).

Let's not fool ourselves, not all refereed papers are great, not all of them are significant and not all of them are published in high impact journals. Most of the reviewing is cursory, menial, and often delegated to postdocs/phds. Most scientific topics are highly specialized, and you 're unlikely to see trolls bothering to comment on them. Writing another paper as response is not a solution either: the pace of article publishing is months and years, not seconds.

An example: I have often found the discussion of scientific papers here, in HN, to be illuminating, clarifying or countering issues, and offering a wider perspective that is often not mentioned in the article itself. And it's not like everyone in HN is a luminary, just mostly inquisitive people. I also see articles discussed in a useful way in twitter. Why can't we open up this discussion?

I don't think comments are to be taken as reputation metrics either. But i do think a simple, helpful open discussion section is missing from every paper that i have read. I believe the main barrier to it is that academics do not want to get off their high horse of untouchability.

To be precise, many journals do have a comment section, but nobody uses it.


>To be precise, many journals do have a comment section, but nobody uses it.

GitXiv[1] also has a comment section and voting features, and nobody uses it.

I know it has a small user base compared to ArXiV, but I think that its format resembles what ArXiv might look like with comments, votes and reproducible code.

[1]: http://gitxiv.com/


> GitXiv[1] also has a comment section and voting features, and nobody uses it.

Likewise SciRate is a reasonably popular arXiv overlay, but the comments are never used: https://scirate.com/


> If a paper deserves a serious comment, I think the proper format for that is another paper.

That's the thing, though - speaking as a former academic, the reaction when you see a paper that needs serious commenting is mostly "I'll just ignore this paper, since it has these issues", and you continue working on your own stuff - which again is because of the need for citations. If comments had larger effects on your career, they would be worthwhile making. I would guess the effect would be less papers and more 'collaborative science' where you start from a paper and either suggest improvements or improve it / replicate it yourself. I don't see the current proliferation of papers to be a good thing.

As for the worry that there will be too many 'laymen' doing the commenting, there could (as suggested elsewhere) be a verification process - i.e. there could be a 'Verified Ph.D.' (or some other measure of merit) comment section in addition to the 'layman' section.

There could also be a 'replication' section, where replication efforts will be rewarded - possibly by giving the replications some fraction of the paper's total 'reputation'.

In short: If comments are made to really count for scientists, I think there will be a more healthy scientific process going on. Viewing a paper as an evolving thing is IMO more in line with how science does work anyway - a result should be replicated before it's accepted, which is not the case now.


We need only to look to journals to see counter-examples to your proposition that "If a paper deserves a serious comment, I think the proper format for that is another paper."

For example, a journal might have a "Letters to the Editor" section for people to voice a serious comment, even though such a letter is not "another paper".


The only thing you have to do to make comments worthwhile is restrict them to actual academics. Even if they went on to post shit, it would only destroy their own reputation.


This destroys one of the potential benefits of comments -- which is to open up what might be closed communities. Academic disciplines can at times turn into echo chambers.

But there is the rub: if you make comments open, then you will will probably be flooded with. If you close them up, then fail to get the benefits of openness.


There are plenty of professionals who would be able to contribute, but are not academics affiliated with a university.


As somebody in the academia, I would see comments as potentially very dangerous. A comment system could be very easily abused by established researchers in some areas to prevent others from entering their niche sub-field. Do not get me wrong: the majority of scientists I know have very good professional integrity. However, I have also witnessed cases where established professors tried in all possible ways to prevent people from competing with them by discrediting or delaying the publication of very solid science. The arXiv as it is now is a great equalizer, because papers are presented without any string attached and this gives a chance to everyone (or at least to experts) to judge the quality of the various works without pre-conceptions.

As a fictitious example of what could go wrong. I am a theoretical astrophysicist. I am not established (I am not a tenured professor). Suppose that I come up with a good model to explain some astronomical observations. What I would want to do is to write a paper directed to my astronomers colleagues explaining how my model works and why it is better than competing models. I want to convince them to work with me to analyze their data. As it is now, I would post it on the arXiv, the astronomers would probably find it, read it and evaluate it. However, if the arXiv had comments, a single negative review by a more established theoretical astrophysicist would be enough to discourage any astronomer from even reading my paper. Remember that in the astro-ph section of the arXiv there are of the order of 100 new papers per day and we can realistically read only 1 or 2 papers per day on average. In this situation, the chances of my work being completely ignored because of that one comment would be very significant.

I think that the current channels for commenting on scientific work, private email and/or rebuttal papers, are perfectly adequate.


"A single negative review by a more established theoretical astrophysicist would be enough" to discredit his opinion completely if your methods and results prove useful to the community. And besides, he can do that anyway if your paper happens to fall in his hands for review, which is a much more vicious system, because all the backstabbing happens in private. I disagree with your premise here.


Just immagine having to scan through 100 articles every day looking for new ideas. You find something that sounds interesting from somebody you do not know personally (happens to me all the time). The topic is of your interest, but you are not necessarily an expert of that particular subfield (because you are an observer, a "user" of models, not a "developer"). Judging its quality would take away one day of your work. Clearly, if there is a comment from a famous established professor saying that the work is wrong, you will happily forget about it.

It is also true that a famous professor can backstab you when reviewing your paper. But, first of all, your work is already on the arXiv, so everybody already had a chance to form their own opinion. Secondly, in a peer review process the reviewer cannot arbitrarily reject papers. It does not work like that. He/she has to provide good motivations for his/her recommendations. Reviewers can also be challenged to the editors, who can ask for second or third opinions.

Finally, as others commented, to truly disseminate your work you need to go out and engage the community, giving seminars and talks. I couldn't agree more with this. My issue with that, and I talk as a privileged because I work in one of the top Universities in the USA, is that going to conferences, giving seminars and so on, is way easier if you come from one of the top places. You have funds for traveling, you had a lot of chances to network with the right people (when they visited your institution, for example) and so on. It is much more difficult if you come from lesser known groups or from abroad. The system as it is now already strongly favors people working in the top research Universities in the USA and Canada.

The arXiv is great because it puts everybody at the same level. It ensures that the best ideas have a chance to come out, independently from their origin. I wouldn't want the scientific discussion to be dominated by few loud voices.


> The arXiv is great because it puts everybody at the same level. It ensures that the best ideas have a chance to come out, independently from their origin.

With arXiv as a basis for comparison, what is your impression of what the other / older methods of dissemination overvalue and undervalue? For example, my impression is that the other methods favor top U.S. research institutions, but maybe that's realistic; maybe the older methods actually undervalue top institutions (despite my egalitarian fantasies). Maybe gender or experience or position or scope or novelty or other things are over/undervalued.

I do notice that in scientific research, institution is almost a surname in people's identities. It's always 'Jane Doe of Harvard'; it seems like it might as well be 'Jane Doe Harvard'.


I wouldn't say that traditional journals in my field are biased in favor of top U.S. research institutions. The bias comes from the readers. Someone reading articles that are not directly connected with their research has a hard time judging whether the results are solid or not. It is unfortunately natural that scientist, often without even realizing it consciously, end up using proxies, such as the author's affiliation, when evaluating papers. My worry is that adding comments to the arXiv might institutionalize these biases and contribute to the formation of a more closed and elitist scientific community.


I think his point is that, because of that negative comment his methods and results will never actually put to the test of whether they will be useful to the community; however, given the volume of new papers coming out it is unlikely for that paper to be picked up by the community as is, with or without comments. The best way to get people interested is in establishing direct relationships (ideally in person: go present at a relevant conference, engage with people in the community).


> I disagree ...

With due respect, do you have experience inside the world of theoretical astrophysics research or something similar? Do you know how it actually works or are you speaking ... theoretically (ahem)?


It changes the nature of the site.

I fully think open peer review in various forms, is awesome and should be pursued, but the arxiv serves a more fundamental need as well: To simply make the papers available. Within the fields where arxiv is established this is not an issue, but we need to get adjacent fields, and really everyone to agree to a publicly available database of research first. There is still considerable reluctance in some quarters to upload to the arxiv as I found out after changing fields.

The arxiv should be a pure database, then we can think about various ways to build on top. That could be arxiv overlay journals or journals that try out more fancy, new peer reviewing systems that might include curated comments [0].

As long as there is considerable disagreement over what the right format for serious commenting on papers is, arxiv should not be taking sides.

[0] http://www.earth-system-dynamics.net/peer_review/interactive...


> might be a dangerous area to venture into.

The only danger is that the true obnoxious nature of some academics would be aired in public. There is a lot to gain from opening up a discussion in each paper. Asking questions, making clarifications, even suggesting improvements are things that are not possible to do now.


The only danger is that the true obnoxious nature of some academics would be aired in public.

I'm not sure how you can say that. Generally, the comments on the vast majority of internet sites are pretty low quality, and even insightful and useful ones are difficult to find because of all the noise.


The vast majority may be useless, but a significant minority are valuable (e.g. maybe ~ 10% in HN). Why throw away that value?


Because right now we have a system that works. Email the author. I've ALWAYS gotten a response.


I think email (private communication) will always be useful.

However, some general questions being publicly available would be also arguably have utility.


There are already established forums for discussions, like mathoverflow.


Why throw away that value?

Because others can extremely negative - to the point where they may put some people off publishing in open forums.

The very small amount of value that they may add is easy outweighted by a single person not publishing.


I do not believe that. On the other hand not all publishing is valuable: http://www.nature.com/news/the-pressure-to-publish-pushes-do...


> Laba says arXiv should focus on its core mission. It’s the job of the journals to moderate, she says, and better for third-party sites to handle the discussion.

i actually agree with this

though i do think comments and peer review would be great for arxiv papers.. i think it should be done by someone as a webwrapper


Hypothesis (OpenAnnotation) comments (and highlights!) work for any URL.

https://hypothes.is/

https://hypothes.is/embed.js

http://www.openannotation.org/spec/core/


One argument for conservatism: if physicists want a commenting system, why hasn't SciRate taken off?: https://scirate.com/ The interface is clean, it has a lot of nice features, etc. It's gotten some traction in quantum info (my field), but the comments are rarely used. You can chalk this up to network effects (chicken-and-egg), which direct arXiv hosting could solve, but that problem can probably be eliminated by taking the minimal step of allowing the authors to link to the SciRate page from the arXiv page.

No, I think the major feature about having the arXiv host comments directly is that it would force physicists to engage with folks who comment on their paper, because otherwise there would be unanswered criticism attached ("stapled") directly to their public-facing papers. Maybe that's what we want, but that's a huge step and there are a lot of ways for it to go badly.

Here is a mockup that Paul Gisparg made for author-curated links on arXiv article pages. See the "Author suggestions" column on the right: http://www.cs.cornell.edu/~ginsparg/arxiv/1212.3061-mockup.h... Allowing links to places like SciRate plausible solves most of the chicken-or-egg problem by allowing the author to designate "the" place for public discussion without picking a particular site to win or lose, or bundling release of a paper with forced public discussion.

Shameless plug: folks here may be interested in my blog post about the future of academic papers and how the arXiv influences that: http://blog.jessriedel.com/2015/04/16/beyond-papers-gitwikxi... There was good discussion on HN last year when it was posted: https://news.ycombinator.com/item?id=9415985


I am a non-academic who reads a lot of academic papers. I love arXiv, and I probably spend too much time on there.

Here's my 10-minute take on scirate (from viewing it on a mobile browser). Overall, I don't immediately see any added value. The layout and design isn't great - not on mobile at least. The main view isn't optimized for what I want to see. There's a huge navigation tree that takes up all the view space. Then, it shows me a list of ranked papers that are not of interest to me. It forces me to register to access any features. I don't see comments on any of the papers, only a Reddit-style rating. Do I have to register to see the comments? Overall, I don't think I'd ever go back. But, maybe it wasn't designed with my type of use cases in mind.


Thanks for the thoughts. (Hope it's clear I have no connection to SciRate.) The link to comments is directly under the Reddit-style rating, although it only appears if there actually are comments. Most papers don't have comments and you have to be logged into comment.

Personally, I don't expect it to be nice on mobile and I don't want it to be effortless to comment. The amount of work it takes to make a useful academic comment dwarfs those tiny frictions.


I hope they take inspiration from the Arxiv Sanity Preserver: http://www.arxiv-sanity.com/

Also, ArXiv has a lot more than physics. Here's a map over all the papers: http://paperscape.org/


Paperscape looks like a really useful tool for exploring a subject. Thanks for mentioning it.


That preserves sanity? The tabs on the homepage are inverted, it does everything but preserve my sanity.

It might be a bug though.


So arxiv served about 105000 different articles last year. I'm curious how much it cost and how that compares with what the big publishing houses tell us that should cost for their version of "open" publishing.


The arXiv 2016 budget is online: https://confluence.cornell.edu/display/culpublic/arXiv+Susta...

They estimate spending just over a million dollars this year, funded via members (university libraries contributing a few thousand dollars each) and grants.

However, as I always point out, we must remember that the costs of a published journal are higher for reasons other than profit: journal articles are typically converted to HTML and XML via semi-automated processes with manual proofing and tweaking, production staff and some editors (for the biggest journals) are paid salaries, peer review involves chasing down deadline-abusing academics over weeks or months and building some kind of review tracking system, articles have to be reliably archived (e.g. lockss.org), and so on.

Then add that most journals are still printed and distributed by mail for some reason, and you have extra full-time staff and equipment.

I wouldn't want to throw all of that away and just keep the arXiv. Ditching print makes sense, some things can be better automated (we need better tools than LaTeX or Word to prepare papers, so they can easily be produced in PDF, HTML, or semantic XML form), and some services are unnecessary (I've had journals copy-edit my articles and make no useful changes), but I don't want to just read crappily-formatted base LaTeX from the arXiv with no frills either.

(Incidentally, PLOS is non-profit and still has to charge over $1000 per article in PLOS ONE, to pay for the professional staff and all the manual manuscript conversion gunk. Biologists love Word, so converting must be hell.)


What's wrong with LaTeX (or extensions like XeTeX or LuaTeX)? Cursory searching pops up tools such as LaTeXML (http://dlmf.nist.gov/LaTeXML) for converting from LaTeX to XML; not that I've used it, but it seems to have about a 60% success rate at converting arXiv papers [1]. That's still about 20,000 papers that don't build successfully, but it's a tool that could be improved.

What would you suggest as an alternative to LaTeX or Word?

I'm just curious; I use LaTeX regularly and think it works just fine.

[1]http://arxmliv.kwarc.info/


I'm a big LaTeX fan, but the core typesetting engine is pretty heavily focused on print layout, so you'd have to restrict your macro subset to produce clean XML reliably. (You'd also have to see if you can produce JATS, the standard XML dialect for journal articles.)

I don't know of any good alternatives. LaTeX succeeded because of its excellent extensibility, so the core can be used to produce almost anything. I'd like to find similarly extensible tools that can cleanly produce XML as well as PDF. Maybe Pollen (http://docs.racket-lang.org/pollen/) will be one.

There's also Scholarly Markdown (http://scholarlymarkdown.com/), but I suspect Markdown's lack of easy extensibility will doom it, since you'd have to write documents that fit narrowly into the features they provide. (What if you want a "remark" environment and they don't provide it?) reStructuredText is also an option, since it's built for extension, but it's not very well-known outside of Python circles.


It served those many articles but none (repeat, none) of them are peer-reviewed just by being there (unless they have been (by some journal) after being posted on the arxiv).


In most of arXiv supported fields (if not all) people do peer review for free. It's assumed as part of your academic job


Yes, I know, but they need to be contacted and that is part of their curriculum as well (referee for XXX, YYY, etc.). And the administrative nightmare of peer review is not to be taken lightly.


I find that these discussions of "to comment and like" or "not to comment and like" always misses the mark. Both sides of the discussion have solid arguments and I find it hard to imagine the discussion ever resolving itself to anything thereby keeping a status quo noone is really satisfied with.

The problem is that most peoples frame of reference when talking about comments and "likes" is the facebook, reddit or [insert some publisher] model. That is all fine and dandy as a starting point, but let's not forget that all of those models are specifically engineered to keep people posting and liking as much as possible rather than focusing on like and comment quality. There is no reason, or at least i don't see it, why a different model focusing on like and comment quality could not be engineered. Furthermore, if such a model could be engineered well enough, I don't see why a comment and like section of arxiv could not contribute to furthering science as a whole. With that in mind a much more fruitful and positive discussion could be had about "how do we create a comment and like section where everyones interests are protected?" rather than the old back and forth discussion of easing barrier of entry vs quality.

As others have noted much better starting points for the above discussion could be the stack-overflow model or something similar.


If this happens, there's a lot to be learned from Stack Exchange on how to do this well. The stats.stackexchange.com site is actually pretty high quality IMHO.

Arxiv could use some of that reputation & moderation structure to regulate content. Cranks and time-wasters could be sidelined. It would certainly work much better than a typical newspaper comments section.

It would work by establishing expectations of how much substance should be in a response to a paper. Then getting the community to pitch in on the quality of responses. You still get echo-chamber effects with this, but the quality filter is generally worth it.


I really like the idea of using something similar to the stack overflow model to help with peer review. It could open up the peer review process to more people for a longer period of time, giving a higher chance that more people will contribute if there is interest and a higher overall quality of discussion. All contributions give you a clear increase in reputation points.

Groups of academics could band together to form online-only journals on a specific topic and then allocate medals to papers that are considered accepted. So while there may be many loose papers out there with little or no comments (like on axiv currently) only a few will get the seal of approval.


I would fear fabricated likes and positive comments about "our" papers and fabricated un-likes and negative comments about "their" papers; academic cliques and individual professors can leverage their students to pollute any StackExchange-like mechanism, and if they have too few slaves there are plenty of PR firms to do the job.


FWIW I'd like to see this kind of model join the publishing landscape. It's galling that academics do peer review unpaid for the most part, only for publishers to get in the way of access. Open peer review definitely needs working on.


I've written up a few ideas about PDFs, edges, and reproducibility (in particular); with the Hashtags #LinkedReproducibility (and #MetaResearch)

https://twitter.com/search?q=%23LinkedReproducibility

https://twitter.com/search?q=%23MetaResearch

- schema.org/MedicalTrialDesign enumerations could/should be extended to all of science (and then added to all of these PDFs without structured edge types like e.g. {intendedToReproduce, seemsToReproduce} (which then have specific ensuing discussions))

- http://health-lifesci.schema.org/MedicalTrialDesign

- there should be a way to evaluate controls in a structured, blinded, meta-analytic way

- PDF is pretty, but does not support RDFa (because this is a graph)

... notes here: https://wrdrd.com/docs/consulting/data-science#linked-reprod...

(edit) please feel free to implement any of these ideas (e.g. CC0)


> a way to evaluate controls

a way to evaluate premises (assumptions, controls, data, data transformations) and conclusions (presented as e.g. JSONLD, RDFa in a standard form (potentially like IPython/Jupyter .ipynb; but with an OrderedMap of I/O sequences with fixed #urifragment IDs)


I just hope they don't wreck ArXiv in the process.


Aggregation and curation should remain separate. Perhaps the most useful thing would be for arxiv (or some other entity) to collect links from known curatorion sources to aid in the evaluation of papers.


Weird, but the article seems to be paywalled.


No, they just detect if you're using adblocker and put up a block screen if you are.

Sorry Wired, but too many newspaper websites have abused their advertising privilege for me to ever read a newspaper without an adblocker again.


How is trying to detect what kind of software I'm running is not a violation of CFAA https://ilt.eff.org/index.php/Computer_Fraud_and_Abuse_Act_%... , especially the exceeding authorization part? IANAL and not from the US, guess it not really applies to me, still, I won't be sad the day wired and co goes down the drain.


I never browse arxiv, but access it a lot through other services.

I kind of like that model. Have the site being good at what it is best at, and help other services provide further services, such as search (google scholar) or stricter moderation and commenting and discussion.


I am not sure why it is called "Legendary". What's the legend here?


Like many words, "legendary" has multiple meanings, and the correct one needs to be inferred from context. A dictionary can help if something, like this, doesn't makes sense. For example, http://www.merriam-webster.com/dictionary/legendary says:

> 1: of, relating to, or characteristic of legend or a legend

> 2: well-known, famous

It seems you only knew of the first definition, when the article uses the second.


posted behind a paywall on wired...


Clean Firefox (no addons) from Italy, no paywall or ad-wall.

I have the no-tracking enabled in the FF preferences and I'm on Ubuntu MATE 16.04 LTS.


Workaround: open the link in a private window.


no paywall for me - is it geotargetted?


I see an anti-ad wall, when I whitelist the page or turn off the adblocker it works


What's wrong with that? As far as I know the point of open science is that research publications should be open not everything else, too. For non-science publications like newspapers I actually prefer payed access over ad-supported access.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: