Show HN: Arxiv.org on IPFS

onhn · on Sept 7, 2021

"You can upload your research and publish it on the open web. Members of the community will be able to vote on your research to raise its visibility."

Oh dear.

Taek · on Sept 7, 2021

How would you set it up? The decentralized world doesn't really have a great system for curation at this point (unless you can point to a counterexample!), and so I'm in favor of any sort of playing around with decentralized voting/curation until we find something that seems to be working well.

SkyMarshal · on Sept 7, 2021

Start from the objective of first do no harm. Voting systems may eventually be gamed to distort results, so eliminate the voting system. Instead rely on ad-hoc personal networks to disseminate signal about quality papers out-of-band. Don’t assume you have to systematize everything.

jamescampbell · on Sept 7, 2021

Voting (as was bore out in many examples including digg.com and elsewhere) becomes a mob rule situation and variation of tyranny of the commons without a novelty algorithm in addition to total votes. If you just go by totals, it will be easily gamified and rendered useless as a metric.

onhn · on Sept 7, 2021

The standard and most effective form of curation in science is the reference list at the end of a paper.

But usually you just read everything that is relevant to your research interests from the daily arxiv posting.

f0e4c2f7 · on Sept 7, 2021

A perfect system? No, but think about how people must have felt about Wikipedia on launch.

Love this idea.

bluish29 · on Sept 7, 2021

Actually I don't think science has democratic nature. Yes we do somehow do that as a theory would still need to be accepted widely. But in reality one person can have the correct idea while all others disagree. Still this person is doing it right.

lmohseni · on Sept 7, 2021

I believe science is a democratic process. If someone has the correct idea but communicates it poorly, so poorly that others in the field disagree, then this person is doing it wrong. (Thinking specifically of https://en.m.wikipedia.org/wiki/Shinichi_Mochizuki )

nextaccountic · on Sept 7, 2021

The participation ought to be democratic in the sense of being open to everyone to participate. But, you can't do a vote and use it to decide who is right. Deep down we know that being right or wrong is independent from the scientific consensus. Mochizuki may be interacting with the scientific community in the wrong way, but it has no bearing on whether his theory is correct.

The consensus itself has some democratic features, but it's weighed by prestige and adherence to the current paradigm. I think Kuhn described its mechanism pretty well. It's far easier to convince people of a wrong result if you follow the established paradigm, than convince people of something right if you go against it. What really saves science from being pure dogma is that there are paradigm shifts, revolutions in which the scientific consensus change.

hugoroussel · on Sept 7, 2021

All and all it is a non trivial problem. You have at the very least have to attach some kind of form of reputation system into the verification process. Even with that you will still have the "misunderstood genius" issue, or the "excellent reputation professor" that everyone trust without (enough) verification.

elcritch · on Sept 7, 2021

But at least there’s be a system for other researchers to record “failed to replicate” that could give a channel to critique reputable professors that’s not controlled by the same professors (as they often can in journals).

wongarsu · on Sept 7, 2021

Scientific consensus is democratic in nature (even though votes are not distributed evenly). The ideal is that through reproducible experiments and application of the scientific method the scientific consensus moves to increasingly accurate models of reality over time. But obviously the speed at which that happens varies, and some right ideas took annoyingly long to get accepted into scientific consensus.

kloch · on Sept 7, 2021

Sure the right answer will eventually prevail but the process is much worse than we like to admit. Many breakthrough advances were outright rejected by contemporary peers when first proposed.

"Fermi first submitted his "tentative" theory of beta decay to the prestigious science journal Nature, which rejected it "because it contained speculations too remote from reality to be of interest to the reader." Nature later admitted the rejection to be one of the great editorial blunders in its history. ... Fermi found the initial rejection of the paper so troubling that he decided to take some time off from theoretical physics, and do only experimental physics" https://en.wikipedia.org/wiki/Fermi%27s_interaction

gambler · on Sept 7, 2021

Using Wikipedia as an example of a seemingly naïve idea that was ultimately proven to work is a pretty bad argument that completely ignores how Wikipedia operates at the moment.

It's routinely used for propagating smears:

https://odysee.com/@AlisonMorrow:6/how-wikipedia-decides-if-...

Even one of its co-founders says it's failing as an accurate source of information:

https://odysee.com/@TimcastIRL:8/former-founder-of-wikipedia...

Just like Jaron Lanier predicted in 2006:

https://www.edge.org/conversation/jaron_lanier-digital-maois...

I never understood why so many technologists vehemently defend a website that was obviously prone to a form of "regulatory capture" and groupthink.

chalst · on Sept 8, 2021

Larry Sanger has made something of a career out of being "the cofounder of Wikipedia who thinks it's getting it all wrong". There's a point at which the latest iteration of his criticism ceases to be a stop-the-presses newsworthy event.

Sanger wrote a great set of essays, largely based on the lecture notes of courses he taught as an academic, that seeded Wikipedia with a load of freely licensed content that kickstarted the whole enterprise. It's quite possible that without this initial burst of momentum, Wikipedia would have failed. For that he has earned and will never lose recognition. But the negative part of his critique of Wikipedia is not more searching than that Wikipedia editors perform on themselves without his help, and his series of suggestions for positive alternatives have lost credibility because his ideas never work.

I still pay attention to what Sanger says, but not with a high expectation that what he says will be exceptionally insightful.

bigphishy · on Sept 7, 2021

In all my experience using wikipedia it has been successful at providing facts and accurate references.

I don't mean to attack the speaker here, but that former cofounder of wikipedia you just cited... isn't he an extremist neo-conservative? Why did he leave wikipedia in the first place? What are his proposed solutions?

rglover · on Sept 7, 2021

https://www.youtube.com/watch?v=GA0l1JXhLaI&t=19s

baby · on Sept 7, 2021

Sounds amazing to me

sebmellen · on Sept 7, 2021

Hey Hugo, do you have an email I could reach you at? I've been thinking/working on these problems for 3 years now and would love to find some smart people to partner with to further develop the ideas. I don't have much to show publicly right now, but https://intpub.org/ (soon to be scipub.app) is the start.

betwixthewires · on Sept 7, 2021

Oh wow I'm going to be watching your github repo, what you're doing looks interesting.

Do you have any plans to implement some search functionality to help find documents, or is it about permissionless publishing only?

sebmellen · on Sept 8, 2021

Great question! Search is (I believe) a second-order problem. Once permissionless publishing is solved, extensions can be built onto the core protocol.

In my opinion, indexing and ranking services work much better as ancillary services. The bundling of indexing and ranking with core protocol features is one of my main contentions with the Coinbase-backed "Research Hub" project/company (https://researchhub.org).

If you think about it, journals are important services, but they serve two functions right now:

1. Publication/information & data storage

2. Information indexing and ranking

Really, I think the whole contention about open/closed science is directed in the wrong place. Journals shouldn't be hosting information! They should be indexing and ranking information.

The information should be stored on an open, decentralized substrate where no one needs to go through authentication steps to view it. Then, journals can maintain their closed/open persuasions and offer bundled services and discovery and maintain their clout without really needing to change their core offering.

Also, the git repo link needs to be updated, and there's nothing there right now really, but https://github.com/scipubapp is the right link!

sebmellen · on Sept 8, 2021

Great question! Search is (I believe) a second-order problem. Once permissionless publishing is solved, extensions can be built onto the core protocol.

In my opinion, indexing and ranking services work much better as ancillary services. The bundling of indexing and ranking with core protocol features is one of my main contentions with the Coinbase-backed "Research Hub" project/company (https://researchhub.org).

If you think about it, journals are also important services

Also, git repo link needs to be updated, and there's nothing there right now really, but https://github.com/scipubapp is the right link!

eecc · on Sept 7, 2021

Also keep me in the loop, plz :)

sebmellen · on Sept 7, 2021

For sure! Will do.

T-A · on Sept 7, 2021

From the repo [1]:

"research publishing platform that is community based, transparent and censorship resistant" (my emphasis)

"Community members moderate the platform and can increase or decrease the visibility of the uploaded files" (again, my emphasis)

[1] https://github.com/hugoroussel/xirva

prvc · on Sept 7, 2021

>censorship resistant

My emphasis.

hevalon · on Sept 7, 2021

Just a heads up comment, it seems some of the IPFS links are broken. For example I just visited https://www.xirva.org/categories/cs.AI and all links point to an "undefined" path example https://ipfs.io/ipfs/undefined/2107.00082.pdf

But overall, great idea!

hugoroussel · on Sept 7, 2021

Not all articles are uploaded yet, it is still WIP.

hevalon · on Sept 7, 2021

that's fine. Maybe then for a better user experience have grey-out or removed the IPFS download button for the articles that haven't been uploaded yet.

meroje · on Sept 7, 2021

links ? everything is a fake js button

imback · on Sept 7, 2021

Slightly OT, but how is IPFS these days? Every time I've tried using it the past few years it seemed very WIP. Cool sounding protocol though.

Taek · on Sept 7, 2021

It's still very slow - retrieval times >30s for files that aren't cached on cloudflare or ipfs.io. Also, those two providers each have multiple periods of downtime every year.

If the files are cached and the services are up, it's plenty fast for static data but dynamic data (IPNS) is still very slow.

We've built a competitor called Skynet that is much faster (less than 200ms for files that aren't in the cache) and scales better. It's currently hosting tens of millions of files across 200+ TB of data.

We really like the vision that IPFS had and we think decentralized data is the future of the Internet. We're proud to have put in the legwork to make it practical.

https://docs.siasky.net/

amelius · on Sept 7, 2021

Why do you need a blockchain to implement Skynet?

Taek · on Sept 7, 2021

The blockchain gets us a decentralized marketplace for decentralized storage providers. Anyone can join as a provider and get paid, and the blockchain can act as a decentralized escrow that holds the payment until proof is provided that the storage contract was properly fulfilled.

98% of our technology is off-chain. Only a little tiny sliver (the file contract open and close) is actually posted to the blockchain.

yawnxyz · on Sept 8, 2021

"Pin up to 100GB at no cost" (https://account.siasky.net/payments)

Wait, is that a typo?? That seems incredibly generous. I guess that's where decentralization comes in.

Does it also mean that if for some reason the network comes down, the files all come down as well?

pimterry · on Sept 7, 2021

Do you have any documentation that compares Skynet to IPFS anywhere? From an architectural standpoint and/or features.

Taek · on Sept 7, 2021

Closest we got at the moment is this: https://skynet.guide/discover/storage-chains-compared.html#f...

I'm writing a more direct comparison this week, we just recently (less than 30 days ago) hit full feature parity with ipfs, any webapp or file deployed on IPFS should work natively on Skynet now as well. The link/identifier will be different but you shouldn't need to change any code

menmob · on Sept 7, 2021

Are you able to “pin” files yourself like IPFS? Or are you required to pay hosts to store it?

Taek · on Sept 7, 2021

You can always run a host yourself and pin it to your own host.

The main reason we chose a host-based architecture instead of a pin based architecture is that we saw on IPFS that having people pin their own data resulted in really poor uptimes, a lot of file rot, and it also substantially reduced scalability and increased fetch times. And after all of those tradeoffs, the vast majority of accessible content on IPFS is hosted via a pinning service anyway.

menmob · on Sept 7, 2021

Makes sense. I’ve just been looking at creating a distributed YT archive on IPFS, but as you said, load times are absolutely terrible, especially for big files like video. I’ve been following sia since 2016-ish, and skynet looks awesome, just worried about maturity. I will try hosting my own files as you described, thanks!

a1369209993 · on Sept 7, 2021

> You can always run a host yourself and pin it to your own host.

(It sounds like it, but) to clarify, can you do this completely for free, with no cooperation from a third party (eg, you don't need to pay a existing host to vouch for you)?

Taek · on Sept 7, 2021

No need to pay an existing host or get any sort of external party to enable you, it's a permissionless network

jstanley · on Sept 7, 2021

I got turned off it because things that I'd pinned, and knew I'd pinned, would eventually become mysteriously inaccessible unless accessed through the node that I pinned it on, and I couldn't work out why. It's a great idea, but I found it too flakey to satisfy me.

hugoroussel · on Sept 7, 2021

Build this during an hackathon where I felt like that arXiv was in grand need of a small face lift. Currently not all articles are uploaded. The repository is here : http://github.com/hugoroussel/xirva

ithinkso · on Sept 7, 2021

Not to be confused with viXra, which is a sci-fi alternative to arXiv

hugoroussel · on Sept 7, 2021

I nearly called the project this name without knowing it existed. Thanks for the share.

kloch · on Sept 7, 2021

viXra is full of nonsense but then again so is arXiv (see: 750GeV debacle). viXra desperately needed a voting system and if it did it would likely have been much more useful and become a viable alternative to arXiv.

arXiv does not accept papers from authors with no institutional affiliation and viXra was the only (and ugly) alternative. There is an opportunity there to fix both sites.

detaro · on Sept 7, 2021

What was the 750GeV debacle? That thing were some people said LHC had found a new particle maybe?

kloch · on Sept 7, 2021

A spurious signal at LHC (that disappeared in later runs) that spurred a cottage industry of arXiv submissions trying to explain it. Authors, mostly grad students, quickly submitted hundreds of low quality articles trying to get in on the "discovery". Later articles would cite over 400 previous articles and it became one giant circle jerk. There were even blog posts complaining about the blatant "ambulance chasing".

https://motls.blogspot.com/2016/06/ambulance-chasing-is-just...

http://resonaances.blogspot.com/2016/06/game-of-thrones-750-...

eecc · on Sept 7, 2021

Ah cool… I also took a stab at something similar several years ago: https://github.com/ecausarano/heron

Also at the time I was considering IPFS.

But I guess the real trick is implementing a WOT to implement peer review and filter out the inevitable junk that will be published

westurner · on Sept 7, 2021

"Help compare Comment and Annotation services: moderation, spam, notifications, configurability" executablebooks/meta#102 https://github.com/executablebooks/meta/discussions/102 :

> jupyter-comment supports a number of commenting services [...]. In helping users decide which commenting and annotation services to include on their pages and commit to maintaining, could we discuss criteria for assessment and current features of services?

> Possible features for comparison:

> * Content author can delete / hide

> * Content author can report / block

> * Comments / annotations are screened by spam-fighting service

> * Content / author can label as e.g. toxic

> * Content author receives notification of new comments

> * Content author can require approval before user-contributed content is publicly-visible

> * Content author may allow comments for a limited amount of time (probably more relevant to BlogPostings)

> * Content author may simultaneously denounce censorship in all it's forms while allowing previously-published works to languish

#ForScience

westurner · on Sept 7, 2021

FWIW, archiving repo2docker-compatible git repos with a DOI attached to a git tag, is possible with JupyterLite:

> JupyterLite is a JupyterLab distribution that runs entirely in the browser built from the ground-up using JupyterLab components and extensions

With JupyterLite, you can build a static archive of a repo2docker-like environment so that the ScholarlyArticle notebook or computer modern latex css, its SoftwareRelease dependencies, and possibly also the Datasets can be run in a browser tab with WASM. HTML + JS + WASM

hugoroussel · on Sept 7, 2021

Exactly the problem that I have. Many edge cases too.

kloch · on Sept 7, 2021

What do you mean by junk? Spam and Abuse, or scientific papers that the reviewer simply doesn't like?

eecc · on Sept 8, 2021

Well, as anyone can and will publish, it will be stuffed with junk, porn, trash of all sorts, so my plan was to implement a filter to ignore any data published outside of one’s WOT.

Furthermore, a user searching for “reviewed” data and papers would normally filter for items with enough “endorsement” metadata items signed by known WOT actors.

I haven’t figured a mechanism to prevent “review rings”, although being totally transparent it should be easy to spot them.

MayeulC · on Sept 7, 2021

Doesn't seem to work (I don't have a local gateway on this machine):

https://www.xirva.org/list/eess.IV/2011/2011.00052

Brings me to https://ipfs.io/ipfs/undefined/2011.00052.pdf

which says: invalid ipfs path: invalid path "/ipfs/undefined/2011.00052.pdf": invalid CID: expected 1 as the cid version number, got: 31965309853

hugoroussel · on Sept 7, 2021

Not all articles are uploaded yet, it is still WIP.

fnord77 · on Sept 7, 2021

is search still a WIP?

hugoroussel · on Sept 7, 2021

Everything is still WIP, very difficult to do rich indexing on IPFS files. A new project idea ?

flexd · on Sept 7, 2021

Hmm, my Firefox here on Ubuntu does not trust that CA for some reason. And clicking "Accept the risk and continue" seems to just land me back at the "Warning: Potential Security Risk Ahead" page.

hugoroussel · on Sept 7, 2021

I think only www.xirva.org has valid certificate, maybe Firefox strips the www for some reason.

brylie · on Sept 7, 2021

I get an error on several articles to the effect of:

invalid ipfs path: invalid path "/ipfs/undefined/2107.00648.pdf": invalid CID: expected 1 as the cid version number, got: 31965309853

woile · on Sept 7, 2021

Wow looks super nice, feels so easy to navigate and read, I was thinking it would be nice to have a RSS feed for some topics to read as news. Great work!

Jhsto · on Sept 7, 2021

How does the NFT thing work? I do not have Polygon tokens but I wonder what have you envisioned as the functionality of this to be?

hugoroussel · on Sept 7, 2021

Since it was for a ETHGlobal hackathon I thought it would be fun to experiment with the feature where we mint an NFT where the metadata points to the IPFS link. You could then do whatever you do with an NFT.

Jhsto · on Sept 7, 2021

I wonder how much information do you get from arxiv regarding the dependency graph via citations. Could there be a way that I, as I upload my own manuscript, to tip the authors of the people I citated and conversely someday have the possibility of also generating revenue personally as such? It would be much nicer, I think, if the fees that are currently paid to journals to instead go to the authors that also contributed to my work.

hugoroussel · on Sept 7, 2021

There are for sure interesting ideas related to new forms of scientific funding. The issue I have and why the project is currently on standby is how to combat spam/hoax articles

aerique · on Sept 7, 2021

Cool.

Now the same for Sci-Hub?

leavenotracks · on Sept 7, 2021

Search bar doesn't seem to be working...

hugoroussel · on Sept 7, 2021

stevejpurves · on Sept 7, 2021

does the upload here also submit to arxiv, or will uploading to xivra mean they diverge?

hugoroussel · on Sept 7, 2021

It will diverge.

_zh9y · on Sept 9, 2021

What if every paper was on IPFS? Whenever I try to read an obscure paper published before 1990, the paper is usually behind a paywall.