Hacker News new | past | comments | ask | show | jobs | submit login
Show HN: Arxiv.org on IPFS (xirva.org)
238 points by hugoroussel on Sept 7, 2021 | hide | past | favorite | 73 comments



"You can upload your research and publish it on the open web. Members of the community will be able to vote on your research to raise its visibility."

Oh dear.


How would you set it up? The decentralized world doesn't really have a great system for curation at this point (unless you can point to a counterexample!), and so I'm in favor of any sort of playing around with decentralized voting/curation until we find something that seems to be working well.


Start from the objective of first do no harm. Voting systems may eventually be gamed to distort results, so eliminate the voting system. Instead rely on ad-hoc personal networks to disseminate signal about quality papers out-of-band. Don’t assume you have to systematize everything.


Voting (as was bore out in many examples including digg.com and elsewhere) becomes a mob rule situation and variation of tyranny of the commons without a novelty algorithm in addition to total votes. If you just go by totals, it will be easily gamified and rendered useless as a metric.


The standard and most effective form of curation in science is the reference list at the end of a paper.

But usually you just read everything that is relevant to your research interests from the daily arxiv posting.


A perfect system? No, but think about how people must have felt about Wikipedia on launch.

Love this idea.


Actually I don't think science has democratic nature. Yes we do somehow do that as a theory would still need to be accepted widely. But in reality one person can have the correct idea while all others disagree. Still this person is doing it right.


I believe science is a democratic process. If someone has the correct idea but communicates it poorly, so poorly that others in the field disagree, then this person is doing it wrong. (Thinking specifically of https://en.m.wikipedia.org/wiki/Shinichi_Mochizuki )


The participation ought to be democratic in the sense of being open to everyone to participate. But, you can't do a vote and use it to decide who is right. Deep down we know that being right or wrong is independent from the scientific consensus. Mochizuki may be interacting with the scientific community in the wrong way, but it has no bearing on whether his theory is correct.

The consensus itself has some democratic features, but it's weighed by prestige and adherence to the current paradigm. I think Kuhn described its mechanism pretty well. It's far easier to convince people of a wrong result if you follow the established paradigm, than convince people of something right if you go against it. What really saves science from being pure dogma is that there are paradigm shifts, revolutions in which the scientific consensus change.


All and all it is a non trivial problem. You have at the very least have to attach some kind of form of reputation system into the verification process. Even with that you will still have the "misunderstood genius" issue, or the "excellent reputation professor" that everyone trust without (enough) verification.


But at least there’s be a system for other researchers to record “failed to replicate” that could give a channel to critique reputable professors that’s not controlled by the same professors (as they often can in journals).


Scientific consensus is democratic in nature (even though votes are not distributed evenly). The ideal is that through reproducible experiments and application of the scientific method the scientific consensus moves to increasingly accurate models of reality over time. But obviously the speed at which that happens varies, and some right ideas took annoyingly long to get accepted into scientific consensus.


Sure the right answer will eventually prevail but the process is much worse than we like to admit. Many breakthrough advances were outright rejected by contemporary peers when first proposed.

"Fermi first submitted his "tentative" theory of beta decay to the prestigious science journal Nature, which rejected it "because it contained speculations too remote from reality to be of interest to the reader." Nature later admitted the rejection to be one of the great editorial blunders in its history. ... Fermi found the initial rejection of the paper so troubling that he decided to take some time off from theoretical physics, and do only experimental physics" https://en.wikipedia.org/wiki/Fermi%27s_interaction


Using Wikipedia as an example of a seemingly naïve idea that was ultimately proven to work is a pretty bad argument that completely ignores how Wikipedia operates at the moment.

It's routinely used for propagating smears:

https://odysee.com/@AlisonMorrow:6/how-wikipedia-decides-if-...

Even one of its co-founders says it's failing as an accurate source of information:

https://odysee.com/@TimcastIRL:8/former-founder-of-wikipedia...

Just like Jaron Lanier predicted in 2006:

https://www.edge.org/conversation/jaron_lanier-digital-maois...

I never understood why so many technologists vehemently defend a website that was obviously prone to a form of "regulatory capture" and groupthink.


Larry Sanger has made something of a career out of being "the cofounder of Wikipedia who thinks it's getting it all wrong". There's a point at which the latest iteration of his criticism ceases to be a stop-the-presses newsworthy event.

Sanger wrote a great set of essays, largely based on the lecture notes of courses he taught as an academic, that seeded Wikipedia with a load of freely licensed content that kickstarted the whole enterprise. It's quite possible that without this initial burst of momentum, Wikipedia would have failed. For that he has earned and will never lose recognition. But the negative part of his critique of Wikipedia is not more searching than that Wikipedia editors perform on themselves without his help, and his series of suggestions for positive alternatives have lost credibility because his ideas never work.

I still pay attention to what Sanger says, but not with a high expectation that what he says will be exceptionally insightful.


In all my experience using wikipedia it has been successful at providing facts and accurate references.

I don't mean to attack the speaker here, but that former cofounder of wikipedia you just cited... isn't he an extremist neo-conservative? Why did he leave wikipedia in the first place? What are his proposed solutions?



Sounds amazing to me


Hey Hugo, do you have an email I could reach you at? I've been thinking/working on these problems for 3 years now and would love to find some smart people to partner with to further develop the ideas. I don't have much to show publicly right now, but https://intpub.org/ (soon to be scipub.app) is the start.


Oh wow I'm going to be watching your github repo, what you're doing looks interesting.

Do you have any plans to implement some search functionality to help find documents, or is it about permissionless publishing only?


Great question! Search is (I believe) a second-order problem. Once permissionless publishing is solved, extensions can be built onto the core protocol.

In my opinion, indexing and ranking services work much better as ancillary services. The bundling of indexing and ranking with core protocol features is one of my main contentions with the Coinbase-backed "Research Hub" project/company (https://researchhub.org).

If you think about it, journals are important services, but they serve two functions right now:

1. Publication/information & data storage

2. Information indexing and ranking

Really, I think the whole contention about open/closed science is directed in the wrong place. Journals shouldn't be hosting information! They should be indexing and ranking information.

The information should be stored on an open, decentralized substrate where no one needs to go through authentication steps to view it. Then, journals can maintain their closed/open persuasions and offer bundled services and discovery and maintain their clout without really needing to change their core offering.

Also, the git repo link needs to be updated, and there's nothing there right now really, but https://github.com/scipubapp is the right link!


Great question! Search is (I believe) a second-order problem. Once permissionless publishing is solved, extensions can be built onto the core protocol.

In my opinion, indexing and ranking services work much better as ancillary services. The bundling of indexing and ranking with core protocol features is one of my main contentions with the Coinbase-backed "Research Hub" project/company (https://researchhub.org).

If you think about it, journals are also important services

Also, git repo link needs to be updated, and there's nothing there right now really, but https://github.com/scipubapp is the right link!


Also keep me in the loop, plz :)


For sure! Will do.


From the repo [1]:

"research publishing platform that is community based, transparent and censorship resistant" (my emphasis)

"Community members moderate the platform and can increase or decrease the visibility of the uploaded files" (again, my emphasis)

[1] https://github.com/hugoroussel/xirva


>censorship resistant

My emphasis.


Just a heads up comment, it seems some of the IPFS links are broken. For example I just visited https://www.xirva.org/categories/cs.AI and all links point to an "undefined" path example https://ipfs.io/ipfs/undefined/2107.00082.pdf

But overall, great idea!


Not all articles are uploaded yet, it is still WIP.


that's fine. Maybe then for a better user experience have grey-out or removed the IPFS download button for the articles that haven't been uploaded yet.


links ? everything is a fake js button


Slightly OT, but how is IPFS these days? Every time I've tried using it the past few years it seemed very WIP. Cool sounding protocol though.


It's still very slow - retrieval times >30s for files that aren't cached on cloudflare or ipfs.io. Also, those two providers each have multiple periods of downtime every year.

If the files are cached and the services are up, it's plenty fast for static data but dynamic data (IPNS) is still very slow.

We've built a competitor called Skynet that is much faster (less than 200ms for files that aren't in the cache) and scales better. It's currently hosting tens of millions of files across 200+ TB of data.

We really like the vision that IPFS had and we think decentralized data is the future of the Internet. We're proud to have put in the legwork to make it practical.

https://docs.siasky.net/


Why do you need a blockchain to implement Skynet?


The blockchain gets us a decentralized marketplace for decentralized storage providers. Anyone can join as a provider and get paid, and the blockchain can act as a decentralized escrow that holds the payment until proof is provided that the storage contract was properly fulfilled.

98% of our technology is off-chain. Only a little tiny sliver (the file contract open and close) is actually posted to the blockchain.


"Pin up to 100GB at no cost" (https://account.siasky.net/payments)

Wait, is that a typo?? That seems incredibly generous. I guess that's where decentralization comes in.

Does it also mean that if for some reason the network comes down, the files all come down as well?


Do you have any documentation that compares Skynet to IPFS anywhere? From an architectural standpoint and/or features.


Closest we got at the moment is this: https://skynet.guide/discover/storage-chains-compared.html#f...

I'm writing a more direct comparison this week, we just recently (less than 30 days ago) hit full feature parity with ipfs, any webapp or file deployed on IPFS should work natively on Skynet now as well. The link/identifier will be different but you shouldn't need to change any code


Are you able to “pin” files yourself like IPFS? Or are you required to pay hosts to store it?


You can always run a host yourself and pin it to your own host.

The main reason we chose a host-based architecture instead of a pin based architecture is that we saw on IPFS that having people pin their own data resulted in really poor uptimes, a lot of file rot, and it also substantially reduced scalability and increased fetch times. And after all of those tradeoffs, the vast majority of accessible content on IPFS is hosted via a pinning service anyway.


Makes sense. I’ve just been looking at creating a distributed YT archive on IPFS, but as you said, load times are absolutely terrible, especially for big files like video. I’ve been following sia since 2016-ish, and skynet looks awesome, just worried about maturity. I will try hosting my own files as you described, thanks!


> You can always run a host yourself and pin it to your own host.

(It sounds like it, but) to clarify, can you do this completely for free, with no cooperation from a third party (eg, you don't need to pay a existing host to vouch for you)?


No need to pay an existing host or get any sort of external party to enable you, it's a permissionless network


I got turned off it because things that I'd pinned, and knew I'd pinned, would eventually become mysteriously inaccessible unless accessed through the node that I pinned it on, and I couldn't work out why. It's a great idea, but I found it too flakey to satisfy me.


Build this during an hackathon where I felt like that arXiv was in grand need of a small face lift. Currently not all articles are uploaded. The repository is here : http://github.com/hugoroussel/xirva


Not to be confused with viXra, which is a sci-fi alternative to arXiv


I nearly called the project this name without knowing it existed. Thanks for the share.


viXra is full of nonsense but then again so is arXiv (see: 750GeV debacle). viXra desperately needed a voting system and if it did it would likely have been much more useful and become a viable alternative to arXiv.

arXiv does not accept papers from authors with no institutional affiliation and viXra was the only (and ugly) alternative. There is an opportunity there to fix both sites.


What was the 750GeV debacle? That thing were some people said LHC had found a new particle maybe?


A spurious signal at LHC (that disappeared in later runs) that spurred a cottage industry of arXiv submissions trying to explain it. Authors, mostly grad students, quickly submitted hundreds of low quality articles trying to get in on the "discovery". Later articles would cite over 400 previous articles and it became one giant circle jerk. There were even blog posts complaining about the blatant "ambulance chasing".

https://motls.blogspot.com/2016/06/ambulance-chasing-is-just...

http://resonaances.blogspot.com/2016/06/game-of-thrones-750-...


Ah cool… I also took a stab at something similar several years ago: https://github.com/ecausarano/heron

Also at the time I was considering IPFS.

But I guess the real trick is implementing a WOT to implement peer review and filter out the inevitable junk that will be published


"Help compare Comment and Annotation services: moderation, spam, notifications, configurability" executablebooks/meta#102 https://github.com/executablebooks/meta/discussions/102 :

> jupyter-comment supports a number of commenting services [...]. In helping users decide which commenting and annotation services to include on their pages and commit to maintaining, could we discuss criteria for assessment and current features of services?

> Possible features for comparison:

> * Content author can delete / hide

> * Content author can report / block

> * Comments / annotations are screened by spam-fighting service

> * Content / author can label as e.g. toxic

> * Content author receives notification of new comments

> * Content author can require approval before user-contributed content is publicly-visible

> * Content author may allow comments for a limited amount of time (probably more relevant to BlogPostings)

> * Content author may simultaneously denounce censorship in all it's forms while allowing previously-published works to languish

#ForScience


FWIW, archiving repo2docker-compatible git repos with a DOI attached to a git tag, is possible with JupyterLite:

> JupyterLite is a JupyterLab distribution that runs entirely in the browser built from the ground-up using JupyterLab components and extensions

With JupyterLite, you can build a static archive of a repo2docker-like environment so that the ScholarlyArticle notebook or computer modern latex css, its SoftwareRelease dependencies, and possibly also the Datasets can be run in a browser tab with WASM. HTML + JS + WASM


Exactly the problem that I have. Many edge cases too.


What do you mean by junk? Spam and Abuse, or scientific papers that the reviewer simply doesn't like?


Well, as anyone can and will publish, it will be stuffed with junk, porn, trash of all sorts, so my plan was to implement a filter to ignore any data published outside of one’s WOT.

Furthermore, a user searching for “reviewed” data and papers would normally filter for items with enough “endorsement” metadata items signed by known WOT actors.

I haven’t figured a mechanism to prevent “review rings”, although being totally transparent it should be easy to spot them.


Doesn't seem to work (I don't have a local gateway on this machine):

https://www.xirva.org/list/eess.IV/2011/2011.00052

Brings me to https://ipfs.io/ipfs/undefined/2011.00052.pdf

which says: invalid ipfs path: invalid path "/ipfs/undefined/2011.00052.pdf": invalid CID: expected 1 as the cid version number, got: 31965309853


Not all articles are uploaded yet, it is still WIP.


is search still a WIP?


Everything is still WIP, very difficult to do rich indexing on IPFS files. A new project idea ?


Hmm, my Firefox here on Ubuntu does not trust that CA for some reason. And clicking "Accept the risk and continue" seems to just land me back at the "Warning: Potential Security Risk Ahead" page.


I think only www.xirva.org has valid certificate, maybe Firefox strips the www for some reason.


I get an error on several articles to the effect of:

invalid ipfs path: invalid path "/ipfs/undefined/2107.00648.pdf": invalid CID: expected 1 as the cid version number, got: 31965309853


Wow looks super nice, feels so easy to navigate and read, I was thinking it would be nice to have a RSS feed for some topics to read as news. Great work!


How does the NFT thing work? I do not have Polygon tokens but I wonder what have you envisioned as the functionality of this to be?


Since it was for a ETHGlobal hackathon I thought it would be fun to experiment with the feature where we mint an NFT where the metadata points to the IPFS link. You could then do whatever you do with an NFT.


I wonder how much information do you get from arxiv regarding the dependency graph via citations. Could there be a way that I, as I upload my own manuscript, to tip the authors of the people I citated and conversely someday have the possibility of also generating revenue personally as such? It would be much nicer, I think, if the fees that are currently paid to journals to instead go to the authors that also contributed to my work.


There are for sure interesting ideas related to new forms of scientific funding. The issue I have and why the project is currently on standby is how to combat spam/hoax articles


Cool.

Now the same for Sci-Hub?


Search bar doesn't seem to be working...


yes


does the upload here also submit to arxiv, or will uploading to xivra mean they diverge?


It will diverge.


What if every paper was on IPFS? Whenever I try to read an obscure paper published before 1990, the paper is usually behind a paywall.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: