Design and evaluation of IPFS: a storage layer for the decentralized web

noasaservice · on Sept 6, 2022

Lets remind everyone:

1. IPFS attaches ALL network interfaces (internal and external) to your identity.

2. Tor is still "experimental" done by 3rd parties. https://flyingzumwalt.gitbooks.io/decentralized-web-primer/c...

3. Due to 1 and 2, any hosted content is EASILY trackable to a user's computer, even behind NATs. A machine cryptokey also helps cement that (but can be changed). This allows easy DDoS'ing of any and all endpoints hosting content you don't like.

4. It is trivial to ask the dHT for *who* has a certain content key, and get all (or the top 50?) computers hosting that content. (this matters with regards to "sensitive" content)

5. Running a node is still high cpu, ram, and network chattiness - so using a VPS to keep IPFS off your local network is still tenuous to run.

zcw100 · on Sept 6, 2022

1. IPFS attaches ALL network interfaces (internal and external) to your identity.

I don't know what you mean by attaching a network interface to your identity or even just identity. IPFS identifies a node by PeerID which is mostly a public key, that's it.

2. Tor is still "experimental" done by 3rd parties. https://flyingzumwalt.gitbooks.io/decentralized-web-primer/c...

Tor has nothing to do with IPFS. There have been some people that have worked on some integrations to use them together.

3. Due to 1 and 2, any hosted content is EASILY trackable to a user's computer,even behind NATs. A machine cryptokey also helps cement that (but can be changed). This allows easy DDoS'ing of any and all endpoints hosting content you don't like.

Yes, it is easily trackable. There is nothing about the design or goals of IPFS to be anonymous. In some ways the entire point is to be found and announce that you have content available. I don't know why people associate decentralized with anonymous. Not sure how you think it's cemented. Just toss the key and make a new one. How does this make DDoS'ing endpoints any easier than anything else hosted on the internet let alone DDoS'ing all of it. In face being distributed and content addressed can mitigate DDoS if it's malicious or just a hug of death.

4. It is trivial to ask the dHT for who has a certain content key, and get all (or the top 50?) computers hosting that content. (this matters with regards to "sensitive" content)

I hope so, that's the way IPFS works. Ask for who has what you're looking for and retrieve it. I have no idea what you mean by "sensitive" content. If it's sensitive you can encrypt it.

5. Running a node is still high cpu, ram, and network chattiness - so using a VPS to keep IPFS off your local network is still tenuous to run.

There are configurations to mitigate this but just making a blanket statement that it's high cpu, ram, and network is just FUD.

dannyobrien · on Sept 6, 2022

(Disclosure: I work for the Filecoin Foundation/Filecoin Foundation for the Decentralized Web).

I do actually agree that the privacy and anonymity aspects of IPFS are not well- conveyed. I think people get hooked on the "censorship-resistant" nature of decentralized systems, without understanding that even if you have multiple sources, for instance in a content-addressable network like IPFS, aggressive censorship systems have other strategies to dissuade dissemination or punish readers. You always have to be thinking a few steps ahead. Services like Tor and, I hope, the IPFS network both try to convey what threat models they are useful for, and which they are not, but it's really hard to stop overenthusiastic re-statements that give them super-powers they do not, in fact, possess.

That said, there's a bunch of careful thinking right now going on about how IPFS's privacy story could be improved: https://blog.ipfs.tech/ipfs-ping-2022-recap/ has a couple of sessions on this, and is a great summary of some other recent developments in the space.

One of those improvements is in the point about nodes being high CPU, RAM, etc. (I actually find this to be more of a challenge when running the full IPFS Go node locally on my desktop, rather than on a VPS; it requires some tweaking.)

The strategy right now is to encourage more implementations of IPFS to cover more use-cases; the original go-ipfs had to do everything, including maintaining some legacy decisions. Nowadays, there's a lot of effort on alternative IPFS implementations that can be slimmer, or optimised for particular scenarios, e.g. on an embedded device, serving a high-load web gateway, or providing millions of files. Protocol Labs recently renamed their canonical go-ipfs to kubo (https://github.com/ipfs/kubo ) to make it more of a peer with other implementations.

Of course, I love all these new generation implementations EQUALLY, but if you pushed me, I've enjoyed playing around with https://github.com/n0-computer/iroh , a modular rust implementation building off the increasingly robust rust libp2p etc libraries. There's some more to pick from here: https://docs.ipfs.tech/basics/ipfs-implementations/

noasaservice · on Sept 6, 2022

First off, thank you for the comment!

I was an early adopter (0.3 !) of go-ipfs. I was experimenting in shifting massive public scholar data archives to it, and then referring to blocks to remix content in other archives. Naturally, I was thinking this could be applied to Internet2 for even faster transport of bulk data. (I was also playing around with private IPFS clouds for sensitive data.)

At the time, on a 1Gbps connection to the internet, I was getting consistently 115MBps, or 92% of linespeed!

However, when I poked further at the protocols, there were definite things I wasn't happy with. Naturally, without providing metrics to point at (and refute... sigh "fud"), I did notice that go-ipfs was a HOG. We all know it. It's getting better, for sure, but yeah. Found that one out when I got TOS'd off of a VPS provider when I started using 80% of cpu and 90% ram. Then again, $5/mo dealer :D

Outside of an academic setting, my other concern for me-hosting was that when I joined the network, it was putting every network adapter into the dHT as my machineID. I get why, so that local nodes could talk with each other to reshare highly requested content. But I'm definitely a "least surprise" kind of person, and having my internal IP4 and IP6 addresses put online was definitely a big surprise.

I also remember the old discussions of how to handle the /tor/(onionsite) network connection, and how it appeared to get tabled/scrapped, over issues on how to do so without violating anonymity beyond the onion-name. I remember back in the day on IRC helping 2 people who mostly figured that out, or at least got it to emit a sacrificial IP.

Again, I still disagree with not putting something like Tor or I2P in the limelight with "Offer these files via Tor/I2P". Doing this would allow all the peer command to probe who's offering and who's downloading a way to do so without outing your whole network. It would also have a nice side-effect of increasing the size of Tor/I2P and thus also strengthening those networks.

And quite frankly, given that FileCoin was meant to pay others to host your content, its the cryptocoin I have least issues with. It makes sense, and seems to be grounded in reality of finite storage/bandwidth. And IPFS is completely usable on your machines without paying a cent in filecoin. Seems like a win-win, honestly.

dannyobrien · on Sept 6, 2022

No problem! I hear you about Tor/I2P integration -- I've been following that bug/issue since it was first raised, and speaking personally, I've been an advocate for building a privacy-protective "stack" that uses existing tooling for a long time (see https://www.eff.org/deeplinks/2013/08/tahoe-and-tor-building... , which was intended to be a discussion of composing existing tools together, though in retrospect seems much more Tahoe-LAFS than I meant it to be.)

It does seem harder to pull off than it at first seems though, which I think is why numerous people have bounced off it on both the IPFS and Tor sides.

I have a general theory that this is true of a lot of interoperability initiatives: they are by their nature tasks that sit on the edge of "nice to have" periphery of an existing project. Plus they often require dev unicorns -- people who are able to understand the architecture and cultures of two different development spaces.

One thing I've been talking to a few people about informally as part of my work at FFDW is to work out an institution or funding initiative that would be a wrapper around these kind of interoperability ventures more generally. Dan Lynch's INTEROP was a vital part of the early Internet's success, and I think we miss something occupying that space in our new decentralized world. We have so many amazing tools, but have so little time to make them work well together.

dtaht · on Sept 6, 2022

I really wish y'all had a grip on and were measuring the bufferbloat you are creating with TCP_INFO. hundreds of flows in slow start are really tough on the network even with fq_codel or cake, and ledbat...

colinsane · on Sept 7, 2022

>> 5. Running a node is still high cpu, ram, and network chattiness - so using a VPS to keep IPFS off your local network is still tenuous to run.

> There are configurations to mitigate this but just making a blanket statement that it's high cpu, ram, and network is just FUD.

it’s not FUD. this is the out-of-the-box experience for many users, and has been basically since its inception. i know enough about IPFS to import, retrieve, and pin content, convert between different CID formats, use the IPNS and dnslink stuff and host a gateway, configure the set of resources that gateway will serve, and so on. what i still don’t know how to do after being an IPFS user since 2015 is how to make it not noticeably bog down the typical low-end/SBC home servers lots of people in this space use for all their other p2p stuff.

it’s not FUD: perf/resource use is a real problem that is legitimately hampering adoption. try running just basic IPFS workflows on the latest gen of raspberry pi before labeling perf complaints as FUD. if you’re close to the dev team and somehow don’t understand that this isn’t FUD then setup a conference call with me and i’ll demo perf of a typical home setup for you.

zcw100 · on Sept 7, 2022

It is when you don’t include any context. You at least included that your standard of performance is a raspberry pi.

px43 · on Sept 6, 2022

Yeah, doing crime on IPFS is a really bad idea. I talked with Juan Benet about this in maybe 2016 or so, and he said it's a feature, which is fair enough.

Privacy is very much not the goal of IPFS, and never has been. Not only is it obvious who is hosting any given file, but if you listen actively you can usually see who is downloading any given file as well.

7v3x3n3sem9vv · on Sept 6, 2022

sounds like a potential privacy disaster of Google Facebook and others can just listen in on erroneously assumed privacy. I suppose the only reason Google/Facebook aren't pushing for it is it allows anyone to collect this data vs just the big players.

chriswarbo · on Sept 6, 2022

> sounds like a potential privacy disaster of Google Facebook and others can just listen in on erroneously assumed privacy

IPFS is an alternative to HTTP. Would you characterise Google's Web crawler as "listening in" and an "potential privacy disaster"? Are those paying for SEO "erroneously assum[ing] privacy"?

noasaservice · on Sept 7, 2022

HTTP doesn't announce ALL the internal networks you're on. It only announces its own public IP in which it responded to.

hecturchi · on Sept 6, 2022

1) To be fair, private IP ranges should no longer be published to the public DHT (there is a LAN DHT for them).

And of course what interfaces are announced is and has been configurable for as long as I remember ("NoAnnounce" setting in the config).

blamestross · on Sept 6, 2022

Yes on all counts.

4) Just keep watching the dht entry. It isn't "Top" it is "most recent". If you really are in a rush to know, just join the dht with that entry as your node's hash (or one close to it) and they will even advertise themselves to you.

chriswarbo · on Sept 6, 2022

These points also apply to HTTP, except (5). In fact, resource usage is my biggest problem with IPFS at the moment; and the reason I turned off my node :(

hecturchi · on Sept 6, 2022

It is relatively new so not many people know about it, but now libp2p (and Kubu/go-ipfs) can be configured with a Resource Manager that effectively can keep a lot of the resource usage at bay: https://github.com/ipfs/kubo/blob/master/docs/config.md#swar...

noasaservice · on Sept 6, 2022

Yeah, you really can't get away from the chattiness of IPFS because that's just how distributed hash tables work.

But the CPU and RAM costs... well, those are frankly exorbitant to what amounts to a glorified Apache web server bolted on with a Bittorrent tracker.

pclmulqdq · on Sept 6, 2022

The CPU and RAM costs are all about figuring out how to pay people (and charge them) algorithmically. I remember private trackers that had seeding and uptime requirements (enforced by people) 20 years ago.

chriswarbo · on Sept 6, 2022

> The CPU and RAM costs are all about figuring out how to pay people (and charge them) algorithmically

This sounds more like Filecoin than a normal IPFS node. I've never used Filecoin, I just host my static Web sites and git mirrors on IPFS, but the de facto Go implementation of IPFS slows down my laptop noticably and is consistently the highest consumer of RAM.

rhodorhoades · on Sept 6, 2022

Interesting. Can you explain how that’s different than the normal internet/ hosting solutions currently available?

chriswarbo · on Sept 6, 2022

The reasons I like IPFS:

- The address of a file has no relation to the machine hosting it (it's just a hash of the contents). Unlike HTTP and DNS, which rely on machine-specific IP addresses

- I can host files on multiple unreliable machines (e.g. a powerful desktop, a mostly-always-on raspberry pi and a couple of frequently off/disconnected laptops). As long as at least one machine's online, the files are still available. Doing this for HTTP requires some sort of load-balancer (a single point of failure), plus mechanisms for discovery, authentication, etc.

- Since IPFS uses a content hash, all copies of a file will get served from the same address; hence anyone with a copy of a file can bring its address back to life. With HTTP, people can host their own copies of files that get taken down or disappear; but their copy's address will be different from the original, and hence all hyperlinks to the original will remain broken. (Note that it's not perfect, since IPFS can perform hashing in a few ways, e.g. to better support streaming; and that results in different addresses)

This is really useful for stuff like git repos, software packages, etc.

moomoo11 · on Sept 6, 2022

You get way more VC washing I mean funding.

forgotmypw17 · on Sept 6, 2022

For not "sensitive" content, but just as an alternative to other hosting methods, would you say that these shortcomings fall under "do things that don't scale"?

pclmulqdq · on Sept 6, 2022

A lot of Web 3 stuff is about federating infrastructure in ways that still allow centralization of control. IPFS is just another one of those protocols. One other thing that turned me off about IPFS/Filecoin was the huge amount of compute power needed to run a node.

hosh · on Sept 6, 2022

IPFS was around before people started talking about “web3”.

I see it as part of increasing resiliency and data locality. A key characteristic is that the cost for data storage is shifted to the people who care about preserving that data.

So a good example is distributing Linux distro packages.

Apparently, cpu/mem can be tuned.

px43 · on Sept 6, 2022

Not really. The Ethereum whitepaper was 2013, and I'm pretty sure the first Web3 stuff talking about Ethereum(execution)+Swarm(storage)+Whisper(signalling) was 2014, and IPFS was launched in 2015. IPFS worked well enough that people lost motivation to work on Swarm, and IPFS pubsub would have been cool for the signalling layer, but it seemed to fall flat on its face. Ethereum went all in on being the execution layer of Web3 and mostly delegated other components to other projects which are competing to fill in the gaps, but all generally play nice with each other.

https://blog.ethereum.org/2014/08/18/building-decentralized-...

hosh · on Sept 6, 2022

I suppose "long before" is a mischaracterization. According to Protocol Lab's history, this stuff was bandied about in 2013: https://web.archive.org/web/20210428163146/https://protocol....

To be fair, ideas like can emerge all together, whether in parallel or because people were talking to each other about it.

Personally for me, I don't think web3 will get anywhere (and I could be wrong). I'm more interested in IPFS (and less so with FileCoin) because the use-case I am interested in is much narrower in scope. It's also interesting to me that "web3" got hijacked from semantic web ... though semantic web stalled out as well.

hecturchi · on Sept 6, 2022

I think the point is not to force decentralization, but to allow it. When the protocol is open, there are more chances to break the monopolies through innovation, quality of service etc. and avoid lock-in.

In this case, IPFS breaks the monopoly of trust (only obtaining content from a single source because that source is the reputable source for it). Content addressed data makes the source irrelevant. Many possibilities open from there (i.e. breaking the monopoly of storage).

Hamuko · on Sept 6, 2022

I've been under the impression that IPFS has way lighter computational requirements than Filecoin, which is basically impossible to run without a server.

pclmulqdq · on Sept 6, 2022

Filecoin needs a server. IPFS needs a high-end laptop (that you don't take anywhere and keep 100% connected to reasonably fast internet) at least.

eminence32 · on Sept 6, 2022

3 of the authors work for Protocol Labs (the company that develops IPFS), which is likely why the paper is able to analyze data from the ipfs.io gateways

dartharva · on Sept 6, 2022

I only know IPFS from it being one of the enablers in downloading from Library Genesis, and man am I glad that it works so good for them..

sixtyfourbits · on Sept 6, 2022

It actually doesn't work so well, unfortunately.

Picture this: 3 million books, at least one CID each (in practice it's often multiple, since the libgen collection uses a chunk size of 256kb). Section 3.1 of the paper talks about content publication - for each CID, a provider record is published on up to 20 different peers. Because the CIDs are derived from a high-quality hash function, they are evenly distributed. So this means that a node with a sufficient number of items ends up connecting to every single node on the network. For 3 million CIDs * 20 publication records, this means sending out 60 million publication records, every 12 hours, i.e. an average of 1388 publication records per second (assuming one CID per file, which is conservative). This is just to announce to the network "hi... just wanted to let you know I still have the same content I did yesterday". And every full replica of libgen is doing this.

Another major flaw derives from the way bitswap works as discussed in Section 3.2, which states "before entering the DHT lookup, the requesting peer asks all peers it is already connected to for the desired CID". I'm not sure if that actually means all the machines the node has any type of connection to, or only those connections over which it is using bitswap. Even so, asking every peer you're connected to, even if it's only the subset for which you have a bitswap connection established, is inefficient.

Compare this to bittorrent:

- First there is a much coarser level of granularity; in the libgen case the torrents contain 1,000 books each, so there's much fewer announcements to the bittorrent DHT and trackers. Of course the tradeoff with this is that you can't look up the identifier of an individual book. Instead if you know the magnet link (which can be constructed from the torrent hash) and the name of the file within that torrent.

- Secondly, a node hosting a large number of torrents (and a large number of active connections) will only send out want lists to the peers that it knows are also hosting that torrent. Peers also exchange have lists, and I think one or both can be represented as a bitfield for efficiency (rather than a list of CIDs/hashes). With bitswap you can end up asking every connected peer, just in case they have it.

On a practical note, hosting the torrents is quite practical with adequate hardware, but even with fairly powerful machines IPFS (at least in it's current main implementation, go-ipfs aka Kubo) really struggles and can bring a machine to its knees, even when hosting only a portion of the full collection. In terms of scalability, bittorrent and IPFS are in completely different leagues. Scalability is the main reason the archive of papers from sci-hub (over 80 million) isn't available via IPFS yet, because it's just not going to be able to handle that at all in its current state.

Having said all this, I should state my knowledge of the protocol and go-ipfs is incomplete, as I've only used it but not done any development work on it or dived deeply into the code. I'm happy to be corrected if I've misunderstood anything mentioned above. Also, bittorrent has more than 20 years of implementation experience and i'm sure with further work IPFS can be made to scale better. I don't have the answers as to how to achieve the granularity you get with IPFS vs bittorrent (which is a major point of difference, and something that sets IPFS apart in a significant way). But it's something that definitely has to be fixed to be truly capable of achieving it's stated goals.

b_fiive · on Sept 6, 2022

_Disclaimer: I'm the founder of a project behind a new IPFS implementation_

I have a fair amount of experience with the kubo (go-IPFS) codebase, and can confirm the broad strokes of what you've posted here, including the part where bittorrent is straight-up better at scaling, both in terms of protocol design choices & having robust implementations.

The chattiness of the protocol is a very real problem. It used to be _much_ worse. Further order-of-magnitude drops will require rethinking numerous aspects of the protocol. The implied-start topology of the network needs more thought. What remains to be understood is if those pile of changes can bring IPFS into the same league as bittorrent in terms of network efficiency, while also having the "single-swarm property" that provides fine-grained content routing.

A bunch of us are committed to building this. Hopefuly a HN post in a few years time will point to this one as a reference for just how far we've come

xani_ · on Sept 7, 2022

Bittorent's main advantage (aside from not hosting every single chunk a file as separate entity) is that with tracker it effectively creates its own small network of hosts sharing a given torrent, and even in case of DHT you kinda just use it to find other peers for the one big torrent and that's about it.

Even in massive torrents with thousands of peers you just pick a bunch to talk with and that's it.

And publishing one was same thing essentially. Just that each peer's peer network will mesh enough to propagate data quickly.

> Also, bittorrent has more than 20 years of implementation experience and i'm sure with further work IPFS can be made to scale better.

It honestly looks like fundamental design issue. Bittorrent was blazing fast from the beginning and it only got better.

uncomputation · on Sept 6, 2022

Perhaps but Library Genesis worked long before it started using IPFS as well.

unmole · on Sept 6, 2022

Is IPFS usable without the Cloudflare bridge?

px43 · on Sept 6, 2022

Yeah, there are something like four major public IPFS gateways, one run by Cloudflare, one run by Protocol Labs, or you can just run your own node, which is something that many thousands of people are doing.

a-user-you-like · on Sept 6, 2022

Does Protocol labs use cloudflare?

rmorey · on Sept 6, 2022

no, the ipfs.io gateway predates the cloudflare one

lapinot · on Sept 6, 2022

no, ipfs.io seems to originate from protocol labs own network: https://bgp.he.net/AS40680

truth_seeker · on Sept 6, 2022

> The content retrieval process across all regions takes 2.90 s, 4.34 s, and 4.74 s in the 50th, 90th, and 95th percentiles

Good improvement over the years but still a long way to go feel it even soft real time. Not sure these servers are using on the fly gzip compression before sending over network but they should consider adding compression feature at file or block level natively in "ipfs add" command.

There was an interesting paper "Hadoop on IPFS" (around year 2016-17). I hope these continuous improvement will play good role in making big data and analytics decentralised before it hits v1.0

https://s3-ap-southeast-2.amazonaws.com/scott-brisbane-thesi...

IanCal · on Sept 6, 2022

I'd like to use it for analytics. One issue I hit was when I was testing things out for that use I added a file to several pinning services and the result wasn't available via my node or the ipfs main gateway for hours.

purim · on Sept 6, 2022

IPFS is just too slow for it to be usable on a mass scale. It's a neat idea but unfortunately, p2p file storage is tough, you absolutely need a central model to scale up. Offering coins to cash out at casinos where the other side of the order book are people with unlimited supply of it. Doesn't work.

yamtaddle · on Sept 6, 2022

The fundamental problem with any of these p2p-hosting solutions is that most devices folks use these days are battery-powered. Any solution that requires a persistent connection or intermittent network wake-ups to achieve even semi-decent performance won't make it, and if you overcome that hurdle you still won't be able to count on most of the devices downloading content to also serve content. No-one's gonna sacrifice battery life to make the distributed Web dream happen.

They seem a lot more useful for internal infra of hosting providers, though for IPFS in particular I expect the performance isn't consistent enough to be a suitable solution for most of those use cases.

tinalumfoil · on Sept 6, 2022

Having used ipfs casually the big draw for me has been the gateways. Running a full ipfs node on client devices is mostly not practical, but I can choose a trusted (or not so trusted since I can check the hash) gateway and that can handle the heavy lifting for me. And even though the ipfs network is slow, the distributed nature lends itself to very heavy caching, so the gateway doesn't have to be any slower than if it was serving me its own static files.

Plus the gateways provide compatibility to www.

Given, things are far from ideal now. But looking over the water it's good to see Mastodon taking off and I think that's largely because you have the option of just choosing a single trusted provider ("instances") from which you can access the rest of the network. The trusted provider does the heavy lifting for you.

zozbot234 · on Sept 7, 2022

IPFS is built to work reasonably well with ephemeral connections. Sure, not everyone is interested in serving content but for those who are it seems to be a reasonable choice.

jhunter1016 · on Sept 6, 2022

Disclaimer, I am head of product for Pinata

I think of IPFS as being an open data platform first. You can connect to it and disconnect as frequently as you like. The underlying p2p capabilities don't have to necessarily be fast. It just has to do the job it was designed to do—get content in a permissionless fashion.

Speed, convenience, reliability, and more are not the problems protocol need to solve for. Providers can solve for these problems without infringing on your ability to "take your ball and go home." Take Pinata for example. We provide dedicated IPFS gateways that provide essentially the same experience you would expect from traditional cloud providers. But if you ever want to leave Pinata or back your data up or just inspect your data, you don't need Pinata's permission. IPFS media is public and open. Convenience is a layer on top of that.

IPFS also doesn't need or have tokens. Filecoin is a separate entity. IPFS is especially powerful because it is not linked to a specific blockchain or currency.

dmitriid · on Sept 6, 2022

> Speed, convenience, reliability, and more are not the problems protocol need to solve for. Providers can solve for these problems

They can't if the protocol doesn't allow it.

Otherwise it will be regular centralized storage providers with IPFS bolted on top for one or two geeks who care about it.

zeroclip · on Sept 6, 2022

That’s kinda the whole point. A couple of centralized gateways can be the main hosts, like Protocol Labs and Cloudflare, and provide the best performance, and also compete with each other on the same content hash. If one or both of them goes down, any user in the world can re-host the data by pinning it, if they still happen to have the file on disk or if they’ve backed it up.

dmitriid · on Sept 7, 2022

I don't understand the point, really. Especially not from the user's point of view. IPFS doesn't add anything in n this equation except additional complexity and "well, if this already centralized service goes down, you're still screwed because the chances of someone storing your files are asymptotically approaching zero"

zeroclip · on Sept 7, 2022

If your centralized X.png URL is changed, or the website goes down, or the host goes down, then the URL is dead, and so is everything that points to it. Even if somebody has the asset backed up locally or in their cache, they will have to re-upload it to another URL.

Because IPFS links are not URLs, it works on a different paradigm.

The chance of somebody storing your HTTPS files and IPFS files is the same. Users pay hosts to keep hosting them. With IPFS, if the user stops paying the file hosting service, another user can pick up the slack without the link becoming dead.

dmitriid · on Sept 7, 2022

> if the user stops paying the file hosting service, another user can pick up the slack

Too much faith in someone picking up your files when a centralised host goes down. This is an important detail that for some reason is always dismissed by IPFS proponents.

zeroclip · on Sept 7, 2022

If a user stops paying for hosting, and they are not hosting and pinning it themselves locally, they can't expect the file to persist indefinitely on the web. Hosting is not free.

But if another party is interested in the file, they have the option to persist the file regardless of what the original party and centralized services choose to do.

dmitriid · on Sept 7, 2022

So. Let's go back to my original statement: "Otherwise it will be regular centralized storage providers with IPFS bolted on top for one or two geeks who care about it."

What does IPFS bring into the picture apart from "oh yeh, there's a near-zero chance that someone will keep hosting your file"?

zeroclip · on Sept 7, 2022

There's two specific attributes that are unique to IPFS.

1 - A content addressable URI protocol that allows you to locate a file without linking to a singly-owned and named server or host. This is not the case with HTTPS URL protocol.

2 - Open source clients where multiple parties, including competing parties and individual users, can all simultaneously and permissionlessly peer-to-peer host an asset behind the content addressed URI.

dmitriid · on Sept 7, 2022

I understand that. This doesn't address https://news.ycombinator.com/item?id=32743041 or any of the subsequent discussion.

zeroclip · on Sept 7, 2022

We are entering a recursive loop, because I’ve already responded to that. :)

dmitriid · on Sept 8, 2022

And that's the problem, really. As pure tech IPFS is interesting (probably). But if it doesn't have concrete answers to user questions/problems apart from vague "yeah, maybe perhaps probably something will maybe happen", it's useless beyond a small following of hardcore geeks.

rakoo · on Sept 6, 2022

Bittorrent has clearly shown that p2p can perfectly scale with very little centralization, that is now becoming unneeded. There is an issue with IPFS, not with p2p.

My uneducated guess: content in IPFS is split in too small blocks, making the data-to-control ratio way too low.

hinkley · on Sept 6, 2022

When I worked for a National Supercomputing Center, I discovered that they had a distributed filesystem called Andrew Filesystem (AFS) which they used for archival purposes. With every new generation of distributed filesystem I always wondered why the older ones failed.

It wasn't until maybe 10 years ago that I finally got my answer: It turns out that Amdahl's Law kills AFS. There's a total throughput wall that becomes very painfully visible once you move to gigabit networking, and any one client can pretty much saturate the network.

pjmlp · on Sept 6, 2022

That is also what CERN was using in early 2000 before the Grid efforts.

VonGuard · on Sept 6, 2022

I am told this is very specifically a default setting in the app that makes the minimum connections something like 100 or so. If you crank that down to 5 to 20 it goes much faster. YMMV.

xani_ · on Sept 7, 2022

You don't need central central, but the more you distribute the more you lose on performance.

It could be interesting idea on smaller scale, say you start a "virtual hosting provider", where each of 10, 100, 1000 people connects into a mesh and store eachother's data so in event of failure it just keeps working.

matrix_overload · on Sept 6, 2022

Yeah, but Filecoin raised $250M two years ago. They have enough capital to keep writing these articles until the thermodynamic death of the Universe.

simonebrunozzi · on Sept 6, 2022

In 2017, not in 2020.

wespiser_2018 · on Sept 7, 2022

I'd like to image a world with globally addressable devices, maybe that's through NAT hole punching or just IPv6 where we can share everything, but the way things are now, I really struggle to determine what's the legitimate use case for this technology if we ignore the interest in decentralization for it's own sake and just consider engineering tradeoffs like cost for performance and end user experience/ergonomics.

As a storage layer, there are major challenges to adoption IPFS that have persisted almost a decade in to the project. At this level of partition comes at an incredible cost to availability, and from everything I read, the best practices for hosting user generating content still involve paying a service to "pin" your content to ensure it doesn't get dropped, so you still pay someone to host your data!

So what I'd like to know, is why would I want to use IPFS to host anything, when better, more performant and cost effective alternatives exist, and IPFS doesn't guarantee a file is actually hosted? Like, are there words you can say to your boss to argue for IPFS as a rational choice in systems architecture? What is the use case here?

morsch · on Sept 6, 2022

I keep confusing IPFS with the (much older) I2P. Just in case someone else makes that mistake.