Hacker News new | past | comments | ask | show | jobs | submit login

My concern with the IPFS claims of permanence, like "Where you can know for sure your pictures will be available forever" is that, AFAICT, files are only uploaded to the network once another party requests them.

An example:

  $ipfs init
  $echo 'agdkjagka' > test
  $ipfs add test
  added QmTtmsSSTH5fMQ9fn9NRXWtcZXnBBazxR8fidXcE5KB76h test
  $rm -R ~/.ipfs
  $ipfs init
  $ipfs cat QmTtmsSSTH5fMQ9fn9NRXWtcZXnBBazxR8fidXcE5KB76h
  Error: merkledag: not found

In English, `ipfs add` does not upload anything to the network. The only way a hash becomes distributed on the network is for another party to request that hash. Even then, I believe that the data only exists on that party's system. That is, if both you and the requester leave the network, the data is gone.



Yeah, this is a common source of confusion, and the ipfs.pics tagline doesn't really help the situation. Personally I'd change it to "where you can ensure your pictures will be available forever".

You shouldn't read "permanent" as "eternal", but as "not temporary" --- in the same sense as "permanent address" or "permanent employment". HTTP is temporary in the sense that you're relying on a single server to actively continue serving a file for it to remain available. IPFS is permanent in the sense that, so long as the file exists somewhere in the network, it will remain available to everyone.

Edit: If you want to ensure your content remains available, you still need to host it somewhere (your own server, convincing/paying someone else to, etc). IPFS isn't so much about storage as it is about distribution.


I like that distinction. As a non-native I wonder – is there an english word that's better suited than "permanent" to describe "not temporary"?


Lasting, until further notice, long-term


Lasting seems appropriate – thanks!


I would have said "available"... just not the forever part.


Durable or immutable?


IPFS is just piece of the puzzle. Filecoin, Bitcoin like technology (but about files instead) is meant to combat this issue. Similar to how you pay Dropbox or AWS to host your files, you'll have IPFS hosts that you'll pay to rehost your content. Or, you'll have your own public daemon that you can "pin" your own content. In the end, people are not willing to give out disk space for free, therefore Filecoin will exist.

http://filecoin.io/ < built by the same guys behind IPFS and meant to be used together


> In the end, people are not willing to give out disk space for free, therefore Filecoin will exist.

Disk space is one parameter of the equation though. Bandwidth and uptime should be considered too in order to estimate the effective amount of resources a node adds to the network.


Given enough peers over time, neither bandwidth or uptime should be an issue. If an incentive based system like Filecoin were to take off, and there's a market to exchange that currency, then it may actually be a viable business just to set up farms that host content. Kind of like a hosting service, but with indirect customers.


What if you pay the network to host your blocks by hosting some other blocks from the cloud, at some m:n ratio? The cloud (some random peer) could periodically check your block inventory and make sure you were still holding up your end.

edit: surely that's been done before?


Really, this whole concept is similar to torrents. Either ratio-based (upload at least X % of what you download), or HnR based (don't download something and refuse to seed it.)


Storj (http://storj.io/) is an interesting decentralized approach to this problem too.


also Maidsafe (http://maidsafe.net/)


This is true. However things disappear from the web all the time. Imagine if everything on the internet that was accessed once a week would be available forever. While not perfect, it would be much better than the web is today.

IPFS doesn't download anything to new peers, unless the new peers ask for it. That way each node owner can control what's on their node.

But say if popular browsers by default used IPFS as a cache, that way if the original publisher goes away the content could live on, as long as the content is popular.


> But say if popular browsers by default used IPFS as a cache, that way if the original publisher goes away the content could live on, as long as the content is popular.

That is my main issue with the way IPFS is being marketed, as it were.

It is not a "Permanent Web" if the only content that is permanent is popular content. Old links will still break, because old links are generally less popular. Old websites will simply vanish, even ones that are otherwise maintained on the current web.

In particular, applications like this post itself, that are part backup part publishing, aren't great applications of IPFS because your images are just hosted off your home internet connection. Power outage? No data. ISP issue? No data. Hardware failure? Hope you had a real backup. Basically, why would I choose IPFS, which is in this case equivalent to self hosting, over flickr, instagram, etc?

Edit: I'd be remiss to not refer you to toomuchtodo's comment below. Were a service like the Internet Archive to take part in IPFS then it would help with some of my above concerns. However, it's not really IPFS that is making the permanence possible so much as the Internet Archive in that circumstance.


The permanence of a "site" in IPFS is intrinsically bound to the active participation of those social entities propelling the sites content.

So, were we to have IPFS support directly in the browser, every time you or I take a look at the pics in a thread, for example, we'd be contributing to the effort to keep things on the IPFS network, to our nearest local trusted peers, for as long as the subject is relevant.

So, your typical net forum, whose members are dedicated to the subject, would communicate about that subject a such, and in so doing .. perpetuate the network.

Yes, the IPFS web has to be tended. But so do your servers. Your servers will die in an instant if 10 or so people die, in an instant (extreme case), or for any one of a thousand different - social - reasons. In this case though the technology is aligned; the load of supporting an IPFS web is being distributed among people whose interest supports the subject, instead of the centralized, sysadmin-with-keys-of-godlike-power. This de-centralization should be considered an attack on devops. IPFS means that the admin of establishing a distributed content delivery system capable of scaling to load, no longer requires an admin. The user is the admin.


> Basically, why would I choose IPFS, which is in this case equivalent to self hosting, over flickr, instagram, etc?

Personally, I'd like to have some data backed up in several places and have links that don't break. IPFS allows for that.

Flickr sells up and goes down? All the links to images break.

> However, it's not really IPFS that is making the permanence possible so much as the Internet Archive in that circumstance.

Both, surely. The major thing that IPFS also allows is backups of the IA without any single person needing to be able to host absolutely everything.

You are right though, there is a big difference between allowing permanent backups and guaranteeing them.


IPFS also allows IA to not just be a backup, but to help distribute the original content itself. There's no longer a distinction between origin hosts and backups.


Yes, that's a very good point.


>Old websites will simply vanish, even ones that are otherwise maintained on the current web.

Not really. Today, a website needs to be maintained by the original host or it goes away. If IPFS were used, the same site would need to be hosted by the original host or any other interested party.

If absolutely nobody else is interested enough to host it, the original host can continue to do so, and the site would be in the same situation as today's sites: hosted by one node.

>In particular, applications like this post itself, that are part backup part publishing, aren't great applications of IPFS because your images are just hosted off your home internet connection. Power outage? No data. ISP issue? No data. Hardware failure? Hope you had a real backup. Basically, why would I choose IPFS, which is in this case equivalent to self hosting, over flickr, instagram, etc?

While I haven't looked at the source code, I'm fairly certain ipfs.pics is uploading the photos to their servers as well. It's effectively a Flickr-type site using IPFS as the backend, with the added benefit that the photos may still be available somewhere else if their servers disappear.


It already does. "The way back machine" https://archive.org/web/


Right. At this time, you'd need a forced persistent host for content (S3?). Hopefully that need would drop away as more hosts joined the network.

Note to self: Build an S3 backend/gateway/persistent-store-of-last-resort for ipfs.


Well, it doesn't matter so much how many hosts join the network. You still need to convince some members of the network to view your content at least once in order to distribute the data.

I suppose you could argue that nonvaluable content would vanish over time justifiably, but then it's not really a, "Permanent Web."

Edit: Apparently I can't reply to your reply to this comment, but thanks for the link. I hadn't seen that.


I believe IPFS was partially intended to help the Internet Archive in that regard. They'll be the consumer of last resort for all objects, thereby bringing about the Permanent Web.

https://ipfs.io/ipfs/QmNhFJjGcMPqpuYfxL62VVB9528NXqDNMFXiqN5...


It'd be interesting to see a browser implement caching using something like IPFS. When a regular HTTP GET (or whatever, really) request is sent, the IPFS enabled browser could look for a `Link` header with the `ipfs:` scheme and `rel="alternate"` in the response, and use that as an alternate place to look for the content. The Etag header could carry the hash, so the browser would could tell on subsequent requests which hash it associates with the mutable URI. In the event of a 304 it'd look up the data in the IPFS cache – which may or may not actually be on disk. If not it might still be a more efficient fetch than HTTP since there may be many peers; worst case scenario, the only peer is the HTTP server you made the request to in the first place.

I suppose `Content-Location` could be used as well, but I don't know how well browsers that don't understand the `ipfs:` scheme would react to that, although the spec doesn't preclude non-http schemes used in the URI.

It'd be an interesting experiment anyway, and could be a boon to adoption of a distributed system like IPFS.


Come to think of it, `Content-Location` is much more semantically appropriate than `Link <ipfs:hash>; rel="alternate"`; the latter is just a link, but the `Content-Location` header would tell you the canonical location of the requested content. For an IPFS enabled client, this would mean that if they want that specific content, they'd never even hit the HTTP server on subsequent requests, but dive straight into IPFS. That said, existing clients may get very confused by an unsupported scheme in that header. Presumably, that client should go `IDK lol` and go use the not-so-canonical URL instead, but I'd be surprised if they'd actually work like that.


Can't you just make the initial request for your document yourself? (I know this works to seed Freenet; not as sure about IPFS.)


It caches things locally (in ~/.ipfs/blocks), so you'd have to request it from a secondary system to get it even on another node. However, my understanding is that if that second system left the network and you left the network the data would still be lost.

You need a third party to request the data and not leave the network to keep the data around.

Given either the third party reliably remains in the network (e.g. the Internet Archive) or you can consistently get new third parties to request the data and cache it then it will remain in the network. The latter does not seem particularly reliable to me, however.


I think it's more of an issue with "marketing" or how IPFS is (was?) presented: It's not a magic web-in-the-sky for all the things -- but it does make it really easy to a) host stuff redundantly, and b) scale out distribution. So you could edit a static web page on your laptop, have a "post commit hook" (or other automagic system) that pulls/pushes published posts to two-three permanent servers -- these could be backed up as normal, or you could just spin up some VMs and have them "restore" from some "master list" of your own content (hashes).

Now as long as at least one device is up (and has the content), you can bring backups on-line easily. And as long as at least one server is connected to IPFS other nodes can get the content, and in theory, any spike in popularity will get distributed and cached "suitably".

An added bonus is that if you publish something like a controversial, but popular, political blog post/expose, and some government throw you in a hole that officially doesn't exist -- your readers, if they're on IPFS, will maintain an active backup of your content by virtue of reading it.

This is a lot more convenient than someone having to explicitly spider it etc (although a combination would probably work/be good idea -- eg: an IPFS "dmoz.org" where authors could register content index-pointers for others to spider/download into their IPFS nodes -- and index for search).


I don't disagree on any particular points. When I first read about it and started playing with it I definitely felt like my expectations were set to something other than what IPFS actually provides.

That said, I think systems of this nature are worth pursuing and perhaps IPFS itself can be improved for more general purpose use cases. For my part, I think it'd be awesome to be able to write some html, css, make some images, `ipfs add ~/website` and then be able to link anyone my content and have reasonable guarantees of it's existence for the rest of my life. I can host my own websites, but it's not a particularly enjoyable experience.

> This is a lot more convenient than someone having to explicitly spider it etc (although a combination would probably work/be good idea -- eg: an IPFS "dmoz.org" where authors could register content index-pointers for others to spider/download into their IPFS nodes -- and index for search).

IIRC it's possible to follow announcements of new hashes on the network and retrieve them automatically. I picked this up from #ipfs on FN, I believe, so I'm not 100% sure about it. Doing that would make an IPFS search engine fairly robust (and interesting to build, actually).


ipfs dev here! This is indeed possible, you will be able to listen on announcements (provider messages) of hashes that are near your nodes peerID within the kademlia metric space. To get a complete picture of all hashes on the network, you would need to ensure your nodes had reasonable coverage over a good portion of the keyspace (enough that the K closest peers calls for any hash would return at least one of your nodes).

I really want to build something like this, just haven't had the time to do so.


You don't need to do this with Freenet. When you insert data it is pushed to other nodes - a completed upload means other nodes have the data. You can turn off your node and the data is still available.


Out of curiosity (I haven't got an IPFS installation on hand, so I can't test) -- what'd be the effect of adding a:

  curl https://ipfs.io/ipfs/QmTtmsSSTH5fMQ9fn9NRXWtcZXnBBazxR8fidXcE5KB76h
before the "rm -R"-line? I'm guessing it might not be something the gateway servers are set up for doing ATM -- but maybe they should, or a cluster of IPFS node should be set up for that?

I'm not sure if forcing a full download of the whole file is a waste of bandwidth, or a clever way to force the person adding a file to the cache at least perform some effort by "using" the same amount of incoming bandwidth as the cache nodes would have to on the back-end.

My initial thought was that such a system should allow "seeding" the cache by simply sending a HEAD or similar request...


Yeah, you can definitely force the gateways to cache content for you. Just make sure not to rely on that for keeping things available, they run a garbage collection every hour or so to clear up disk space.


>The only way a hash becomes distributed on the network is for another party to request that hash.

I consider this a feature, not a bug.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: