I am a novice, but I'll do my best to answer. IPFS isn't a replacement for the existing web like many of its predecessors, for the purposes of your question, it really works more like a drop-in shared caching system. The site host doesn't have to participate in IPFS for this to work.
As an example: You host a blog. As an IPFS user, I surf to the blog and store your content in my cache. When another IPFS user attempts to access your blog they may pull directly from my cached version, or from the original host (depending which is fastest). Merkle DAGs are used to hash content for quick locating, to ensure content is up-to-date, and to build a linked line of content over time.
This gets more interesting if there's a widespread service outage. IPFS nodes will continue to serve the most up-to-date version of the web even if the web is fragmented. As new information becomes available it is integrated into the existing cache and then propagated to the rest of the fragments.
I still struggle to understand how this works with databased content, but I do believe IPFS addresses this content.
Yeah - there's a ton of immutable URLs on the web. All of CDNJS/JSDelivr/Google Hosted Libraries, most of raw.github.com, all Imgur & Instagram images, all YouTube video streams (excluding annotations & subtitles), and all torrent file caching sites (like itorrents.org). There are probably some large ones I'm forgetting about, but just mapping immutable URLs to IPFS could probably cover 1/3rd of Internet traffic.
IPFS archives is the effort that's going on right now to archive sites. Eventually there will be a system for automatically scraping & re-publishing content on IPFS.
Right now storage space and inefficiencies in the reference IPFS implementation are the biggest problems I've hit. Downloading sites is easy enough with grab-site, but my 24TB storage server is getting pretty full :( ... Gotta get more disks.
Say you grab a site. How do you announce that fact, verify that it is an unmodified copy, sync/merge/update copies and deduplicate assets between different snapshots?
Say I have a site that sells Awesome Products. L337Hacker mirrors my site, does a DDOS on the original and lets IPFS take over, redirecting the shopping cart to his own site.
Is this a potential scenario? If so, is there any way to prevent it?
If I'm reading this correctly, there are a few ways IPFS could be used in support of distributed, fraud-resistant, commerce.
First: publishing the catalog isn't the same as processing the shopping request. Online commerce is largely an update to catalog + mail-order shopping as it existed from ~1880 - 1990. If someone else wants to print and deliver your (PKI-authenticated, tamper-resistant) catalog, that's fine.
Second: The catalog isn't the transaction interface, it's the communications about product availability. The present e-commerce world is hugely hamstrung on numerous points, but one of these is the idea separating the catalog and product presentation itself from ordering. So long as you're controlling the order-request interface, you're good. A payment processing system which authorised payments via the bank rather than from the vendor would be helpful. Also a move away from account-info sharing.
The key is in knowing who the valid merchant is, and in establishing that the fraudulent merchant has misrepresented themselves as the valid merchant. Perhaps authentication within the payment system would help.
Taking the shopping cart's payment mechanism out of the shopping cart would help.
All IPFS URLs contain the hash of the content, so you can't change it. There's a mechanism to allow for URLs which can point to varying bits of content, but I'm not aware of a paper which shows its security properties.
Upon further reading, it appears that it may be impossible to verify the security of an IPFS cached page, simply because the hash is calculated post-fetch on the client. That allows any sort of shenanigans to be performed on the original content before it's stored.
If content is created specifically for IPFS-caching (similar to Freenet or Onion), then it may be possible to be authoritative, but content cached from the web should never be considered so.
> Upon further reading, it appears that it may be impossible to verify the security of an IPFS cached page
Not at all, rather the opposite, it's very easy to verify a page since the hash is based on the content.
You have file "ABC" that you want to download. So you fetch it and once you have it locally, you hash it yourself and you compare the hashes. If they are the same, you know you have the right thing. If they are different, someone is sending you bad content.
Re-read the original question. If someone is preventing access to the original page and the alternate is being served thru IPFS, there is no way to compare the original. The IPFS cached page becomes the authoritative page and could contain altered content, which the hash takes into account.
If the original page can perform the hash and embed it, that would somewhat alleviate the issue during the fetch, but do nothing to prove that the IPFS-served page was trustworthy or not, unless some third-party knows the original hash, as well.
If the page was served to the IPFS network, to be cached, by a neutral, trusted third-party, that would somewhat alleviate the problem, although there arises the problem of trust again.
The only way to minimize the trust issue is if the page originates from inside the IPFS network and is not a cached version of page originally served outside the network.
You're right. I misread your parent's comment, which suggested that web pages could magically be cached securely into IPFS from the public web without any involvement from the website's owner, which is nonsense.
As an example: You host a blog. As an IPFS user, I surf to the blog and store your content in my cache. When another IPFS user attempts to access your blog they may pull directly from my cached version, or from the original host (depending which is fastest). Merkle DAGs are used to hash content for quick locating, to ensure content is up-to-date, and to build a linked line of content over time.
This gets more interesting if there's a widespread service outage. IPFS nodes will continue to serve the most up-to-date version of the web even if the web is fragmented. As new information becomes available it is integrated into the existing cache and then propagated to the rest of the fragments.
I still struggle to understand how this works with databased content, but I do believe IPFS addresses this content.