Keep Your Stuff, for Life

CGamesPlay · on June 30, 2020

Wow, it was really hard to find out what this software does. I finally found a demo at the end of an hour-long talk from 2 years ago: https://youtu.be/PlAU_da_U4s?t=2687

So it seems to have a bunch of scripts to import data into its database from a variety of sources (including cloud services), and provides a search interface to navigate that historical data. And it has a lot of under-the-hood stuff about replication, and it's entirely self-hosted.

pmezard · on June 30, 2020

You initial confusion and nice summary made me thought of camlistore which I looked at 4 or 5 years ago. And here it is, renamed!

This is sad because the project looks interesting, it is not the usual quick kludge or vaporware, but the bad or inexistent documentation is a terrible disservice to the project.

stavros · on June 30, 2020

Yeah, I've seen this project multiple times over the year, was always interested in it but never managed to figure out what it's for, let alone how to use it.

thiagocsf · on June 30, 2020

Perkeep is camlistore, FYI.

michaelcampbell · on June 30, 2020

They know.

> And here it is, renamed!

taneq · on June 30, 2020

Does the first sentence not cover that fairly well?

> Perkeep (née Camlistore) is a set of open source formats, protocols, and software for modeling, storing, searching, sharing and synchronizing data in the post-PC era. Data may be files or objects, tweets or 5TB videos, and you can access it via a phone, browser or FUSE filesystem.

I checked archive.org and that text has been there for a couple of months at least. Looks interesting, I've been in the market for this kind of self hosted backup/replication/tagging/search thing.

ebg13 · on June 30, 2020

> Does the first sentence not cover that fairly well?

Not for me. Almost all of what you copied there seems like intrinsically meaningless fluff. "a set of formats"? "a set of protocols"? "software"! Of course it's software! "for modeling"? "in the post-PC era"?

I would have preferred GP's ~"imports data/files from cloud services into a local database and provides a search interface for it". Because THAT tells me what it does, not any of the other words you posted.

Someone made a really well done cartoon mascot for it, but the copy isn't _helpful_.

generalk · on June 30, 2020

> Does the first sentence not cover that fairly well?

It does not.

The first sentence on the site ("Perkeep [...] is a set of open source formats...") describes literally what the thing is, but not at all what it _does_.

Not to slam on these cats, because marketing copy is _hard_. For project collaborators, or open-source dorks who live in this kind of world anyway, the sentence on the homepage is probably perfectly descriptive.

But I agree with my GP post. Reading the homepage I had no idea what Perkeep actually did.

taneq · on June 30, 2020

What it _does_ is:

> modeling, storing, searching, sharing and synchronizing [...] files or objects, tweets or 5TB videos, and you can access it via a phone, browser or FUSE filesystem

I mean maybe it could have been more explicit or they could have added more detail, but having this as the first sentence is WAY better than most of the 'professional' landing pages for startups that get posted here. 'Harmonizes synergy and increases your ability to wow your target space with your aspirations', now that's meaningless.

js2 · on June 30, 2020

Them: “The project description isn’t clear to me.”

You: “Well I’m sorry it wasn’t clear to you but it was clear to me and better than these other things and here’s why it should have been clear to you.”

If someone tells you something is unclear to them, arguing about it doesn’t change the fact that it wasn’t clear to them.

revnode · on July 1, 2020

As an engineer

> files or objects, tweets or 5TB videos

means rsync, curl, and Twitter API integration. It obviously does more than this, since I can throw something together that does that in a few hours. Where is the list of everything it supports?!?

That should be front and center.

8bitsrule · on July 1, 2020

I managed to parse it, but there's a great grabber sentence down below that might be right at the top:

"Your data should be alive in 80 years, especially if you are." To which you might add, "We're here to help you make sure that's what happens". Then follow that with the "Things Perkeep believes ..." section.

After that, the mission is clear, how it works is clear (though many people might have -no idea- what 'Open source' is good for). Only Then (IMO) can you get away with going all technical on them!

sukilot · on June 30, 2020

[flagged]

theelous3 · on June 30, 2020

Not for muggles is a silly cop out. It's very nondescriptive. That "set of protocols and file formats" etc. could do literally anything.

Is it for a nas? Is it distributed? Is it some backup yoke for a cloud service? Do you run it on your own hardware? Is it all of the above? What does your data look like? Where and how are you searching through it? Is that part not for muggles either or is there a frontend? And so on basically forever.

CGamesPlay · on June 30, 2020

For me it didn’t. That sentence is too vague as to what I as a user will be able to do with this project.

hvs · on June 30, 2020

It reads more like a patent description than telling me what it's for.

awinter-py · on June 30, 2020

yeah like a lot of OSS backup products the docs are grandiose without focusing on ease of operation / feature parity with dropbox

blablabla123 · on June 30, 2020

Perkeep is really straightforward to operate though. One can just install the dmg, mount the root and put files there. Also there's a web interface. For me the most interesting feature is though how easy it is to setup a new node and to sync to/from it. Like rsync but no need to worry about the parameters. Also as the name suggests, it's far more difficult to loose important data than with Dropbox (and similar products like iCloud) which can quickly happen through wrong usage.

mro_name · on June 30, 2020

no wonder looking at the funding, is it?

TuringTest · on June 30, 2020

You don't need funding for focusing on ease of operation, it's a mindset. Either you're user-centered, or you're not.

reallydontask · on June 30, 2020

>Either you're user-centered, or you're not.

It really depends on the audience for the product, targeting regular users does require more thought on the UX, etc which almost invariably means more funding.

michaelt · on June 30, 2020

I don't think the website targets the wrong audience - just the copywriter was so involved in the project they forgot to make what they're offering clear.

When someone says "permanently keep your stuff, for life" do they mean some sort of pay-once eternal backup, like permanent.org? Something censorship-resistant, like Freenet? Something peer-to-peer and distributed-ledger based like Filecoin? Backing up data locked up in cloud services? Converting obscure file formats into ones with more longevity? Bypassing defunct DRM? Activism against civil forfeiture?

taneq · on June 30, 2020

Just from the front page it looks like a software stack for running your own archive server?

BiteCode_dev · on June 30, 2020

Why do you think cmd line tools are popular ? Because they are easier to do than a GUI, not because all geeks think the terminal is the only xp that matters.

atq2119 · on June 30, 2020

That seems... offtopic? You can have command line tools that are easy to use, with documentation focused on user-friendliness, and you can have command line tools where figuring out what they even do in the first place takes a lot of effort. And the same is true for GUIs, for what it's worth.

lukeschlather · on June 30, 2020

That takes time. Or funding. I have often thought "it would be great if I could suck all my emails and Facebook posts into a local search index." It sounds like this is that. If I wrote it I would have exactly enough time to make sure that all my emails and Facebook posts are in a local index and then I would have no more time for a personal project. Making a nice UI on top of it would easily increase the amount of time fivefold.

chiefalchemist · on June 30, 2020

Agreed. It's really about benefits vs features. Features are what. Benefits are why; why you should care. Addressing why is essential.

eexan · on June 30, 2020

About $30k - https://opencollective.com/perkeep/expenses - which is more than most OSS projects would ever see in their lifetime, so it's not such a good excuse for lacking documentation and UX.

sukilot · on June 30, 2020

$30K over 13 years, and 98% of that funding was the lead developer and his wife (plus 10% from another developer) giving their own money to another developer.

It looks more like "wealthy Googlers helping out a friend with a short term gig" than "financing a professional product".

Perhaps you could donate your time as a UX specialist to help these folks who are more versed in backend systems and libraries.

eexan · on June 30, 2020

I'm more of a backend person myself, and perkeep disappoints on that side too. See my other comment about how much time it took to ingest just one .iso. I don't think this system would handle a 5 TB video file as the authors claim, that's some false advertisement!

I considered it for my use case (archiving and deduping dozens of terabytes / millions of files on several personal NAS boxes), really wanted to use it - as I find some of its ideas pretty cool, but in the end decided it would be simpler to just write something from scratch instead. It took less time to write it than it would have taken perkeep to ingest my data.

drwolff · on June 30, 2020

Same video is posted at the bottom of their website.

kalado · on July 1, 2020

I felt the same but if you go to "Docs" the very first link is then:

"Overview: The original motivation and background for why Perkeep exists and what one might use it for."

And there I found a great description.

ZenPsycho · on June 30, 2020

it’s a distributed personal content addressed blob object store, with a variety of configurable storage back ends, plus a web interface to make that kind of usable to a human for human things.

it’s like git but for all your stuff.

why would you want to use it? you probably wouldn’t, quite yet. but it’s an interesting attempt at doing something a little more sophisticated than a plain file system.

teekert · on June 30, 2020

This comment represents the value of HackerNews comments very well :) Thank you.

solarkraft · on June 30, 2020

Perkeep is super interesting. I have been looking at it as a media database (photos, movies and such, powerful tag-based search, all downloaded on demand), but some things regarding that really hurt:

- The data store messes with my files (yes, there's a FUSE mount, but eh, having to adopt a special data store always makes me feel weird since it usually comes with performance and compatiiblity implications, there are also many other block-chopping data stores, for example in IPFS).

- The last time I checked there was no way to delete something. This is okay for Tweets I guess, but if I commit a 3h video I later realize to just be too large or a photo I end up not really wanting around - well, oops.

I have huge respect for Brad Fitzpatrick in general, of course, and especially for creating this. Recent velocity has seemilgly been relatively low, however: https://news.ycombinator.com/item?id=22161812

andai · on June 30, 2020

> ... my time is tempered by 2.5 and 0.5 year old kids. ... I'll pick up my involvement again as kids get a bit older.

> We have no plans to abandon it.

ahupp · on June 30, 2020

It sounds like this preserves your data from lots of different services. This is something I need! But I couldn't figure out what it actually supports. Suggestion: the very first paragraph should describe the specific inputs it can handle.

eexan · on June 30, 2020

For the most part, it's just an object storage (think Amazon S3). Content addressable (think Git): you put an object (file bytes) in, and you can get it out by its hash, that's it.

There are some bits (permanodes and claims) for adding metadata to objects (filename, timestamp, geo location and other attributes, I think even arbitrary jsons) and for authentication/sharing. A few really cool bits around modularity: blob servers can be composed over network - you can transparently split your blob storage over multiple machines, databases, cloud services, set up replication, maybe encryption (unclear to me if it works or not).

Importing data from different services is not really its core competency, at least not yet. It can ingest anything you can put on your file system and there are importers for a few third-party services (see https://github.com/perkeep/perkeep/tree/master/pkg/importer ), but that's about it

kemonocode · on June 30, 2020

Thank you so much for a description of what it actually does which the website seems to struggle so much to convey.

One thing that I'm still trying to figure out is, if you do happen to know: how does it handle data deduplication (if at all)? How about redundancy and backups? I've been glancing over the docs and I do see mention of replication to another Perkeep instance but that's not quite what I'm looking for.

eexan · on June 30, 2020

Deduplication is naturally handled by content-addressable property of this object store: the address of each object is its cryptographic hash, SHA224 in Perkeep. So if you try to put a duplicate copy, you'll find that the address at which you try to put it at is already occupied by the first copy. Perkeep assumes that you never delete anything (deletion is even simply not implemented, not even for garbage collection/compaction purposes), so if you see that one copy of an object was already put, you can discard any further puts as no-ops.

Then there is also some logic to chunk large objects into small pieces or "blobs". These small chunks are actually what the storage layer works with, rather than with the original unlimited-length blobs that the user uploaded. Chunking helps to space-efficiently store multiple versions of same large file (say, a large VM image) - the system only needs to store the set of unique chunks, which can be much smaller than N full but slightly-different copies of the same file. But I personally I find that it deteriorates its performance to the point of making it unusable for my use case of multi-TB multi-million-files storage of immutable media files. If chunking/snapshotting/versioning is important for your use case, I'd look more towards backup-flavored tools like restic, which share many of these storage ideas with Perkeep.

Redundancy and backup is handled by configuring storage layer ("blobserver") to do it. Perkeep's blobservers are composable - you can have leaf servers storing your blobs, say, directly in a local filesystem directory, remote server over sftp, or an S3 bucket, and you can compose them using special virtual blobserver implementations into bigger and more powerful systems. One such virtual blobserver is https://github.com/perkeep/perkeep/blob/master/pkg/blobserve... - which takes addresses of 2+ other blobservers and replicates your reads and writes to them.

eexan · on June 30, 2020

Backup as in backing up one perkeep instance to another is the "pk sync" command (https://github.com/perkeep/perkeep/blob/master/cmd/pk/sync.g...).

You give it the addresses of source and destination blobservers, it enumerates blobs in both, and copies the source blobs missing from destination into the destination server.

dabeeeenster · on June 30, 2020

spent a minute looking and couldnt find the list...

_ofdw · on June 30, 2020

I admire the goal of reliable long term data storage. I don't need to store tweets or 5TB videos, but my current solution is duplicate DVDs with a bunch of par2 files to hopefully ward off bit rot.

I feel like there's some room for improvement.

kstenerud · on June 30, 2020

Hook up ZRaid SSDs on a low powered machine (like raspberry pi). Set up regular ZFS scrubbing, and connect that to a monitoring service.

You're not going to prevent silent bitrot no matter what modern technology you use, so take a proactive approach instead to prevent data loss.

mnw21cam · on June 30, 2020

Agreed. I used to back up onto DVDs, creating a set of (say) 12 DVDs with an extra couple generated by par2, to take care of the case where some of the DVDs just straight aren't readable.

However, I found that I had a lot of data to back up, and it was actually cheaper and less tedious to get 4TB USB hard drives for ~£100 each, and plug them into an old defunct EeePC901 (with the added advantage that if the power goes out it has a battery).

My main PC has an SSH private key that lets it access a restricted shell on the EeePC that only allows it to give it files to store. That way, if a hacker breaks into my internet-facing machine, all they can do to my backups is fill the disc up, not delete or access anything. I have a process on the EeePC that regularly scrubs the par2 files, and the hard drives (I have two so far) are formatted with BTRFS, so given all the data is regularly read by the scrub process, that should notice any drive failures. My main PC uses ZFS, so I have safety in variety.

I also have an off-site backup stored on an encrypted USB hard drive in my locked locker at work, which isn't updated as regularly. My internet connection is slow, so I use the rsync --only-write-batch trick, and then carry the large update file to the backup on my laptop.

What could possibly go wrong?

lostlogin · on June 30, 2020

Re: 4Tb drives, I do the dollars-per-mb calculation before buying hard drives. The most recent time I included the enclosure cost and found that it was actually cheaper to go huge. Granted the enclosure was a Synology, but buying 16TB drives is the closest I’ve been so ‘solving’ storage in a long time. Formatting them and adding them to the array was brutal, and they are noisy, but it has been worth it.

mnw21cam · on June 30, 2020

Yeah, I'm using 4TB laptop-sized (2.5") external USB drives, because they are small, cheap, quiet, and they don't need a secondary power supply. At the time of purchase, they were the sweet spot, with 5TB drives considerably more expensive. That may have changed now.

If you're going for 3.5" drives, then yes I can well believe that the sweet spot is with slightly larger drives, especially if you take enclosures into account. I did the calculations for work a while back for shoving hard drives into something from https://www.45drives.com/ and it seemed that getting the largest drives possible was the best price/capacity option.

TheOperator · on June 30, 2020

Buying enough 16tb drives for an efficient raid array is an expensive way to save money.

Something that's easy to overlook with larger drives is that their rebuild times are worse.

"Shucking" drives throws the economics way off even if it means having to do some hacks and losing warranty... Usually the drives that come in enclosures are smaller.

yjftsjthsd-h · on June 30, 2020

> Buying enough 16tb drives for an efficient raid array is an expensive way to save money.

A lot of ways to be efficient with money start by having or using a lot of it:)

denkmoon · on June 30, 2020

You can buy a 2tb drive for $100. $200 and you've got 430 DVD's worth of data redundantly stored. $300 and you've got local redundancy AND offsite backup.

paranoidrobot · on June 30, 2020

Even if magnetically they don't have 'bit rot', they use bearings where the lubrication can dry up and wear out when they're not turning for long periods of time.

You need to keep them spinning on a regular basis, and replace them as they begin to fail.

dannyw · on June 30, 2020

HDDs are also prone to silent bitrot, where it will simply return incorrect bytes for a sector, even without any smart errors. (Optical disks also bitrot; but so does HDDs).

This is usually a precursor to SMART errors happening in the near future, but unfortunately, it can still result in corrupted replication and corrupted backups; as your backups would be backing up the rotten (corrupt) data.

I've witnessed this happen on both Seagate and WD drives, on systems with ECC memory. I can only suspect this is due to HDD manufacturers wanting to reduce their error rates, and RMA rates: it may happen with the ECC bits in a sector is corrupt, making bitrot undetectable. Instead of giving an error (and being grounds for a RMA replacement), the HDD firmware may choose to return non-integrity-checked data; which would usually be correct but also could be corrupt.

It's why filesystems like ZFS and btrfs are so important.

My rough estimation of this, based on my own experiences and those on r/DataHoarder, suggests 1 hardware sector (4KB for most drives post 2011) will silently corrupt per 10TB year. Such corruption can be detected via checksumming disc systems like ZFS.

Usually, the whole sector is garbage, which is not indicative of cosmic ray bitflips.

External flash memory storage like USB sticks and SD cards fare far worse. In my own experience, silent corruption occurs more like 1 file per device, per 2-3 years; irrespective of the size of memory. I've had USB sticks and SD cards return bogus data without errors, so often. I only know because I checksum everything, otherwise I would have thought the artefacts in my videos or photos came with the source.

If, in 2020, you are not using ZFS or btrfs for long term archival, you are doing something wrong.

ext4, NTFS, APFS, etc may be tried and tested, but they have no checksumming, and that is a problem.

mnw21cam · on June 30, 2020

Interestingly, on my home ZFS raidz with 3 4TB hard drives, I have had to replace a drive a couple of times because ZFS scrub was reporting silent corruption. They were consumer-grade SATA drives.

However, at work, I have backed up ~200TB of data to a large server with RAID-6 and ext4, storing the backups as large .tar files with par2 checksums and recovery data, and regularly scrubbing the par2 data. I have yet to see any corruption whatsoever. These are enterprise-grade hard drives. This is the strongest evidence I have yet seen that the enterprise-grade drives are actually better than the consumer-grade ones, rather than just being re-badged.

dannyw · on June 30, 2020

Enterprise drives have different firmware, especially from an ECC and integrity perspective. From a price/perf standpoint tho, shucking consumer grade drives with ECC win.

lostlogin · on June 30, 2020

Thanks. What are the drives at your workplace?

mnw21cam · on June 30, 2020

I actually have no idea. I didn't have any part in purchasing that particular system, I don't have root, and all the drives are hidden behind a RAID controller. Sorry.

LgWoodenBadger · on June 30, 2020

How do you know they are enterprise drives then?

pnutjam · on June 30, 2020

I have a home "NAS" (opensuse server) where my main /data partition is xfs, but it mounts a btrfs backup partition, rsyncs, and takes a snapshot.

I should really get around to converting the main drive to btrfs, but this works well.

harikb · on June 30, 2020

Do proper use of ZFS also require ECC memory?

pseudalopex · on June 30, 2020

ZFS protects you from disk errors. ECC protects you from memory errors. Using one or the other is safer than using neither. Using both together is even safer.

roland-s · on June 30, 2020

100% yes. With non-ECC you will always have bad RAM bits eventually. With ZFS this is especially bad because it can corrupt your checksums or your ZFS metadata, which means either silently corrupting your data, or corrupting ZFS itself and losing your entire zpool (akin to losing a RAID array).

kurlberg · on June 30, 2020

Maybe not: that ZFS needs ECC is "common wisdom", but the disaster scenario appears not so likely. See

https://news.ycombinator.com/item?id=14207520

https://news.ycombinator.com/item?id=8293025

denkmoon · on July 1, 2020

This is FUD perpetuated by a certain individual on the FreeNAS forum.

denkmoon · on July 1, 2020

Ideally you would temporally separate the purchasing of the drives and make sure to hook them up and check them every so often (once a year?).

Much like other commenters I'm no expert on the topic, but I think you'd have to be pretty incredibly unlucky to have a mechanical failure on 3 drives at once from lack of use, especially if they were from different manufacturing batches.

kasabali · on June 30, 2020

I'm no expert on the issue so correct me if I'm wrong, but I've heard modern HDDs use fluid bearings and isn't susceptible to drying up.

HenryBemis · on June 30, 2020

My setup (laptop w8.1pro): external 4TB disk, assigned to letter L (for Library). I got Acronis running once per month, dumping a 70-80GB .tib file on L. L also has a backup folder with everything I got (setup files, books, photos, every audiobook/video I need such as trainings, etc). The whole backup is ~2TB.

Now get Carbonite (not affiliated, I just like the infinite space backup), and get it to backup your key laptop folders (Docs, Images, Desktop, etc) and your L-drive. I don't remember how much it costs ($6-10?/mo), but I have stopped worrying since then. I got a monthly tib file for my system and an "instant" backup for everything else. So even if my laptop is stolen I can set up a new laptop (the .tib may be useless but I can open it and see what s/w I had and I can take the config files/folders to move to my new system).

I don't remember how much the disk was but it didn't hurt my wallet, and the ~$100 (?) per year on Carbonite (had CrashPlan) definitely doesn't hurt my wallet.

pwg · on June 30, 2020

If you continue your DVD method you may also want to look into "dvdisaster" https://en.wikipedia.org/wiki/Dvdisaster

pmiller2 · on June 30, 2020

Do you store your par2 files on different discs from the data they refer to? Do you have multiple copies? Do you store the discs in a cool, dark place when you're not using them? Do you have another form of backup, preferably offsite?

If you do all these things, I think that's about the best you can do with optical media.

_ofdw · on June 30, 2020

Haha, I keep the par2s on the same disc, and then I make multiple copies of the same disc and they're in a cardboard box in my basement somewhere.

jordanthoms · on June 30, 2020

The ink on writable optical disks fades over time... So of all the copies were written at the same time, they might all stop working at around the same time and cause problems :)

Widdershin · on June 30, 2020

Yeah if you're really taking this seriously it's worth investing in an LTO-4 drive and some tapes.

There might be a better medium available nowadays but if I seriously wanted to have a piece of data fifty years from now that's where I'd start.

pmiller2 · on June 30, 2020

Really, any second form of backup would be a good idea. Preferably, it should be stored offsite and separate from any other backups.

brnt · on June 30, 2020

Anyone who uses par2 to protect a filetree, I wrote a small utility to help you maintain that filetree (e.g. bulk verify/create/status) [1].

[1] https://github.com/brenthuisman/par2deep

Smoosh · on June 30, 2020

You can always try an affordable online service like Backblaze B2 as one of your options. I haven't checked lately but I think they cost around $5 per terabyte per month. Plus usage, but that would be minimal for an archive.

dannyw · on June 30, 2020

Your credit card may expire or get blocked, and you might be in hospital, or billing alert emails may simply go to your spam folder for 45 days. Your data will be irreversibly deleted by Backblaze.

There are LOTS of failure cases with any cloud provider, especially one with a crazy policy of deleting data in just 45 days.

There is at least 1 reddit post a month about how someone lost data with Backblaze. Their reddit support rep is never able to do anything about it, other than "sorry, we will take on board your feedback".

For comparison, if your Google Drive subscription lapses, Google stops you from uploading but will not delete your data.

lostlogin · on June 30, 2020

This. I was backing up an external drive on a very very slow connection. I plugged it in to the machine late in the month, but forgot to turn the power on at the wall. The upload didn’t start and I lost the full backup and had to Re-upload the whole drive.

A good lesson was learned but it hurt. The upload took weeks to complete.

zozbot234 · on June 30, 2020

Most BD-R discs are physically higher quality than DVD-R's. They can be comparable to archival-quality M-Disc media.

tarruda · on June 30, 2020

I suggest trying syncthing. It is a cross platform P2P file synchronization tool, super easy to setup.

etskinner · on June 30, 2020

How is this better than a plain old filesystem with bitrot compensation? What does this have that my btrfs or ZFS filesystem (with parity) doesn't?

cpach · on June 30, 2020

Does anyone know how mature Perkeep is? Is anyone using it regularly? Would love to hear if there is anyone who has experience with it.

eexan · on June 30, 2020

Very immature, just have a look at its extensive absent documentation. The best bit that describes the state of things: "If you're a programmer or fairly technical, you can probably get it up and running and get some utility out of it".

So much to show off for 7 years of development, I'm pretty skeptical of its future. But some of the ideas are pretty cool, like composable blob servers.

xafloc · on June 30, 2020

I'm guessing not very: "The latest release is 0.10 ("Bellingham"), released 2018-05-02"

brudgers · on June 30, 2020

past discussion, https://news.ycombinator.com/item?id=18008240

alexr243 · on July 10, 2020

There's a blockchain startup called arweave which is trying to do exactly this.

eexan · on June 30, 2020

I tried perkeep a while ago. While the ideas are cool, the implementation is meh:

I added a single 2.7G ubuntu iso - it took 5 minutes to ingest it (on a tmpfs!), and turned it into 45k(!) little chunks, wtf is up with that? At this rate indexing my multiple terabytes of data is going to take it days and I don't even want to think how much seek time it's going to need to store its repo on a spinning HDD.

shock · on June 30, 2020

> I added a single 2.7G ubuntu iso - it took 5 minutes to ingest it (on a tmpfs!), and turned it into 45k(!) little chunks, wtf is up with that?

Ingest times correlate linearly with file sizes because it needs to compute the blobref (which is a configurable hash) for all the blobs (chunks as you call them). Splitting in blobs/chunks is necessary because a stated goal of the project is to have snapshots by default when modifications are done. Doing snapshots/versioning without chunking would be very inefficient.

eexan · on June 30, 2020

Reading the docs, snapshotting/versioning doesn't strike me as a major feature of perkeep. It's more important and appropriate in the domain of backup software (e.g. restic/attic/borg) and where you'd want it together with delete functionality to reclaim space.

But perkeep's focus, as I understand it, is more on managing an unstructured collection of immutable things (e.g. photo archive), rather than being a tool to back up your mutable filesystem. So I'm not sure they made a good design decision to chunk the sh*t out of my files, which really kills the performance on large files and especially on spinning disks.

natural219 · on June 30, 2020

Ah, nostalgia :).

It seems like with the recent wave of news about social media migrations (reddit, facebook, twitter, twitch, tiktok), people are hopefully starting to get more and more warmed up to the idea of protocolization of their social data.

But most of the projects doing it are still just too immature. Solid, Perkeep, Blockstack, etc. just seem like vaporware.

Seems like the only serious projects in use are Matrix, Urbit, and ActivityPub/Mastodon. But I haven't checked in with the decentralization scene in a while.

lukecameron · on June 30, 2020

I want that protocolization too, although I don't hold out much hope that the monopolies in place can be broken, outside of fairly radical regulation.

To add to your list, there is also Secure Scuttlebutt [1] which has had a decent userbase over the past few years, and Planetary [2] which is a funded iOS client for it.

I think in general they all suffer from the chicken-egg problem and will need some reason for enough people to switch to be able to build a userbase. There isn't really any "novel hook" like tiktok, twitter, whatsapp, instagram, snapchat, etc have had in the past.

[1] https://scuttlebutt.nz/

[2] https://planetary.social/

asdkhadsj · on June 30, 2020

Man, I love the idea of Scuttlebutt but I hate the developer UX. I'm writing some apps that I wanted to put on SSB but have all but given up on the idea. Something about SSB, as a dev, leaves me with a lot of questions and no idea where to even get answers from.

So I'll write my app outside of SSB, hopefully in a way that's mostly compatible, and possibly with future integration.

I may also toy with an SSB-like protocol myself, as the fundamentals of SSB is a work of art imo. I really enjoy what Gossip brings to the table, and how SSB focuses on human->human relationship to bring P2P to the table.

natural219 · on June 30, 2020

Ah, I forgot about SSB. Very unique project and interesting people working on it.

apitman · on June 30, 2020

I'm bullish on the Solid model, but they have a chicken-egg problem. Nobody is going to develop apps until a lot of people have their own pod, and nobody is going to use a pod until there are great apps for accessing their data.

The same thing happened with remoteStorage. There's initially a flurry of proof-of-concept apps, but no commercial quality killer apps to attract daily users.

AFAIK the only cloud storage protocol really used for app development is Google Drive. GDrive got successful by making a great cloud storage solution first, then once everyone had one app developers started making apps for it.

sukilot · on June 30, 2020

How can you call a pile of open source code with no marketing "vaporware"?

darkwater · on June 30, 2020

Last released version 0.10 is from May 2018. Is this project still alive? Last commit is from March 11 2020 so maybe they are just "slow" at releasing.

PurpleRamen · on June 30, 2020

Last commit was just a simple bugfix in documentation. The last juicy commits were in december 2019 and https://github.com/perkeep/perkeep/graphs/code-frequency makes the impression that activity has slowed down in 2019 significantly. Maybe they reached the point of good enough, but the high number of open issues and mergerequests is still problematic.

andai · on June 30, 2020

They reached the point of having kids :)

https://news.ycombinator.com/item?id=22161812

sukilot · on June 30, 2020

Development paused when Brad had a kid in 2018.

It probably needs someone new to adopt it.

alexr243 · on July 10, 2020

There's a blockchain startup called arweave which is trying to do exactly this

0xCMP · on June 30, 2020

For those curious "what does this do" and "how does this work" the presentation on the home page (Video + Slides) helps a lot.

crooked-v · on June 30, 2020

It looks like this hasn't had any updates since May 2018.

ramzeus · on June 30, 2020

The GitHub page has lots of fairly recent updates: https://github.com/perkeep/perkeep

vaughnegut · on June 30, 2020

Last commit was in March, mind you. It looks like an interesting project though

tptacek · on June 30, 2020

I imagine the author is pretty busy with Tailscale right now.

Intermernet · on June 30, 2020

And kids ;-)

recursive · on June 30, 2020

Why change it?

mam2 · on July 1, 2020

That would be nicer to be able to "delete all your stuff, for life" in our era than to keep it

heybrandons · on June 30, 2020

Watching the LinuxFest talk now, this looks really neat!

aabbcc1241 · on June 30, 2020

exporting/mirror content from clearnet social network to zeronet is also interesting.

For popular content, you'll see it has many seeds, instead of likes

betimsl · on June 30, 2020

How is this different from Upspin?

Aperocky · on June 30, 2020

It's like a poor man's version of AWS Glacier