Hacker News new | past | comments | ask | show | jobs | submit login
Show HN: We built an end-to-end encrypted alternative to Google Photos
1180 points by vishnumohandas on Aug 29, 2021 | hide | past | favorite | 405 comments
Hello HN,

Over the last year we've been building ente[1], a privacy-friendly, easy-to-use alternative to Google Photos. We've so far built Android[2][3], iOS[4], web[5] apps that encrypt your files and back them up in the background. You can access these across your devices, and share them with other ente users, end-to-end encrypted. You can also use our electron app[6] to maintain a local copy of your backed up files.

We've built a fault-tolerant data replication layer that replicates your data to two different storage providers in the EU. We will be providing additional replicas as an addon in the future.

We're relying on libsodium[7] for performing all cryptographic operations. Under the hood it uses XChaCha20 and XSalsa20 for encryption and Argon2 for key derivation.

We have documented our architecture[8] and open-sourced our clients[9].

We did a soft-launch on r/degoogle[10] sometime ago, and have since then ironed out issues and polished the product.

But we are far from where we want to be in terms of features (object and face detection, location clustering, image filters, ...) and user experience. We are hoping to use this post as an opportunity to collect feedback from fellow hackers.

If there's anything we can do better, please let us know, we would like to.

Best,

- Vishnu, Neeraj, Abhinav

[1]: https://ente.io

[2]: https://ente.io/apk

[3]: https://play.google.com/store/apps/details?id=io.ente.photos

[4]: https://apps.apple.com/in/app/ente-photos/id1542026904

[5]: https://web.ente.io

[6]: https://github.com/ente-io/bhari-frame/releases/latest

[7]: https://libsodium.gitbook.io

[8]: https://ente.io/architecture

[9]: https://github.com/ente-io

[10]: https://www.reddit.com/r/degoogle/comments/njatok/we_built_a...




I’ve been watching this project for a long time and personally am very excited. The fact that it’s #1 on HN today (congrats!) makes me think I’m not the only one.

There are also a lot of valid concerns in these comments about privacy and use of algorithms. A lot of it depends on what you’re looking to gain by adopting a new service/switching away from something else and individual concern.

Personally, I’m looking for a place to store personal photos: friends, family, travel etc. Critical needs - easy sharing ideally not locked into Apple’s ecosystem - not to have my photos mined for advertising and social graph data (most important) - ideally around for the long haul but in my mind this is for sharing, not backup

I’m not particularly concerned about warrants, government surveillance etc. Again for me this is about sharing so the expectation of true privacy is low. Any photos I considered sensitive I would store elsewhere.

For me, the biggest point of confidence I have in this project is that they charge money from day 1 and don’t have a forever free plan. I’m excited about projects that offer the benefits of “social” but where the software, not my data, is the product.


re: "the expectation of true privacy" you might enjoy reading the Cypherpunk's manifesto [0]

"Privacy is necessary for an open society in the electronic age. Privacy is not secrecy. A private matter is something one doesn't want the whole world to know, but a secret matter is something one doesn't want anybody to know. Privacy is the power to selectively reveal oneself to the world."

[0] https://www.activism.net/cypherpunk/manifesto.html


I'm in the same boat, have been watching, love that they have a businessmodel and am waiting for the time when they are covering my needs (face recognition, object / scene detection...). I'd even pay a 2$/month "lurker" subscription which has like 100mb of storage so I can check the features from time to time and support the team.


As someone who's never used cloud-based photo browsers... I always assumed the facial recognition aspect was primarily for social media apps that try to tag known faces from a user's friends group, to put it in those people's news feeds or something. It's one reason I avoid being photographed and ask people not to tag my name to my face if they do post a photo I'm in. I'm wondering, what's the utility of facial recognition if you're storing/sharing photos on a service that has no database of known faces? Or is this just for image editing or red eye removal or something?

[edit] as I'm rethinking it, would this just be for searching your own images for a particular person...?


> as I'm rethinking it, would this just be for searching your own images for a particular person...?

My Synology NAS has face recognition and it is wonderful even if (actually: especially since) it has no pre-existing database and doesn't (to the best of my knowledge) share its database.

For someone like me who manages family photos for the entire family but isn't to good at recognizing faces it is just brilliant.


I agree, Moments isn't a bad piece of software, especially being able to group/combine the same person, that is tagged as a different person. My newborn was like 50 different people when I first uploaded our pics, merging them together was as easy as a few clicks.

I wonder if it's a good idea to use Synology as onsite, and ente as 123 backup solution?


It's so incredibly useful to be able to bring up pictures but you don't remember the exact time or date that you took it.

Google photos has come in so clutch when you're searching through 50k photos.


To be able to categorize by person, ex: "list all photos of Jim".


This would be a useful feature for myself, I am also loathe to tag faces on social media with all that entails; but I find myself approaching a friends birthday or other events wishing I could search my images for everything that included them from the past year


So this is a project specifically marketed as E2E encrypted, and you are "waiting for the time when they are covering my needs (face recognition, object / scene detection...)"

You will be waiting a long long time for that.

The only way they can do that is client side, and if they go there we are back to the last few weeks discussion of Apple's new client side image scanning shit.

You do not want this service, it seems.

You want a non Google service who can do face recognition, and object/scene detection, but who'll pinky promise you they won't sell you out to advertisers or law enforcement or governments, even though they obviously could.


> we are back to the last few weeks discussion of Apple's new client side image scanning

Apple has always been indexing images on the client side. What changed is that they're now reporting the presence of a predetermined set of hashes to authorities.

If governments were to mandate that such reporting is necessary, it is likely that the enforcement will be on a device/OS level, extending the example set by Apple. Demanding compliance from every single cloud storage provider out there (E2EE or not) would be a sub optimal route for them to take.

My point being, "client side indexing" is not the evil here, and it is unlikely that storage providers will be the ones forced to share data. Your concerns should probably be directed at your operating system.


I don't think this is fair.

What iCloud Photos is doing for their client-side scanning is: (1) Not to your benefit. There is no positive outcome for you from your photos being scanned. (2) Mandatory if you want to use iCloud Photos.

In contrast, I presume this would be- (1) Only to your benefit, because all of this derived metadata around scenes and faces would also be encrypted end-to-end as part of the photo library. (2) Entirely optional.


What do you mean a long, long time?

Increasingly powerful GPU compute being released and constantly improving image recognition models out in the wild. I'd bet there's a nicely packaged, open source solution released in under 3 years.


I wonder how sales psychology might differ between a "lurker" subscription and an inexpensive limited plan? Lurker might have a more explicit "I think you're interesting and want to support/encourage you - thanks, we appreciate it" exchange. Or maybe defuse "but is it usable?" or "do I want to bother attempting to use it?" or yet-another-thing commitment concerns. Not "am I really going to use this?" but "does this look worth encouraging?". And maybe has a funnel story of "ok, now it's looking good, and I'll start using it for real... and not the mere limited plan". Sort of a patreon vibe, but blended with plans?


Looking at their pricing for €0.99 / month you can get 10gb storage, so go at it!


Storage is cheaper on S3


If you're gonna dick them around over the difference between €0.09/GB/Month and $US0.025/GB/Month, they're probably ecstatic to not have you as a customer.

Either you're whining about their entire ecosystem of encryption, key sharing, mobile apps, desktop app, web app, etc - not being worth a cup of coffee a month "Cause I can do it all myself using S3!!!", or you're planning on storing many times more than ~200GB on their platform.


The monthly storage costs are too high. For the price of 1TB from you (15€), I can buy more than 2 TB just about anywhere else.

Commercially, Apple and Google are both 2TB for 10 CHF and Amazon gives you unlimited as part of a Prime membership. Storage providers like Backblaze and Wasabi both charge around $5/TB and that's really the table-stakes price. For the more DIY-inclined, Hetzner sells a 2TB OwnCloud instance for 9.90€/month.

I'd prefer to buy software from you than storage. It's out of the question for me to pay you per TB but I'd consider paying a flat rate for software I then host myself.


I fully agree. It's a hard sell getting people to switch from an evil but known cloud provider to an unknown cloud provider that claims to not be evil.

What we do not need is more cloud offerings that can change, vanish or lock us out at the blink of an algorithm's eye.

What we need, rather, are reliable and easy-to-use solutions that allow us to retain full control of our data (i.e. self-hosted and offline) while having feature parity with the big cloud-only solutions.

I for one am convinced that there is plenty of money to be made that way. Perhaps not as much on autopilot as with the quasi-scam that is cloud computing, but people willingly paid hundreds or thousands for software before clouds and subscriptions. People will do so again, if you bring a convincing, unique or competitive product to market.

That being said, I like, appreciate and support this project for its impetus, even though I think its distribution strategy is misguided and fad-driven (re-selling cloud space instead of selling software). It's not too late to change that...


Hey, so the project had initially started off as a self-hostable software (with an option to buy a pre-configured device). We realized soon that it's hard to monetize such a product in the consumer space to the point where it can become self-sustaining.

We don't have a problem with offering a self-hosted variant. But given our limited engineering bandwidth we had to take a call on who our target market should be, and we felt that it was more important to make privacy accessible to people like my mom and dad. Hence this direction.


> We realized soon that it's hard to monetize such [self-hosted product]

Spot on. We iterated on a similar product in this space: "privacy preserving", "self-hosted", "open source" etc. But focused on local AI indexing & search of personal videos and photos [0], rather than backups.

We ultimately shelved VideoNinja because we weren't able to find a sustainable business angle:

* Non-technical people simply don't care (happy locked into Apple / Google).

* Technical people understood the proposition, but are super stingy. Case in point, see the responses in this very thread: "$10 per year max; I can buy a HDD for less!". That's one (cheap) restaurant meal per year.

So I fully understand your decision to go "cloud". Although that immediately takes your product off the table for me personally. I want nothing of mine (of value) in the cloud.

I feel there must be a way to square that circle, the market exists.

[0] https://video-ninja.com/


Just put a price on it, ffs! Make it extensible with plugins. To gain 100% trust make it open source. I am happy to pay good money of a local, non-leaking AI based tagging software for video and photos.


> To gain 100% trust make it open source.

I think until they've got a customer base and a proven model a happy median is to put the code in escrow and agree to give the source to paid licensees should the project be abandoned/more than x months without updates/whatever.


Very surprised no one has mentioned Synology yet. This has been done. And it's awesome!

I currently have a self-hosted google photos clone and I only paid for the hardware. Highly recommend.


Synology's Moments is ok, but it has issues. Not mobile friendly at all, can only create one shareable link per album, and others can't contribute their pictures to your album. Those are the biggest issues in my experience.


I'm still not satisfied, but photoprism seems to move into the right direction here. Digikam os great of you want everything on a single machine. Shot well has other advantages. None of them have a good solution to immediately and automatically import any photo taken on your phone.


Use apps like PhotoSync and it will upload automatically when photos are taken.


While it unfortunately didn't work in the consumer market, there's a space for video recognition in the business space:

- Scene finding for directors/news channels. AP and other sources have a lot of material but you pretty much literally have to watch the entire video in order to find a good scene.

- Scene finding for the XXX crowd. Very underserved market.

- Scene finding for police/lawyers. While it may seem like the opposite of 'privacy preserving', defense attorneys are literally just swamped with video evidence in an attempt to make them give up. Similarly if you're suing a big company for something as simple as an on the job injury or harassment, and need to prove there's a pattern of harm... they'll give you everything and let you do the work of finding out that there was a pattern of bad behavior.

It's the kind of thing that'd be useful as an open source solution... or failing that having a company which is 100% neutral in operation is also good.

I'm currently using Microsoft for something like this because they're absolution massive and apart from their OpenAI division, they only care that what you process is legal.


> I want nothing of mine (of value) in the cloud.

What's the issue with the cloud if you encrypt client-side? It's off-site backup. Isn't it too risky to have your life's work on a few drives in the same location?


And then after a year of usage it hits the news that they botched the encryption, or that they helpfully back up the encryption key in the cloud too.


I’d pay for this if it could run locally. Not sure what it would take to be sustainable but solving this problem is worth at least $20/month to me.


I think too many technical people have too much of a distrust of the cloud. I, for one, am happy to offload as much as possible to the cloud (except latency-sensitive things like games) and not carry around drives and drives at home.


I get the decision but I think it misses part of the problem: how do you convince people like your mum and dad to start paying for backups and how do you convince them to pay extra for privacy?

I suspect the way it usually happens is that somebody your parents trust (like you) tells them to sign up for a privacy-preserving backup service.

But who's going to tell them to do that? Do you have the money to pay for advertising?

Normally, I'd suspect it's the tech-savvy younger folks who'd tell them to buy something like this but with your pricing and lack of self-hosted options, I suspect you've alienated a large portion of the tech-savvy audience you need to advocate for your product.


If their service works well and is convenient to use, I’ll be recommending it by word of mouth. In the case of my parents, if I can finally consolidate and de-duplicate the photos from our 3+ Apple Photos collections by pointing the service at “library” folders from a few computers and devices, I’ll be a big fan.


> how do you convince them to pay extra for privacy?

We are hopeful that we will be able to reduce the pricing as we scale up and hit a critical mass.

> who's going to tell them to do that?

We plan to implement a referral program, similar to what Dropbox did, to incentivize existing customers to spread the word.

That said, you do bring up interesting points. To repeat, we aren't averse to the idea of maintaining a self-hosted variant. Just that due to our limited bandwidth we had to choose one direction over another. Having advocates is important and I suppose with time we will have clarity on how to best do this without stretching ourselves too thin.


For our (nascent) product went the other way and prioritised self-hosting at the expense of stretching ourselves too thin, as that's always been #1 ask from folks looking for "consumer-first" alternatives.

Time will tell if it was the right way forward, but I just went with "you can't fight gravity" and built it the way folks expect it to be (ex: supabase / posthog / gitlab).


I really hope the self-hosted option becomes a thing, but unfortunately "we are not averse to the idea" means especially little in the tech world these days.

That being said, really really hoping for your success! It finally fills a MUCH needed gap in 2021 consumer image viewing software.

There are many many gaps in it right now. Synology is basically the only self-hosted photo solution that grandma could use. Honestly surprised that more people aren't taking advantage of the opportunity.


I think that's a bit apocalyptic. Plenty of time to observe and adjust.


Can I suggest adding pricing tier(s) between 100GB and 1000GB? I have between 100gb and 200gb of photos, and £14.99/month seems like a lot considering I only pay £2.49/month for google storage. I'd definitely consider paying a premium for this service, but not 6x.


Drawing a direct parallel with Google will make this difficult, since they own their storage and network infrastructure and have ways to monetize your data. But here's an explanation on why there are large gaps between plans:

- Our 1TB plan costs only 3x the 100GB plan. This model works under the assumption that the average utilization of a 1TB plan (across all customers) will be ~30%.

- If we were to bring in an intermediary plan (say 500GB), we would have to increase the pricing of the 1TB plan (since at least 50% will now be utilized), and also set the price of the 500GB plan to at least 2x of the 100GB plan. Both plans now appear unattractive.

- Since Apple and Google don't support per GB billing yet (which IMO would have been the fairest way to go), we had to pick buckets, and the current ones seemed like the fairest possible.

I hope this makes sense.


>If we were to bring in an intermediary plan (say 500GB), we would have to increase the pricing of the 1TB plan (since at least 50% will now be utilized), and also set the price of the 500GB plan to at least 2x of the 100GB plan

What happens if you start by pricing all tiers "honestly" (i.e. reasonably profitable even at 100% utilization)? Have you determined that the market won't bear that pricing? If so, is there any way to meet in the middle?

In general, you may be erring a little too much on the side of asking some customers to grossly overpay for their actual utilization and, in practical terms, 100GB to 1TB is just an extremely wide gap, as evidenced by your parent's comment.

So, it seems that most who tip over into the 100GB - 1TB plan will be there, overpaying, for a long time. And, obviously, most people who make it to 1TB will pass through that range. So, if you do see a higher concentration of users in that range than at 1TB (as intuituon would suggest), then you're essentially "punishing" a plurality of your customers by asking them to subsidize a smaller group's pricing.

Failing other options, it may be better to do the inverse: raise the pricing of 1TB to accomodate a "friendlier" 500GB plan.


I definitely empathise with the difficult in competing with the big cloud providers on price. Your service is inevitably going to end up more expensive. Having said that, I'd be interested to know how you're hosting the content.

When I was looking at setting up a similar service, it seemed like you Backblaze B2+Cloudflare might well be the best combination. B2 will sell you storage at $5/TB, and you can get free bandwidth out to Cloudflare's network. It's against Cloudflare's terms to use free plan for image hosting that isn't just images as part of webpages. However, one of their staff members commented on a thread that they'd likely to be willing to set up a custom plan for a business who wanted to do this. And I'd bet that Cloudflare's bandwidth would be a lot cheaper than B2's.


Pre-signed URLs generated with B2's S3 APIs are incompatible with Cloudflare at the moment. We are working around this by using a Cloudflare Worker to proxy data from B2 to the client. This is currently free if you're on the Bundled plan and Cloudflare's support has promised that when they decide to start charging, they will alert us in advance.

Interestingly, Workers Unbound charges 0.045/GB which is more than B2's 0.01/GB.

A viable long term alternative could be Wasabi that offers free egress in return for a $6/TB plan. But we're waiting to see how things pan out before executing an expensive migration.


When you say incompatible, are you talking about the cache not working or something else? How are you working around this using workers?


B2 documentation suggests that after adding a CNAME (eg. cdn.ente.io) for their bucket endpoint (eg. bucket.s3.eu-central-003.backblazeb2.com), you will be able to replace the latter with the former. This breaks with the native B2 APIs with the following error:

```

{

  "code": "not_found",

  "message": "/api/top_level_url_mapping",

  "status": 404
}

```

The last I checked was a few months ago, not sure if things have been fixed now.

With Workers, we simply fetch the remote resource from B2 and return it back to the client, acting as a thin proxy.


Curious about alternatives. GB to GB, other services will always be cheaper. How do you help frame pricing What about charging per picture? Likely a non-starter, but you get where I'm going with this. iPod = 1,000 songs in your pocket.

If not you, someone will figure this out. Charging by the GB seems hard. What if instead your levels were: 1,000 photos 10,000 photos 100,000 photos

You might get people who store super high res files, but work that into the pricing.


I had thought about this a year ago when I was pitching the product to my parents who had no idea what a GB was. But I was put off by the possibility of abuse once I extended the framework to videos.


Appreciate your reply! It gets to the core of your value proposition though. Surely you could add in some limitations if needed. If it worked, maybe the biz would grow so fast you don't care about a little abuse.

Do you have any marketers to help you? Will be hard to navigate the messaging alone.


My phone photos are 2.2 MB each. 1,000 GB's is 1M MB's which equates to approximately 450,000 photos. At $18.99/TB/year, 1,000 photos would cost ~$0.42 a year.

Photos can easily be 30 MB each or more, especially from dedicated cameras. If all photos were 30 MB it would cost $5.69 per year for 1,000 photos.

Not making any point, just calculated it for myself and thought to share.


I like this line of thinking.

You know it really gets me thinking about packages rather than GB for this service. Maybe there's a "family plan" opportunity here. Do families value anti-surveillance in general, or is it simply lone actors?

Just the idea of archetypes flashed through my mind. An opportunity to sell to difference audiences. What kind of algos do individuals need, pro photographers, families?


What about Google photos is evil? I don't get it.


Okay it's easy to downvote, but I'll elaborate instead. First of all Google is trainihg AI models on your data and also able to create shadow profiles for people including those who decide against using Google services.

They also used dark pattern on Android for years by enabling cloud sync by default for everything. So a lot of people got all their photos uploaded while they had no idea about feature.

So it's not any different from Facebook that constantly tried to collect as much data on you as possible. Do you know what is evil about facebook?


I don't really get what's evil about AI models and cloud sync.

And I don't think anything is wrong about Facebook's business model. I think most people are uninformed about it and believe that they sell personal data, but if you understand the way they make money, it's very difficult to say that there is any particular issue with it.


Ah, what you really meant was "what's evil about selling my data?" which is a much larger question. And it sounds like you already have your answer.


They actually don't "sell".


They take your data and turn it into something that has value to them. With actual selling, that something is money. In this case it could be something else, but saying it that way will not help the general discourse of this problem at all. Much like being pedantic over terminology.


Ouch. This post reminds me of that one about GoogleSpeak: how Google limits thought about antitrust https://zyppy.com/googlespeak/


The other day I sent out a link made with Google Photos' "create link" function. That's not a share to another user, just a link that anyone can open, no Google account required. But one person showed me that hitting that link on her phone, Google wanted to authenticate her before showing the picture.

That is utterly unacceptable.


Genuinely curious - could you elaborate on why that is so unacceptable? What does requiring authentication imply, or lead to in the future?


Prevents sharing with friends who don't have a Google account. It breaks what could be a general purpose sharing mechanism.


This sounds like mild inconvenience.

What's evil about that?


Mild inconveniences can become problematic at scale. One person taking a crap in a lake is typically not a big deal. 1,000,000 people doing the same is a serious health risk. Scale matters.


Yeah, if you are client-side encrypted, where you choose to host doesn't really matter because even with a warrant there is nothing you could do to recover data, so why not go for something like Wasabi?


I can pay for a terabyte of Amazon Glacier for $50/year. Amazon Deep Glacier is $12 per month.

$300/year for 2TB isn't happening. I can buy a 12TB HDD for less, if I shop around.

I'd like a service like this to keep small, well-compressed 1080p or 4k photos available for instant access, and original files in archival storage of some kind.

I'm totally glad to pay the $10/year for the baseline service, and another $12 for deep glacier costs. I'm not glad to pay thousands of dollars for a service like this over the lifetime of my photos. I'm not quite sure where the line between that is.

I'll also mention: open-source, data export, and the option of self-hosting is helpful. I don't want to spin up an EC2 instance for this when I can buy $12, but if you go out-of-business, I'd like to have the option. Could also be an option you only guarantee if the service is discontinued or has substantially different costs/terms.


> I can pay for a terabyte of Amazon Glacier for $50/year. Amazon Deep Glacier is $12 per month.

You can pay even less to store that data in /dev/null. To make a more realistic comparison you should also include data retrieval & data transfer costs. Reading a terabyte from those services costs around $100.


I can think of close to zero times when I would need my full photo collection, in full resolution, all at once. In most cases, for showing photos, even 1080p highly-compressed is fine. In rare cases, I want to edit an old photo, and I want the original RAW file in full color depth and resolution.


With Amazon and Google you’re paying half in monthly fees and half with your mineable data. This service seems geared towards people who don’t want that.

Rolling your own on top of a cloud storage provider is great too but for an incremental $100-$200/year some people would pay for something that “just works”.


I’d love for something like this to exist (a fast, clean, well-designed mobile and desktop app for backing up my photos with E2E), but I’d only switch from one of the big providers if it were FOSS and I can bring my own backend target (e.g. S3, SMB, FTP).

In a perfect scenario I could generate my own private key to plug into my client devices and just have everything push to private S3 (and then from there archive to the cheapest, coldest glacier tier after it’s been synced to my home storage).

This to me would not be that complicated to build, but would essentially provide E2E Photostream and a backup of last resort in the cloud.

Obviously (as is the problem with all FOSS) you have the dilemma of how do the developers get paid, which I’m sure is why you went down this yet-another-paid-cloud-provider route instead of what I’ve suggested above.

All that said - I like what you’re trying to build, I could see it being useful to some, but providing E2E photo storage as a direct-to-consumer service is IMHO just asking to be held liable later for what your users store there should you gain any considerable traction.


I'm sure this isn't a popular opinion due to the technical know-how involved, but these days I much prefer to selfhost my own services. Far too many times businesses have gone under, changed their practices, had pricing wildly fluctuate, or remove features I wanted. Having setup a handful of useful services on a cluster, I have much more peace of mind involving my data, feature access, etc.

I would love to see a FOSS version of ente available for me to host. My family is currently split amongst multiple photo library services and it'd be nice to say "Here's ours."


Well you can, I wrote how here:

https://redbeardlab.com/2021/08/03/my-syncthing-setup-cheap-...

The nice thing is that S3QL allows setting a secret key, so your files just get encrypted before to be pushed on the cloud.


+1 for custom storage target


I tried it, but unfortunately the complete lack of auto-categorization in all of those e2ee photo storage apps renders them unusable for anyone with a large library. Ente is not the first one to do this, there are many others with similarly lacking UX, like MEGA.

Both Apple Photos and Google Photos:

1. have easy search by location on a map of the world.

2. allow browsing to any date in an instant.

3. index photos by objects/faces and allow for instant searching - Apple even does it on-device.

Also, frankly, I don't trust you to stay around for long, so I would appreciate the option to store encrypted photos on a cloud of my choosing that I already pay for, with a separate subscription for using your app. Not sure what the Venn diagram of <cares about privacy>, <willing to pay for your storage>, <needs excellent browsing experience> looks like.

Looking forward to an app which works for people with large libraries. :)


All the features you mention are already addressed in the original post as planned future developments. Knowing that they are planned makes me put my trust more in Ente than in Mega (which I use as an alternative to Dropbox and am very satisfied with). Not that there’s anything wrong with confirming interest in their planned features; I’m just pointing Ente’s plans out for anyone who scrolls right to the comments.

As for possible bankruptcy, you can never be too certain, but it’s easier to stay in business with Indian costs of living than US. (The company is located in India.)


Have you tried the Synology Photos app (https://www.synology.com/en-global/DSM70/SynologyPhotos [1])

While it does have some kinks it's surprisingly good and has the features you are looking for in a locally hosted/publicly available option. You do have to buy one of their NAS's however.

I have moved over to this partly for privacy and partly due to cost (I produce way too many photos per year to store them economically at Google)

[1] fyi this is the reasonably 'new' instantiation as linked to here, they EOL'd the very old, different app of the same name from their v old NAS's. Adding that here in case anyone has or buys an old NAS, you may get the old version of the app - I think you need a NAS with a decent processor to perform the face detection etc.


The biggest problem I've faced with their app suite is it seems to make my disks spin 24/7, constantly seeking even if there is zero external activity. It wouldn't be such a big problem if I didn't live in a small apartment and have to listen to them seek all night. Other people have reported the issue, but it doesn't seem like they plan on addressing it.


I think it only does this until the catalogue has been indexed, depending on what options are enabled. For instance, if face matching is enabled then it has to process all of the pictures for faces and group them.


I'll be honest I keep my NAS in a purpose built server closet in my house which is shielded.

Maybe costs of SSDs are coming down enough you could use those instead?


I personally would want both options. I use mylio with many similar features and it has e2ee but you manage your own space / cloud and your still paying a monthly fee.

For the non nerd friends, managing your own cloud space is mostly a non starter although. The best choice is cloud storage managed by the provider as an option, along with the self hosted option.


Thank you! As someone who as of a week ago is hardcore switching from Apple to Linux, I applaud you. I've purchased a 16" MBP, both Airpod models, iPhone, and iPad in the last 24 months. Now on to System76!

Whatever the past is, I believe there's a new market in 2021 for Apple-switchers that will unleash new funds for companies like yours. De-Google movement will pale in comparison to this in terms of economics. Looking into signing up just on principle. Non-E2EE encrypted, closed source, without ability to self host is a dead end, why put a penny more towards it. Open source options may suck today but it's the only path forward. Thank you for what you do - whether your company succeeds or simply inspires 1,000 new companies in its place.

What are your plans for Linux support? Your site only mentions Android and iOS, I see electron mentioned, but again I'm one of these Apple switchers, I have no idea what I'm doing really but I'm willing to pay for solutions.

Take my money!!!


I think that you are overestimating the size of the audience of people who have nerd rage over whatever we’re pissed off at Apple about or have meaningful concerns about government surveillance.

They’d be better off focusing on making a better user experience instead of E2EE drama.


The size of the audience may be small indeed. But Apple users on average have deep pockets and are willing to spend. A quick search suggests Apple users spend >2x what Android users spend. No data yet on what Apple users spend compared to Linux users. It's part of the reason "but 5% marketshare!" was never a good argument against the rise of Mac/iPhone.


The moms and kids dropping 1000$ on candy crush every month are not going to switch off apple. The big spenders in apples echo system are not the tech literate. The tech literate normally cost way more to maintained from a client perspective. Also they are allot more concerned over costs with regards to space as seen by allot of comments on this page talking about how expensive storage is.


> without the ability to self host is a dead end

Indeed, this is why you would be foolish to use Ente as you cannot self host it. At any point they can choose to lock things down, make their clients closed source, etc etc, and you'd once again need to spend time jumping ship because you'd need to find a new ecosystem.

Ente is just convient and is coming at the right time (hence the massive amount of upvotes) but does not give you total control nor your freedom back. Using them instead of something you can self host is just running in circles.

> open source may suck...

What? This was extremely random and out of place with the rest of your comment.

What you really want, if you care about self hosting and all the other stuff you mentioned, is Nextcloud[0]. And if you don't want to self host yet, you'd be better off hosting Nextcloud in a VPS, even on Linode you can just 1 click deploy a Nextcloud instance in their app store[1]. That way you don't become dependent on a service you cannot control/deploy yourself.

[0] https://github.com/nextcloud [1] https://www.linode.com/marketplace/apps/linode/nextcloud/


I think ente does fill a niche, people don't mind paying dollars for companies because it is supposed to guarantee a level of service/polish. And in the case of photos, if the service were to shut down, there's very likely a path one can take to perform a migration.

I'm a big user of open source solutions, I use Linux on my machine and use syncthing to sync files across all my devices. I'm aware that my solution is not doable for everyone and that's the problem with most open source solutions, the lack of polish/ease of use. There are tons of systems that aren't open source that we are forced to rely upon for day to day, airplane software, traffic lights, telecommunications) and we've just accepted it because of convenience and trust.

What I'm trying to say is that we don't have to worry about self-hosting everything and force ourselves to only use open source tools. I do think that if we do use private tools, we should understand how our data can be exported to a new system if necessary so we're not "locked in".


Standard Notes seems like a good example of this balance to me. While you can self host, I assume 99% of people don't. It must be an option, otherwise I wouldn't use it.


Yes, Ente needs to have self-hosting on their roadmap or I won't support it.

> What? This was extremely random and out of place with the rest of your comment.

Edited to say "open source options may suck today"

Thanks for giving me the chance to explain. My comment here may give more context: https://news.ycombinator.com/item?id=28321460

I've tried NextCloud, even 1-click hosted by a third party. For all the power, it's not built with me in mind, it seems to treat my photos like files/data, not like photos. I want to pay money for that extra oomph, for algorithms, searchability (about $10/month for my photos seems about right), and I want to pay money so I don't have to pay with my time. Is there something I can buy that's on top of NextCloud?


There are many people like you willing to spend money on a good solution that gets the job done who have no interest in self hosting and reviewing the source.

After experiencing the ease of Google photos, any basic file management system to store photos is a downgrade after that.

If ente can figure out how to do the extras (search, face matching) without invading privacy (not even sure how possible this is) I can see this being valuable to the people who want to de-google and maybe even de-apple.


> not built with me in mind

Fairo

> on top of NextCloud

Not that I know of. Could have a look through the nextcloud marketplace for something or another. Tbh, I don't see any open platforms having Google/Apple photos kind of functionality for at least a little bit. Google and Apple trained their algorithms on the people using their free tiers for years. Google especially had access to so much information on the user using Google Photos that it was able to build the algorithms it has today. For an open platform to have this functionality, it would need to wait for an open source model/algorithm to exist, else it would need to build it itself by using user data (no E2EE then).

Unless Google open sourced whatever models it uses in Google Photos today, don't expect this level of searchability yet. Actually even if Google did, it would probably be so tied to Google user information and be incompatible with E2EE.


How to create an open source training set without surrendering my data? Like Numerai but instead of a hedge fund it's photo data: https://numer.ai/


Sorry for the delayed response, I missed this comment.

If you're on Linux you can either use our web app[1] or our desktop app[2]. The latter is just the former wrapped in electron, but with the ability to sync uploaded files to your local disk drive.

[1]: https://web.ente.io

[2]: https://github.com/ente-io/bhari-frame/releases/latest


I don't think I'm ready to invest in a photo hosting solution again, be it with my time, my money, or my data, without it being open source/self-hostable or at least open core with a community behind it.

Been duped too many times.


Similar sentiment here. I wish this project well, but photo storage is a long-term thing, and I've been bitten too many times (most recently by Apple shutting down Aperture, which left me with big libraries which are very difficult to migrate).

I considered writing my own software and making it open source, but then realized that photo hosting/sharing software with password-protected sharing features will be used by criminals to store/share CSAM. So, if I end up writing my own solution, it will sadly not be shared with anyone.

Incidentally, I think this service will run into a similar problem: end-to-end encryption is great, but if it gets to a certain size, governments will intervene.


Curious about the details of how you were duped.


Not OP but I have had many cloud photo accounts in the past: myphotoalbum, Kodak Gallery, photobucket, Flickr and more. Eventually all of them either shut down, or got sold and became unmaintained. Google Photos and Apple's are the only ones that I can trust will still be around in 10 years' time.


Picasa as well, though Google nowadays does a pretty great job at getting all pictures you have on your account together with https://get.google.com/albumarchive/


Pity that once uploaded there's no way to get your data back from Google. API scrambles EXIF location metatada while Takeout, besides being pain to use on an ongoing basis, fails if you store too many files.


FWIW, ente processes all of the location metadata generated by Takeout during an import via web.ente.io.


That's probably the best once can do other than reverse engineering the protocol used by Google Photos Android app - as that app seems to be able to download files with full exif, unlike official API.

Unfortunately, as mentioned, multiple users report that Takeout does not work once you get past certain size (I have 350GB and it fails every time). It's been failing for years, probably always. Of course Google doesn't care.

I guess if someone was in EU they could try to ask Google for their data under GDPR data portability, face inevitable non-answer and then go to court if they are determined enough.


I sincerely hope that someone sues.

Google has blocked access to their APIs for migration[1] which IMO contradicts with their stance on data portability[2]. It is hard to assume good intent here.

[1]: https://developers.google.com/photos/library/guides/acceptab...

[2]: https://datatransferproject.dev


As the famous saying goes, never ascribe to malice that which can be explained by incompetence. This is Google we are talking about, they are infamous for their lack of strategic focus and disorganization.


What are your plans for when your app is found to host content such as terrorist executions, child porn, etc.? (This isn't trolling, it's something that eventually happens with every product, and I've been wanting a non-Google version myself but wondering how that kind of abuse would be dealt with.)


Since it‘s a paid service with user accounts. You would be able to ban users that have been reported to use this service for illegal means. The same question can be asked to WhatsApp / iMessage / Signal / etc.


the answer is right here https://ente.io/transparency


It does not say how often it is updated. Wouldn't it be better to say "as of 8/29/2021, we have received no such requests and we are updating this page monthly".


Yes, this is a good first step towards a true warrant canary, but you need to date it and provide a cryptographic hash of the content.


I don't think they would be able to do anything about it, since (from what I could infer from reading) it is zero-knowledge, so no one from the company can access the pictures. I might be wrong, though


Well, depending on legislation, they could be ordered to change the code to send the user password to them on next login for that account and then decrypt everything…


The architecture of Ente (https://ente.io/architecture) prevents your unencrypted master key from being exposed to the server. The password authentication appears to be client-side, which means that the data could not be compromised solely by a malicious server-side change.

Now, Ente could still change its web application to somehow leak the master key and not disclose the changes in the source repo. One solution for this vulnerability is to package the entire web client as a browser extension, which is what Mega is doing:

https://github.com/meganz/web-extension


There are a couple of other ways to mitigate the problem for web applications. If you're willing to install a browser extension, then it might make more sense to use the Signed Pages extension[0] which applies PGP signature checking to web pages. The other solution is to use Secure Bookmarks[1], which combine SRI integrity hashes with Data URIs to ensure that a fixed bundle of JavaScript is running in the page.

[0] https://github.com/tasn/webext-signed-pages

[1] https://coins.github.io/secure-bookmark/


Yes, and that is a problem.


What is the problem/why is there a problem?


When push comes to shove, technology is subservient to society: https://en.m.wikipedia.org/wiki/Lavabit


Well, first and foremost, if I ran a service, I would not want to help either terrorists or pedophiles. I would be very unhappy if I was doing that.

Secondly, if you do provide service to terrorists or pedophiles, and take no steps to stop doing so, law enforcement and society in general is not going to be very happy with you.


The answer to this question is why the only solution in the long run is local storage.


Just imagined a distopian future where storing data locally would be illegal, for the society good of course /s


Not when you have government-mandated software checking your local files against hashes. Not today, but someday.


It is not possible to prove this, because the photos are encrypted.


Encrypted content can be decrypted.

Links and data tranfers can be traced.

Warrants and suponeas can make such traces / actions legal.


something that only showed up in mainstream media 10 years after smart phones got launched. gawd.


Please please support custom storage back ends, I'd love to use my Dropbox or S3 or whatever to still fully own my pictures. And I'd love to pay extra to opt out of and analysis, tagging, etc of my photos. Basically I'd like the interface to be similar to Google Photos but with a privacy focused storage engine and clients.


I concur. However storage is how they plan to make money, so there will need to be a different monetization strategy for BYO storage. As yet I can't imagine any.

EDIT:

I think have an idea! Add the S3/OneDrive/Etc support but comment it out. To make use of it one would have to download the source, XCode, compile it, and deploy it. This puts a cap on the number of people who can do that, so you won't end up with everyone getting a free copy. Those people who are able to do it are likely to be asked for advice by their less techy friend, so this is basically free software to key influencers.... Ok, so this does not sound as exciting as it did before I started typing, but maybe this will lead to something...


The problem with that is that some kind fellow on GitHub will clone the project, uncomment the code to enable the premium features for free, and change its name. If it's released under a FOSS license, the original authors have little recourse.

This is what happened with Emby (a media server like Plex). The backend was open source and there was a license to activate premium features. Somebody cloned it, and then released the premium features to everyone for free.


So it's a little more complicated than that.

Our API server runs the following

- authentication

- replication

- differential sync

- and a few more errands that are necessary for the apps to function

The solution to this would be to offer a self-hosted variant where you can plug in your S3 credentials. But like I mentioned else where in this thread, maintaining such a project comes with an overhead we cannot afford right now. Hopefully sometime in the future we will be able to afford the necessary engineering bandwidth.


I like how Joplin does it for notes. You authorize them as an application in Dropbox or give them credentials to a S3 bucket. Don't get me wrong. I want to pay for your service. I just have to be able to access and decrypt my files if you had to shut down your service all of a sudden.


Our pricing model is such that the product can self sustain itself. Also, we have a desktop app[1] that syncs your uploaded data to a local drive, so you don't have to worry about a lock-in.

But even if we do have to sunset the service due to unforeseeable reasons, our cold storage is relatively inexpensive and we will give our customers ample time to migrate out.

Also, in such a scenario we would want to publish our entire system in an easily deployable way so that all our efforts would not be in vain.

[1]: https://github.com/ente-io/bhari-frame/releases/latest


I see where you're coming from and I really appreciate that you're taking the time to respond. I know it's unlikely for a service like this to shut down from one day to the next but it's not impossible, plus the whole thing about a service having the ability to shut me out of my own data, that's just scary. And many of us are already paying for storage on Dropbox and have secondary backups set up for instance. I'm just saying that this would probably convince more people to switch, leveraging a service they're already paying for plus whatever you're charging to facilitate - less than the full service with storage would cost but enough to make you some money as well. Again, offering privacy in a field that was previously devoid of it is a great step in the right direction.


I would pay for a self hosted solution, or for a solution where I can plug into a backend you support.

I would also pay upfront, e.g. kickstarter


Heh. Yeah. Been building something like this, where you can have your choice of metadata storage and file storage. Out of the box, it would be Sqlite and the local FS, and then you can become adventurous. Postgres and S3? Elastic and S3? Sure.

Needless to say, years later, I am still building it. For one guy doing this on my own time, it's a lift. Maybe after I quit my job soon :)


Is there something to share and possibly collaborate with others? Just now on the drive home I contemplated doing a POC with S3 storage but I acknowledge hoe much work that probably would be.


My journey with this started back in Java and Play 1. Now it's a Scalatra project. I am rewriting the front-end because the original was written with JS5 and Knockout, becoming essentially dead on arrival and pretty unmaintainable.

The idea is that the "engine" is going to be open-source, but the UI would be free and proprietary (you would be able to bolt on your own UI).

Once the UI is presentable to a point where I can actually test the engine against it, it would be ready for collaboration. But again, it's been a rough stop and go. No wonder something like this does not exist.

To be accurate, this is not a photo management project, it's a full on DAM. But I am doing photos first. Could end up being less ambitious at first, however. Even the baseline is a massive project.


you may want to take at look at: https://www.boxcryptor.com/en/


Re: Shared Albums

>the receiver just needs a free ente account.

I feel like there should be an even more frictionless option to make it easy for family to access photos. For example, if there were a way to just trigger a mailing list when an album is added to, that would be perfect. “Here is an update on our trip: [link]” I love that you mention you are security and privacy focused, and I see how this could conflict with that mission. Perhaps a tradeoff here could be allowing one viewing via link and future viewings require account?


> if there were a way to just trigger a mailing list when an album is added to, that would be perfect

We can do this if all of the participants are already on ente.

> allowing one viewing via link and future viewings require account

We are hoping to come up with an implementation similar to this where in a link to an album can be shared with N devices. We will persist an accessToken on the viewer's localstorage so that they can re-view the album multiple times without having to sign up.


It's funny, I see this being the first feature they kill off unfortunately when it becomes the new super easy way of sharing CSAM on shady forums.


This is looks super cool, however not something I'd be interested in using myself if I can't selfhost it (at least it looks like thats not possible from the website).


Self-hosting a zero knowledge service is probably unnecessary.

If you're hosting the service, there's no need for data to be encrypted client-side. Unless, of course, you were intending on running the service on a public cloud which you didn't control, but that's something I don't think many privacy conscious folk would do.

There's plenty of open source, self-hosted alternatives to Google Photos.


Yeah, having attempted to operate a service very similar to this (only more focused on general encrypted cloud storage) I will say there are no good economics in usage-based billing. You're much better off selling a license to use the software and give users the ability to use common cloud storage providers (minimally the s3-compatible ones but also things like Google Drive) as the backing for this. Even safer from a legal perspective would be not having accounts at all and allowing users to purchase a 1-year license based on license keys that are cryptographically validated but not stored anywhere. Then it's impossible to do anything user specific whether you are compelled to or not.


To me it is a canary signal that I have the option to self-host.

Most likely, QoS would be better from ente's hosting and I would be inclined to take advantage of that. An open source server can be audited and offer an off-ramp should their service no longer suit me.

Then again, the economics of enabling self-hosted infrastructure are probably less exciting compared to locking users in to marked-up, white-labeled infrastructure.


How do you know it's zero knowledge?


The source code of the client-side apps appears to be available on GitHub. So if they're bluffing, it won't be too long until someone calls them out on it.


Without a fully described mechanism to confirm that the client you download is not compiled with additional code (i.e. without specifying exactly how the client is compiled, using which version of which compiler, and which compile flags, dependency versions, etc) any kind of "the code seems to be on github" is kind of meaningless.


Ideally they should support reproducible builds so that anyone can confirm that the hash of the app corresponds to a specific tag on the source repository. Unfortunately app stores are making it harder to know what the hash of the app you are installing is, but for side-loading this should still be possible.

For web apps, the situation is even more difficult, but there is a technique called Secure Bookmarks which allows you to confirm that a specific bundle of JavaScript is running (at the expense of some usability):

https://coins.github.io/secure-bookmark/


F-Droid supports reproducible builds. Any serious FOSS app, I think, must priortise publishing to F-Droid.


Unless they only send compromised code to you personally and nobody else.


One way to mitigate that is through Binary Transparency, which would allow people to detect if a release is made for which there is no source code available (assuming the project already has reproducible builds). There is already a project attempting this for Arch Linux packages[0].

Of course it's still possible that an update could be sent to everyone which contains some code that only runs when a certain username is entered, so users would need to avoid updating the app until an audit by a trusted third party had approved it.

[0] https://github.com/kpcyrd/pacman-bintrans



That's just a non-binding promise. If that's enough for you, you don't need encryption at all.


I think the correct link is: https://ente.io/architecture


Again, just a promise.


self hosting is not worth the time and effort.


That is not categorically true.

On the business side, there's plenty of companies that have offered and succeeded with self-hosted software. On the client side, there's many individuals like myself willing to dedicate time, money, and effort to self-host services. I spent quite a bit of time setting up my NAS with self-hosted services, not only because the number of photos and media I store would be prohibitively expensive to host elsewhere (I do photography and videography as a hobby, 120 fps 10 bit footage adds up), but because I enjoy the hobby.


we have so many consumer facing apps. you'd want to maintain all those and actually have a life to use those? good luck!


Not everybody has to use "many" apps. You can only self-host those you care about.


Another thing to keep in mind with this kind of software is tracking data loss, corruption and deletion. I've used photo management services before, and have had data loss that I can't explain from this year or that year. Did I delete it? Did I do a migration wrong? Did the software silently delete it? I'm not quite sure. What is even worse is you cannot get 'another copy' of these photos from elsewhere, because they're all unique.

Having a 'recycle bin' and an ability to see the history of photo deletion, modifications and imports can be useful in tracking down what causes data loss. Also having masters accessible in a simple plain directory is essential in being able to audit that the software is working correctly, can be backed up in a simple manner and if your service goes belly up, is easy to migrate from.

Another issue is bitrot. Your desktop can bitrot modify a photo, and then your photo management software detects this as the 'new version' and destroys the original good version. You have to make sure you mitigate this by storing a hash on import and restoring to the original hashed version.


Sharing some of the steps we've taken at ente to reduce the probability of such events:

- All files uploaded to ente are versioned and older versions are available for 60 days from the day you updated them.

- File deletions are performed only as a function of user action. Deleted files are again recoverable for 60 days.

- Two copies of each file are maintained with separate storage providers. Both of these providers offer 11x9 durability.

- For each uploaded file, we compare the number of bytes uploaded from the client to that received on the server and request a reupload in case there is a mismatch (to be replaced with a hash check).

We understand your concerns and will continue to invest in steps that improve data integrity and durability.


Super cool. Did you roll your own storage solution or are you using one of the many cloud providers? If the latter, which one? I ask because I've done a ton of work in optimizing costs in this area (at large scales), and as the top comment mentioned, $15 is kind of steep for 1TB.


Hey, we're currently using two S3 compliant storage providers (Backblaze and Scaleway). I would love to talk more about how we could reduce our pricing. Please let me know if I can reach out to you over the email mentioned on your HN profile. Thanks!


More than welcome to!


Oh please do share some nice tips in this regard


Very reasonable pricing, though you could advertise the free 'trial' tier a bit more prominently. I thought the service was paid only until I re-checked the pricing page and read the tiny gray on black text before writing this comment.

You also didn't set a single tracking cookie. Nice.


I'll increase the opacity of that line, thanks for the feedback!


Your homepage says "protect your photos/faces etc. from algorithms"

The algorithms are what makes Google Photos; Google Photos. If I wanted to just store my photos I'd throw them in a S3 bucket or Dropbox or something.

Google Photos lets me automatically categorise my photos by person, lets me search my library using text search for anything (e.g. I can search 'museum' and see pictures I've taken in museums). That is where the real value of Google Photos comes into play.

> But we are far from where we want to be in terms of features (object and face detection, location clustering, image filters, ...) and user experience. We are hoping to use this post as an opportunity to collect feedback from fellow hackers.

So you're going to implement algorithms then?


> So you're going to implement algorithms then?

Yes, we will implement the algorithms, purely on the client side, such that we don't hold indexes to your personal data.

But I understand how that piece of text could have thrown you off, I'll think of ways to rephrase it. Thanks for pointing it out.


Actually I'm really curious how you do this. If the photos aren't stored client side, then how do you search? Do you have a thumbnail of every photo client side? Is that enough? I mean ImageNet scores are still pretty low for small/fast neural nets. And ImageNet isn't even representative of real world photos. So obviously to be successful you're going to have to continue training. So how do you do this in a privacy preserving way? Even federated learning can have some issues because images can be reconstructed from gradients.


> Do you have a thumbnail of every photo client side

In the happy path the files/thumbnails are indexed before they are uploaded. But we are designing a framework that will pull files/thumbnails for indexing if they are unindexed or indexed by older models.

> how do you do this in a privacy preserving way

Our accuracy will not match that offered by services who index your data on their servers. But there's a trade off between user experience and privacy here, and we are hopeful that ente will be a viable option for an audience who is willing to sacrifice a bit of one for a lot of the other.


As someone who has worked on systems like these let me translate:

“You stuff will be private but in return accuracy will be so bad that the UX is gonna suck!”

That’s the key piece people miss when they wanna do anything with ML…that’s it’s a different problem compared to writing code because it’s not about the code anymore, it’s about having great training data!


Apple Photos seems to be using just Core ML[1] for on-device recognition and it does a pretty good job. As for Android, we plan to use tflite, but the accuracy is yet to be measured. And if customers do install our desktop app, we will be able to improve the indexes by re-indexing data with the extra bit of compute available.

We don't feel that the entire UX of a photo storage app will "suck" because of a reduced accuracy in search results, and we think that for some of us the reduced accuracy might not be a deal breaker.

[1]: https://developer.apple.com/documentation/coreml


Up until recently I’ve used Apple Photos happily since it provided a good combination of convenience plus the privacy of on-device recognition. You have a compelling product if you can convince customers you are as reliable and more trustworthy than Apple. You do face the disadvantage of not being the default option for iOS/macOS but that should be balanced by being available cross-platform in Android, Linux, Windows.


Core ML and TFlite are just tools for running ML models. Generating the models is the hard part, and that is what encryption will make more difficult.


We will resort to models that are available in the public domain.


Bingo!


To be honest, that wasn't a concern with my question. I think most people on HN understand this aspect. My question was more about how you improve your models when you don't have the same feedback mechanisms as non-privacy preserving apps. Google can look at your photos and see what photos fail and collect the biased statistics. In a privacy preserving version you won't be able to do this. Sure, you can on an internal dataset, but then there are lots of questions about that dataset's bias and if it is representative of the real world. I mean how many people think ImageNet is representative of real world images? A surprising number.


As someone else who works on systems like these, I agree training data is the whole problem. However you can use some techniques like homomorphic encryption and gradient pooling to collect training data from client code while remaining end-to-end encryption. It's hard, but it's not impossible.


Really? Have we had a revolution in homomorphic encryption such that it can be used for anything other than 1-million-times-slower proofs-of-concept?

I know IBM has released something lately, but given the source..

Does anyone use HE for the type of ML application you are describing?


So I guess there is more to the question that I'm asking.

> Our accuracy will not match that offered by services who index your data on their servers. But there's a trade off between user experience and privacy here,

I think most people here understand that[0]. We are on Hacker News after all and not Reddit or a more general public place. The concern isn't that you are worse. The concern is that your product has to advance and get better over time. That mechanism is unclear and potentially concerning. The answer to this is the answer to how you ensure continued privacy.

You talk about the "push files/thumbnails for indexing" and this is what is most concerning to me and at the heart of my original question. How are you collecting those photos for _your_ training set? Obviously this isn't just ImageNet (dear god I hope not). Are you creating your own JFT-300M? Where are those photos being sourced from? What's the bias in that dataset? Obviously there are questions about the model too (CNNs and Transformers have different types of biases and see images differently). But that's a bigger question of training methods and that gets complicated and nuanced fast. Obviously we know there is going to be some distillation going on.

There's a lot of concerns here and questions that won't really get asked of people that aren't pushing privacy based apps. But the biggest question is how you get feedback into your model and improve it. Non-privacy preserving apps are easier in this respect because you know what (real world) examples you're failing on. But privacy preserving methods don't have this feedback mechanism. We know homomorphic encryption isn't there yet and we know there are concerns with federated learning (images can be recreated from gradients). So the question is: how are you going to improve your model in a privacy preserving method?

[0] I think people also understand that on device NNs are going to be worse than server side NNs since there's a huge difference in the number of parameters and throughput between these and phone hardware can only do so much.


> how are you going to improve your model in a privacy preserving method

We will not improve our models with the help of user-data and will resort to only pre-trained models that are available in the public domain.


This is one of your best replies in the whole thread.

Yes to this. Prove it as well.


Why is it such a great reply? They didn't really answer my question.


I liked the clarity of response. Public models, not user data seems a clear answer to your question?


Not really. In fact it might suggest something I'm specifically more worried about. Datasets that we use in research aren't really appropriate in production. They have a lot of biases that we don't exactly care about in research but you do in production that can also get you into a lot of political and cultural trouble. So really if they are going to just use public datasets and not create their own then I expect a substantially low performance, potential trouble ahead, and I'm concerned about who is running their machine learning operations.


Appreciate the detail here. Given your relevant experience sounds like something that the devs need to address.


Being in the ML community I have a lot of criticisms of it. There are far too many people, especially in production, that think "just throw a deep neural net at it and it'll work." There is far more to it than that. We see a lot of it[0]

[0] https://news.ycombinator.com/item?id=28252634


Wow fascinating. What do you ideally want to see in terms of datasets enabled by user data?

Having vendors vacuum up my data is sub-optimal from a privacy/ownership standpoint. I'm curious how to enable models without giving away my data. Open source models owned by society? Numerai style training (that I don't understand) https://numer.ai/ ?


Datasets are actually pretty hard to create. You can see several papers specifically studying ImageNet[0] including some on fairness and how labels matter. There's also Google's famous private JFT-300M dataset[1]. JFT was specifically made with heavy tails in the distribution to better help study these areas, which is specifically the problem we're interested with here and one that is not solved in ML. Even with more uniform datasets like CIFAR there are still many features that are noisy in the latent space. This is often one of the issues with doing facial recognition and why there's issues with people with darker skin. Even if you have the same number of dark skinned people as light skinned you may be ignoring the fact that cameras often do not have high dynamic ranges and so albedo and that dynamic range play a bigger role that simply "1M white people and 1M black people". There's tons of effects like this that add up quickly (this is just an easy to understand example and one that's more near the public discourse). You can think back to how Google's image search at one point showed black people if you searched gorilla. On one hand you can think "oh got a dark color humanoid" or you can think "oh no... dear god...". That's not a mistake you want to make, even if we understand why the model made it. It is also hard to find these mistakes, especially because the specifics of them aren't shared universally across cultures because this mistake has to do with historical context.

This is still an unsolved problem in ML. Not only do we have dataset biases (as discussed above) but models can also exaggerate these biases. So even if you get a perfectly distributed dataset your model can still introduce problems.

But in either case, we don't have the same concerns in research as we have in production. While there are people researching these topics most of us are still trying to just get good at dealing with large data (and tails) in the first place. Right now the popular paradigm is "throw more data at the model." There are nuances and opinions to this why this may not be the best strategy and why we should be focusing on other aspects (opinions being key here).

Either way, "using publicly available datasets" is an answer that suggests 1) they might not understand these issues and 2) the model is going to have a ton of bias because they're just using off the shelf models. I want some confidence that these people actually understand ML instead of throwing a neural net at the problem and hitting go.

> I'm curious how to enable models without giving away my data.

Our best guess right now is homomorphic encryption. But right now this is really slow and not as accurate. There's federated learning but this has issues too. Remember, we can often reconstruct images from the dataset if we have the trained model[2]. You'll see in this reference that while the reconstructions aren't perfect, they are more than satisfactory. So right now we should probably rule out federated learning.

> Open source models owned by society?

Actually models aren't the big issue. Google and Facebook have no problem sharing their models because that isn't their secret sauce. The secret sauce is the data (like Google's proprietary JFT-300M) and the training methods (though most of the training methods are public as well as few are able to actually reproduce due to not having millions of dollars in compute).

I hope this accurately answers your questions and further expands on the reasoning behind my concerns (and specifically why I don't think the responses to me are sufficient).

[0] https://image-net.org/about.php

[1] https://arxiv.org/abs/1707.02968 (personally it bugs me that this dataset is proprietary and used in their research. Considering how datasets can allow for gaming the system I think this is harmful to the research space. We shouldn't have to just trust them. I don't think Google is being nefarious, but that's 300M images and mistakes are pretty easy to make).

[2] https://arxiv.org/abs/2003.14053


godelski, I really appreciate such a thoughtful response to my curiosity.

Looking at this while better understanding the problem, I wonder what features I really want for my own photo library. Thinking of iOS photos. Matching people together seems hard. But grouping photos by GPS location or date is trivial. So we have to get clear on what features are important for home photo libraries.

I can now see how the idea of "use public libraries = solution" falls short. It neither presents a viable solution or demonstrates rigorous understanding.


Hey, that's what HN is about. You got experts in very specific niches and we should be able to talk to each other in detail, right? That's the advantage of this place as opposed to somewhere like Reddit. Though expanding size we face similar issues.

These are good points about GPS and other metadata. I didn't really think about that when thinking about this problem, but every album I create is pretty much a combination of GPS and temporally based (though I create this with friends). But I think you're right in suggesting that there are likely _simple_ ways to group some things that aren't currently being done.

> I can now see how the idea of "use public libraries = solution" falls short. It neither presents a viable solution or demonstrates rigorous understanding.

ML is hard. But everyone sells it as easy. But then again, if it was easy why would Google and Facebook pay such a high rate for researchers? There's a lot of people in this space and so it is noisy. But I think if you have a pretty strong math background you start to be able to pick out the signal from the noise better and see that there is a lot more to the research than getting SOTA results on benchmark datasets.


You can run algorithms locally and still violate privacy by uploading private facts derived from the data with algorithms. Saying you won’t hold “indexes” doesn’t begin to cover it.


Well, it does begin to cover it. Do you have to be so strident?


What do you think is meant by indexes?


But that will mean that for every version of the algorithms, it have to read all the photos since 15 years ago... my phone battery will die soon.

And if I need to have other kind of client... like a nas to do that... Why I need the cloud?


> phone battery will die soon

Indexing will be opt-in. You will be able to run the indexing only on your desktop client for instance.

> Why I need the cloud?

So that you don't have to manage your own storage infrastructure? But if you would like to do that, then there are self-hosted alternatives that will better serve your use case.


Agree with the above poster. I don't care about algorithms. I want algorithms. But I want algorithms that only work for me. Screw off everyone else.

Apple used to sell this. Then they stopped.


Those "algorithms" can run locally, on a NAS or a desktop, generate the metadata and make it available to you only on your mobile.

I can see myself paying for such software if it was mature enough.


Synology Photos is one such solution already for example.


I have Synology, actually. Is Synology Photos trustable?


The software with these features is called Synology Moments. I use it and I mostly love it, at the very least as a backup for my Google Photos.

My experience is that it works great, provided that you're on your local network. When away from home or traveling, less so. Maybe I could configure things better to alleviate that, I don't know, but I haven't managed to yet.

Sharing is less convenient. Trying to share a photo on-platform is a terrible experience for the receiver with multiple slow redirects, so much so that generally if you're on mobile it's easier/better to just download the photo to your device and share the photo directly. The Moments android app has a flow for doing this, which is nice. It also makes a certain amount of sense: the alternative would be others connecting to your NAS online, which is always going to be less nice than just connecting to Google photos.

The search capabilities are pretty decent. It can recognize people and tag them appropriately. It can recognize some things. In some ways, I prefer searching it over searching Google Photos. But again, only if you're on your local network with your NAS.

--

Edit: see aborsy's response to me below. Looks like I'm a version behind. Maybe on-platform photo sharing is better now, I'll update the software and check it out


Yeah, new version is about the same.

If you want to check it out, here's a couple photos from when I picked some peppers the other day:

https://ojensen5115.quickconnect.to/mo/sharing/pgdYsVEqu


In DSM 7, it’s called Synology Photos!


Thanks for the heads up! Looks like I have an update to install :)


For at-home NAS, is Synology the best for recreating Google services?


I've had Synology for years and I have used their Photos and Momemts app.

It's pretty dang hard to recreate a Google service.It's great for backup and have control over the photos - but dang it's slow....if I need something real quick, I usually go to google photos...even when I'm home. Maybe I need to upgrade to a NAS w/ faster processor, I don't know.

I've turned off the Google Photos facial recognition stuff because of privacy, but dang I miss the convenience. Moments has their own but it's not a good.

Google photos I can easily search for a city or text or an object and it pops up quickly.


It's the best I've found so far. They have a number of apps (docs, drive, moments, etc), and I wouldn't say they are as good as Google right now, but they are quite workable.


In Love my synology. The differentiator between NAS devices is not the hardware but the software.


i have one myself and i would say its the best out of all the alternative nas's out there. you pay a bit extra but its worth it considering how easy it is to setup. i also paid a bit more extra for the plus model so i could run docker which in turns gives you a huge selection of other apps over the built in apps or the synocommunity apps

https://github.com/awesome-selfhosted/awesome-selfhosted


> Those "algorithms" can run locally

But I don't want my GPU burning away running them when they could run much more efficiently and out of mind in the cloud.


Then you aren't the target audience?


Am I the only one who never realized you can search "museum" and see your museum photos?

Now that you've mentioned it, yes, I'd like to try that. But as a counterpoint to your argument, I've never needed it, and I suspect that a lot of people may not actually be getting the same value propositions that you're getting.

On the other hand, Google Photos is Google Photos. But it's often a mistake to compete directly with an established product. New ideas tend to win by transcending the competition.

I propose that if this Show HN turns into a product, it will be because it does something people didn't realize they wanted. Maybe that's privacy. I don't know.


I use it all the time - it's the killer feature of google photos. The premise is that if you come back from vacation with 300 photos, it's unlikely that you (the average non photography-nerd user) are going to sit there and tag them all. If in a few years you want to find "that photo of me you took on the beach in north carolina", with a quick search you can.

There are annoying limitations though, probably because the original team moved on and it's in maintenance stage. Using my example above, google photos has no idea what the "outer banks" are (which is where the beach photos were taken in north carolina) and returns no results. It also has trouble parsing out entities from search terms, so "north carolina beach maggie" isn't going to find pictures of Maggie on the beach in North Carolina (which you'd think they could really fix given that, well, they're google). Finally, there's no way (that I know of) to jump from search results to your full timeline; let's say that "north carolina beach" gets me a bunch of beach pictures from January 2015 (yeah, it was cold), but doesn't have _the_ picture from the trip that I know I want - there's no direct way to click to January 2015 from the results, which really sucks. (Instead you have to go back out of results and use their fiddly scroll to get there.)


Yeah, it's a killer feature, but I really wish they had some sort of a documented "search API".

Instead of natural language search, where I have no idea whether it understood me, I wish I could do (modifying your example):

"North Carolina" "Maggie Thomson" "Tom Morgan" -beach 2018

for all photos in NC, with Maggie and Tom, not in a beach from 2018

and even better, if it could tell me the number of results that would show up if we removed each keyword above.

I guess it's a tough problem, even for Google :(


> there's no direct way to click to January 2015 from the results, which really sucks. (Instead you have to go back out of results and use their fiddly scroll to get there.)

It's amusing how people's insights can turn myopic. Search in photos is the killer feature, and it even solves the problem that you have.

If you realize that you need to see photos from January 2015, don't try to scroll back in your photos feed. Just do a second search for "January 2015".


That's worse than click-to-this-photo-in-context though. Maybe I have 4000 photos from January 2015, so it doesn't help to search for the month.


To an extent it does. For example “Seattle night in July” shows me night pictures taken in Seattle the one July I was there a few years ago.


I try to use it often but it works pretty poorly and I always have to scroll through years of photos to look for what I need.

For me the killer feature of Google photos are: - Free storage of photos (hence why I'll move after I run out of free space) - Tagging faces - Sharing albums


It's a great idea that works in a limited way. Getting that next 30% is going to take awhile nevermind natural language queries.


There's more you can do honestly. Search and and assign people so you can find picture with just them. This also works for pets. People, pets , objects, place, etc. Hell, I searched the car I use to drift and it showed up. It's really neat.


The search is really quite fun to play with, and very useful! I also like searching on the map and seeing where I’ve taken photos. Especially if I’m looking for one particular photo, it’s fun to zoom in from the world map


Thanks for pointing that out. I actually had the opportunity to sync my iPhone photos to Google Photos, but opted to decline. This made me reconsider; cheers.


Why would this feature , that is also apart of Photos, make you reconsider ?


As much as I like apple / iCloud / my iPhone, I do like the idea of seeing all the places on a map that I’ve traveled with my lovely wife Emily. We’re hoping to go to the Seychelles if the next three months work out at my contract gig.

I like the idea of being able to type “water” and see a bunch of water bottles mixed in with all the water-y places we’ve visited.

What sealed the deal was to see it on a map. I typed “water” into Photos just now, and it did a pretty good job. But there’s something peculiar about being able to look at a pin and say “I’ve been at that pin.”

Just a silly thing. But it costs me nothing to get it, so I want it.


Yes but I’m saying that’s a feature in Photos right now. So long as the photo has location data , you can see it on the Map in Places on the albums tab.


Thank you! I did finally figure out what you were pointing out. Apparently there is a “places” album, as you say.

For some reason, it only has 40 places, whereas I have 9,987 photos. I definitely have photos from Cancun, so I wonder if the location data somehow got stripped, possibly when I got a new iPhone around 4 months ago… though that doesn’t make much sense to me.

Anyway, I just wanted to say thanks for pointing out that the thing I wanted already exists on iOS, even if it didn’t have a pin on Cancun. I’ll check the exif data someday, perhaps, or sync to Google photos and see if it pops up.

Cheers!


I use this feature occasionally, but it also seems to be pretty bad for the searches I try. For example, if I search for 'dog', I do indeed get pictures back that contain my dog. However, there are a ton of false negatives -- that is to say, the 'dog' search doesn't show me all of the photos that most definitely and very clearly have my dog in them.

And it's not just dogs. Specific people, locations (before I turned of geotagging on my photos), scenery (mountains, outdoors), etc.

Sometimes this search is nice, but it's not good enough that I can really rely on it.


We need to make this stuff local again, that will be the real competitor to big corp Foo... no servers, no end-to-end, no service cost, no ads, no privacy issues, no random revokation of accounts without recourse, just one end - the users. We can have face detection etc locally if people want it... cycles, it's going to happen eventually.


>lets me search my library using text search for anything

This is untrue, and actually one of the reasons I hope a strong competitor to Google Photos comes along soon. The search function is, for whatever reason, heavily censored and perhaps even biased in some circumstances. Worse, it is completely useless. For example, the query "fat" returns nothing, despite the fact that my gallery is filled with drawing reference photos that includes plus-sized people. "Black people" returns photos of non-black people, and (infamously, and perhaps for related reasons re: the shortcomings of Google's image recognition and tagging algorithm) "gorilla" returns 0 results. "Red shirt" returns an image of a blue decorative screen; "comic" returns anime and webpage screenshots; "woman" returns multiple photos consisting entirely of groups of men.

The situation is dire.


Think of it from Google's POV. Imagine if the tabloids found out about a situation of someone searching for 'fat' in the search bar and then it coming back with pictures of themselves or their friends - that could cause some serious controversy.


Well, this gets to the heart of one of the issues with the current approach to AI. Statistical consensus doesn't always align with a user's personal view or desires. I don't know how you solve the problem; my issue is that Google doesn't seem to know, either, but they insist that they do.


To think that someone can just throw their photos in s3 assumes people are ops, devils, or devs. That’s a small slice of the population. What about everyone else?


I also mention Dropbox. I haven't used it for a while though


Besides search another feature of Google Photos that I would need is automatically inclusion of photos in shared albums based upon who is in them. Some examples:

I have an album shared with my parents which photos of my daughter are automatically added to.

I have an album shared with my daughter which photos of our dog is automatically added to.

I also like the collages, slideshows, movies and this day x years ago photos which Google Photos automatically creates and notifies me of.


You're willing to pay the price of those algorithms and the Google ecosystem. Others are not.

I'm excited to review this project. Thanks to the creators.

This has come at a perfect moment ... as, this weekend, I'm literally downloading my entire Google photos archive (one year at a time) to my local harddrive and figuring out a way forward.

I'm done with Google after a 'straw breaking the camels back' moment with their payment system.


For me the features that make Google photo, Google photo are:

* it's free and comes by default with an Android phone.

* it just works.

If you can make an effortless way to get online backups of my photos at a reasonable price while regaining privacy, then I'll switch in a heartbeat without a single thought about any of those ML-based moat features Google has crammed in their service.


I want none of those features.

I want automatic backup, easy sharing, and accessibility from all devices.


Personally I'd find the pure storage and basic categories suitable. I dislike almost all the algorithms. Especially "memories" and shit like that.

Simple and reliable backup and reasonably speedy browsing is what I need.


> If I wanted to just store my photos I'd throw them in a S3 bucket or Dropbox or something.

Neither of those give you any privacy unless you do the encryption yourself in which case you have to build something to access them unencrypted. Have you checked out what the service actually does?


Wouldn't a mega encripted folder make sense for the average person?


Let's say you store your photos in Dropbox but inside an encrypted folder. What would you have to do to view the photos? Unless there client you encrypt your files with has a photo viewer, you'd have to download the pictures and decrypt them to look at them. The whole thing becomes very inconvenient very quickly.


https://mega.io/ is client side. It works like protonmail you decript to view.


On top of this, good algorithms should be run if it is possible to do it in a privacy friendly way.


>So you're going to implement algorithms then?

Jeesh, that's easy.

You encrypt the algorithms too.


I don't want Google at all in my life, so I think this product seems very attractive. But of course it depends on the user, what they value.


Sidenote: are you aware that "Ente" is German for "duck"? :)


If I recall correctly "ente" has a pleasant meaning in Portuguese. Google Translate says it means "loved" but I feel like my paperback dictionary said something else...

Edit: I think it's similar to "being"


Since OP seems to be from Kerala it might be in Malayalam."ente" in Malayalam(Language of Kerala) means "mine".


This!

Also, I've a thing for rubber ducks.

Also, the domain was available. :)


hey, fellow keralite here.. good domain and good luck!


Yes, hence the icon for "simple" @ ente.io :)


what are currently the best open source projects that allow you to fully automate and manage deployment of your own personal (or multi-user) cloud photos/drive storage service? I found:

    [1]: https://github.com/nextcloud
    [2]: https://github.com/Piwigo with S3 extension: https://piwigo.org/ext/extension_view.php?eid=691
    [3]: others https://arstechnica.com/gadgets/2021/06/the-big-alternatives-to-google-photos-showdown/ which mentions the most feature packed to be https://photoprism.app/


I migrated my photo collection to https://github.com/jpsim/AWSPics about a year ago, pretty happy with it (so much so that I ended up contributing a number of features and bug fixes back to it). Basically all you have to do, after the initial setup, is an S3 sync to upload new photos, and a gallery web site and resized thumbnails get generated automatically.

All private, you configure usernames and passwords. The ongoing cost is just that of S3 standard / infrequent-access storage, which for my collection of ~50GB is currently costing me about ~$1/month. In terms of the auto-generated gallery (lambda function that traverses an S3 bucket) and the password-protection (CloudFront Origin Access Identity), you're locked in to AWS. But in terms of the data, you by definition have all the files in a simple folder tree on your local disk too, you can back it up wherever else you want, you can migrate it elsewhere quite easily. And AWSPics itself is open-source.


Add https://lomorage.com, self hosted, cross platform, mobile friendly, support multiple accounts, and login from multiple devices.


Or, just syncthing, if you don't need a specializes photo web interface. They apparently added support for client-side support recently, so you can put it on some random vserver as well.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: