How 500px serves up 500TB of high res photos

callumjones · on May 15, 2015

Interesting that they upload to S3 first, only to pull it off soon after and process it.

I imagine this reduces complexity and lets them increase their throughput of uploads but I wonder if 500px toyed around with the idea of first uploading it to local (temp) storage, having the workers process it and then finally persist it. But I guess the cons are a) some point it needs to go to S3 in original form b) increases data loss

Renat · on May 16, 2015

I'm the CTO of 500px. Decoupling uploaders and converters simplifies the overall design and makes it more robust. Converters are a part of the legacy design that we are planning to retire. In the past, we would generate multiple sizes and crops for every image so it can be efficiently displaced on the site and/or mobile apps.

Today, we rely on the new Resizer service that can resize and crop images on the fly. It's an interesting piece of technology that we will be writing about on our blog: developers.500px.com

ranrub · on May 16, 2015

Thanks for sharing this and being on HN to answer - very interesting architecture! How do you deal with customers from remote locations (Asia, Oceania) complaining about slow upload speeds?

bliti · on May 16, 2015

I will keep posted to your blog. Thanks for sharing the link. Always looking to learn about memory heavy operations. Currently am the lead dev on software that revolves around user generated content (images and audio). Its an ever ending road of (fun) challenges.

renaudg · on May 16, 2015

Is there a specific reason why you didn't use an off the shelf open source solution like Thumbor ( http://thumbor.org/ ) ?

Renat · on May 16, 2015

performance is the main reason. Additionally we needed a solution to modify images on the fly (watermarking and attribution). That's why we decided to build our own service using go

frik · on May 16, 2015

I assume you still generate a small thumbnail for every picture, right?

discardorama · on May 16, 2015

I am surprised that it is faster to resize the image, than to pass the (resized) bits right through. Is the outbound network a bottleneck?

kalleboo · on May 16, 2015

It could be that there are a lot of images/versions that simply never get seen, or seen rarely enough that it's inefficient to store them. And they're being cached by the CDN anyway, so popular images will still only be generated once per edge.

Renat · on May 16, 2015

bingo! Extra storage for all the resized images that never get seen, lack of flexibility to introduce new sizes, plus requirements for watermarking and attribution on images are the main reasons why we are moving away from pre-conversion to dynamic image resizing

Melraidin · on May 17, 2015

Generating many conversions for all photos incurs both storage and computational overhead that may never pay off. By only generating the required images on the fly for photos we avoid this overhead while allowing more flexibility. If watermarks or attribution change in the future it's easier to have the resize service in place to handle this as images expire from the CDN than to do a batch job to pre-process the updated conversion requirements for all existing photos.

Also note that photos are stored indefinitely, so any additional permanent storage (new conversions) would add additional storage costs, miniscule as they might be, forever.

yeukhon · on May 15, 2015

To me this is a very common practice. At least in my experience a lot of AWS processing are done with S3 throughout the pipeline, whether you run EMR or run simple workflow. You can also create API/temp credential per user in your app allows the user to upload things to "storage" (but in this case S3) and process from there.

ranrub · on May 15, 2015

And with Lambda based on S3 events, you can run stuff automatically on that as well. However, if you want your uploaders to get feedback on their uploads immediately and/or synchronously, you need to upload to app servers.

Melraidin · on May 16, 2015

I'd like to be able to have users upload directly to S3 (obviously less infrastructure and code to maintain) but without being able to provide immediate feedback on the upload I've found it to be preferable to have our own application servers in the upload path. This lets us immediately detect unsupported formats and act on the completion of the upload (whether successful or failure) without delay.

ranrub · on May 15, 2015

It also means that you need to scale your upload servers (IO & long requests), this is more difficult than trusting S3 to scale for you. However having your S3 open to uploads has it's downsides. We at Cloudinary do uploads to our own (autoscaled) instances and only then persist.

e12e · on May 16, 2015

My initial thought was: wow, this looks like a great validation of the idea behind Joyent Manta[1] storage+process. I wonder if it would've been a good fit for 500px, and how it would compare in terms of price/performance? Granted, they now have a working system on s3, so maybe for the next up-and-comming competitor? ;-)

Great write-up, btw. And thanks for the heads-up about vips/nip2 -- I wasn't aware of those.

[1] https://www.joyent.com/object-storage

dataminer · on May 15, 2015

Thumbs up for VIPS, used it 4-5 years ago, for building a system which processed high resolution images. Its fast, lean and an amazing piece of software.

fotios · on May 19, 2015

Totally agree, I've used it for some projects as well. The only thing I wasn't able to grok (which still kills me to this day), was that I couldn't get it to handle EPS files correctly. If anybody has done that with VIPS/IM, I would love to hear about it.

Melraidin · on May 19, 2015

VIPS should be able to open EPS files via libmagick but at least in my test I found that I ended up with just a black box. Maybe related to the below bug? https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=170649

jeffreyrogers · on May 22, 2015

I'm replying here to let Melraidin (who posted a sibling comment) know that their account is dead, for reasons I can't understand since their comments seem helpful.

Renat · on May 16, 2015

Performace is the reason why we switched from ImageMagick

Melraidin · on May 15, 2015

VIPS has been great to work with. The docs are generally very good and the performance has been great. The author (https://github.com/jcupitt/) has always been responsive to issues and questions and seems to be constantly working on the project.

My only, relatively minor quibble, is dev work happening on master, but when it's largely a one-man project who am I to judge. So far we haven't come across any problems that weren't either self-inflicted or already fixed in master.

quasse · on May 15, 2015

Are they somehow intercepting middle click on their site so that links do not open in a new tab? Control-click works fine for me, but middle click would just replace the article, which was inconvenient. (latest Chrome on Ubuntu)

leftnode · on May 15, 2015

I'm getting the same thing and it's incredibly frustrating on an otherwise interesting article. I had to Right Click > "Open In New Tab" for links I wanted to read later.

distantsounds · on May 15, 2015

Same here, Chrome on W7.

aroch · on May 15, 2015

Middle-click seems to be working fine here (FF41, OSX)

nols · on May 16, 2015

Same for me (Chrome on Mint)

Retr0spectrum · on May 15, 2015

Works for me on Firefox 38 on Arch Linux.

happytrails · on May 16, 2015

All the services! An Engineer's paradise.

cconcepts · on May 16, 2015

I was getting excited about using 500px for the specific reason that it seems to load much faster than flickr. Then I loaded it up in a public space to get some inspiration by browsing other user's work - interspersed were various images of women in various states of undress. I quickly exited the site and never went back for fear of someone glancing over my shoulder and assuming I'm doing something other than getting artistic inspiration.

Not arguing that the images aren't art but I personally don't want them popping up on my screen when I'm just trying to get inspiration for landscape photography.

dagw · on May 16, 2015

500px at least used to have a feature to autofilter out nude photos.

Renat · on May 16, 2015

we still have "Display Adult Content" flag that can be disabled on https://500px.com/settings

cconcepts · on May 17, 2015

Cool, thanks for that. It was a knee-jerk reaction to bail on you folks the way I did but I thought perhaps I wasn't the only one.

ranrub · on May 15, 2015

Popular photo sharing company opens the kimono on their photo processing stack.