Common problems with large file uploads

zimbatm · on July 19, 2012

Given the title I was expecting the article to provide a solution.

From personal experience, the bigger the file, the more likely you will experience a connection cut in the middle of the upload. That is why the most important thing it to support resumable uploads.

At the moment there is no clear consensus on how to handle that. Amazon S3 has one protocol, Google uses two revisions of a different protocol, one on YouTube[2], another on Google Cloud Storage[3]. Both work by first creating a session that you refer to when uploading the chunks. There is also the Nginx upload module[4] that delegates the session ID to the client for some reason.

And there is no browser client available to my knowledge.

That's all I know folks

[1]: http://docs.amazonwebservices.com/AmazonS3/latest/API/mpUplo... [2]: https://developers.google.com/youtube/2.0/developers_guide_p... [3]: https://developers.google.com/storage/docs/developer-guide [4]: http://www.grid.net.ru/nginx/resumable_uploads.en.html

otoburb · on July 19, 2012

I miss the ZMODEM protocol. Resumable file transfers over 56kbps was the bomb. Made me feel whole again (pun partially intended) back in '89.

wamatt · on July 19, 2012

Goddamn I had forgotten all about that haha. Rush of nostalgia from BBS scene.

XMODEM was painful. ZMODEM was leet.

ars · on July 18, 2012

For the HTTP/2.0 discussion there was here earlier:

A way to continue an interrupted file upload.

Because POST variable are sent in order, if you put the file first and the other variables after, the server never sees them if the file was interrupted. So when I code a form I always put the hidden ones first so at least I can give a useful error message (since I know what the user was trying to do).

It would be better to decouple them and upload the files and the rest of the variables separately.

tagx · on July 18, 2012

I'd really like to be able to use Dropbox as a magic upload handler for any file I upload on my local HD, not just those in my Dropbox folder. They handle the logic of getting all my files into the cloud. Why can't I point a website to my Dropbox and say here, this is handling the file upload?

cypherpunks01 · on July 18, 2012

You're basically describing https://www.filepicker.io, the company that wrote this blog post.

girasquid · on July 18, 2012

Dropbox has an API that will (theoretically) let you do this, but there hasn't been a ton of people jumping up and implementing it yet. It'll be cool when it shows up.

banana_bread · on July 18, 2012

I've used the dropbox API before to automatically upload photos in a dropbox folder to Flickr. It occurs on an interval (cron job every 2 mins). I'm sure you could do the same thing using FTP or a custom API that exists on your destination server.

jmathai · on July 18, 2012

They released a delta API recently. It's a bit of a pain to perform this task as you can't differentiate between a new file and a renamed file or a file that's been moved from one location to another.

In all cases you get a delete (if the file isn't new) and then a new file event.

I implemented it, and it kinda sucks for this sort of thing. The purpose seems to maintain local state to mirror the state on Dropbox. Not terribly interested in that...I just want to subscribe to specific events (webhooks, anyone?).

This was their solution to the frequently requested webhooks. It falls short. Way short.

jleader · on July 18, 2012

ifttt.com has a dropbox-to-flickr recipe (http://ifttt.com/recipes/6804). Their dropbox channel provides 2 triggers, one for any new file in your public folder, and one specifically for new photos in your public folder (it doesn't say exactly what the definition of "photo" is, though).

I haven't tried it, so I don't know how gracefully it handles renames or moves.

meanguy · on July 18, 2012

Dropbox only supports files up to 150MB via their API. I inquired about bumping the limit on a per-app basis; no response.

repsilat · on July 18, 2012

Doesn't matter too much if you're just using Dropbox as a storage backend for a more complex service. If your box is full of encrypted 2MB segments backing a pseudo-filesystem you can upload files as big as you like and not have to worry about broken connections.

The best part is that you can use other services with similar APIs for pseudo-RAID or just a union-mount. Sharing is more difficult than vanilla Dropbox, of course, but I'm working on it...

ChrisNorstrom · on July 18, 2012

8gb+ files? I found a way but you have to use a JAVA FTP Applet. I tested these two here: http://jupload.sourceforge.net/ and http://www.jfileupload.com/

Dragged and dropped an 8gb+ file and left it on for 5 hours. Worked perfectly. No time outs, no errors, and I'm on a shared hosting account at 1and1.

My problem with them is that it wasn't possible to hide the FTP username and password, they were always in javascript files. I whined, I complained, I bitched, and there was nothing they could do about it. :( So you basically had to password protect the whole directory with .htaccess and be very careful with whom you shared the credentials.

If you don't want people to download and install software just stick with JAVA FTP Applets.

Ralith · on July 19, 2012

What exactly did you expect them to do about it? For a client-side tool to establish a plain FTP connection, it needs to possess authentication credentials.

mbreese · on July 19, 2012

You could always just hard-code the username/password into the applet and recompile. That shouldn't be too hard...

Or, if you control the FTP server, you could dynamically add and remove random virtual users/passwords to the FTP server (hopefully virtual users). Then when the client javascript gets the username/password, it could only be used once.

cstavish · on July 19, 2012

One could scan the .class file for string literals with relative ease. Obfuscation would be an improvement, but still not completely secure.

gue5t · on July 19, 2012

It would hardly be an improvement. Wireshark would be a first step for most reverse engineers when there's network authentication involved.

mbreese · on July 19, 2012

Well, we are talking about FTP, so the string is going to travel over the line in plaintext anyway.

stellar678 · on July 19, 2012

It's been a long time since I've been on shared hosting, but I thought they usually offered some kind of anonymous upload-only FTP directory. Couldn't your users upload to that and then your application can read from that directory?

rajbot · on July 18, 2012

I've been dealing with browser-based large file uploads, which means dealing with lots of browser-specific issues.

Fortunately, things are getting better, especially for the webkit-based browsers. Firefox still has some issues, and I check https://bugzilla.mozilla.org/show_bug.cgi?id=678648 pretty regularly. Just today this bug, which was filed in 2003, changed from Status = NEW to Status = ASSIGNED.

Today is a good day.

bobf · on July 18, 2012

To clarify, bug 678648 was logged in 2011 and is marked as a duplicate of bug 215450 (the one from 2003): "uploading files that are larger the 2GB fails" @ https://bugzilla.mozilla.org/show_bug.cgi?id=215450

liyanchang · on July 18, 2012

First, I'm impressed that someone was uploading 2gb files back in 2003...

Agreed. Good to see that firefox is going to be able to do more than 2gb soon.

rajbot · on July 19, 2012

That title is a bit misleading.. On some platforms you can already use Firefox to do >2GB uploads, but there is still a 4GB limit..

If anyone wants to help beta-test a HTML5 uploader that calls archive.org's S3-like endpoint under the hood (no IE or Opera support yet, though Opera 12 is now working..): http://archive.org/upload/

t4nkd · on July 18, 2012

I've experienced this issue before when establishing a publisher backend for a D2D pc game business. It seems to be basically impossible without a Java applet of some kind, and even then it's wonky at best and just 'fails' at worst. The real fix for the issue seemed to be simply providing an FTP connection and letting people connect through the native client of their choosing.

That really seems to be the key for this problem, develop a simple native app capable of FTP uploads, that make it easy for users to deliver files to your app within the context of their use. Most browsers are capable of opening native applications via unique protocol, you could easily enrich the process by having the native app be a part of(or try to blend seamlessly with) major browsers.

jasomill · on July 19, 2012

As plenty of file transfer protocols, clients, and servers support resumable transfers (FTP, SFTP, rsync, proprietary browser-based tools, etc., or even basic HTTP if you arrange for the file to be pulled rather than pushed and your "client's server" has byte-range support), perhaps this should be titled "why you shouldn't use a single HTTP POST request from a browser to upload a large file". The general reason seems to be "because this is not a use case this feature is commonly designed for and tested against."

abemassry · on July 18, 2012

I ran into this problem with https://truefriender.com/ the solution I used was to use nginx instead of apache, nginx streams the file to disk and then I can handle it with PHP. I still have the 2GB problem but I've tested out Perl and I can go past it, now I just have to implement it.

liyanchang · on July 18, 2012

Being on Herkou, been bit many times by the 30 second time out. No luxury of changing it, let alone moving in nginx.

kookster · on July 18, 2012

It may not work for ginormous files, but I've used a flash swf object to upload to s3, released as part of a Rails gem. The latest version is here: https://github.com/nathancolgate/s3-swf-upload-plugin

severin · on July 19, 2012

Hi everyone. We developped a solution just for that! Please feel free to look at http://forgetbox.com and give us feedback.

Our users send 130GB files, directly from Gmail...

zampano · on July 19, 2012

Excuse me if this is a stupid question, but why would timeout issues on large files affect something like Heroku more often than other types of hosting services?

re · on July 19, 2012

Heroku enforces the timeout.

https://devcenter.heroku.com/articles/request-timeout

https://devcenter.heroku.com/articles/s3#file_uploads

http://stackoverflow.com/questions/7854239/heroku-timeout-wh...

graup · on July 18, 2012

I use node.js with this plugin: https://github.com/felixge/node-formidable/

Works like a charm!

prezjordan · on July 19, 2012

I immediately thought of node with this post. Makes uploading and streaming a breeze!

frytaz · on July 18, 2012

split them into rar/zip files with checksums on client side then upload...