Hacker News new | past | comments | ask | show | jobs | submit login
Common problems with large file uploads (filepicker.io)
72 points by ananddass on July 18, 2012 | hide | past | favorite | 34 comments



Given the title I was expecting the article to provide a solution.

From personal experience, the bigger the file, the more likely you will experience a connection cut in the middle of the upload. That is why the most important thing it to support resumable uploads.

At the moment there is no clear consensus on how to handle that. Amazon S3 has one protocol, Google uses two revisions of a different protocol, one on YouTube[2], another on Google Cloud Storage[3]. Both work by first creating a session that you refer to when uploading the chunks. There is also the Nginx upload module[4] that delegates the session ID to the client for some reason.

And there is no browser client available to my knowledge.

That's all I know folks

[1]: http://docs.amazonwebservices.com/AmazonS3/latest/API/mpUplo... [2]: https://developers.google.com/youtube/2.0/developers_guide_p... [3]: https://developers.google.com/storage/docs/developer-guide [4]: http://www.grid.net.ru/nginx/resumable_uploads.en.html


I miss the ZMODEM protocol. Resumable file transfers over 56kbps was the bomb. Made me feel whole again (pun partially intended) back in '89.


Goddamn I had forgotten all about that haha. Rush of nostalgia from BBS scene.

XMODEM was painful. ZMODEM was leet.


For the HTTP/2.0 discussion there was here earlier:

A way to continue an interrupted file upload.

Because POST variable are sent in order, if you put the file first and the other variables after, the server never sees them if the file was interrupted. So when I code a form I always put the hidden ones first so at least I can give a useful error message (since I know what the user was trying to do).

It would be better to decouple them and upload the files and the rest of the variables separately.


I'd really like to be able to use Dropbox as a magic upload handler for any file I upload on my local HD, not just those in my Dropbox folder. They handle the logic of getting all my files into the cloud. Why can't I point a website to my Dropbox and say here, this is handling the file upload?


You're basically describing https://www.filepicker.io, the company that wrote this blog post.


Dropbox has an API that will (theoretically) let you do this, but there hasn't been a ton of people jumping up and implementing it yet. It'll be cool when it shows up.


I've used the dropbox API before to automatically upload photos in a dropbox folder to Flickr. It occurs on an interval (cron job every 2 mins). I'm sure you could do the same thing using FTP or a custom API that exists on your destination server.


They released a delta API recently. It's a bit of a pain to perform this task as you can't differentiate between a new file and a renamed file or a file that's been moved from one location to another.

In all cases you get a delete (if the file isn't new) and then a new file event.

I implemented it, and it kinda sucks for this sort of thing. The purpose seems to maintain local state to mirror the state on Dropbox. Not terribly interested in that...I just want to subscribe to specific events (webhooks, anyone?).

This was their solution to the frequently requested webhooks. It falls short. Way short.


ifttt.com has a dropbox-to-flickr recipe (http://ifttt.com/recipes/6804). Their dropbox channel provides 2 triggers, one for any new file in your public folder, and one specifically for new photos in your public folder (it doesn't say exactly what the definition of "photo" is, though).

I haven't tried it, so I don't know how gracefully it handles renames or moves.


Dropbox only supports files up to 150MB via their API. I inquired about bumping the limit on a per-app basis; no response.


Doesn't matter too much if you're just using Dropbox as a storage backend for a more complex service. If your box is full of encrypted 2MB segments backing a pseudo-filesystem you can upload files as big as you like and not have to worry about broken connections.

The best part is that you can use other services with similar APIs for pseudo-RAID or just a union-mount. Sharing is more difficult than vanilla Dropbox, of course, but I'm working on it...


8gb+ files? I found a way but you have to use a JAVA FTP Applet. I tested these two here: http://jupload.sourceforge.net/ and http://www.jfileupload.com/

Dragged and dropped an 8gb+ file and left it on for 5 hours. Worked perfectly. No time outs, no errors, and I'm on a shared hosting account at 1and1.

My problem with them is that it wasn't possible to hide the FTP username and password, they were always in javascript files. I whined, I complained, I bitched, and there was nothing they could do about it. :( So you basically had to password protect the whole directory with .htaccess and be very careful with whom you shared the credentials.

If you don't want people to download and install software just stick with JAVA FTP Applets.


What exactly did you expect them to do about it? For a client-side tool to establish a plain FTP connection, it needs to possess authentication credentials.


You could always just hard-code the username/password into the applet and recompile. That shouldn't be too hard...

Or, if you control the FTP server, you could dynamically add and remove random virtual users/passwords to the FTP server (hopefully virtual users). Then when the client javascript gets the username/password, it could only be used once.


One could scan the .class file for string literals with relative ease. Obfuscation would be an improvement, but still not completely secure.


It would hardly be an improvement. Wireshark would be a first step for most reverse engineers when there's network authentication involved.


Well, we are talking about FTP, so the string is going to travel over the line in plaintext anyway.


It's been a long time since I've been on shared hosting, but I thought they usually offered some kind of anonymous upload-only FTP directory. Couldn't your users upload to that and then your application can read from that directory?


I've been dealing with browser-based large file uploads, which means dealing with lots of browser-specific issues.

Fortunately, things are getting better, especially for the webkit-based browsers. Firefox still has some issues, and I check https://bugzilla.mozilla.org/show_bug.cgi?id=678648 pretty regularly. Just today this bug, which was filed in 2003, changed from Status = NEW to Status = ASSIGNED.

Today is a good day.


To clarify, bug 678648 was logged in 2011 and is marked as a duplicate of bug 215450 (the one from 2003): "uploading files that are larger the 2GB fails" @ https://bugzilla.mozilla.org/show_bug.cgi?id=215450


First, I'm impressed that someone was uploading 2gb files back in 2003...

Agreed. Good to see that firefox is going to be able to do more than 2gb soon.


That title is a bit misleading.. On some platforms you can already use Firefox to do >2GB uploads, but there is still a 4GB limit..

If anyone wants to help beta-test a HTML5 uploader that calls archive.org's S3-like endpoint under the hood (no IE or Opera support yet, though Opera 12 is now working..): http://archive.org/upload/


I've experienced this issue before when establishing a publisher backend for a D2D pc game business. It seems to be basically impossible without a Java applet of some kind, and even then it's wonky at best and just 'fails' at worst. The real fix for the issue seemed to be simply providing an FTP connection and letting people connect through the native client of their choosing.

That really seems to be the key for this problem, develop a simple native app capable of FTP uploads, that make it easy for users to deliver files to your app within the context of their use. Most browsers are capable of opening native applications via unique protocol, you could easily enrich the process by having the native app be a part of(or try to blend seamlessly with) major browsers.


As plenty of file transfer protocols, clients, and servers support resumable transfers (FTP, SFTP, rsync, proprietary browser-based tools, etc., or even basic HTTP if you arrange for the file to be pulled rather than pushed and your "client's server" has byte-range support), perhaps this should be titled "why you shouldn't use a single HTTP POST request from a browser to upload a large file". The general reason seems to be "because this is not a use case this feature is commonly designed for and tested against."


I ran into this problem with https://truefriender.com/ the solution I used was to use nginx instead of apache, nginx streams the file to disk and then I can handle it with PHP. I still have the 2GB problem but I've tested out Perl and I can go past it, now I just have to implement it.


Being on Herkou, been bit many times by the 30 second time out. No luxury of changing it, let alone moving in nginx.


It may not work for ginormous files, but I've used a flash swf object to upload to s3, released as part of a Rails gem. The latest version is here: https://github.com/nathancolgate/s3-swf-upload-plugin


Hi everyone. We developped a solution just for that! Please feel free to look at http://forgetbox.com and give us feedback.

Our users send 130GB files, directly from Gmail...


Excuse me if this is a stupid question, but why would timeout issues on large files affect something like Heroku more often than other types of hosting services?



I use node.js with this plugin: https://github.com/felixge/node-formidable/

Works like a charm!


I immediately thought of node with this post. Makes uploading and streaming a breeze!


split them into rar/zip files with checksums on client side then upload...




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: