Hacker News new | past | comments | ask | show | jobs | submit login
Realtime encoding - over 150x faster (transloadit.com)
128 points by felixge on Dec 20, 2010 | hide | past | favorite | 70 comments



Maybe, if we keep bringing it up on here, startups will finally start putting a couple of lines of boilerplate at the top of their blog pages so we will know what on earth it is that they do when we end up there from HN.

A guy can dream, can't he?

At least this time the logo at the top links back to the main product page. Oh, and I see that the title text of the logo has just such a description! You're almost there guys!


Thanks for the suggestion, we definitely screwed that up.


Yes, and by the way they could put video NOT in autostart mode, so that when you open a dozen links from an aggregator like HN, there isn't some guy's voice or worse, cheezy demo music blaring from the PC coming from an unknown tab deep down the stack.


You have the same name as a caching middleware for Django that has since gone derelict.

This makes me sad. :(


I don't really get the 150x bit:

"Since our servers can encode video much faster than most of your users can upload it, this means there is literally no more delay between the end of the upload and the video finishing encoding. In the screencast above this makes a 150x speed difference."

Surely the upper bound is 2x if you could transcode faster than the upload before.

Unless you are just measuring the time between the upload finishing and the transcode being done. But why would a user care about that metric rather than the total elapsed time?


> Unless you are just measuring the time between the upload finishing and the transcode being done.

That is what we are measuring indeed.

> But why would a user care about that metric rather than the total elapsed time?

Because that is the time where the user feels he has already done his part, but the site he is on is taking forever to do what he needs it to do.

Another reason why we measure the time from upload done to encoding done is because that's what our customers pay us to do.


With the exception of video formats (you mention quicktime) that puts the metadata at the end, how about real-time preview as the file is still uploading? Even a series of static thumbnails every 10 seconds would be an interesting new feature.


That's a very cool idea - we'll add support for that as well : ).


I demand you address the OP question of why you haven't fixed the physical constraints of crappy upload bandwidth? Seriously, wtf would we pay your for otherwise? </snark>


As funny as this sounds, fixing the upload bandwidth problem is also something we will work on. Getting a good route between your users and servers can make a real difference.

So at some point in the future we'll offer upload servers in all major geographic areas.


It's interesting how difficult this can be. I was recently evaluating Linode locations for a new virtual server, and though I'm geographically much closer to California, the Texas location gave me almost 20x the download speed. To be precise, I was able to download from a California node at 300K/s, while I got close to 6MB/s from Texas.

Comcast used to route my traffic to California through Seattle, Washington, then down to San Jose and then Fremont, but now, it's going to Texas first, then across to San Diego, then up to Fremont, across a saturated link.


Yeah, why don't they make a 150x faster javascript encoder, then I only have to upload much less data! </derp>


You're derping, but a browser plugin seems like it might actually be a good idea for this. Even if JS is too slow, someone who uploads lots of video through a site might well be willing to install a little plugin to make it faster, especially if it comes with other simple things like queuing a directory of videos, etc.


Wonder if something like the Google Native Client would be good for this in the future? (Don't know a whole heap about it so may be off the mark)


NaCl might be good for doing the transcoding on the client side before uploading - and would be especially helpful if the transcoding is a downsampling (i.e. reducing stream length), and if the client machine is beefy enough to do it without adversely affecting performance when pumping out bits at the rate of the upload connection.

However, it wouldn't be particularly useful for things like queuing uploads from a directory. The point of NaCl is running native machine code in a provably secure sandbox. It's about making possible web app features faster on the client side, rather than adding new capability, per se.


Most users who have uploaded videos are probably aware that it isn't available right away, but may not really understand why as far as encoding goes. If that process is sped up by such a significant factor, they don't have to worry about it. The 150x number is at worst misleading if you take Amdahl's law into consideration, but upload speeds really aren't at the mercy of site owners. Improving any part of the pipeline is going to be a big win, especially for video.


It's nice to see people thinking about processing input as a stream rather than waiting for the entire message to be received before doing anything.

If you start to think about input and output in web app as streams rather than buffered data, alot of neat possibilities arise for reducing latency.


This is often true, but not always - some video formats put header data at the end of the file, not the beginning, so you can't just start encoding as bits come in. Or if you can, you're encoding blind.


Sure, but we think it's possible to "prepare" those videos on the client site before uploading.


Youtube have been doing this for a long time now, it's great watching a video being processed and "read" as it's uploaded, it's the even more user friendly step on from a progress bar, makes Youtube uploading more "fun".


Do you have any references for youtube doing this? Youtube is the example I always mention when it comes to long waiting times after an upload : ).


I don't believe YT transcodes during upload, since last I checked they are doing 2-pass encoding (its been awhile though).

However they do examine the buffer during an upload. Try uploading an MOV with bad edit points, they will warn you about A/V sync issues before the upload is complete. I assume they also abort uploads for invalid files.


A video starts playing the moment you navigate to that page. Note to OP: please add a play button to your video. The video should start only if the user has specifically requested it. It took me a minute to hunt down the tab that was the source of noise in my otherwise quiet work environment and, once I found it, I killed the tab without even glancing at the page title.


As you can probably tell, we are super excited about this, feel free to ask me anything : ).


What have you done about the security implications of pushing user-provided data into ffmpeg straight off the wire? How worried about this are you? Obviously ffmpeg was not built with untrusted input in mind. I have only paranoia as a justification, but I bet I could make it crash pretty hard, and with a bit more work I bet I could design an input that caused ffmpeg to just fill your disk forever.

How much work have you had to do around these issues? Did they prove to be bigger or smaller than you anticipated?


I'd bet that ulimit, quota, and chroot would go a long way to solving those issues.


This is really neat, but since you said "anything"... I have a legal question that I haven't been able to find the answer for:

How do you deal with the licensing issues regarding open source encoding software? Do you pay the MPEG-LA a fee directly for use of the software? Is it per file/per minute/flat fee?

Just wondering about the mechanics of this, mainly for an in-house streaming application.


IANAL, but generally, you need a license if you're using open-source encoders or decoders and you're the one who actually compiles them. (Again, not a lawyer, but that changes ffmpeg from "a source code description of an encoder" to "an encoder.")

MPEG video stuff is generally pretty cheap or free for low volume. Audio codecs are fairly expensive, on the other hand. AAC + MP3 + AMR is >$20K in minimum fees.


No, we don't pay the MPEG-LA (yet). The information they provide has given us reason to believe that we don't need a license at this point.


How many ffmpeg instances are you holding in memory? You have a single ffmpeg process for each upload that is happening, do you? Additionally, if I am uploading files with B-frames and GOPs of 300 (which I assume will lead to a lot of frames being kept in memory, not sure about this), is your RAM requirement not going to be massive if you have several uploads happening at the same time? And considering the length of time a single video takes, it's reasonably likely you'll have a lot running in parallel, is it not?


We have one ffmpeg process for each upload, yes. From our tests with the kind of videos we mostly deal with (user generated content), this seem to work quite well while only requiring reasonable memory allocations.

So sure, there could be problems we don't know about yet, but that's the price for doing something nobody has done before. We will deal with these problems as they come up. In the worst case that might mean falling back to normal encoding if a ffmpeg child process starts to misbehave.


So you're using plain command line ffmpeg or are you feeding the frames yourself? How do you throttle the speed of ffmpeg?


We are feeding raw bytes into a command line ffmpeg process via standard input.

Ffmpeg only reads as much from stdin as it can process. Node.js gives us a 'drain' event when this happend, which means we can pump more data into it. So we are no throttling the speed, just the flow of data.


As a curious outsider, I have to ask why are you giving away the recipe to your secret sauce ? I mean, there are several of your competitors such as zencoder , etc. that could easily copy this idea and implement it . So why would you blog about implementation know-how that made your product cooler and enabled you to command premium pricing ?


I don't really think it matters for a number of reasons. Chief among them being motivation. If you are motivated enough to copy this idea then you would reverse engineer it as soon as you heard what they are doing without knowing exactly how, which is what I did knowing that they were based on nodejs and I know how nodejs handles buffers.

Ultimately, if you are a provider of services, these or any other, you will add functionality or run the risk of being eclipsed.


Well, anybody in the video encoding business will know how we did it. It's kind of painfully obvious. So it's not so much spilling the secret sauce.

Also, we don't believe individual features themselves will give us a long term advantage. Consistently being a leader of innovation however, will.


Looks great. Does this do encoding of streaming video or only files?


It only encodes uploaded files.


How will this cool feature help your startup make money? How much cost in time and effort did it take to add the feature?


1. We are infinitely [1] faster at converting video uploads than our competitors now. Speed matters. This is a strong USP.

2. Publicity. There are a bunch of features that would certainly be equally useful to our customers. However, none of them would get much interest from hacker news. We got a lot of signups and page views from this.

3. It took me probably 4 full days [2] to implement this, including the screencast and blog post. That's reasonable for the value added by this feature.

--fg

[1] We are really infinitely faster if the upload speed is slower than our encoding speed. This is probably the case for 99% of all consumer internet connections at this point. In any case we reduced the time required for an encoding from SUM(Upload, encoding) to MAX(Upload, Encoding).

[2] We already made a lot of technology decisions that lowered the cost of implementing this. A competitor would certainly need much more resources to pull this off.


Is this limited to 1-pass encoding only?


yes, 2-pass encoding will not be doable in realtime.


One small question: if a user has their upload stream redirected to your service, and for some reason this upload is unable to finish, are we now forced to have the user try the upload again? It seems one advantage of the two step process would be the ability to try the process again on behalf of the user rather than making them wait and that should be weighed into the convenience formula.

I've never used your service so I'm not sure exactly how the upload stream is redirected to your platform, so this concern might not be totally valid if the upload is running through the client platform anyways.


> if a user has their upload stream redirected to your service, and for some reason this upload is unable to finish, are we now forced to have the user try the upload again?

If the upload doesn't finish the user needs to redo it.

> It seems one advantage of the two step process would be the ability to try the process again

You got me confused here. If the upload never finished, there is nothing you can do to fix this.

That being said, resumable file uploading is the next thing we'll tackle.


> It seems one advantage of the two step process would be the ability to try the process again

I should have been more specific: consider the special case that the encoder on your side encounters a random error whereas just a "plain" upload to the client platform would have finished, so precluding any network transmission errors.

Although, with resumable uploading and a simple go-between service on the client platform between the user and your platform all manner of fanciness could be achieved, I think. Thanks for quick response!


Your question is answered in the link itself. Did you read it?


The biggest challenge to me seems to be spinning up instances once you get a spike of realtime jobs in parallel. Keeping the 'realtime' promise is often hard once your systems go into production.


Of course. But since most people upload with insanely slow speeds, we can actually handle a spike much better with this feature than we could before : ).


I was thinking this should be done in realtime on the client machine and streamed up the server. Then you cut down on both transfer and encoding time, and use far less server resources.


This means shipping a plugin, or porting ffmpeg and friends to flash / javascript. Maybe something to think about if we ever decide to raise VC : ).


Isn't this a great use case for Native Client?

Maybe it's not ready but it seems to be something to look forward to.


Isn't the whole purpose of pre-encoding before uploading is so that you shrink the file size and upload faster?

Why would I want to upload 4GB of video when I can encode it down to 700MB then upload?


It's not for end-users. It's a service for developers to add to their web application. You can certainly try to coax all your users into pre-encoding their video - and understanding how to do that - if you like....


Kudos on launching the new feature!

The pricing is a tad expensive, but doesn't look bad at all when you consider the need for an on demand encoder for the entire time that the user is uploading.


We know we're a little expensive. But we did this so we could lower pricing in the future (which we will), and to attract people who are getting more value out of our service initially.

If you have a project and pricing is a deal breaker, just email us and we'll set you up with a discount.


"While this sounds easy in theory, it is rather difficult to pull off on most stacks."

Why is it difficult on most stacks? Because it's tying up a request handling thread?


Because most stacks don't have a single threaded, non-blocking I/O event loop.

Sure you can do this with threads, but it's gonna get very tricky to pump data between a socket, file and process using threaded programming.


First, I agree that it'd be silly not to use thread-per-CPU non-blocking I/O.

However, I don't see why it'd be tricky in a threaded server. The thread gets an fd to recv() the incoming data from, and it popen()s an ffmpeg process then loops to recv() and write() the data until done.


I'm not saying it can't be done. In fact the multipart parser we have build for transloadit has already been ported to C++ [1] and I imagine Java has decent libraries as well.

But most people just sit on a request/response oriented Python/Ruby/PHP stack with possibly stupid buffering load balancing in between (nginx buffers uploads).

If you build this from ground up, it is certainly possible with a lot of technologies.

[1] https://github.com/FooBarWidget/multipart-parser


Is that "contact us" link correct? It adds an e-mail address to the end of the URL, which conveniently opens a comment page, but it still seems wrong.


Ouch, fixed it. Thanks for letting me know!


It is not faster, just parallel processing with the upload buffer. This would be a nice feature to add to most web servers/web scripting environments.


The fact that it's not actually faster is the entire beauty of it. While our competitors have been super busy to actually make their encoding faster, we have just taken a huge bite of free launch by making the encoding happen in parallel.


You may want to avoid using the word "sucks" in a professional context because there is a population of people for whom the word evokes the idea of oral sex. Try substituting "is not so good." This will have the added virtue of being super charming in your German accent.

edit: I usually don't bitch about being downmodded. But don't you guys know any old people? And know that old people tend to be in charge of things? In any event, you shouldn't rely upon business leaders of any age being ignorant of the language.

-http://www.thefreedictionary.com/sucks 5. Vulgar Slang To perform fellatio on.

-suck, Old English sucan, corresponding to Latine sugere "to suck." It's of imitative origin. Meaning "do fellatio" is first recorded 1928.

-Slang sense of "be contemptible" first attested 1971


Here's the thing: Sure. "Old people" exist. They are often in charge of things. Many of them read blogs and surf the internet. Do you really, honestly think that somebody who surfs the internet and reads blogs, especially blogs about video encoding, won't be aware of the current meaning of the word?

Honestly?

You see the word "sucks", in the context of something not being very good, everywhere. Literally everywhere. I bet that if you googled any tech-related keyword, somewhere in the first ten results, "sucks" would be employed in the context of "is not so good".

My point is this: People that read that blog post _will_ know what "sucks" means in that context and will not infer anything sexual from it - and even if they did, it's hardly the end of the world. Surely nobody who spends any reasonable amount of time reading blogs about video encoding will be in the slightest offended by fellatio. After all, "old people" are no different to "young people" except they have more experiences under their belt (hyuk hyuk).


Luckily, centenarians aren't typically in the market for buffered video encoding vendors.


Since when has the word "sucks" evoked that?


Since before it hung out with the hip words.


I remember the difference happening about 12 years ago, though my neighborhood was probably behind the times somewhat. I remember that some of my fellow 7th graders and I were used to the old meaning, but it wasn't too hard to transition to the new one since we didn't really use it much anyway. Also, I remember a trailer for Bicentennial Man (1999) using the more "evocative" meaning.


Sure we know old people who may be offended by that word. But how many of those old people will be reading this blog post?




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: