Hacker News new | past | comments | ask | show | jobs | submit login
Video streaming at scale with Kubernetes and RabbitMQ (alexandreolive.medium.com)
288 points by thunderbong 11 months ago | hide | past | favorite | 92 comments



I thoroughly appreciated this article as I've been building a short-form video content streaming service and the performance hasn't been what I expected.

Granted, I knew that my service needs to be able to scale at different bottlenecks, but a lot of "build your own video service!" tutorials start with:

- Build a backend, return a video file

- Build a frontend, embed the video

And that leaves a lot to be desired in terms of performance. I think the actual steps should be:

- Build a backend that consists of:

  - Video Ingestion service

  - Video Upload / Processing Service that saves the video into chunks

  - Build a streaming service that returns video chunks
- Build a frontend that consists of:

  - Build or use a video streaming library that can play video chunks as a stream
Edit: From the author's links, I found this website which is very informative: https://howvideo.works/


I helped work on howvideo.works, fun to see it helping people! The world of video is, I'd argue, one of those technical spaces that is extremely iceberg-y. You can get decently far enough using S3 + the HTML5 video tag, which I think creates a perception among some that video is just images but a little bigger, but that couldn't be further from the truth. You can really pick just about any step along the video pipeline from production to playback and go as deep for as many years as you'd like.

This is both a semi-shameless plug and probably a few levels deeper than what you're looking for, but I organize a conference for video developers called Demuxed. The YouTube channel[1] has 8 years worth of conference videos about streaming video (and the 9th year is happening in a couple of weeks). The bullet points you mentioned are definitely covered across a few talks, but it's certainly not in any kind of "how to" format.

[1]: https://youtube.com/demuxed


I'm the writer of the article; I LOVE howvideo.works. It helped me quite a lot when I started working on video processing. I'm still a beginner and always fall back to it when I'm unsure about something fundamental. Thanks for your work. I'll take a look at your YouTube channel.


I'm the writer of the article; thanks for your lovely comment. I skipped many essential parts of the architecture in the article to keep it concise. The following articles will be about the technical implementation of what I discussed in this one.


I've been using commercial streaming services for in app (cloudflare, bunny, vimeo), and found performance & bandwith use terrible. The HLS protocol for iOS doesnt work wel for 5-10 second clips, since it needs 1-4 seconds. Now using compress mp4 with progressive loading. Way better.


> and found performance & bandwith use terrible

Can you explain exactly what you mean?


>> Now using compress mp4 with progressive loading

What exactly does this mean?


> Video Upload / Processing Service that saves the video into chunks

At this point you also need to chose what streaming protocol you want to use. You have mostly two choices, HLS if you want to get things done quick, or MPEG-DASH if you want more control (but you'd need a separate HLS pipeline for iOS anyway…)

> Build or use a video streaming library that can play video chunks as a stream

As someone who's worked on a web streaming player, I'd strongly recommend not to build one but to use an existing one (or, in short: use HLS.js)



>> I've been building a short-form video content streaming service

What does it do?


Right now I'm basically trying to just re-create the TikTok / Youtube Shorts / Instagram Reels experience of infinitely scrolling videos.

Mostly just building for fun though.


I built a similar project, and had great results with cloudflare stream.


This is nice if you only have to deliver in one format, but as soon as you want to show up on TVs you are stuck delivering in a lot of formats, and life gets complicated quickly.

Throw subtitles in multiple languages, and different audio tracks, into the mix, and all of a sudden streaming video becomes a nightmare.

Finally, if you are dealing with copyrighted materials, you have to be aware as to what country your user is physically residing in while accessing the videos, as you likely don't have a license to stream all your videos in every country all at once.

Throw this all into a blender and what is needed is a very fancy asset catalog management system, and that part right there ends up being annoyingly complicated.


Oh, this is just the tip of the iceberg. Many parts of on-demand video streaming are largely commoditized at this point. Add in support for linear (live) streaming and ad insertion and things start to get really interesting. :)


Article made me nostalgic.

My very first job was writing streaming media services for radio. Integrating the ad services over events embedded into the stream identifying content was a pretty nifty solution. You send the metadata to your ad server, it selects the ads based on your user and media metadata, you swap out the player's media url to play the ad(s), and keep the stream going in the background. When the commercial break is up, you kill the ad stream and swap the media stream player back in. Presto changeo, seamless ads in 2007.

You also track media playback through the stream embedded events.

The hard part back then was live video/audio encoding. These cards lived at the stations, and streamed to our cdn. Geocoding for American service bases in foreign countries was interesting too, the streaming rights for stuff limits the countries you can stream in.

Monitoring the live encoders was an interesting problem - I setup an event-based solution so I could capture the audio waveform and analyze the frequency buckets on dozens of streams at once. Could tell if a stream was silent pretty quickly, and alert the station engineer via email.

Back then, a lot of this stuff wasn't off the shelf like it was today. Even on the frontend,working with Flash, window's media player sdk, and the QuickTime sdk and integrating it with Javascript was challenging. And the backends were all in php and Java 1.4, and mysql, and a whole lot of memcache.

Was a really great experience for a new professional programmer to write all that.


In 2007 and i would argue before that, live media encoding for streams was pretty much a solved problem but it was often very expensive, and out of reach of radio stations and the information wasn't readily accessible so you could potentially end up giving Microsoft or Real networks lots of money.

I was working on streaming in the late 90s - that was real wild west, had to write solutions myself, and it felt very duck taped together. Luckily it was an offshoot of a largish ISP so we had bandwidth and a lot of servers. incoming streams often used bonded ISDN lines.


My background is in linear (title VI and OTT) and ad-insertion. A big chunk of my job is explaining to folks that just because you solved VOD that doesn't mean you've solved linear. It's almost best to think of them as two distinct problems.


"fancy asset catalog management system" - was thinking about building such solution lately - do you know any open-source solutions of this kind?


I’ve built or been involved in building 4 closed source media asset management systems ( including my current place of work and one I cofounded) and you can get relatively far in the first month.

But the devil is in the details and most people that attempt it don’t even get the database models right on the first or second time, they underestimate the complexity particularly when it comes to video and struggle when the use cases widen. Document asset management systems are way easier but usually don’t understand media as in depth as a media asset management system.


I built a similar video pipeline, not on Kubernetes but using EC2 instances for those hungry FFMPEG encoder.

The system differ in that it was not user generated video content. It was coming from the cameras in our fitness studio.

Here is the article if anyone intereste to read about: https://dev.to/dvliman/building-a-live-streaming-app-in-cloj...


awesome article! curious -- why clojure?


No specific reason. It could have been built in any language. It was just the language we were using and enjoyed at that time.


Because CS Degree


OP mentions that "I would love to be a little mouse and peek at YouTube’s complete architecture to see how far we are from them." You can occasionally find posts -often linked here- from another player in streaming video which you might have heard of, discussing technical architecture. For example, this might be a little lower level that you may be interested in as it relates to kernel optimizations to jack bit throughput rates, but I dig this sort of thing -

https://www.youtube.com/watch?v=36qZYL5RlgY


I've heard somewhere that YouTube has their own transcoding hardware.


Indeed. https://blog.youtube/inside-youtube/new-era-video-infrastruc... , https://research.google/pubs/pub50300/ . (Search the paper title and you should be able to find the pdf itself elsewhere).

I'm not actually sure on balance how much transcode gets done in hardware vs software, since it's also very amenable to using batch compute that's otherwise idle. I'll guess that most or all live transcoding - streams, on-the-fly transcode into formats not pregenerated - are done in hardware, and transcoding new formats for the back catalog are probably done on a mixture of mechanisms where and when capacity is available. (Source: Googler, not on YouTube though.)


For general video streaming, Mux.com has greatly decreased my development time. Getting playback working is straightforward. And for advanced use cases, like real time editing and preview in a web browser, it works as expected and doesn’t get in the way.


Fuck K8. You literally don't need it. Maybe he needs it because he's building on google cloud.

AWS is easier, but you can do it with anything. The basic steps are:

1. Upload the file somewhere 2. Transcode it 3. Put the parts somewhere 4. Serve the parts

You should really transcode everything into HLS. It's 2023, and everything that matters supports it. If you want 4k you can use HLS or the other thing (which I keep forgetting the acronym for).

If you want to get fancy you can do rendition audio, which not everything supports. Rendition audio means sharing one audio stream amongst N number of video streams.

You can use FFMPEG to transcode, but I'd suggest using AWS MediaConvert. It's cheap, fast, and probably does everything you want. Using FFmpeg directly works, but why bother. You will get an option wrong and screw everything up. You don't want your video to not work on some random device that 50k people are using in some country you didn't think about.

He's using RabbitMQ but you should use SQS, because SQS can trigger lambdas...which means no polling required. But use whatever queue you want.

You can kick the process off by attaching a Lambda to S3, which will start the process when the file is uploaded.

You can kick your "availability activation" off by attaching a Lambda to the S3 output bucket.

Background: I help run a streaming service and built the backend pipeline.

This omits the entire "metadata management and analytics" side as well. That's left as an exercise for the user.


I would like to know what the costs come out to per minute of video encoded and how many outputs they're getting in order to compare this to something like Media Convert (AWS), Google Transcode or something dedicated like Mux.

For reference Google Transcoder is about $0.13 per minute of encoding (at four resolutions). Mux is at $0.032 / min, AWS Media Convert $0.0188.

I should note I know Mux's pricing well, use them a lot, happily. It gets a bit confusing with Google and Media Convert because I'm not sure how these costs map to the resulting bitrate renditions that get created and I've not got the time for a deeper dive to get a more straight apples to apples comparison (ignore scale discounts)

[1] https://cloud.google.com/transcoder/pricing

[2] https://www.mux.com/pricing/video

[3] https://aws.amazon.com/mediaconvert/pricing/


Full disclosure, I currently work at Mux on the video product. Previously though, I worked at an education startup with user generated video content. Like many others commenting on this thread, I built a simple queuing system using RabbitMQ and celery, transcoding on EC2 with ffmpeg. While we might have saved some money by doing this in house, we almost certainly discouraged users from uploading content because the entire video needed to transcoded before it could be viewed. For use cases like the breaking news or high traffic user generated content, you really want to minimize wait time, and that requires some kind of special sauce. At Mux, we encode content just in time for very fast publish times. It’s very challenging to do this on your own.


This post is somewhat unfairly voted down.

Cloud services like S3 and Azure Storage were invented specifically for hosting images and video. That’s their origin story, their foundation, their very reason for being.

Similarly, cloud functions / lambda were invented for background processing of blobs. The first demos were always of resizing images!

Building out this infrastructure yourself is a little insane. Unless you’re Netflix, don’t bother. Just dump your videos into blobs.

It’s like driving to your cousin’s place, but step one is building your own highway because you couldn’t be bothered to check the map to see if one already existed.

PS: Netflix serves video from BSD running directly on bare metal because at that scale efficiency matters! If efficiency doesn’t matter that much, use blobs. Kubernetes is going to be even worse.


I actually worked on system just like OP is describing and we ran everything ourselves including DBs (on k8s) so I can offer some perspective. For egress heavy services such as video minimizing outrageous cloud egress fees is key. Having an ability to tell your cloud account manager you can easily run multi-cloud or on-prem makes it a lot easier to negotiate sane rates.

Ultimately you’re competing with the likes of Twitch (IVS), youtube and cloudflare on price and they ALL run their own compute so at certain size you will have to run your own hardware to stay competitive especially now that zirp is in rear view mirror.

See Dropbox as another example of this but in documents/storage space


Depending on proprietary software such as cloud offerings for something as essential to its requirements such as encoding is not sustainable, and will create technical debt as your software/company will rely on the profitable success of the cloud service.


But even if you'd don't use AWS use something else. Any video encoder service will be cheaper and more reliable than using k8 + your own hacked together docker instance. It's literally not worth it.

Do you really want to spend your time messing around with ffmpeg and setting the correct GOP values? Or trying to create your own b-frame track?

At some point you actually need to do what all these settings are for. Until that time you should use other services.


What would you recommend using as an alternative to being locked into AWS?


Full disclosure I currently work for Bitmovin, but we work well on AWS, Google and Azure. We have some migration tools and are one of Microsoft's recommended alternatives since they are sunsetting Azure Media Services. Our Streams product is a simple, one API call solution that just works, similar to some of the things mentioned already, but we also have a pretty advanced API that can do whatever you need. We're also about to wrap up a beta for GPU acceleration with NVIDIA and make that publicly available.


Recently got burnt by Microsoft deprecating Azure Media Services, so I can understand if the author doesn't want to use a cloud offering for encoding.


If I'm understanding correctly you're suggesting using Lambda for processing. Is that a good idea? Lambda is actually expensive for heavy workloads.


I believe the Lambda would just trigger a MediaConvert process, so it would actually be doing very little.


Ah, got it, thanks.


If your video is short enough to encode in the lambda limit it is worth considering that https://aws.amazon.com/blogs/media/processing-user-generated... mediaconvert is expensive, generally AWS video stack is expensive (MediaConvert, Elemental, Media Connect)


Heck no, lambda kicks mediaconvert off.


One thing people pointed out is minimizing egress fees.

You do that by using CloudFlare, fastly, or another CDN.

You can get bandwidth costs down to < .001/gb by committing to $1500/mo in bandwidth. The CDNs will pull from S3 once, then cache it forever (assuming you do it right).


serverless on aws is underrated if your workload can fit on it. i have an app that hasn't received a single ounce of maintenance in like 3 years that still "just works", collects revenue from stripe, does all the business logic on lambdas, generates downloadable print-friendly pdf's on lambdas, etc. the supporting tech is dynamo + triggers for lambda, s3 and related triggers for lambda. but it would be hard for a non-expert aws user to fathom this sort of architecture, so i don't fault others for falling down the nih rabbit hole.


Hey, just curios, can you link your app.


Not one mention of MPEG-DASH


You need to use MPEG DASH if you are contractually obligated to use DRM. You can create an M3U8 and an MPEG DASH manifest that share the same encrypted segment files.


I believe thats what "or the other thing (which I keep forgetting the acronym for)." was referring to.


Well it's an international standard, no biggie.


While the article provides guidance on utilizing standard software and services to construct a basic video upload platform, it lacks deeper insights into advanced scaling techniques.


We’ve built a similar pipeline architecture for our product. One key thing I’ll mention is that we’re using Shaka-streamer which is a python wrapper around Shaka-packager (which in turn is a wrapper around ffmpeg). We queue our transcode jobs into a redis queue, and use k8s to scale the transcode workers based on queue volume. Lastly, as a few folks have mentioned, we have an experimental on-prem transcoding cluster with consumer grade HW that is pretty cheap.

If you’re interested in working on transcoding I’d highly recommend taking a look at Shaka-packager/streamer.


I used to work on system like this and even built the logic to use preemptible pool effectively just like OP. If I had to design it from scratch today I would use Temporal for job scheduling - their durable compute concept is perfect fit for this and we had a lot of trouble maturing the equivalent scheduling system trying to keep up with rapidly growing scale


Shameless plug here. In case someone does not want to build any of this and still stream video, you can check https://www.gumlet.com


Must be expensive to run on Google Cloud.

Also looks pretty complex.

The stabilization step presumably does a video encode …. that’s extremely expensive in terms of time, compute and money I wonder why it’s necessary.


Hello, I'm the writer of the article. Our solution gets videos from random people who present products we sent them. We get dodgy videos filmed on bad devices, and the process of contacting the user and getting him to re-upload another video in better quality is time-consuming for our team. We'd rather spend a little bit more in computing to try and save time overall. I hope this answers your question.


I wonder if it wouldn't be cheaper to run an on-prem farm of BestBuy-grade "gamer PC" for smaller scale networks like that.


Twitch used to use cheap cores with Intel quicksync and maybe still does


Slap one of these puppies in….

AMD Alveo MA35D Media Accelerator

https://www.xilinx.com/applications/data-center/video-imagin...


Ive used xilinx a fair bit for encoding. once you get past the pain of compiling your tooling for it it does speed up VOD encode significantly.


How’s the quality? I heard it was so-so and i think you can’t close your own presets


For what I have down (2k and down, main) the quality has been OK. I have also read some complaints about quality at high res, but I am a happy customer


We've been working on an alternative infrastructure and saves up to 80% on transcoding & delivery costmore affordable solution at Livepeer Studio (https://livepeer.studio/).

It uses un-utilized infrastructure around the world and incentivizes independent network operators to join the network (kind of like a 2-sided marketplace for video-specific compute).

Please sign up for a free account and check it out! We'd love to get your feedback


Not necessarily. GCP, when used correctly, can be super cheap. You also don't know the contractual deals they have with GCP.


I was thinking the same. CF on the front would improve on it but still.

Hetzner or other bare metal providers would probably be a better idea.


CF meaning Cloudflare? If you’re serving video through them, then you’re in “enterprise plan” territory. You can’t do that on the free or “self-serve” paid plans. $5k+/m depending on bandwidth needs (and if you just need a cdn to push bits, CF won’t be competitive on price—their enterprise prices are tailored for companies that want all sorts of managed services and private networking stuff)


Um. Cloudflare stream starts at $5 per month and you don’t pay for encoding only storage and bandwidth. You can serve a decent video library for $500 per month.

https://www.cloudflare.com/products/cloudflare-stream/


Ah, must’ve changed up their billing structure to provide more add-ons for the self-serve plans since last time I was dealing them them. That’s good.


Just use Livepeer for your encoding and decoding and you'll save like 95% of your cost for VOD.


Great article. The other day in my local dev group I gave a talk about something similar but much less sophisticated.


I have to ask, why bother with Kubernetes and all the associated config and pain? Why not just start a new spot instance? I can’t see any reason for Kubernetes in this architecture even though it’s the title of the post.

Also personally I wouldn’t use rabbitmq … it’s pretty heavyweight… there’s lots of lightweight queues out there. Overall this architecture looks like it could be simplified.

Also, the post doesn’t mention if the video encoding uses GPU hardware acceleration. Makes a big difference especially if using spot instances …. ffmpeg in CPU is extremely computationally expensive.

Presumably all input videos need reencoding to convert them to HLS.


Hello, I'm the writer of the article. We are using Kubernetes for our whole architecture, consisting of around 40 microservices and cron jobs. I just wanted in this article to give an example of asynchronous architecture using Kubernetes and RabbitMQ.

We are using RabbitMQ because it's my company target solution. There might better so lighter solution that would fit us but having just one for every solution is easier to maintain.

Great comment about GPU hardware acceleration for encoding, I'm going to look this up.


So Kubernetes is only in this architecture because other systems use it and its required by the parent company but not needed.

That's pretty important context.


That's not what I said; sorry if that was not clear. The parent company requires RabbitMQ, we are using Kubernetes because managing 40 microservices without it would be hell. In the article, I only showed 1 user-facing API, but it's actually multiple services, I just did not want to complicate it too much.


I believe loads of auxiliary microservices have been omitted for brevity. Of course, those also don’t require Kubernetes, but maybe they have some standardised deployment system which keeps things manageable. Don’t forget about Observability and whatnot.


Why do you say RabbitMQ is heavyweight? What queues do you consider more lightweight and what would be your go-to in a situation like this?


They might be thinking of something like ZeroMQ, which is pretty well liked: https://zeromq.org/

That said, I wouldn't call RabbitMQ that heavyweight myself, at least when compared to something like Apache Kafka.


This is what I was wondering, in the article it looks like kubernetes is just used to launch the node containers - why is the database and rabbitmq outside of kubernetes? This architecture looks like it’s been cobbled together by a junior


It is actually the opposite, it is currently considered a good practice to run stateful workloads outside of kubernetes and stateless workloads inside of kubernetes.


> It is actually the opposite, it is currently considered a good practice to run stateful workloads outside of kubernetes and stateless workloads inside of kubernetes.

Is that still true?

I wouldn't call the parent comment charitable enough, because there definitely can be some reasons for running stateful workloads even outside of containers altogether (familiarity included), but at the same time it feels like a lot of effort has been invested into making that a non-issue.

For example, how many database Operators are now available for Kubernetes: https://operatorhub.io/?category=Database&capabilityLevel=%5...

Honestly, as long as you have storage and config setup correctly, it's not like you even need an Operator, that's for more advanced setups. I've been running databases in containers (even without Kubernetes) for years, haven't had that many issues at small/medium scale.


There some of us who still perform four extra steps before putting any DB in k8s and we have good reasons.


Kubernetes loves stateless services. Zero wrong with moving RabbitMQ or a database outside of it.


Except kubernetes has a whole storage provisioning system that gives you redundancy and automatic failover, if you’re going to the trouble of running kubernetes why not just run your whole infra on it?

I run https://atomictessellator.com solo, using kubernetes, and my database, Minio object store, application servers, quantum workers, everything is all on kubernetes, it’s self healing and much simpler to run all the infrastructure the same.

Recently I had a node failure while I was sleeping and the whole system healed itself while I slept, the monitoring system didn’t even alarm me because the small blip of increased latency while the pods rebalanced wasn’t above the alert threshold so it didn’t even wake me up.

What happens in the article infra when the rabbitmq or database nodes fail? The whole system goes offline, which seems very silly setup when you have kubernetes sitting right there, who’s primary function is to handle all of this.


Rabbit and most databases have their own failover strategy. Putting it all on k8s is fine for a toy app but idk why anyone would deploy a real system like that.


OK, I can only speak to my personal projects and 20+ years experience at work.

We run all of our stateful and stateless workloads on 10+ kubernetes clusters at work in multiple datacenters in multiple continents, and we serve 500 million users a month with it.

I wrote the first BORG version of DFP backend systems at Google, where we served billions of users billions of ads a day, and we used stateful infrastructure management on some of the first container runtime systems that inspired k8s during it's development.

Using rabbit and "most databases" native fallover strategy is fine for toy projects, but when you're operating at this scale, you need automated infrastructure provisioning and all of the automated tooling around it.


There are layers to this. At the simplest level, you only have K8s people (and aren't willing to use cloud services). So you install the RabbitMQ Helm chart, hope for the best, and fix any issues that come up.

Then you get a bit worried that the Postgres Helm chart, while good, doesn't do what you want. So you update to use a dedicated clustered Postgres, using some Postgres clustering tech.

Finally, you're at so much scale you can throw giant wads of advertising cash at the problem, and you can use anything you like and it'll work. You just need to choose the best thing for your particular problem.


That is scale I can't comprehend.


What happens when your storage detaches from your k8s cluster? Your services start 503ing, hopefully, because you didn't design your system thinking that k8s == 100% uptime.


Anybody can invent random problems ad nauseam - that doesn’t prove anything.

I’m not claiming that it’s totally bullet proof, I never said that - I’m saying that if you had a kubernetes cluster anyway why not benefit from its abilities? Especially when the alternative is single node, single points of failure, which is clearly inferior.

The "what if the storage detaches" argument could easily apply to the single node VMs too, in which case the outcome would be a total system failure.

We are discussing the contrast between the articles architecture and running everything on K8s ... and I'm saying that running everything on K8s is clearly better


Yeah, what happens when someone in AWS clicks "delete database" accidentally? It's the same thing that happens when K8s blows up in some weird way. You restore from your backup. (Fun fact: deleting the instance deletes the backups!)


Again, I was replying to:

> What happens in the article infra when the rabbitmq or database nodes fail?


> Anybody can invent random problems ad nauseam

I agree; I was replying to this invented problem:

> What happens in the article infra when the rabbitmq or database nodes fail?

It makes sense if you read the reply as a reply.


I am sure it's difficult for someone to build and scale video infrastructure. A few companies are doing it for you; plug in the APIs, and you're done.

Gumlet (https://www.gumlet.com): Per-title encoding (Netflix's approach) to optimize and transcode your videos to boost engagement rates. Moreover, securing your videos is easy with digital rights management solutions paired with Widevine and Fairplay. Made for developers, by developers.

Mux: Developer-friendly video infrastructure for your on-demand & live video needs.

I love Gumlet because of their pricing and support.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: