Justin.tv's Live Video Broadcasting Architecture (2010)

glacials · on July 3, 2019

I worked at Twitch from 2014 to 2018. I was never on the video team, but here are some details that have changed.

Video:

- Everything is migrated from RTMP to HLS; RTMP couldn't scale across several consumer platforms

- This added massive delay (~30+s) early on, the video team has been crushing it getting this back down to the now sub-2s

- Flash is dead (now HTML5)

- F5 storms ("flash crowds") are still a concern teams design around; e.g. 900k people hitting F5 after a stream blips offline due to the venue's connection

- afaik Usher is still alive and well, in much better health today

- Most teams are on AWS now; video was the holdout for a while because they needed specialized GPUs. EDIT: "This isn't quite right; it has more to do with the tight coupling of the video system with the network (eg, all the peering stuff described in the article)" -spenczar5

- Realtime transcoding is a really interesting architecture nowadays (I am not qualified to explain it)

Web:

- No more Ruby on Rails because no good way was found to scale it organizationally; almost everything is now Go microservices back + React front

- No more Twice

- Data layer was split up to per-team; some use PostgreSQL, some DynamoDB, etc.

- Of course many more than 2 software teams now :P

- Chat went through a major scaling overhaul during/after Twitch Plays Pokemon. John Rizzo has a great talk about it here: https://www.twitch.tv/videos/92636123?t=03h13m46s

Twitch was a great place to spend 5 years at. Would do again.

spenczar5 · on July 3, 2019

Hi glacials :) Small correction from someone at Twitch today:

> video was the holdout for a while because they needed specialized GPUs

This isn't quite right; it has more to do with the tight coupling of the video system with the network (eg, all the peering stuff described in the article).

glacials · on July 3, 2019

Oh, edited! Thanks Spencer :)

arcticfox · on July 3, 2019

F5 storm is an awesome name for the venue blip -> refresh reaction. I've certainly contributed my fair share to your storms. It's basically automatic.

ummonk · on July 3, 2019

Yeah, and the problem is that it often does work to fix issues. It's the web equivalent of "have you tried turning it off and turning it back on again?"...

Lorin · on July 4, 2019

Maybe something as simple as a textual overlay "Don't worry - the stream will be back soon" and a script to internally hammer the only element of the page that's actually needed for a video refresh

Sahhaese · on July 4, 2019

text overlays would still be clunky, what's really needed is dynamic video spliced directly into the stream so the viewer understands it's the broadcasters' connection that is poor not the viewer's.

This could be achieved if twitch allowed broadcasters to upload a 2 second loop as a 'placeholder' for connection drops, and twitch could mix that in if it detects too many frame-drops.

It would need to be a setting though because some streamers just stream over a bad connection (e.g. Hitchhiking streams) and wouldn't want that interruption.

hatsix · on July 7, 2019

I guess this is actually being tested right now. I saw something like this last week, though I've also seen streams just drop.

spenczar5 · on July 3, 2019

Or, the live video equivalent of "retry" :)

pulkitsh1234 · on July 4, 2019

I initially thought 'F5 storm' refers to people typing "F" in the chat.

wrigby · on July 3, 2019

On my first read-through I thought 'F5 Storm' referred to F5 load balancers, not hitting the F5 key to refresh the page.

diminoten · on July 3, 2019

> - No more Ruby on Rails because no good way was found to scale it organizationally; almost everything is now Go microservices back + React front

Ugh, I just... I keep trying to pretend I don't need to learn Go, but every highly scalable system I read about that's recently been written about seems to be using it. Maybe I just need to stay away from systems that need to scale? Heh...

snaky · on July 3, 2019

The keyword here is organizationally.

Technically speaking you can build scalable systems using anything you want. But if you need to hire a couple of hundreds developers, you better go with Java 7 or Go than Ruby, Lisp or Perl. The dumber and more uniform the better.

ummonk · on July 3, 2019

The key distinguisher there is static typing. Which is how Facebook manages to make do with PHP, and Dropbox with Python. By adding top annotations for a static type-checker...

snaky · on July 3, 2019

Static typing doesn't help you much, when you have 20 teams hating each other and 20 different ways to write statically typed C++ code.

viraptor · on July 3, 2019

It would be interesting to hear a comparison from someone using https://crystal-lang.org/ at scale. It's basically Ruby + types, which would make it the closest for isolating that one feature. It can't run Rails, but there are very similar web frameworks available.

mcpherrinm · on July 3, 2019

It seems Stripe solved this problem with https://sorbet.org/ which is actually Ruby + types.

viraptor · on July 3, 2019

It misses other benefits of static typing -> being able to compile to efficient code. Also Sorbet is still not popular enough to apply to many gems - it's a lot of work to implement it at the moment.

pkd · on July 5, 2019

It's not Ruby + types, it just looks like it at first glance. The semantics are all diffrent (for good reasons).

I very much doubt that anyone is using Crystal at the scale Ruby/Rails is used yet. It's still far away from a 1.0 release and when I tried out 0.23 it had several issues, major ones for me were lacklustre debugging support and the type checker needing to load the whole AST into memory. Concurrency story was also not very strong but I know people were working on it actively at least.

Anyway, I wrote a non-trivial project in Crystal and overall really enjoyed it. Improving the debugging tooling would go a long way towards making it a real option.

rhizome · on July 4, 2019

I've heard Crystal's Ruby resemblance fades once you get past the beginner stage.

jchw · on July 3, 2019

Personally, I think it’s hugely worth learning. Aside from some eschewed defacto behaviors, Go is very easy to pick up and learn the entirety of in a week or two, because the language itself is really not that large. So I’d argue the time investment is a good one for what you get.

Still, you definitely do not need Go to scale systems. People scale Everything, perhaps most impressively PHP applications.

echelon · on July 3, 2019

Go isn't the only language that scales, it just happens to be popular amongst the scripting language crowd as a next step. You're by no means limited in your choice. You could do Java, C#, Rust...

empath75 · on July 3, 2019

The good news is that it takes like an hour to learn enough go to be productive.

apta · on July 4, 2019

Before golang was a thing, there were highly scalable systems that handled way more traffic than anything written in golang today. Those systems were (and are) written in languages like C++ and Java and C#.

You're just seeing golang in articles because of hype.

hactually · on July 4, 2019

Java and C# lag behind Golang on most performance metrics. Combine it with the awesome deployment story (single binary) and you'd be hard pressed to choose the former

apta · on July 4, 2019

I'd like to see those performance metrics. Other than that, this is not true to the slightest, not just from what I observed, but from established performance people like Martin Thompson[1]. If you watch that talk, he mentions towards the end that they ported Aeron (originally Java) to C#, golang, and C++. The Java version was the fastest out of the box, but with some work, they were able to get the C# version to be faster. I suspect this mainly has to do with value types, which is being developed for the JVM as well.

What you're probably referring to is GC pauses. The golang GC is tuned for latency, at the expense of throughput. The JVM has several GCs, and is gaining several more like Shenandoah and ZGC, which allows you to select the GC that best fits your use case. You can tune for latency or throughput.

A lot of Java deployments these days are in the form of uber/shaded jars, which is basically one jar file that contains the entire app, and run with a single command, not much different than running a binary.

[1] https://www.youtube.com/watch?v=Pz-4co8IaI8

MrBuddyCasino · on July 4, 2019

The opposite is true. Golang is on average 2-3 times slower than Java. On the plus side, it uses less memory.

LOLOLOLO1 · on July 4, 2019

No of course. It may loses at some benchmarks made by Java or Python/Ruby coders.

Two areas where it is actually slower are

1) Memory allocation 2) Regular expression performance.

But you must understand your Java app won’t have performance advantage because of faster alloc speed IRL: GC will take lots of CPU, because Go’s one is much easier at memory release. You can only see allocation advantage in micro benchmarks where the app stops before the GC will start.

apta · on July 4, 2019

> You can only see allocation advantage in micro benchmarks where the app stops before the GC will start.

The golang gc is tuned for latency at the expense of throughput, meaning if you look at the duration of time spent in GC over the course of the code execution, it would actually be longer compared to a GC tuned for throughput.

If you have a use case that requires high throughput, then you cannot change the GC behavior. Unlike on the JVM, where you have several GCs to choose from. The JVM is also getting two new low latency GCs for use cases that require low latency.

And it's not just microbenchmarks where Java does better than golang, it's especially longer running processes where the JVM's runtime optimizations really kick in. Not to mention that the JVM is getting value types as well to ease the load on the GC when required (it does an excellent job as it is even without value types).

I did a dummy port of the C# version of the Havlak code here[1] to Java, preserving the same behavior and not making any data structure changes. On the machine I tested on, the C# version took over 70 seconds to run, while the Java version took ~17 seconds. In comparison, the C++ version took ~24 seconds, and the golang version took ~30 seconds.

Yes, you could most likely spend much more time tuning the C++ version, avoiding allocations, and so on, but at the expense of readability. This is what the JVM gives you, you write straight-forward, readable code, and it does a lot of optimizations for you.

The brainfuck2 benchmark is especially interesting. Kotlin tops the list, but I was able to get Java to the same performance since Kotlin by writing it in a similar manner as the Kotlin code. Again, Java/Kotlin beat out even C++ when I tested them, and by quite a margin.

[1] https://github.com/kostya/benchmarks

barrkel · on July 4, 2019

How much CPU GC takes for any given GC implementation is largely down to the design of the application, its data structures and allocation graph.

Request / response servers which keep caches and other allocations prone to middle age death out of the GC heap are consistent with the generational hypothesis and ought to spend no more than a few (low single digit) percent in GC with a generational collector.

Xelbair · on July 4, 2019

I'll just leave this here https://www.ageofascent.com/2019/02/04/asp-net-core-saturati...

dcu · on July 4, 2019

these microbenchmarks doesn't say anything about real world use cases... but anyway, here are the latest results:

https://www.techempower.com/benchmarks/#section=data-r17&hw=...

c#, rust, go, c++, java, c, nim.. all tied at 7M.

again, this doesn't mean anything useful.

apta · on July 4, 2019

This benchmark isn't really useful as you pointed out. Microbenchmarks are always tricky, but check out the other two posts I just wrote here (about Martin Thompson and the benchmarks on GitHub) for hopefully more realistic benchmarks.

theredbox · on July 4, 2019

It's not about hype but companies using more recent languages to solve the same problem. Why did not Twitch pick Crystal or Rust or Scala or JRuby ?

pkd · on July 5, 2019

Because of the use case. Go wins if all you need is the easiest way to write services with high concurrency requirements. I expect this is true for Twitch's systems.

Crystal is still immature, Rust is more suited to use cases where you want to avoid garbage collection.

Hype is not the only factor but it makes hiring easier. And anything Google puts its weight behind will get hyped. More often than not it's better to choose a technology which suits your organisational (read: hiring) needs.

apta · on July 4, 2019

I'd argue that Java or C# would have worked out just fine for Twitch. There was a recent post on Twitch's early architecture, and it seems they started out with Ruby. Unsurprisingly, they had to switch from it once they needed performance (similar story happened with Twitter).

hajhatten · on July 4, 2019

Well deserved hype

thomastjeffery · on July 4, 2019

> Realtime transcoding

I'm curious: Do services like Twitch specify a specific desired codec/bitrate that doesn't get transcoded? Transcoding seems like a lot of effort for lower quality end result.

If I were streaming, I would want to avoid transcoding as much as possible. Since we're talking about live broadcasting, there is a unique ability for the streamer to choose the format they upload.

kd5bjo · on July 4, 2019

In the RTMP days, the highest quality setting in the viewer was always a straight pass through from the broadcaster, and the reduced versions were transcoded in the data center to fit down lower-bandwidth last-mike pipes.

slimscsi · on July 4, 2019

Same in the HLS days. What used to be called “source” was just a remux from rtmp to ts.

thomastjeffery · on July 4, 2019

Gotcha. For whatever reason, I had forgotten about lower-bandwidth copies.

PullJosh · on July 4, 2019

> everything is now Go microservices back

Excuse the simple question: When I hear "microservices", I think serverless backend. Is that right, or are they different? If they're the same, how do you stream video with serverless? (Seems like streaming, websockets, etc... shouldn't be possible in a serverless environment...)

013a · on July 4, 2019

"Microservice" describes the size and scope of each deployment artifact. It answers the question "is the whole system just one big ball, is it broken up, how broken up is it?" It doesn't describe how it is deployed.

"Serverless" describes how a deployment artifact is deployed and runs. Generally it refers to a class of technologies in multiple domains whereby intricate knowledge of the underlying host is abstracted behind a cleaner API, with things like scaling, security, patching, etc handled by an infrastructure provider. While the term rose in prominence alongside "functions as a service", which is certainly a technology that generally qualifies as serverless, there are many serverless products out there: AWS Fargate for running containers, DynamoDB for a database, S3 for object storage, all of these are "serverless". A good signal is: if I can SSH into it, its not serverless.

A microservice can certainly be deployed serverless (ECS/Fargate or Google Cloud Run comes to mind). A microservice can even refer to one or more logically related functions-as-a-service; the term more-so speaks to how the engineering teams organize their business domain into the code and how the APIs speak to each other, rather than the exact underlying technologies.

NetOpWibby · on July 10, 2019

I finally understand the difference. Thank you.

UweSchmidt · on July 4, 2019

Great explanation!

leonidasv · on July 4, 2019

Microservices are about splitting code into different servers instead of a monolithic codebase. You end up with different servers (probably virtualized) for each domain of the application.

Like, instead of having the video decoding and the analytics code in the same monolith attached to same DB, you deploy a different server for each one, generally with a new DB for each. When the services need to talk to each other, they do it via network (REST, gRPC, etc.).

glacials · on July 4, 2019

They're different. Microservices are still stateful applications that run 24/7. They are just really small in scope.

e.g. the Friends feature on Twitch is one microservice, running in its own autoscaling group, with internal APIs used by other microservices like Whispers.

discordance · on July 4, 2019

My team follows microservice patterns, and have deployed services that utilise websockets over both serverless (Azure Functions and Lambda) as well as regular hosted services (on k8s, EC2 and Azure App service etc). Nothing stopping you there. On the streaming video side we did an app that used Azure Media services + Azure functions.. works well enough.

Not necessarily a good idea, but one 'feature' of microservices is the ability to pick different stacks, languages and delivery methods on an individual service level.

stale2002 · on July 4, 2019

I work at twitch. Let me put it this way. My team that I am on (VOD) has ~8 backend engineers and we are in charge of something like ~2 dozen services.

We literally have services that are run entirely using AWS Lambda functions only.

This is a pretty big difference from teams I've worked on in the past, that have 8 engineers all working on a singular service.

"Microservices" is more of a philosophy than anything.

conover · on July 3, 2019

“+ React front” to say the least! Hope you are well, glacials.

glacials · on July 4, 2019

Hehe, great job on that Chris :)

cantbecool · on July 4, 2019

Didn't Twitch use elemental machines for transcoding?

grogenaut · on July 4, 2019

No. Elemental is more of a high end encoding system for quality. Twitch is more about bulk cheap transcodes of good quality. Think about it. MLB has maybe 18 concurrent events. Twitch is running minimum in the 10k range.

Dobbs · on July 4, 2019

No we never had Elementals. In the early days there was no way we could afford them. In the later days I don't think we would want them as we needed to scale so many transcode jobs that it was easier to have a large farm of dumb machines to organise jobs across.

There may have been an element machine at one point that was used for testing/playing but I really don't think so, and know there wasn't one between 2010 and 2017.

kd5bjo · on July 4, 2019

Transcoding was a relatively late addition to the whole system— for a long time we only passed through the original video bits unchanged and tried to advise broadcasters about picking compromise settings.

By the time we decided transcoding was necessary, we had enough in-house video engineering knowledge to build our own system integrated with everything else.

Dobbs · on July 4, 2019

We had transcoding as early as 2011. As that is when I made my first commits to the video jobs codebase, specifically to the transcoding jobs. It was quite late when we had the resources ($$$) to provide widespread availability of transcodes to the community.

kd5bjo · on July 4, 2019

Yeah; my perspective on “late” is probably pretty skewed, since I left as the rebranding was still being developed. I think the favorite new name when I left was something like Xarth; Twitch was a much better choice.

zemnmez · on July 3, 2019

miss you!

glacials · on July 3, 2019

Miss you too buddy! Hit me up when you're in Seattle next.

zemnmez · on July 3, 2019

of course :) will be easier when I've relocated to sf

NightlyDev · on July 4, 2019

"F5 storms" are easy to handle. Intercept all keypress combinations for refresh and do what you want with it client side. (spread it out over time, use a high-performance endpoint to check if live or a combination)

Most people doesn't use the refresh button in the browser, so only a small amount of traffic will be uncontrolled.

jrockway · on July 4, 2019

Do you have any data to support that? I personally don't have an F5 key on my keyboard (it requires pressing a modifier), so I pretty much always click the reload button to fix a stream blip. The impression I get from reading Twitch chat is that most people are using mobile. I doubt they have a keyboard plugged in and press F5 to refresh.

That said, you certainly don't need your video streaming servers to handle those hundred-thousand refresh requests.

squeaky-clean · on July 4, 2019

Yeah anecdotally every non-tech person I know clicks the refresh button.

It also doesn't catch people watching on mobile web who refresh. And I don't believe their mobile app has a refresh button, whenever a stream glitches out for me I force kill and reload the mobile app.

cortesoft · on July 4, 2019

You have any data on most people not using the refresh button in the browser?

windowshopping · on July 3, 2019

I can barely follow along with this, it's very technical. I can't imagine how Kyle Vogt acquired the necessary knowledge to make this work. Example:

> The point of having multiple datacenters is not for redundancy, it's to be as close as possible to all the major peering exchanges. They picked the best locations in the country so they would have access to the largest number of peers.

This is the kind of thing where I would have to hire some kind of network engineering expert, and he just figured this stuff out and made it work? I can't fathom other people's intelligence sometime.

jedberg · on July 3, 2019

He leveraged the YCombinator network to absorb a lot of information quickly. For example, I taught him basic networking (routers, switches, multicast/anycast, AS numbers, etc). I shared my 10 years of knowledge with him in a single two hour session because he's a genius, and then he ran from there, vastly exceeding my knowledge. I was there because Emmet asked Steve and Steve asked me to go over, and I was happy to help. I'm sure I wasn't the only one.

karambir · on July 4, 2019

Like other sibling, I would be very much interested in a talk like this. Teaching networking with real world examples and explaining 2-3 large scale architectures w.r.t networking. Maybe a long video(or small series) and a follow-up on twitch for Q/A. Would even pay for this.

nickpsecurity · on July 4, 2019

That's pretty nice. Quick look at articles show Emmet was the CEO. Which Steve is this? Huffman at Reddit?

jedberg · on July 4, 2019

Yes Steve Huffman

wslack · on July 4, 2019

This is the talk I want to hear sometime.

throwaway9876a · on July 4, 2019

> I can’t imagine how Kyle Vogt acquired the necessary knowledge to make this work.

I've worked on projects with Kyle, and he often goes into bulldozer mode. It is no surprise to me that Kyle could "learn" all he needed in order to get something like this set up (or at least learn enough to orchestrate a small group in constructing it). Kyle is, by all means, a "force of nature" as YC tends to define it.

The downside to Kyle's optimism is that he often has very little concern for the humanity of others. He can set up decent optics around his actions and decisions in the wake of what many might consider failures, but he has consistently abused those who try to give him good-faith constructive feedback and often brought co-workers to tears. This is all well-documented at least through the past 4-5 years. (Kyle does actually explicitly ask for "direct" feedback btw. He just is only capable of handing the feedback on a periodic, weekly or monthly basis).

A key lesson of this article (and in glacials post above) is a testament to what can be achieved very quickly if technical debt is of minor concern. Kyle's key strength is in building a proof of concept that supports rapid iteration. This point appears to be something the Justin / Twitch teams did very well.

A secondly lesson is in getting alignment among diverse engineers. Think about how the team might have debated the architecture presented. Think about how some of the choices might rub people the wrong way.

Finally, Kyle is a unique character in several ways but is not alone in possessing a transient "bulldozer" mentality. If you see yourself having the same pattern of behavior, get help before others get hurt. There are a variety of mitigations that can help, but they need explicit participation.

kd5bjo · on July 3, 2019

> I can’t imagine how Kyle Vogt acquired the necessary knowledge to make this work.

By this point in history, it wasn’t just him anymore and we’d done a few rounds of improvements already out of necessity. As I recall, he got us up and running at PAIX based mostly on research, but most of the other data centers were built out by a network engineer(1) we hired away from YouTube.

While he was working on the network engineering and keeping the original system afloat, I did a lot of the software work for the system described here.

(1) Name withheld out of courtesy

kvogt · on July 4, 2019

True. Kd5bjo is an incredible engineer.

kd5bjo · on July 4, 2019

I really should thank you for the opportunity— not many people would let a new college hire spend months on a green-field rewrite of something so core to the company.

scurvy · on July 4, 2019

CC? Dude's on webcasts with GigaOM. He was in customer sales roles with a VAR. He's not shy.

smcl · on July 3, 2019

Don't be so hard on yourself, it's pretty common to read blog posts like this and come away with the idea that a super smart person took one look at the lay of the land and leapt directly from problem -> solution in one neat step. What you don't see is the people the talked to about the problem, their back-and-forth spitballing ideas, the various googling to see if there's a standard approach ... and most importantly you don't see any failed attempts.

Bear in mind that I don't think this is some deliberate attempt to appear superhuman, I think it's just accidental

joefourier · on July 3, 2019

I'm sure if you tried building your own livestreaming or VOD service, you'd come up with similar solutions and insights. Peering problems are fairly obvious - put up a gigabit server in Germany and try livestreaming high bitrate video to a highspeed connection in San Francisco (or vice-versa), and watch as you run into problems despite having more than enough theoretical bandwidth.

When your users start to complain, you tend to develop the domain knowledge necessary to solve their problems pretty quickly.

drwl · on July 3, 2019

Justin Kan goes a bit into Kyle's accelerated learning - https://www.youtube.com/watch?v=YzyatiQrQlQ

arcticfox · on July 3, 2019

Pretty exceptional indeed. Also impressive that he was able to grow from founder-stage tech to that scale, since they're largely different problems.

Especially back in 2010. I feel like I'd have a much better shot of being able to figure out that scale these days than a decade ago. (If I spent my free time studying and not watching Age of Empires 2 on Justin.tv/Twitch).

scurvy · on July 4, 2019

If that was actually the case, why the f did they have a boatload of gear in 200 Paul? There's almost no peering exchange there whatsoever (until SFMIX about 3 years ago). Can think of a lot better connected places in the Bay.

kd5bjo · on July 4, 2019

It’s one of the reasons we moved out of there. Moving day was an ... interesting experience: lots of planning to minimize downtime, and everything that was actually planned went relatively well. Unfortunately, what we thought was a 90% plan turned out to be more like 50%. Several people pulled all-nighters on that one.

At the time PAIX had a reverse-billing setup: the more data you transferred, the cheaper your connection charge was; we managed to get all the way into the cheapest billing tier within the first billing cycle which was basically unheard-of at the time.

scurvy · on July 4, 2019

You didn't move out of 200 Paul until long after you were acquired though. I always wondered why you were there, as there are better connected facilities and those that had cheaper power. Was TelX a hedge or something?

kd5bjo · on July 4, 2019

I was long gone by the time the acquisition came along; looks like I was remembering the move into 200 Paul from somewhere downtown (Spear, maybe). The primary motivation was running out of space, but it also coincided with our first forays into peering. I never went deep into that side of things, so I can’t say for certain what the particular arrangements were.

kvogt · on July 4, 2019

Building this was really fun, and I’m very proud of what kd5bjo, Emmett, and many others did to help turn Justin.tv/Twitch into what it is today. We found a way to make what was fundamentally an unprofitable business (if you relied on CDNs) work by relentlessly focusing on reducing cost to the absolute bare minimum through good technology choices and innovating when necessary. Justin.tv would have died otherwise.

JoeOfTexas · on July 4, 2019

Thanks for sharing the information! Even if its 9 years old now, its very interesting for us dreamers.

dang · on July 3, 2019

Discussed at the time: https://news.ycombinator.com/item?id=1196264

kd5bjo · on July 3, 2019

I was the primary architect for Usher and the server-side of the video system described here. I’m happy to answer any questions, assuming I still remember the answers.

loljjkm8 · on July 3, 2019

Why did you call it Usher and not T-pain

kd5bjo · on July 3, 2019

I don’t get your reference, but we chose the name Usher because it’s the software equivalent of the person at the theater who looks at your ticket and shows you where your seat is— it doesn’t actually handle any of the video data, it just knows where you should go to find it.

spydum · on July 3, 2019

Surprised Microstrategy didn’t sue you guys (or vice versa). They built some access / authorization tool named Usher which seemed to be a total flop (why a BI company was even toying with this perplexes me)

kd5bjo · on July 3, 2019

It was/is an internal code name that never shows up except in URLs requested by the player and highly technical articles like this one; I’m not a lawyer, but I expect they might have a hard time showing consumer confusion.

Nition · on July 3, 2019

Slightly off topic but this is my favorite justin.tv video, and also shows something of the website and how early in the days of live streaming we were back then: https://www.youtube.com/watch?v=BqgEm8XWXu8

64738 · on July 4, 2019

Thanks for that! I hadn't seen it before, that's pretty funny.

joefourier · on July 3, 2019

"Live video can't be made by pushing video faster, it takes a completely differently architecture."

Funnily enough, that's pretty much how HLS, the modern live video standard, works - it's essentially a series of tiny video clips loaded & played one right after the other, distributed through the same CDN as normal video files.

Thanks to HLS, live video is actually much worse than it was 10 years ago with RTMP in terms of latency. There's been some recent efforts in getting it down, although they're generally not standardised, hard to scale (e.g. WebRTC) and or a bit awkward.

spenczar5 · on July 3, 2019

They're starting to become standardized. Apple announced LHLS, a "low latency extension" to the HLS standard, https://developer.apple.com/documentation/http_live_streamin....

I don't think anyone really follows Apple's spec for various technical reasons, though. Most do some sort of chunked-transfer encoding, along with pre-signaling segments in playlists, as outlined by the Periscope folks here: https://medium.com/@periscopecode/introducing-lhls-media-str...

lilyball · on July 4, 2019

> I don't think anyone really follows Apple's spec for various technical reasons, though.

I would have assumed it's because Apple only announced this 4 weeks ago, and the only clients that support it are beta software.

slimscsi · on July 4, 2019

None follow apples spec because it’s a month old. It’s not for technical reasons. Like it or not (TBH I’m in the fence about it), it WILL be the standard.

zemnmez · on July 3, 2019

kind of? i think this is more about live replication infrastructure than the video carrier itself. with youtube, you can set up some CDNs, but with livestreaming you need to be continually ingesting and spitting out content at the same time to lots of places in the world at once

kd5bjo · on July 3, 2019

The video carrier can have a lot of effect over the replication properties, though. HLS is essentially a playlist of video URLs that the client fetches and stitches together, as well as refreshing the playlist to get the names of new chunks. Without an extremely specialized web server, each chink needs to be complete and published before adding it to the playlist, putting a lower bound on the overall latency of the chunk size.

RTMP, on the other hand, maintains a live socket between the server and client, and the server can forward each packet as it becomes available.

SaltyBackendGuy · on July 4, 2019

> They also don't have chats with the 100,000 people watching a channel. What they do is assign people into rooms of 200 people each so you can have a meaningful interaction in a smaller group. This also helps with scaling. I thought this was a pretty clever strategy.

Just curious if this is still a thing. I've watched an unhealthy amount of Twitch (not all with chat open) and never noticed this.

savanaly · on July 4, 2019

I'm sure it's not. Often, the streamer is reading the chat onscreen while streaming, and it's always identical to the one I'm seeing, except perhaps delayed by some seconds.

doomjunky · on July 5, 2019

Maybe twitch creates an illusion by showing the streamer only a subset of people that is shared among all other groups. This allows everyone to see what the streamer sees and to communicate within their group.

mlex · on July 4, 2019

This is definitely untrue for present-day Twitch. When you're in a chat room, you're in the chat room with everyone.

augbog · on July 4, 2019

Really cool reading this when I just started 2 weeks ago and I was perusing the Video Architecture docs. It's cool to see how far Twitch has come :) thanks for sharing

olah_1 · on July 4, 2019

If I started a Twitch (or Discord) competitor today, I would use Janus as a base for web-rtc to share the load https://janus.conf.meetecho.com/

In fact, Evasyst is a blend of Twitch and Discord and uses Janus for this reason. https://evasyst.com/

slimscsi · on July 4, 2019

You would look at how much that costs and make a different decision. It can’t scale economically to Twitch size without massive amounts of investment into POPs (source: I built the HLS transcoding stack/infrastructure and HTML5 player at Twitch). A voice only service, sure because it’s 100 times less bandwidth.

kd5bjo · on July 4, 2019

Granted, we made the same evaluation about Wowza a decade ago, but it turned out to be useful as a small piece of a large system. Given how much Twitch has grown since then, no off-the-shelf system would be able to handle the scale.

A brand-new competitor, though, won’t have to deal with those scaling problems for a while. They have to figure out product-market fit first, which probably can be done off-the-shelf these days. And then they’ll need to figure out how to keep the lights on, which will take a similar amount of engineering work as it did for Justin/Twitch.

ec109685 · on July 4, 2019

Twilio is another option to get started since they can handle a lot of the heavy lifting: https://www.twilio.com/video

olah_1 · on July 7, 2019

Why pay Twilio for web-rtc though? Aren't there enough open source web-rtc wrappers like Janus?

simlevesque · on July 3, 2019

Wowza is a great product. If you want to build a huge custom VOD + live boradcasting product, it is a great way to do it.

person_of_color · on July 4, 2019

How would an IC SWE find a rocket ship startup like this in 2019?

cbzehner · on July 4, 2019

Check out https://breakoutlist.com/, it's designed for answering just that question - with a bias towards Silicon Valley hype.

If you're interested in Atrium, reach out to me (see profile). We're hiring!

person_of_color · on July 5, 2019

Awesome resource. Time to send some resumes over the weekend.

Who's with me?

OedipusRex · on July 3, 2019

I miss the Justin.Tv days. Made a lot of friends there.

jrs95 · on July 4, 2019

Some parts of Twitch are still kind of like that. The Classic WoW community especially has been amazing. Met a lot of people there that I play games & just hang out in Discord with pretty much every day.

OedipusRex · on July 5, 2019

Twitch spun out of Justin.tv and then they killed Justin.tv :(

PaulCroweRule34 · on July 3, 2019

Hey friend

imjustsaying · on July 4, 2019

Can anyone recommend some stacks for hosting video streaming?

For 10k concurrent stream receivers in the US and Europe, what would you be looking at to get the stream under 30 seconds lag time?

I realize this is a hard problem but would like to explore it.

foobar_ · on July 4, 2019

I used to think high scalability was something to aspire to. Now I think it's all bullshit.

The web is a terrible architecture for videos. Torrents solved this problem. If you trying to make a camel walk with two legs, you don't deserve praise but a tomato.