Java at Alibaba [pdf]

_old_dude_ · on Oct 9, 2017

There was a presentation with more info at last JVM Language Summit https://www.youtube.com/watch?v=X4tmr3nhZRg

The scary things is that they have implemented coroutine (generalized async/await) by copying the stack back and forth instead of putting the stack in the heap like in stackless python or in scheme.

tuyguntn · on Oct 9, 2017

For non-experts in this field could you please explain why its scary if it works great for their use cases?

titzer · on Oct 9, 2017

Putting it in the heap would generally be safer, as there are fewer special cases to worry about in the garbage collector. Making a copy of the stack and managing it "manually" probably means a lot of special scanning logic in the GC.

srean · on Oct 9, 2017

Could you please explain which stack you are talking about here. If I am not wrong, Java language semantics does not have stack allocated variables, hence my confusion. I sampled the video at random offsets but could not find the location where coroutine implementation was discussed.

noelwelsh · on Oct 9, 2017

It is the call stack that is being copied. It doesn't have to do so much with stack allocated variables (which may be created in Java according to the whim of the compiler) but with capturing the current state of a computation in a relatively light weight manner, so that light weight threads / coroutines can be saved and restored without the context switch and memory overhead of a full thread. This is basically the old "green threads" implementation used in the early JVMs.

The big idea behind this is the concept of a continuation, and a conversion between programs in "direct style" (the way you normally write them) and "continuation passing style" which makes control flow explicit. You could start reading here if interested: http://matt.might.net/articles/by-example-continuation-passi... Or read about how Scheme compilers work to support first class continuations, and the tricks you can play with them.

srean · on Oct 9, 2017

That makes sense. I wasn't sure which stack parent was referring to, stack of java activation records, the underlying C call stack of a JVM implemented in C.

aardvark179 · on Oct 9, 2017

Java has a stack (it uses a stack based bytecode after all) which is used to hold arguments and so forth, and it has local variables whose values live on the stack. What it doesn't have is object allocation on the stack - so if a local variable is a reference type then the object it refers to will be created on the heap.

Note the above only refers to the logical view of the language. A VM and JIT will have an ABI for method calls which may use registers to pass arguments (it may have several for interpreted and JITed code), and it can allocate objects purely on the stack if it can do good enough escape analysis.

_ivvf · on Oct 10, 2017

It's also far far slower performance-wise to put it on the heap. GoLang also does stack-copying with their continuations, and it's probably the best implementation strategy we have right now.

There were a number of interesting projects at Da Vinci, including full TCO. I wonder how much further along Java would be if more of those projects got merged into mainline.

sago · on Oct 9, 2017

Either I'm being very dense, or something doesn't quite make sense about these numbers.

The slides say that on the busiest day of the year, they were serving around 175,000 transactions per second (126k of which, or maybe additionally, were payments).

On the next slide it says they have 'millions of JVMs' running an 'insurmountable number' of requests.

Is each JVM really taking 10s+ per transaction? Even assuming those JVMs are split between servers, messaging, caching, database, etc, I don't get why they need millions of them to handle c. 200k transactions per second.

Needless to say, I am not DevOps, so feel free to set me right.

tanilama · on Oct 9, 2017

Each transaction is likely composed of hundreds of requests to various of subsystems, and each of them might have vastly different constraints. I don't think the metric of average transactions across all JVMs they are running is a meaningful metric.

jjirsa · on Oct 9, 2017

There’s probably a 10:1 or 100:1 page view to transaction ratio, and a 10:1 or 100:1 api request to page view ratio

200k transactions is a crazy number - a normal second for amazon is probably a few thousand orders (4-5k). This is 20x higher.

jjirsa · on Oct 9, 2017

This tweet suggests I’m probably off by quite a bit (overestimating amazons orders)

https://twitter.com/joelcomm/status/917229394212675584

That’s $3400/second in transactions for amazon, and certainly their average price/transaction is more than $1:1

flukus · on Oct 9, 2017

Probably for non-transactional stuff I'm guessing. For every transaction there are probably 100 reads from various sources. Not to mention all the other stuff, from invoicing to reporting.

twic · on Oct 9, 2017

With microservices, anything is possible!

peoplewindow · on Oct 9, 2017

Alibaba does a lot more than sell stuff on single's day.

Cyph0n · on Oct 9, 2017

Impressive stuff. They have basically forked their entire backend stack, starting from the OS and moving upwards. I am particularly interested in how much AliOS deviates from mainline Linux.

dxhdr · on Oct 9, 2017

> They have basically forked their entire backend stack, starting from the OS and moving upwards.

Isn't this pretty common at the $XXX billion dollar tech companies like Google, Facebook, etc?

Cyph0n · on Oct 9, 2017

Most likely, yes, but that doesn't make it any less impressive

baybal2 · on Oct 9, 2017

>Alibaba Runs Millions of Custom JVMs

>They have basically forked their entire backend stack, starting from the OS and moving upwards.

I doubt this being a cool thing. Sure, they are the second biggest IT employer in the country, and have excess resources for everything, but...

A thing about big dotcoms - architects there try to use off-the-shelf software for everything, even if the software is clearly unsuited for the task and its use will require hacks and massive re-engineering.

Putting efforts to use of off-the-shelf software without modifications and hacks greatly reduces all aspects of infrastructure support burden.

BUT, attempting to use of off-the-shelf software everywhere at all costs, that will of course give you problems. And the bigger your are, the worse this is. Read the article from a month ago how Alibaba got stuck with using MySQL for mission critical tasks, and how much efforts they put to "unhack" it

vincnetas · on Oct 9, 2017

> "Read the article from a month ago how Alibaba got stuck with using MySQL for mission critical tasks, and how much efforts they put to "unhack" it"

Do you have a link for that?

disposable_123 · on Oct 9, 2017

Can you add a link to the article (Alibaba/mysql/etc ?!)

baybal2 · on Oct 9, 2017

A video from Alibaba's mysql specialist, http://www.highload.ru/2015/abstracts/1915.html

yegle · on Oct 9, 2017

I worked for Alipay, a subsidiary of Alibaba, around 2010. At that time, the tech stack was lagging behind compare to most silicon valley companies. But it was 7 years ago so things might changed since then.

rqs · on Oct 9, 2017

It's certainly evolving.

Alibaba (and as well as other large online companies) had a huge leap during recent the Mobile Age as mobile phones makes online services more accessible to the public.

And government also enforced (impliedly) mobile carriers (China Unicom, China Mobile and China Telecom) to fueling this trend.

All this makes companies like Alibaba become very rich.

Plus, Alibaba's sites like taobao.com needs to handle tremendous amount of traffic in an average day, and even heavier traffic during promotion events like Nov 11 (Double 11) Day (It's like Black Friday[0]).

They had motive and resource to improve their system, and they opened some of those improvements already on GitHub[1]

[0] https://en.wikipedia.org/wiki/Black_Friday_%28shopping%29 [1] https://github.com/alibaba

wrinkl3 · on Oct 9, 2017

> I am particularly interested in how much AliOS deviates from mainline Linux.

What could be the benefits of not using the mainline Linux for a stack like that?

Cyph0n · on Oct 9, 2017

If I were to guess, the benefits would result from making very "tight" optimizations to e.g. networking or scheduling that are specific to the workloads they deal with regularly.

Remember, Linux is designed to run on as many machine configurations and support as many workloads as possible. To achieve this, kernel devs sometimes cannot adopt the most optimized implementation because it may lead to unforeseen corner cases.

On the other hand, a company like Alibaba knows exactly what kind of machines and workloads they are dealing with, which means that they can make those optimizations and gain a slight performance boost relative to mainline Linux.

nullnilvoid · on Oct 9, 2017

Alibaba is one of the heavy users of JVM. They have done some amazing engineering work on jvm.

boulos · on Oct 9, 2017

Cool presentation! I'm not sure why the submitter editorialized this title, instead of the straightforward "Java at Alibaba" title from the presentation.

ninjakeyboard · on Oct 9, 2017

Java is well suited to very large projects only because developers are "commodity." With the modern language updates, the core java collections library is really broken for modern multi-paradigm programming. There are no true immutable collections in the core lib which makes Java really show its age with the newer language features compared to other more modern languages such as scala. Functions are bolted on as "Single Abstract Method" classes, which make functions essentially syntactic sugar instead of a first class language feature. Java9 makes some improvements but I wouldn't consider using Java for any new projects unless I was at a big enterprise that can only hire really cheap developer and needs to hire a whole lot of em really fast.

koevet · on Oct 9, 2017

I fail to see the connection between cheap developers and Java not being a truly functional language (which it never claimed or tried to be).

andonisus · on Oct 9, 2017

What kind of applications are you building that really require immutable collections and functional programming? In my opinion, Java is best suited for running web servers (and thus microservices), where the work doesn't really require any functional programming at all.

erokar · on Oct 9, 2017

Web servers' performance benefit from concurrent programming models. Concurrent programming is certainly possible in Java, but Java/JVM is not particularly well suited for it compared to other languages/VMs, e.g. Erlang.

andonisus · on Oct 9, 2017

The concurrency model you use is heavily dependent on the scale of your application. Java is just fine for the typical web server stack.

ram_sankar83 · on Oct 10, 2017

Inferences: Java developers are dumb, commodity. Scala/Haskell devs are smart and get things done in 100x speed. Conclusion: Ignore this comment

ninjakeyboard · on Oct 11, 2017

Commodity doesn't mean dumb. There is a general set of skills that most java developers have, and there are a lot of people that have them. So you can replace a developer with another one fairly easily. It's much harder to do with scala. And much harder with elixir or haskell.

knightofmars · on Oct 9, 2017

In a "not big enterprise" environment, what would you choose instead?

ninjakeyboard · on Oct 10, 2017

I'm a scala engineer. I also work with elixir a bit but Scala is preferred.

purplezky · on Oct 9, 2017

Does anyone know if their zprofiler is in the public domain?