Google Team Refines GPU Powered Neural Machine Translation

visarga · on March 24, 2017

TL;DR - It's the code from "Massive Exploration of Neural Machine Translation Architectures", where they run extensive hyperparameter search using 250K hours of GPU time.

Direct GitHub link: https://github.com/google/seq2seq

Arxiv: https://arxiv.org/abs/1703.03906

rawoke083600 · on March 24, 2017

I'm surprised they used k80 and not something like 1080*. Surely they don't need the double precision-performance ? Am I missing something?

jamesfmilne · on March 24, 2017

GTX series cards don't fit in 1U/2U/3U rack servers very well due to the design of the coolers. The Tesla cards are available in passively cooled configuration (just a big heatsink) which allows them to be cooled using the existing fans in the chassis.

Facebook and Microsoft have their own 8-way GPU server designs that use a completely different connector for the GPU (SMX) instead of normal PCIe slots, again because of the motherboard/chassis design. (SMX is also used to expose the NVlink bus).

https://www.nextplatform.com/2017/03/17/open-hardware-pushes...

acidtesting · on March 24, 2017

Companies like google prefer nvidia tesla data centre cards rather than the GTX 1080. Also, if I remember correctly nvidia k80 has two GPUs in one card along with 12 Gb per GPU for a total of 24 Gb. These servers have 8 nvidia K80 per system. A lot of compute power.

tcwc · on March 24, 2017

Using a Geforce card in a datacenter or "GPU cluster" setup voids the warranty, which might be a factor for them at scale.

kuschku · on March 24, 2017

That's invalid in the EU — maybe Google should move their GPU clusters to the EU instead.

happycube · on March 24, 2017

The 'rules' for data centers are kinda different (also why the DGX-1 can't be used either) - cooling is very different and having ECC is a good thing.

It wouldn't surprise me if nVidia cuts favorable bulk deals on the K80, as well.

microcolonel · on March 24, 2017

Seems it mishandles the difference between Premiers and Prime Ministers.

nl · on March 24, 2017

A few people have already pointed this out, but it is the human translation which renders him as Premier Trudeau. The GNMT version renders him as Prime Minister Trudeau, and talks about the meeting between the two premiers.

I don't read Chinese, so I can't tell what the original says.

a-priori · on March 24, 2017

Really, there isn't much of a difference. The term premier is merely an abbreviation of the French premier ministre ('first minister'), and in French they don't distinguish between the roles -- the Premier of Ontario is called le Premier ministre de l'Ontario in French, just like the Prime Minister of Canada is called le Premier ministre du Canada.

In modern Canadian English it's conventional to use 'premier' to refer to provincial heads of government and 'prime minister' to refer to the federal head of government. This hasn't always been the case: until 1972, British Columbia referred to its provincial leader as its Prime Minister and before that other provinces did the same.

Old_Thrashbarg · on March 24, 2017

It seems the human mishandled this as well calling Trudeau a premier.

visarga · on March 24, 2017

Word sense disambiguation is just one of many factors that go into machine translation. What matters is average human evaluation.

ilaksh · on March 24, 2017

In what language pair going which way

microcolonel · on March 24, 2017

In English, Justin Trudeau is not the premier of Canada, he is the Prime Minister. The problem with the translation is that the machine translator would need to know specifically about Prime Ministers as a linguistic category, and know that Justin Trudeau is in that category. The Chinese side of the translation does not encode the difference.

HappyTypist · on March 24, 2017

With enough data the Chinese side can encode <head of state> [Justin Trudeau] -> <head of state/premier> [Justin Trudeau]

yorwba · on March 24, 2017

In fact, the Chinese text says 加拿大+總理+杜鲁多 which is Canada+<head of state>+Trudeau. That's apparently enough context for the machine translation to pick up on, producing the correct "Prime Minister" title.

The only flaw in the GNMT version I can find is that 此行, "this visit" is left untranslated. (But I'm not a native Chinese speaker, so there might be more.)

braindead_in · on March 24, 2017

Why not use DGX-1? Surely Google can afford a bunch of these.

Sephr · on March 24, 2017

Because the DGX-1 is insanely overpriced. Just because you can afford something doesn't make it a reasonable purchase.

mtgx · on March 24, 2017

Last I checked (around the time Nvidia launched it), the DGX-1 seemed "perfectly priced" against Intel's systems. As in they seemed to have priced it in terms of how many Xeon chips you'd have to get to have equivalent performance.

I guess that's one way to go if you value profits above anything else, but I think Nvidia is making a strategic mistake. If it doesn't have to price them that high, then it shouldn't. Because otherwise it's just leaving a bigger opening for Intel to enter the market in a big way.

I think a better long-term strategy would be to make it as hard as possible for Intel to enter the market, and one way to do that is to have reasonable but yet quite aggressive pricing for its GPUs.

Unlike Nvidia, Intel is going to be pressured to recoup the $31 billion it put into Altera and Mobileye, so even if Intel will match Nvidia's new prices, Intel will struggle to recoup that money, while Nvidia can do just fine continuing to sell its GPUs.

kortex · on March 24, 2017

And yet, (from what I've read) DGX-1's are getting snapped up faster than Nvidia can make them. If you have a >6 month wait list, and people still want them at that price, then supply & demand suggests you may even be under-priced (of course Nvidia wants that market saturation).

braindead_in · on March 25, 2017

1. Clone DGX-1

2. Rent it out.

3. Profit!!!

I would pay for cloud DGX-1 servers.