The end-to-end refresh of our server hardware fleet

cortesoft · on March 11, 2017

I really dislike this sort of naming scheme (Bryce Canyon, Honey Badger, Mono Lake, etc)

The names tell you nothing. You can't tell which one came before which, or even what they are. You just have to KNOW that information. A good naming scheme tells you information about the thing named.

wmf · on March 11, 2017

Historically, code names were chosen specifically to give up no information and it seems like that tradition continues, perhaps unintentionally.

ozim · on March 11, 2017

I don't see the problem, I work with naming scheme like XYZ1530 and for knowing which server is which or what is installed we have documentation, name is only reference to find information in docs.

So if you work daily with server you know by heart what is on it, if not any "descriptive" name would only mislead you, because probably stuff changed a lot since naming.

I think the same for hw components, you have to look it up anyway in documentation, because some dimension could be changed after a year.

walshemj · on March 11, 2017

But names like XYZ1530 are harder to recall and these are name for a class of server

erobbins · on March 11, 2017

I work here (FB) and I completely agree. I can never keep the names straight.

noir_lord · on March 11, 2017

100℅ agreed, drives me crazy with Ubuntu, I have to google the names to get the versions.

Never understood what was wrong with 16.04.2 vs Xenial Xerus (had go google the xerus part just now).

rrdharan · on March 11, 2017

The Ubuntu names are chronologically in alphabetical order.

DashRattlesnake · on March 11, 2017

That doesn't really make it any better. Eclipse does the same thing (well they have since they ran out of Galilean moons), and I always end up needing to consult a chart to parse its version scheme: https://en.wikipedia.org/wiki/Eclipse_(software)#Releases

We already have a well developed and widely understood system for naming items in a sequence, it's called "numbers." I suggest people stop trying to be clever and use it.

chrisseaton · on March 11, 2017

Is writing ℅ instead of % some kind of meme? I'm seeing it everywhere all of a sudden, but I can't imagine you can type such an obscure character by accident and so frequently. Wondering if I'm not getting the joke, or if it's some kind of obtuse political or technical statement about something.

_ihaque · on March 11, 2017

At least some Android keyboards (Gboard) put % and ℅ near each other on the same symbol page, and it's easy to mistake one for the other if you don't look closely.

mmastrac · on March 11, 2017

They are on opposite sides of the keyboard, but on the same symbol page. But yeah, if you aren't looking closely, it's very easy to hit the wrong one. Surprised this hasn't been noticed by the Gboard team yet.

chrisseaton · on March 11, 2017

I can't understand why the c/o sign would be considered important enough to go on a phone keyboard, let alone on the same page as percent. Do people often write c/o? I might write it on a physical letter once or twice a year. Is it a cultural thing somewhere?

httpitis · on March 11, 2017

And then again, if people need to write c/o they could actually write... c/o.

noir_lord · on March 11, 2017

Exactly that, I posted from my phone.

squeaky-clean · on March 11, 2017

On my keyboard at least, if I long press % it uses the other one, and it's hard to notice when proofreading.

XorNot · on March 11, 2017

The idea is to avoid people assigning bias: I.E. If something is said to be Mk 1 and Mk 2 people are likely to desire the Mk 2 despite having no practical basis for that.

LeifCarrotson · on March 11, 2017

The whole point of refreshing the hardware fleet from Mk 1 to Mk 2 is that it has a practical basis that they will benefit from.

> Big Basin can train models that are 30 percent larger because of the availability of greater arithmetic throughput and a memory size increase from 12 GB to 16 GB. In tests with popular image classification models like ResNet-50, we were able to reach almost 100 percent improvement in throughput compared with Big Sur

Mk 2 is better than the Mk 1 in several important ways. They're not creating Mk 2 for no reason!

georgyo · on March 11, 2017

One of those reasons could be cost or speed of production. Mk1 could be better the mk2, but mk2 is easier to make.

hkmurakami · on March 11, 2017

Is there since information in Intel codenames I'm perhaps unaware of?

krylon · on March 11, 2017

I haven't been able to keep those straight for years. Maybe this is just me getting old, but I miss the old days, when you could easily tell that a Pentium is faster than an 80486, and that a Pentium 133 is faster than a Pentium 100.

These days, CPU speed matters less than it did back then, but there still are CPU-hungry applications (I'm looking at you, Autodesk Inventor!), and if I had to put together a PC from scratch (which I think I'll actually sometime this year), I would be kind of lost.

foota · on March 11, 2017

Part of the issue here, I think, is that cpu's are much more complex than they were then. You have a number of different cpu lines with different models on the market at any time.

krylon · on March 11, 2017

That is true.

But that does make the decision what CPU is best for a given use case and budget much more complex, too.

(Like I said, the impact of the CPU on overall system performance is less today than twenty years ago for many use cases, so it is not that much of a problem.)

throwawayish · on March 11, 2017

Intel probably intentionally advertises with their weirdo socket names (1156 -> 1155 -> 1150 -> 1151) just to confuse people more. Heck, they probably choose the pin counts in such a strange order just to be more confusing. It's not like they have usable names (Socket H, H2, ...).

walrus01 · on March 11, 2017

A long time ago Intel CPU core codenames were geographic features in or near Oregon.

digler999 · on March 11, 2017

I remember reading that they choose locations because their names aren't trademark-able. So if you try calling your product SuperCPU, then a competitor rushes to trademark the same name, they could force you to rename it, thus tactically interfere with your marketing plan. But if you call it "Chicago" they can't trademark it because its a place.

devonkim · on March 11, 2017

The naming conventions seem like a way more fun variant of US Intelligence Community naming conventions. But even there there's some scheme for terms that reveal a little about its classification and originating agency unintentionally.

Mistletoe · on March 11, 2017

So much equipment and money devoted for something as pointless as Facebook. I wish it could go for something cooler and more useful. Something that hasn't been shown in studies to make us feel lonelier.

yeukhon · on March 11, 2017

I ask what is useful then? People who are working on Open Compute project are definitely learning new things. Every engineering blog post from FB is the reassurance of quality research done right. FB's work may not be impactful to you, but it is impactful to a lot of others. Ever heard of React and Cassandra? All came out of Facebook. Sorry, I have to be harsh, your comment is useless.

andwur · on March 11, 2017

If they were a research organisation then I would have nothing but respect for Facebook's accomplishments. However they are not a research organisation but instead a for-profit corporation that devotes an obscene amount of money and effort to undermining privacy for all users of the internet, irrespective of whether you use their platform or not.

I don't think it's fair to judge the merits of an entity purely by what it brings to the world, without also considering what it in turn takes from it.

yeukhon · on March 12, 2017

I think your concern for privacy has no respect for the individual accomplishment. Facebook, like many other for-profit organizations, will always monetizing user data. You have the choice of not using it. The outcome of their research has impact from environmental to computer science. You may disagree how a for-profit should run, but I respect the people who work there performing top tier research. You'd think all of the DoD research grants are less sinister? Perhaps we should never accept any grants from the DoD so none of them goes into military action based on your sentiment of social responsibility.

michaelbuckbee · on March 11, 2017

I'm not much of a FB user myself, but my wife's auntie (70yo, Filipino, living with us in the US).

- Saw the first pictures of her grandkids on FB

- Organized a school reunion in NY with people flying in from all over the world on FB

- Chats weekly with her daughter working on Singapore over FB messenger

- Writes crazy long prayers for sick friends on FB

- Found former students living locally on FB that she socializes with.

Whether or not FB makes you lonelier or not seems to really be situational and at least for auntie (and many others) they consider it a tremendous positive thing in their lives.

qplex · on March 11, 2017

Definitely situational.

Some people have gone to jail for 30 years for liking the wrong post on FB.

Guess that's one way to get lonely.

sjilo · on March 12, 2017

Can you give a source on that?

qplex · on March 13, 2017

Of course [1].

There have been many similar cases, prison sentences ranging from 15 to over 30 years.

Facebook enables these people to express themselves, but it also helps the authorities to track down anyone who has the wrong opinion. If they really wanted they could do something about it.

[1] https://www.theguardian.com/world/2015/aug/07/man-jailed-for...

Edit: Here is the story about the single like: http://www.ibtimes.co.uk/thai-man-faces-32-years-prison-liki...

threeseed · on March 11, 2017

Sorry but this is an incredibly ignorant statement.

Irrespective of whether you find Facebook (the website) useful, Facebook (the company) is amazing. The technologies they have developed e.g. Hive, HBase, Cassandra, ORC now power the Big Data Analytics movement which is transforming enterprises around the world. And frankly nothing in IT is changing the world for more people across more facets of their life than that. Likewise their work on the maching learning front has been incredibly valuable.

And provided you don't make social media the core of your life then you will find it to be a useful tool.

stephenr · on March 11, 2017

They can't keep the most basic of api's stable and reliable for even 6 months.

tim333 · on March 11, 2017

>Something that hasn't been shown in studies to make us feel lonelier.

I was curious about that as I'd say personally I've found the opposite effect that it lets you connect with friends more easily, so I tried to look up the studies.

Most articles seem to quote Ethan Kross who found people who reported they were lonely used Facebook more and concluded Facebook made them lonely but it seems to me cause and effect would be more likely to run the other way. I mean if I'm physically on my own I'll chat to friends online. I can't recall chatting online causing me to be physically alone.

Mistletoe · on March 12, 2017

http://www.upmc.com/media/NewsReleases/2017/Pages/primack-sm...

There is the most recent one.

praneshp · on March 11, 2017

Why don't you make something cooler/useful?

Swizec · on March 11, 2017

As a small business advertiser: Facebook is awesome. Nowhere else can you get $400 of sales for $10 in ads. That's a 40x ROI.

jonknee · on March 11, 2017

Your ROI is from profit, not revenue. You can easily get $400 in sales for $10 in ads if you don't care about profit.

nickpsecurity · on March 11, 2017

You can get it for $0 in ads on Craigslist. ;)

exodust · on March 11, 2017

So every time you spend $10 on ads, you get $400 of sales? And if you spend $0 on ads you get $400 less revenue for that period?

Sounds like fake news to me.

hyperbovine · on March 11, 2017

The last time hey did this it flooded the market with dirt cheap E5-2560s. Are we in for a new updated deal?

apitman · on March 11, 2017

I'm writing this on a dual E5-2670 v1 system I built for about 1000USD (not including monitors). 16C/32T, intell s2600cp motherboard, 48GB ECC, 256GB SSD (some cheap one), dual 3TB WD Red. Running docker and kvm on the host, and also currently using it as my main desktop. I have a freenas guest with ZFS handling the HDDs (PCI passthrough of a SATA controller), and a Windows 10 guest with GPU passthrough for office and games. All-in-all it's been a fun project, especially for the price. Incredible how cheap hardware can be these days. I paid 60USD for the CPUs when I did my build. I think they've gone up about 50% since then, but are still a steal in my opinion.

krylon · on March 11, 2017

Just out of curiosity, do you have an idea how much power that system uses?

I will probably get myself a new desktop computer sometime this year, and those specs sound pretty sweet. But I want to keep an eye on power consumption, too, and I don't my desktop to either melt or have its fans create a tornado in my living room... ;-)

apitman · on March 12, 2017

I went ahead and rebooted it so I could plug my Kill-A-Watt back in just for you ;). It idles at right around 110W. I did a quick sysbench to peg all the cores and it goes up to about 300W. Note that I have some other things sipping from the meter so it's actually several watts lower for just the tower. With the Windows 10 guest running (host using GTX 950, windows RX 460) it idles between 190 and 250.

apitman · on March 12, 2017

BTW I couldn't really tell any difference in the fan noise levels when the CPUs were pegged, though the temps went from ~40C to ~60C. It's already pretty quiet. Just barely above the level where I notice it.

krylon · on March 12, 2017

Thanks! That is less than I would have expected.

andwur · on March 11, 2017

I'm really interested in this too. I'm tossing up between something like this (maximum thread count but probably high power usage) or a Ryzen build (simple and low power).

That CPU has a TDP of 115W but I can't find much information on the idle power usage.

I would also like to see a write-up of the GPU pass-through setup since that's something I've been wanting to have on my local system for ages, i.e. vm host system => dev vm 1 ... vm N, Windows VM + dedicated GPU for gaming etc.

apitman · on March 12, 2017

If I were doing a build today, I would be sorely tempted by Ryzen, mostly because it would be new hardware and the AM4 platform. That said, having a server board has turned out to be really nice for a lot of things, such as ECC memory and nice virtualization features. It looks like Ryzen has some issues with kvm passthrough so far[0]. My favorite resources are [1] and [2].

[0] https://www.reddit.com/r/VFIO/comments/5yo83m/ryzen_pcie_pas... [1] https://wiki.archlinux.org/index.php/PCI_passthrough_via_OVM... [2] https://www.reddit.com/r/VFIO

urza · on March 11, 2017

What is your host OS / hyperviser? I would like to make similar setup and was wondering if Ubuntu as main host would allow the pci passthrough and all..

apitman · on March 12, 2017

I'm using Arch with kvm/libvirt. Most of the configuration of guests is done via virt-manager. Since a lot of this is moving relatively fast, I recommend using some sort of a rolling distro. I believe the libvirt devs mostly use Fedora.

DrnSln · on March 11, 2017

As you are mentioning that you custom built the system and the intel s2600cp motherboard comes in the EE-ATX ( Enhanced Extended ATX ) format, can you please specify the computer case you are using ? The official ones ( Supermicro, etc ) are expensive if bought new, same for used ones on ebay as they are large and heavy, thus the shipping fee is killing the deal.

oso2k · on March 11, 2017

My HP z820 uses almost 300w at load and some fan noise with dual E5-2660s, 128GB RAM, and an nVidia Quadro K2000. I've heard that some people that do audio/video with similar rigs replaced the factory CPU coolers and fans with thermaltake products for quieter rigs. It noticeable but not what I would call annoying. It is louder than the 2010 Mac mini it replaces when at normal load but way quieter than when the mini blows at full speed during swapping or intensive going usage.

dom0 · on March 11, 2017

I use a 2x E5-2665 v2 machine (S2600CP4) for building stuff. It's in very small, very cramped 2S Supermicro case (2U, less than 60 cm deep). Fun machine. ing loud as hell, uses stacked 40 mm fans... :)

apitman · on March 12, 2017

Yeah I originally thought about getting one of the 1U prebuilt systems from natex. So glad I went with the huge quiet case instead.

dom0 · on March 12, 2017

Lucky for my ears that the machine is with the others in a rack ;)

fnbr · on March 11, 2017

Where did you find them? eBay?

oso2k · on March 11, 2017

HP z820/z620/z420 and Dell T7600/T5600 are great used workstations that support the Xeon E5-26xx V1 series processors at very reasonable prices on eBay. Workstations are $300-$700 (maybe add CPU, RAM, HD, graphics card) shipped. I've bought a z820 and z620 at $420 & $499 respectively. My machines were partially usable missing only a second CPU & graphics card in the z820. z620 was fully usable with dual E5-2620s. I added a pair of E5-2660s, 128GB of RAM, and a second CPU cooler+fan to the z820 for $500 shipped and swapped the graphics card. So for $920, I got a 32 threads and 128GB beast z820 (probably would have cost $20K new) and a E5-2640 and 16GB of RAM to sell. So easy to work with too with their tool-less design.

It's a great 4K video/photo editing & OpenShift lab machine. Literally 300 chrome tabs and it hasn't cracked past 16GB of RAM usage in RHEL. Just need to spin up some VMs now.

matthew-wegner · on March 11, 2017

This used to be <$500, but currently $633 for a motherboard, dual E5-2670 CPUs, and 128GB ECC memory: http://www.natex.us/Intel-S2600CP-Motherboard-Package-Deal-p...

apitman · on March 11, 2017

Highly recommend this case if you go with an Intel S2600CP (or any other SSI EEB mobo): https://www.newegg.com/Product/Product.aspx?Item=N82E1681185...

blakes · on March 11, 2017

I am using the Fractal Design Define XL R2 (what a name) for my Supermicro dual CPU mobo, I drilled one extra mounting hole and the mobo mounted just fine.

Very quite and solid.

apitman · on March 11, 2017

+1 for natex. That's where I got my motherboard and it's worked great for about 6 months.

quink · on March 11, 2017

To the extent that dual socket LGA2011 ATX/LLB boards moved up in price to a point where you could pay three or four times more for a suitable mobo than for sixteen cores of CPU.

eBay E5-2670

loser777 · on March 10, 2017

I hope some of the old hardware makes it to ebay, but it looks like many of the form factors are proprietary.

ssttoo · on March 10, 2017

s/proprietary/opensource/

http://www.opencompute.org/ :)

jacquesm · on March 10, 2017

What do you intend to do with it?

I've found that almost any kind of short lived experiment I can do cheaper on AWS than doing it with hardware that I own. If it is longer running then it might become viable to own the hardware.

hueving · on March 11, 2017

It's sad to me this is becoming the status quo. Using other massively centralized companies for compute resources is a sad future.

It's bad for privacy, it's bad for diversity to protect against SPOFs, it's bad for general computing hardware (vendors primarily target the giants), it's bad for users via vendor lock-in, and it's bad for open source projects in the infrastructure space.

I think hackers justify it to themselves by pretending it's a commodity like electricity, but it's far from that. If my utility goes out, I can turn on generator and get exactly the same electricity. If Amazon goes out I have to build again on another cloud from a (hopefully recent) backup or just sit dead (like the recent s3 outage).

Sorry about the rant, but is there anything that would get you to stop giving the keys to the kingdom to Amazon?

jacquesm · on March 11, 2017

Hardware is a means to an end. I've got plenty of it but at the end of the day what you do with it should be balanced by what it costs.

For companies that have instances running long term it can very well be cost effective to own the hardware. My email server, web server and DNS server are on my own hardware with a co-location facility that I trust.

But for experimental stuff where you need to spin up a hundred machines for an hour or two you just can't beat the cloud (and that's my only use case for the cloud, though I can see others go much further).

I don't like the monoculture any more than you do, but to see this as me having given the 'keys to the kingdown to Amazon' is several steps too far.

hueving · on March 11, 2017

I misinterpreted what you meant by a "short lived experiment". I took that to basically mean any project when you're starting out. My apologies.

Whenever I'm experimenting I rarely need a burst of 100 instances, it's usually 1 or 2 instances to run things and I prefer to run them on my own hardware.

raz32dust · on March 11, 2017

I feel you. But I think the lock-in problem can be solved if we can have some standardization of cloud services such that you can always move to another provider. That has to start with some big company developing an abstraction layer and open sourcing it, and then we can go from there. I think Netflix has a switch from Amazon to GCP; I hope be they'll standardize it and open source it.

cowardlydragon · on March 11, 2017

Chef, puppet, etc, and other Apache projects all offer this already.

It's like leveraging Oracle specific database features. Your a fool to do so.

jacques_chester · on March 11, 2017

> That has to start with some big company developing an abstraction layer and open sourcing it, and then we can go from there.

I work for a company (Pivotal) that's had such a product -- Pivotal Cloud Foundry -- for several years. It creates an abstraction layer for apps or container images, your choice.

Deploy with BOSH to raw metal, OpenStack, vSphere, AWS, Azure or GCP. BOSH creates an abstraction layer over the IaaS.

We're also the main driving force behind Spring, Spring Boot and Spring Cloud Services; the latter is in part a generalisation and integration of Netflix OSS.

We cooperate a lot with Google and Microsoft. For example: https://cloud.google.com/solutions/cloud-foundry-on-gcp

dpark · on March 11, 2017

> I think hackers justify it to themselves by pretending it's a commodity like electricity, but it's far from that. If my utility goes out, I can turn on generator and get exactly the same electricity. If Amazon goes out I have to build again on another cloud from a (hopefully recent) backup or just sit dead (like the recent s3 outage).

What's your use case?

Are you just futzing around at home? Sure, use a server in your bedroom. Who cares?

Are you delivering a service to other people? Then owning hardware is probably a bad idea. If it's in your house, your users are hosed if you lose power or your internet cuts out. Putting it in a DC just means you're handing the same keys to someone else, but your self-managed hardware is definitely going to be less reliable than Amazon's infrastructure.

Owning hardware is a bad deal for everyone involved unless you're big enough to build your own HADR infrastructure.

hueving · on March 11, 2017

>but your self-managed hardware is definitely going to be less reliable than Amazon's infrastructure.

I don't buy this. I've seen many multi-datacenter self-managed deployments provide better uptime than Amazon web services. You are forgetting that when you own the hardware, you can actually orchestrate maintenance windows with live migrations, etc and then take down an entire datacenter with no impact. Guess when Amazon does maintenance? That's right, you don't know and one screw up can mean instances in "degraded status" (a.k.a. you might as well terminate it and launch a new one) or all of S3 is down during critical business hours.

Of course your own hardware in a single data-center is going to be exposed to high probability of failures, but that's the equivalent of using a single instance in EC2 (which I have lost two of in the last 7 years of managing 15 or so of them for a small company).

I will admit that it takes strong ops skills to maintain high uptime on your own hardware, but that's just due to a lack of good open source tooling in this area. I would rather see a movement to improve tooling rather than continue to boost the stranglehold the public cloud is putting on everyone.

dpark · on March 11, 2017

> I've seen many multi-datacenter self-managed deployments provide better uptime than Amazon web services.

Self-managed, multi-DC? Congrats on having a lot of money to blow, I guess.

Yes, with enough money you can match Amazon for uptime or scalability or whatever metric you prefer. For the same money you can probably buy triple the capacity in Amazon or your preferred cloud provider, so this is mostly a game for people with really deep pockets, really large scale, or really poor budgeting.

> You are forgetting that when you own the hardware, you can actually orchestrate maintenance windows with live migrations, etc and then take down an entire datacenter with no impact.

How many DCs are you talking about here? Are you self-managing in 4+ DCs? Or are you running in 2 DCs and your capacity is overbuilt by 100+%? In either case, deep pockets are nice to have.

Also, does your maintenance strategy seriously involve bringing down entire DCs? This is kind of absurd and makes half of me jealous of the bathtub full of cash you must bathe in. It makes the other half of me question some engineering decisions you've apparently made.

> all of S3 is down during critical business hours.

I have trouble believing people when they claim to do significantly better than Amazon (or another favorite cloud provider) for infrastructure uptime. If you stand up a fairly complex system comprised of a number of loosely-coupled services, you're going to end up experiencing some outages, because you'll face the same challenges as Amazon and those guys aren't idiots. You'll lose your message queue due to a bug, or you'll lose a network switch and realize your failover takes 30 minutes to complete instead of the 5 seconds you hoped for, or you'll accidentally DDOS a subsystem when exercising a failover or a system upgrade, or something else. Complex systems fail and when people tell me they built an "internet scale" system with better uptime than Amazon, I'm left to assume that they probably just do a bad job of tracking uptime or else that their systems are not at the scale they imagine. Everyone who builds large systems experiences outages.

jacquesm · on March 11, 2017

> I basically don't believing people when they claim to do significantly better than Amazon (or another favorite cloud provider) for infrastructure uptime.

That needs a dollar-for-dollar or something to that effect qualification. It's possible but very expensive.

There are for instance long running (and I mean really long running, many years or even decades) experiments where any amount of downtime would cause a do-over.

One of my customers had something like this on the go. The amount of money they spent on their power and network redundancy was off the scale, but they definitely had better uptime than Amazon.

Their problems were more along the lines of 'this piece of equipment is nearly eol, how do we replace it without interrupting the work it does'.

dpark · on March 11, 2017

Yes, sorry. I was assuming similar expense. Enough money can buy just about anything, including a few additional nines.

If your goal is to build out scale more reliably than Amazon, at the same or lower cost, that's tough and you're unlikely to achieve it unless your scale is approaching that of Amazon (and you have really good people).

hueving · on March 15, 2017

>Self-managed, multi-DC? Congrats on having a lot of money to blow, I guess.

Putting a rack in a COLO is still self-managed for the purpose of what I'm talking about. It's easy to get multiple data centers where you are renting the space and electricity but you still own the hardware and can make agreements with various ISPs to get service from.

>How many DCs are you talking about here? Are you self-managing in 4+ DCs? Or are you running in 2 DCs and your capacity is overbuilt by 100+%? In either case, deep pockets are nice to have.

See comment above.

>Also, does your maintenance strategy seriously involve bringing down entire DCs? This is kind of absurd and makes half of me jealous of the bathtub full of cash you must bathe in. It makes the other half of me question some engineering decisions you've apparently made.

See comment above. "bringing down a DC" doesn't mean shutting everything off, it means from the perspective of your end users, your service is not available there.

> because you'll face the same challenges as Amazon and those guys aren't idiots.

No, but they have much different priorities. If all I want is static asset hosting, the loosely-coupled micro-service architecture you are referring to is completely overkill and results in the very instability you are claiming is normal.

>Complex systems fail and when people tell me they built an "internet scale" system with better uptime than Amazon, I'm left to assume that they probably just do a bad job of tracking uptime or else that their systems are not at the scale they imagine. Everyone who builds large systems experiences outages.

Nobody except Google and Microsoft are building something as complex as the entire AWS stack. The vast majority of AWS users are using a tiny percentage of the features that come with AWS and can get by on much simpler systems that are easier to reason about and maintain.

When you dump the majority of what Amazon is actually running, you have a much simpler system and architecture and actually can beat Amazon's uptime.

patrickg_zill · on March 11, 2017

Amazon charges at least 15 to 20 times the going rate for bandwidth. So if you are serving large amounts of data, it could easily be the case that you can pay for enhanced uptime with just the savings on bandwidth alone.

kuschku · on March 11, 2017

My raspberry pi at home has had in the past 3 years less downtime than aws.

Local datacenters in the city had even less.

I'm not sure where AWS is supposed to get that famous reliability from, but it's not in uptime. (I can't comment on storage reliability, because I only write a few terabytes of data a month — but otherwise, there's RAID 5 or other RAID setups to ensure data staying valid)

AWS has its advantages in its immense scalability within of seconds, it has its advantages in convenience.

But its uptime isn't much better than most home connections.

Home statistics:

Power downtime since 2006 is 29 minutes.

Internet downtime since 2006 is 6 hours in 2014, 2 times 30 minutes each in 2016.

This is on a 100/40 DSL line nowadays (the downtimes were, except for one, when switching ISPs), without any universal power supply, battery or generator.

For comparison, this is equivalent to a downtime of 99.99% — the same as AWS advertises, but better than what they delivered in this or the last year.

jacquesm · on March 11, 2017

You probably do not get how this works. Let me try to explain: when you talk about the uptime of your raspberry pi you are looking at a single, very simple instance of a computer. It's really easy to get an insane uptime out of a single machine.

Here's one for you:

  > uptime
    02:52:56 up 714 days, 16:53,  1 user,  load average: 0.00, 0.00, 0.00

Which is pretty average for a small, underutilized server. Essentially the uptime here is a function of how reliable the power supply is.

But that's not what AWS is offering.

They offer a far more complex solution which by the very nature of its complexity will have more issues than your - and mine - simple computers.

The utility lies in the fact that if you tried to imitate the level of complexity and flexibility that AWS offers that you'd likely not even get close to their uptimes.

So you're comparing apples and oranges, or more accurately, apples and peas.

jhlgkhkhil · on March 11, 2017

Agreed. What I question is whether a lot of the complexity is actually needed for a lot of the systems being deployed? For example people are building docker clusters with job based distributed systems for boutique B2B SAAS apps with a few 1,000 users. Is the complexity needed? And how much complexity needs to be added to manage the complexity?

kuschku · on March 11, 2017

How am I comparing apples and oranges?

The previous posters said that I should use AWS, because anything I set up myself will have more downtime than AWS.

Now. I've actually set up a few systems.

Some on rented dedicated servers, some on actual hardware at home.

Including web apps, databases backing dozens of services, etc.

As mentioned above, all of them have better uptime than AWS.

How am I comparing apples with peas if this is exactly the point made above — that even for simple services I should use AWS?

jacquesm · on March 11, 2017

> How am I comparing apples with peas if this is exactly the point made above — that even for simple services I should use AWS?

That a single instance of something simple outperforming something complex does not mean anything when it comes to statistical reliability. In other words, if a million people do what you do in general more of them will lose their data / have downtime than those same people hosting their stuff on Amazon. The only reason you don't see it is because there is a good chance that you are one of the lucky ones if you do things by yourself.

And that's because your setup is extremely simple. The more complex it gets the bigger the chance you'll end up winning (or rather, losing) that particular lottery.

kuschku · on March 11, 2017

> The only reason you don't see it is because there is a good chance that you are one of the lucky ones if you do things by yourself.

Or maybe because I have less complexity in my stack, so it’s easier to guarantee that it works.

Getting redundant electricity and network lines, and getting redundant data storage solutions is easy.

Ensuring that of 3 machines behind a loadbalancer at least 2 work is also easy.

Ensuring a complex system of millions of interconnected machines, services which have never been rebooted or tested in a decade (see the AWS S3 post-mortem), none will ever fail, is a lot harder.

dpark · on March 11, 2017

You're right. If you run fairly low volume services that don't need significant scale, you can possibly achieve better uptime than Amazon. You'll probably spend significantly more to get it, though, since your low volume service probably could run on a cheap VM instead of a dedicated physical server.

You're also likely rolling the dice on your uptime, since a hardware failure becomes catastrophic unless you are building redundancy (in which case you're almost certainly spending far more than you would with Amazon).

kuschku · on March 11, 2017

Actually, I’ve calculated the costs – if you only need to build for one special case, even with redundancy you tend to be always ~3-4 times cheaper than the AWS/Google/etc offerings for the same.

But then again, you have only one special case, and can’t run anything else on that.

PascLeRasc · on March 12, 2017

I agree, it is sad that heavy computing/data is being centralized around corporations. I really get a lot out of being able to see and touch my hardware. To me that's worth the additional cost. I love my little 100TB Synology box and it feels weird now sitting at my desk without its soft fan hum.

archimedespi · on March 10, 2017

eh, owning hardware is fun and a great learning experience

jacquesm · on March 11, 2017

Owning is passive, it's what you do with it that matters.

rb2k_ · on March 11, 2017

A lot of them already are:

http://www.ebay.com/sch/i.html?&_nkw=open+compute+server

scurvy · on March 11, 2017

That's all regular Quanta gear. No idea if it was owned by FB or another OCP adopter. OCP is popular with mineral companies.

nananonymous · on March 11, 2017

If facebook is just now announcing their upgrade should we expect these prices to go down?

wmf · on March 11, 2017

This is the Nth generation. Facebook has been continuously decommissioning old servers for years already.

nodesocket · on March 11, 2017

How long until Facebook joins the public cloud business with Amazon, Google, and Microsoft?

wmf · on March 11, 2017

Never? Their infrastructure is cool but it's only around half of what a public cloud would need.

nodesocket · on March 11, 2017

Never... I'm not convinced. Rollback to when Amazon was pre AWS. Everybody thought they were crazy announcing they were getting into the datacenter and cloud business. I'd say it has worked out well for $AMZN.

dragonwriter · on March 11, 2017

> Everybody thought they were crazy announcing they were getting into the datacenter and cloud business.

All the comments I heard were positive about how they were diversifyiny by leveraging expertise they had been forced to develop for their own core platform, not that they were crazy. I'm sure there were some.who.said "crazy", but it definitely wasn't everyone.

addicted · on March 11, 2017

I agree. Maybe Amazon sold it really well, but as far as I can remember, the response almost universally (including in financial circles) was that this was a great idea since it allowed them to leverage idle resources they needed to build out to handle peak loads (such as during the holiday season).

canes123456 · on March 11, 2017

That never made sense. What happened during holiday season? Everyone on AWS was put on hold?

gregmac · on March 11, 2017

Amazon.com, even at its peak computing needs, is now a drop in the bucket.

Three years ago, in 2014, AWS was adding the equivalent hardware every day of what ran Amazon.com in 2004, when it was only a $700-million company. [1]

[1] https://www.enterprisetech.com/2014/11/14/rare-peek-massive-...

umanwizard · on March 11, 2017

I don't know for sure but I'm guessing spot pricing for instances went way up.

nodesocket · on March 11, 2017

Really? Let's take a look at the chart for 2006 when AWS was first introduced (as far as I can tell). Does not look like $AMZN stock really did anything positive in 2006. It is hard to find exact dates of when AWS products were released. Anybody know when EC2 went live to the public (full actual date)?

http://imgur.com/a/BIFQH

Sorry for linking to an image, but Google Finance link to this chart did not work. Sigh!

Request for a startup: Make a finance interface as good as Bloomberg terminals for the web.

archgoon · on March 11, 2017

I don't think that had anything to do with AWS.

Here's An article from 2006 regarding the earnings. AWS isn't mentioned. Drop in operating income and announcement of Groceries and Baby and Toy stores.

http://www.slate.com/articles/business/moneybox/2006/07/the_...

_kwld · on March 11, 2017

Amazon didn't have to compete with AWS, Azure and GCP, Facebook does and Facebook doesn't have nearly as much of a record providing infrastructure to businesses. Cloud is a hard market to get into now.

motoboi · on March 11, 2017

Well, they could double it then and reap the benefits of scale.

randartie · on March 11, 2017

Half in what sense?

wmf · on March 11, 2017

In development effort. They have webscale datacenters, a hardware supply chain, OS provisioning, networking, object storage, etc. but AFAIK they don't have multitenant IaaS or PaaS, OSS/BSS, etc.

vacri · on March 11, 2017

The other thing is that they are a decade behind AWS is that stuff they don't have... and AWS has been reinvesting all it's profits in that area. That's hard to catch up on.

CoolGuySteve · on March 11, 2017

Ya, this is nice and all but until I can rent time on one of these servers I don't really care all that much. Are these OpenCompute designs hosted anywhere other than Facebook?

It feels more like they're bragging more than anything.

tristor · on March 11, 2017

Yes, you can host your apps on OpenCompute hardware today with Rackspace Cloud OnMetal among other providers. You might find the list of involved companies for the OpenCompute Project a good start. You can also buy or fabricate your own OpenCompute compatible hardware thanks to its open design.

rattray · on March 11, 2017

They bought Parse, and then shut it down with no mention of something else.

It's hard to imagine that they're currently planning on getting into the Cloud space.

jchrisa · on March 11, 2017

Parse

dexterdog · on March 11, 2017

100 million hours of video played per day. Are people actually watching this video or are they just inflating the number?

ploggingdev · on March 11, 2017

Facebook has over a billion daily active users, so 100 million hours divided over 1 billion users is 0.1 hours/user which is 6 minutes per user. Seems reasonable. Of course, there are lots of people who don't watch any videos and on the flip side there are a lot of people who watch a lot of videos on facebook. Edit : as pointed out below, the autoplaying videos might skew the numbers quite a bit.

_kwld · on March 11, 2017

They've inflated metrics in the past: http://www.businessinsider.com/facebook-video-views-exaggera...

I'm not sure if there's any standards between platforms for these things that allow you to compare though. I'd say for example that you should exclude watches that last less than 5s or so. YouTube and Netflix may not have thought to do it because it doesn't make much sense to them but Facebook really needs to since I assume most of their video watches are automatic (accidental) while scrolling through the feed.

dexterdog · on March 11, 2017

It does matter to Netflix. They don't publish their numbers and just use them for internal metrics so you can bet that they are honest with themselves about their numbers.

mgkimsal · on March 11, 2017

given a lot of auto-play video in my own feed, I'm presuming it's not all actually 'watched'. if they'd give a separate number on videos 'listened' to (where I actually unmute the audio), I'd take that number more seriously.

Strom · on March 11, 2017

This wouldn't be too accurate anymore either, because now even audio autoplays.

Buge · on March 11, 2017

Is this a setting or experiment or something? It doesn't happen for me.

mgkimsal · on March 11, 2017

Neither for me. Videos play, but the audio is always muted. I have to enable it to play for each video, and have for more than a year.

Strom · on March 12, 2017

Apparently they've tested it for a year or so already and are now doing a wider rollout. http://www.wxyz.com/money/consumer/dont-waste-your-money/fac...

aurelianito · on March 11, 2017

Autoplayed videos stress their server infrastructure even if no one is watching. It is OK to count them in the context of this article.

wlesieutre · on March 11, 2017

Given that you can't scroll past a video without it playing, it's got to be a mix

lucaspiller · on March 11, 2017

You can disable this in the settings.

dexterdog · on March 11, 2017

But it defaults to on which means that's where it is for most people.

jabl · on March 11, 2017

I'm disappointed in the "open rack" designs. For a really minor improvement in density they have broken compatibility with standard 19" gear.

One could argue that at FB scale it's worth it, but then MS seems to manage just fine with 19".

oso2k · on March 11, 2017

It's interesting. If they wanted to, they could compete with the likes of HP, Dell, Lenovo, and Cisco if they could ramp up production to accommodate customers. I wonder who does their manufacturing on the backend.

wmf · on March 11, 2017

Facebook uses Quanta/QCT, Celestica, and Accton for a lot of their manufacturing. You can buy Facebook servers from companies like Hyve, AMAX, and Stack Velocity but they aren't really aiming at the mainstream server market.

quickben · on March 11, 2017

Is it that cheaper to custom build, if you can't ebay them off at their half life point to recover some of the cost?

wmf · on March 11, 2017

Most of the cost is in processors and RAM so those parts can be sold at end of life. There are server recycling companies that specialize in this.

saycheese · on March 11, 2017

Here's an example of one: http://cashforelectronicscrapusa.com

PhantomGremlin · on March 11, 2017

Yeah, that kinda reminds me of expertsexchange.com

Is it electronic-scrap or is it electronics-crap ?

saycheese · on March 11, 2017

The terms of use page refers to "CJ Environmental" which when combine with the INC 500 reference of the home page returns this profile:

http://www.inc.com/profile/cj-environmental

From the FAQs, "How does material get processed and refined? All materials are unique and subject to different methods of processing. Newer computers are refurbished and given new homes to maximize ROI for our customers. Scrap product is crushed or shredded before the refining process. All material is processed in accordance with all Federal, State and Local regulations. To learn more about our licensing and compliance measures contact us."

taf2 · on March 11, 2017

When can we start hosting our services with Facebook - similar to aws, gce, etc?

trustfundbaby · on March 10, 2017

Anyone know what cpus/gpus they use in these?

jacquesm · on March 10, 2017

"Built in collaboration with our ODM partner QCT (Quanta Cloud Technology), the current Big Basin system features eight NVIDIA Tesla P100 GPU accelerators. These GPUs are connected using NVIDIA NVLink to form an eight-GPU hybrid cube mesh — similar to the architecture used by NVIDIA's DGX-1 system. This setup, combined with the NVIDIA Deep Learning SDK, utilizes this new architecture and interconnects to improve deep learning training across all GPUs.

Compared with Big Sur, Big Basin will bring us much better gain on performance per watt, benefiting from single-precision floating-point arithmetic per GPU increasing from 7 teraflops to 10.6 teraflops. Half-precision will also be introduced with this new architecture to further improve throughput."

iDemonix · on March 11, 2017

So I just run the setup CD, right?

jacquesm · on March 11, 2017

Assuming your question is serious: Yes, essentially, but you're going to have to temporarily connect some peripherals if you don't intend to pop in a pre-installed image on a drive. And of course, depending on what OS you intend to run on it you might end up with driver hassles, your best bet is likely a reasonably modern linux distro.

opsunit · on March 11, 2017

Facebook PXE boot to Anaconda, use Chef to provision the host and then ultimately use their own scheduler to execute a workload.

wmf · on March 11, 2017

Facebook uses Xeon D and Xeon E5 CPUs. https://www.servethehome.com/facebook-at-open-compute-summit...

yeukhon · on March 11, 2017

This makes sense. No onboard graphic card.

ksec · on March 11, 2017

No mention of AMD. No wonder why Intel isn't worried. They have probably locked up contract with Amazon, Microsoft, Google, Facebook, Oracle, IBM, OVH, Baidu, Alibaba, Salesforces, SAP, DO etc along with dozens of other slightly smaller players.

Which got me to think, what % of Market share in terms of "Server" Market, do these dozens of player own? 50%?