Hacker News new | past | comments | ask | show | jobs | submit login
The end-to-end refresh of our server hardware fleet (facebook.com)
206 points by sloanesturz on March 10, 2017 | hide | past | favorite | 145 comments



I really dislike this sort of naming scheme (Bryce Canyon, Honey Badger, Mono Lake, etc)

The names tell you nothing. You can't tell which one came before which, or even what they are. You just have to KNOW that information. A good naming scheme tells you information about the thing named.


Historically, code names were chosen specifically to give up no information and it seems like that tradition continues, perhaps unintentionally.


I don't see the problem, I work with naming scheme like XYZ1530 and for knowing which server is which or what is installed we have documentation, name is only reference to find information in docs.

So if you work daily with server you know by heart what is on it, if not any "descriptive" name would only mislead you, because probably stuff changed a lot since naming.

I think the same for hw components, you have to look it up anyway in documentation, because some dimension could be changed after a year.


But names like XYZ1530 are harder to recall and these are name for a class of server


I work here (FB) and I completely agree. I can never keep the names straight.


100℅ agreed, drives me crazy with Ubuntu, I have to google the names to get the versions.

Never understood what was wrong with 16.04.2 vs Xenial Xerus (had go google the xerus part just now).


The Ubuntu names are chronologically in alphabetical order.


That doesn't really make it any better. Eclipse does the same thing (well they have since they ran out of Galilean moons), and I always end up needing to consult a chart to parse its version scheme: https://en.wikipedia.org/wiki/Eclipse_(software)#Releases

We already have a well developed and widely understood system for naming items in a sequence, it's called "numbers." I suggest people stop trying to be clever and use it.


Is writing ℅ instead of % some kind of meme? I'm seeing it everywhere all of a sudden, but I can't imagine you can type such an obscure character by accident and so frequently. Wondering if I'm not getting the joke, or if it's some kind of obtuse political or technical statement about something.


At least some Android keyboards (Gboard) put % and ℅ near each other on the same symbol page, and it's easy to mistake one for the other if you don't look closely.


They are on opposite sides of the keyboard, but on the same symbol page. But yeah, if you aren't looking closely, it's very easy to hit the wrong one. Surprised this hasn't been noticed by the Gboard team yet.


I can't understand why the c/o sign would be considered important enough to go on a phone keyboard, let alone on the same page as percent. Do people often write c/o? I might write it on a physical letter once or twice a year. Is it a cultural thing somewhere?


And then again, if people need to write c/o they could actually write... c/o.


Exactly that, I posted from my phone.


On my keyboard at least, if I long press % it uses the other one, and it's hard to notice when proofreading.


The idea is to avoid people assigning bias: I.E. If something is said to be Mk 1 and Mk 2 people are likely to desire the Mk 2 despite having no practical basis for that.


The whole point of refreshing the hardware fleet from Mk 1 to Mk 2 is that it has a practical basis that they will benefit from.

> Big Basin can train models that are 30 percent larger because of the availability of greater arithmetic throughput and a memory size increase from 12 GB to 16 GB. In tests with popular image classification models like ResNet-50, we were able to reach almost 100 percent improvement in throughput compared with Big Sur

Mk 2 is better than the Mk 1 in several important ways. They're not creating Mk 2 for no reason!


One of those reasons could be cost or speed of production. Mk1 could be better the mk2, but mk2 is easier to make.


Is there since information in Intel codenames I'm perhaps unaware of?


I haven't been able to keep those straight for years. Maybe this is just me getting old, but I miss the old days, when you could easily tell that a Pentium is faster than an 80486, and that a Pentium 133 is faster than a Pentium 100.

These days, CPU speed matters less than it did back then, but there still are CPU-hungry applications (I'm looking at you, Autodesk Inventor!), and if I had to put together a PC from scratch (which I think I'll actually sometime this year), I would be kind of lost.


Part of the issue here, I think, is that cpu's are much more complex than they were then. You have a number of different cpu lines with different models on the market at any time.


That is true.

But that does make the decision what CPU is best for a given use case and budget much more complex, too.

(Like I said, the impact of the CPU on overall system performance is less today than twenty years ago for many use cases, so it is not that much of a problem.)


Intel probably intentionally advertises with their weirdo socket names (1156 -> 1155 -> 1150 -> 1151) just to confuse people more. Heck, they probably choose the pin counts in such a strange order just to be more confusing. It's not like they have usable names (Socket H, H2, ...).


A long time ago Intel CPU core codenames were geographic features in or near Oregon.


I remember reading that they choose locations because their names aren't trademark-able. So if you try calling your product SuperCPU, then a competitor rushes to trademark the same name, they could force you to rename it, thus tactically interfere with your marketing plan. But if you call it "Chicago" they can't trademark it because its a place.


The naming conventions seem like a way more fun variant of US Intelligence Community naming conventions. But even there there's some scheme for terms that reveal a little about its classification and originating agency unintentionally.


So much equipment and money devoted for something as pointless as Facebook. I wish it could go for something cooler and more useful. Something that hasn't been shown in studies to make us feel lonelier.


I ask what is useful then? People who are working on Open Compute project are definitely learning new things. Every engineering blog post from FB is the reassurance of quality research done right. FB's work may not be impactful to you, but it is impactful to a lot of others. Ever heard of React and Cassandra? All came out of Facebook. Sorry, I have to be harsh, your comment is useless.


If they were a research organisation then I would have nothing but respect for Facebook's accomplishments. However they are not a research organisation but instead a for-profit corporation that devotes an obscene amount of money and effort to undermining privacy for all users of the internet, irrespective of whether you use their platform or not.

I don't think it's fair to judge the merits of an entity purely by what it brings to the world, without also considering what it in turn takes from it.


I think your concern for privacy has no respect for the individual accomplishment. Facebook, like many other for-profit organizations, will always monetizing user data. You have the choice of not using it. The outcome of their research has impact from environmental to computer science. You may disagree how a for-profit should run, but I respect the people who work there performing top tier research. You'd think all of the DoD research grants are less sinister? Perhaps we should never accept any grants from the DoD so none of them goes into military action based on your sentiment of social responsibility.


I'm not much of a FB user myself, but my wife's auntie (70yo, Filipino, living with us in the US).

- Saw the first pictures of her grandkids on FB

- Organized a school reunion in NY with people flying in from all over the world on FB

- Chats weekly with her daughter working on Singapore over FB messenger

- Writes crazy long prayers for sick friends on FB

- Found former students living locally on FB that she socializes with.

Whether or not FB makes you lonelier or not seems to really be situational and at least for auntie (and many others) they consider it a tremendous positive thing in their lives.


Definitely situational.

Some people have gone to jail for 30 years for liking the wrong post on FB.

Guess that's one way to get lonely.


Can you give a source on that?


Of course [1].

There have been many similar cases, prison sentences ranging from 15 to over 30 years.

Facebook enables these people to express themselves, but it also helps the authorities to track down anyone who has the wrong opinion. If they really wanted they could do something about it.

[1] https://www.theguardian.com/world/2015/aug/07/man-jailed-for...

Edit: Here is the story about the single like: http://www.ibtimes.co.uk/thai-man-faces-32-years-prison-liki...


Sorry but this is an incredibly ignorant statement.

Irrespective of whether you find Facebook (the website) useful, Facebook (the company) is amazing. The technologies they have developed e.g. Hive, HBase, Cassandra, ORC now power the Big Data Analytics movement which is transforming enterprises around the world. And frankly nothing in IT is changing the world for more people across more facets of their life than that. Likewise their work on the maching learning front has been incredibly valuable.

And provided you don't make social media the core of your life then you will find it to be a useful tool.


They can't keep the most basic of api's stable and reliable for even 6 months.


>Something that hasn't been shown in studies to make us feel lonelier.

I was curious about that as I'd say personally I've found the opposite effect that it lets you connect with friends more easily, so I tried to look up the studies.

Most articles seem to quote Ethan Kross who found people who reported they were lonely used Facebook more and concluded Facebook made them lonely but it seems to me cause and effect would be more likely to run the other way. I mean if I'm physically on my own I'll chat to friends online. I can't recall chatting online causing me to be physically alone.



Why don't you make something cooler/useful?


As a small business advertiser: Facebook is awesome. Nowhere else can you get $400 of sales for $10 in ads. That's a 40x ROI.


Your ROI is from profit, not revenue. You can easily get $400 in sales for $10 in ads if you don't care about profit.


You can get it for $0 in ads on Craigslist. ;)


So every time you spend $10 on ads, you get $400 of sales? And if you spend $0 on ads you get $400 less revenue for that period?

Sounds like fake news to me.


The last time hey did this it flooded the market with dirt cheap E5-2560s. Are we in for a new updated deal?


I'm writing this on a dual E5-2670 v1 system I built for about 1000USD (not including monitors). 16C/32T, intell s2600cp motherboard, 48GB ECC, 256GB SSD (some cheap one), dual 3TB WD Red. Running docker and kvm on the host, and also currently using it as my main desktop. I have a freenas guest with ZFS handling the HDDs (PCI passthrough of a SATA controller), and a Windows 10 guest with GPU passthrough for office and games. All-in-all it's been a fun project, especially for the price. Incredible how cheap hardware can be these days. I paid 60USD for the CPUs when I did my build. I think they've gone up about 50% since then, but are still a steal in my opinion.


Just out of curiosity, do you have an idea how much power that system uses?

I will probably get myself a new desktop computer sometime this year, and those specs sound pretty sweet. But I want to keep an eye on power consumption, too, and I don't my desktop to either melt or have its fans create a tornado in my living room... ;-)


I went ahead and rebooted it so I could plug my Kill-A-Watt back in just for you ;). It idles at right around 110W. I did a quick sysbench to peg all the cores and it goes up to about 300W. Note that I have some other things sipping from the meter so it's actually several watts lower for just the tower. With the Windows 10 guest running (host using GTX 950, windows RX 460) it idles between 190 and 250.


BTW I couldn't really tell any difference in the fan noise levels when the CPUs were pegged, though the temps went from ~40C to ~60C. It's already pretty quiet. Just barely above the level where I notice it.


Thanks! That is less than I would have expected.


I'm really interested in this too. I'm tossing up between something like this (maximum thread count but probably high power usage) or a Ryzen build (simple and low power).

That CPU has a TDP of 115W but I can't find much information on the idle power usage.

I would also like to see a write-up of the GPU pass-through setup since that's something I've been wanting to have on my local system for ages, i.e. vm host system => dev vm 1 ... vm N, Windows VM + dedicated GPU for gaming etc.


If I were doing a build today, I would be sorely tempted by Ryzen, mostly because it would be new hardware and the AM4 platform. That said, having a server board has turned out to be really nice for a lot of things, such as ECC memory and nice virtualization features. It looks like Ryzen has some issues with kvm passthrough so far[0]. My favorite resources are [1] and [2].

[0] https://www.reddit.com/r/VFIO/comments/5yo83m/ryzen_pcie_pas... [1] https://wiki.archlinux.org/index.php/PCI_passthrough_via_OVM... [2] https://www.reddit.com/r/VFIO


What is your host OS / hyperviser? I would like to make similar setup and was wondering if Ubuntu as main host would allow the pci passthrough and all..


I'm using Arch with kvm/libvirt. Most of the configuration of guests is done via virt-manager. Since a lot of this is moving relatively fast, I recommend using some sort of a rolling distro. I believe the libvirt devs mostly use Fedora.


As you are mentioning that you custom built the system and the intel s2600cp motherboard comes in the EE-ATX ( Enhanced Extended ATX ) format, can you please specify the computer case you are using ? The official ones ( Supermicro, etc ) are expensive if bought new, same for used ones on ebay as they are large and heavy, thus the shipping fee is killing the deal.


My HP z820 uses almost 300w at load and some fan noise with dual E5-2660s, 128GB RAM, and an nVidia Quadro K2000. I've heard that some people that do audio/video with similar rigs replaced the factory CPU coolers and fans with thermaltake products for quieter rigs. It noticeable but not what I would call annoying. It is louder than the 2010 Mac mini it replaces when at normal load but way quieter than when the mini blows at full speed during swapping or intensive going usage.


I use a 2x E5-2665 v2 machine (S2600CP4) for building stuff. It's in very small, very cramped 2S Supermicro case (2U, less than 60 cm deep). Fun machine. ing loud as hell, uses stacked 40 mm fans... :)


Yeah I originally thought about getting one of the 1U prebuilt systems from natex. So glad I went with the huge quiet case instead.


Lucky for my ears that the machine is with the others in a rack ;)


Where did you find them? eBay?


HP z820/z620/z420 and Dell T7600/T5600 are great used workstations that support the Xeon E5-26xx V1 series processors at very reasonable prices on eBay. Workstations are $300-$700 (maybe add CPU, RAM, HD, graphics card) shipped. I've bought a z820 and z620 at $420 & $499 respectively. My machines were partially usable missing only a second CPU & graphics card in the z820. z620 was fully usable with dual E5-2620s. I added a pair of E5-2660s, 128GB of RAM, and a second CPU cooler+fan to the z820 for $500 shipped and swapped the graphics card. So for $920, I got a 32 threads and 128GB beast z820 (probably would have cost $20K new) and a E5-2640 and 16GB of RAM to sell. So easy to work with too with their tool-less design.

It's a great 4K video/photo editing & OpenShift lab machine. Literally 300 chrome tabs and it hasn't cracked past 16GB of RAM usage in RHEL. Just need to spin up some VMs now.


This used to be <$500, but currently $633 for a motherboard, dual E5-2670 CPUs, and 128GB ECC memory: http://www.natex.us/Intel-S2600CP-Motherboard-Package-Deal-p...


Highly recommend this case if you go with an Intel S2600CP (or any other SSI EEB mobo): https://www.newegg.com/Product/Product.aspx?Item=N82E1681185...


I am using the Fractal Design Define XL R2 (what a name) for my Supermicro dual CPU mobo, I drilled one extra mounting hole and the mobo mounted just fine.

Very quite and solid.


+1 for natex. That's where I got my motherboard and it's worked great for about 6 months.


To the extent that dual socket LGA2011 ATX/LLB boards moved up in price to a point where you could pay three or four times more for a suitable mobo than for sixteen cores of CPU.

eBay E5-2670


I hope some of the old hardware makes it to ebay, but it looks like many of the form factors are proprietary.


s/proprietary/opensource/

http://www.opencompute.org/ :)


What do you intend to do with it?

I've found that almost any kind of short lived experiment I can do cheaper on AWS than doing it with hardware that I own. If it is longer running then it might become viable to own the hardware.


It's sad to me this is becoming the status quo. Using other massively centralized companies for compute resources is a sad future.

It's bad for privacy, it's bad for diversity to protect against SPOFs, it's bad for general computing hardware (vendors primarily target the giants), it's bad for users via vendor lock-in, and it's bad for open source projects in the infrastructure space.

I think hackers justify it to themselves by pretending it's a commodity like electricity, but it's far from that. If my utility goes out, I can turn on generator and get exactly the same electricity. If Amazon goes out I have to build again on another cloud from a (hopefully recent) backup or just sit dead (like the recent s3 outage).

Sorry about the rant, but is there anything that would get you to stop giving the keys to the kingdom to Amazon?


Hardware is a means to an end. I've got plenty of it but at the end of the day what you do with it should be balanced by what it costs.

For companies that have instances running long term it can very well be cost effective to own the hardware. My email server, web server and DNS server are on my own hardware with a co-location facility that I trust.

But for experimental stuff where you need to spin up a hundred machines for an hour or two you just can't beat the cloud (and that's my only use case for the cloud, though I can see others go much further).

I don't like the monoculture any more than you do, but to see this as me having given the 'keys to the kingdown to Amazon' is several steps too far.


I misinterpreted what you meant by a "short lived experiment". I took that to basically mean any project when you're starting out. My apologies.

Whenever I'm experimenting I rarely need a burst of 100 instances, it's usually 1 or 2 instances to run things and I prefer to run them on my own hardware.


I feel you. But I think the lock-in problem can be solved if we can have some standardization of cloud services such that you can always move to another provider. That has to start with some big company developing an abstraction layer and open sourcing it, and then we can go from there. I think Netflix has a switch from Amazon to GCP; I hope be they'll standardize it and open source it.


Chef, puppet, etc, and other Apache projects all offer this already.

It's like leveraging Oracle specific database features. Your a fool to do so.


> That has to start with some big company developing an abstraction layer and open sourcing it, and then we can go from there.

I work for a company (Pivotal) that's had such a product -- Pivotal Cloud Foundry -- for several years. It creates an abstraction layer for apps or container images, your choice.

Deploy with BOSH to raw metal, OpenStack, vSphere, AWS, Azure or GCP. BOSH creates an abstraction layer over the IaaS.

We're also the main driving force behind Spring, Spring Boot and Spring Cloud Services; the latter is in part a generalisation and integration of Netflix OSS.

We cooperate a lot with Google and Microsoft. For example: https://cloud.google.com/solutions/cloud-foundry-on-gcp


> I think hackers justify it to themselves by pretending it's a commodity like electricity, but it's far from that. If my utility goes out, I can turn on generator and get exactly the same electricity. If Amazon goes out I have to build again on another cloud from a (hopefully recent) backup or just sit dead (like the recent s3 outage).

What's your use case?

Are you just futzing around at home? Sure, use a server in your bedroom. Who cares?

Are you delivering a service to other people? Then owning hardware is probably a bad idea. If it's in your house, your users are hosed if you lose power or your internet cuts out. Putting it in a DC just means you're handing the same keys to someone else, but your self-managed hardware is definitely going to be less reliable than Amazon's infrastructure.

Owning hardware is a bad deal for everyone involved unless you're big enough to build your own HADR infrastructure.


>but your self-managed hardware is definitely going to be less reliable than Amazon's infrastructure.

I don't buy this. I've seen many multi-datacenter self-managed deployments provide better uptime than Amazon web services. You are forgetting that when you own the hardware, you can actually orchestrate maintenance windows with live migrations, etc and then take down an entire datacenter with no impact. Guess when Amazon does maintenance? That's right, you don't know and one screw up can mean instances in "degraded status" (a.k.a. you might as well terminate it and launch a new one) or all of S3 is down during critical business hours.

Of course your own hardware in a single data-center is going to be exposed to high probability of failures, but that's the equivalent of using a single instance in EC2 (which I have lost two of in the last 7 years of managing 15 or so of them for a small company).

I will admit that it takes strong ops skills to maintain high uptime on your own hardware, but that's just due to a lack of good open source tooling in this area. I would rather see a movement to improve tooling rather than continue to boost the stranglehold the public cloud is putting on everyone.


> I've seen many multi-datacenter self-managed deployments provide better uptime than Amazon web services.

Self-managed, multi-DC? Congrats on having a lot of money to blow, I guess.

Yes, with enough money you can match Amazon for uptime or scalability or whatever metric you prefer. For the same money you can probably buy triple the capacity in Amazon or your preferred cloud provider, so this is mostly a game for people with really deep pockets, really large scale, or really poor budgeting.

> You are forgetting that when you own the hardware, you can actually orchestrate maintenance windows with live migrations, etc and then take down an entire datacenter with no impact.

How many DCs are you talking about here? Are you self-managing in 4+ DCs? Or are you running in 2 DCs and your capacity is overbuilt by 100+%? In either case, deep pockets are nice to have.

Also, does your maintenance strategy seriously involve bringing down entire DCs? This is kind of absurd and makes half of me jealous of the bathtub full of cash you must bathe in. It makes the other half of me question some engineering decisions you've apparently made.

> all of S3 is down during critical business hours.

I have trouble believing people when they claim to do significantly better than Amazon (or another favorite cloud provider) for infrastructure uptime. If you stand up a fairly complex system comprised of a number of loosely-coupled services, you're going to end up experiencing some outages, because you'll face the same challenges as Amazon and those guys aren't idiots. You'll lose your message queue due to a bug, or you'll lose a network switch and realize your failover takes 30 minutes to complete instead of the 5 seconds you hoped for, or you'll accidentally DDOS a subsystem when exercising a failover or a system upgrade, or something else. Complex systems fail and when people tell me they built an "internet scale" system with better uptime than Amazon, I'm left to assume that they probably just do a bad job of tracking uptime or else that their systems are not at the scale they imagine. Everyone who builds large systems experiences outages.


> I basically don't believing people when they claim to do significantly better than Amazon (or another favorite cloud provider) for infrastructure uptime.

That needs a dollar-for-dollar or something to that effect qualification. It's possible but very expensive.

There are for instance long running (and I mean really long running, many years or even decades) experiments where any amount of downtime would cause a do-over.

One of my customers had something like this on the go. The amount of money they spent on their power and network redundancy was off the scale, but they definitely had better uptime than Amazon.

Their problems were more along the lines of 'this piece of equipment is nearly eol, how do we replace it without interrupting the work it does'.


Yes, sorry. I was assuming similar expense. Enough money can buy just about anything, including a few additional nines.

If your goal is to build out scale more reliably than Amazon, at the same or lower cost, that's tough and you're unlikely to achieve it unless your scale is approaching that of Amazon (and you have really good people).


>Self-managed, multi-DC? Congrats on having a lot of money to blow, I guess.

Putting a rack in a COLO is still self-managed for the purpose of what I'm talking about. It's easy to get multiple data centers where you are renting the space and electricity but you still own the hardware and can make agreements with various ISPs to get service from.

>How many DCs are you talking about here? Are you self-managing in 4+ DCs? Or are you running in 2 DCs and your capacity is overbuilt by 100+%? In either case, deep pockets are nice to have.

See comment above.

>Also, does your maintenance strategy seriously involve bringing down entire DCs? This is kind of absurd and makes half of me jealous of the bathtub full of cash you must bathe in. It makes the other half of me question some engineering decisions you've apparently made.

See comment above. "bringing down a DC" doesn't mean shutting everything off, it means from the perspective of your end users, your service is not available there.

> because you'll face the same challenges as Amazon and those guys aren't idiots.

No, but they have much different priorities. If all I want is static asset hosting, the loosely-coupled micro-service architecture you are referring to is completely overkill and results in the very instability you are claiming is normal.

>Complex systems fail and when people tell me they built an "internet scale" system with better uptime than Amazon, I'm left to assume that they probably just do a bad job of tracking uptime or else that their systems are not at the scale they imagine. Everyone who builds large systems experiences outages.

Nobody except Google and Microsoft are building something as complex as the entire AWS stack. The vast majority of AWS users are using a tiny percentage of the features that come with AWS and can get by on much simpler systems that are easier to reason about and maintain.

When you dump the majority of what Amazon is actually running, you have a much simpler system and architecture and actually can beat Amazon's uptime.


Amazon charges at least 15 to 20 times the going rate for bandwidth. So if you are serving large amounts of data, it could easily be the case that you can pay for enhanced uptime with just the savings on bandwidth alone.


My raspberry pi at home has had in the past 3 years less downtime than aws.

Local datacenters in the city had even less.

I'm not sure where AWS is supposed to get that famous reliability from, but it's not in uptime. (I can't comment on storage reliability, because I only write a few terabytes of data a month — but otherwise, there's RAID 5 or other RAID setups to ensure data staying valid)

AWS has its advantages in its immense scalability within of seconds, it has its advantages in convenience.

But its uptime isn't much better than most home connections.

Home statistics:

Power downtime since 2006 is 29 minutes.

Internet downtime since 2006 is 6 hours in 2014, 2 times 30 minutes each in 2016.

This is on a 100/40 DSL line nowadays (the downtimes were, except for one, when switching ISPs), without any universal power supply, battery or generator.

For comparison, this is equivalent to a downtime of 99.99% — the same as AWS advertises, but better than what they delivered in this or the last year.


You probably do not get how this works. Let me try to explain: when you talk about the uptime of your raspberry pi you are looking at a single, very simple instance of a computer. It's really easy to get an insane uptime out of a single machine.

Here's one for you:

  > uptime
    02:52:56 up 714 days, 16:53,  1 user,  load average: 0.00, 0.00, 0.00
Which is pretty average for a small, underutilized server. Essentially the uptime here is a function of how reliable the power supply is.

But that's not what AWS is offering.

They offer a far more complex solution which by the very nature of its complexity will have more issues than your - and mine - simple computers.

The utility lies in the fact that if you tried to imitate the level of complexity and flexibility that AWS offers that you'd likely not even get close to their uptimes.

So you're comparing apples and oranges, or more accurately, apples and peas.


Agreed. What I question is whether a lot of the complexity is actually needed for a lot of the systems being deployed? For example people are building docker clusters with job based distributed systems for boutique B2B SAAS apps with a few 1,000 users. Is the complexity needed? And how much complexity needs to be added to manage the complexity?


How am I comparing apples and oranges?

The previous posters said that I should use AWS, because anything I set up myself will have more downtime than AWS.

Now. I've actually set up a few systems.

Some on rented dedicated servers, some on actual hardware at home.

Including web apps, databases backing dozens of services, etc.

As mentioned above, all of them have better uptime than AWS.

How am I comparing apples with peas if this is exactly the point made above — that even for simple services I should use AWS?


> How am I comparing apples with peas if this is exactly the point made above — that even for simple services I should use AWS?

That a single instance of something simple outperforming something complex does not mean anything when it comes to statistical reliability. In other words, if a million people do what you do in general more of them will lose their data / have downtime than those same people hosting their stuff on Amazon. The only reason you don't see it is because there is a good chance that you are one of the lucky ones if you do things by yourself.

And that's because your setup is extremely simple. The more complex it gets the bigger the chance you'll end up winning (or rather, losing) that particular lottery.


> The only reason you don't see it is because there is a good chance that you are one of the lucky ones if you do things by yourself.

Or maybe because I have less complexity in my stack, so it’s easier to guarantee that it works.

Getting redundant electricity and network lines, and getting redundant data storage solutions is easy.

Ensuring that of 3 machines behind a loadbalancer at least 2 work is also easy.

Ensuring a complex system of millions of interconnected machines, services which have never been rebooted or tested in a decade (see the AWS S3 post-mortem), none will ever fail, is a lot harder.


You're right. If you run fairly low volume services that don't need significant scale, you can possibly achieve better uptime than Amazon. You'll probably spend significantly more to get it, though, since your low volume service probably could run on a cheap VM instead of a dedicated physical server.

You're also likely rolling the dice on your uptime, since a hardware failure becomes catastrophic unless you are building redundancy (in which case you're almost certainly spending far more than you would with Amazon).


Actually, I’ve calculated the costs – if you only need to build for one special case, even with redundancy you tend to be always ~3-4 times cheaper than the AWS/Google/etc offerings for the same.

But then again, you have only one special case, and can’t run anything else on that.


I agree, it is sad that heavy computing/data is being centralized around corporations. I really get a lot out of being able to see and touch my hardware. To me that's worth the additional cost. I love my little 100TB Synology box and it feels weird now sitting at my desk without its soft fan hum.


eh, owning hardware is fun and a great learning experience


Owning is passive, it's what you do with it that matters.



That's all regular Quanta gear. No idea if it was owned by FB or another OCP adopter. OCP is popular with mineral companies.


If facebook is just now announcing their upgrade should we expect these prices to go down?


This is the Nth generation. Facebook has been continuously decommissioning old servers for years already.


How long until Facebook joins the public cloud business with Amazon, Google, and Microsoft?


Never? Their infrastructure is cool but it's only around half of what a public cloud would need.


Never... I'm not convinced. Rollback to when Amazon was pre AWS. Everybody thought they were crazy announcing they were getting into the datacenter and cloud business. I'd say it has worked out well for $AMZN.


> Everybody thought they were crazy announcing they were getting into the datacenter and cloud business.

All the comments I heard were positive about how they were diversifyiny by leveraging expertise they had been forced to develop for their own core platform, not that they were crazy. I'm sure there were some.who.said "crazy", but it definitely wasn't everyone.


I agree. Maybe Amazon sold it really well, but as far as I can remember, the response almost universally (including in financial circles) was that this was a great idea since it allowed them to leverage idle resources they needed to build out to handle peak loads (such as during the holiday season).


That never made sense. What happened during holiday season? Everyone on AWS was put on hold?


Amazon.com, even at its peak computing needs, is now a drop in the bucket.

Three years ago, in 2014, AWS was adding the equivalent hardware every day of what ran Amazon.com in 2004, when it was only a $700-million company. [1]

[1] https://www.enterprisetech.com/2014/11/14/rare-peek-massive-...


I don't know for sure but I'm guessing spot pricing for instances went way up.


Really? Let's take a look at the chart for 2006 when AWS was first introduced (as far as I can tell). Does not look like $AMZN stock really did anything positive in 2006. It is hard to find exact dates of when AWS products were released. Anybody know when EC2 went live to the public (full actual date)?

http://imgur.com/a/BIFQH

Sorry for linking to an image, but Google Finance link to this chart did not work. Sigh!

Request for a startup: Make a finance interface as good as Bloomberg terminals for the web.


I don't think that had anything to do with AWS.

Here's An article from 2006 regarding the earnings. AWS isn't mentioned. Drop in operating income and announcement of Groceries and Baby and Toy stores.

http://www.slate.com/articles/business/moneybox/2006/07/the_...


Amazon didn't have to compete with AWS, Azure and GCP, Facebook does and Facebook doesn't have nearly as much of a record providing infrastructure to businesses. Cloud is a hard market to get into now.


Well, they could double it then and reap the benefits of scale.


Half in what sense?


In development effort. They have webscale datacenters, a hardware supply chain, OS provisioning, networking, object storage, etc. but AFAIK they don't have multitenant IaaS or PaaS, OSS/BSS, etc.


The other thing is that they are a decade behind AWS is that stuff they don't have... and AWS has been reinvesting all it's profits in that area. That's hard to catch up on.


Ya, this is nice and all but until I can rent time on one of these servers I don't really care all that much. Are these OpenCompute designs hosted anywhere other than Facebook?

It feels more like they're bragging more than anything.


Yes, you can host your apps on OpenCompute hardware today with Rackspace Cloud OnMetal among other providers. You might find the list of involved companies for the OpenCompute Project a good start. You can also buy or fabricate your own OpenCompute compatible hardware thanks to its open design.


They bought Parse, and then shut it down with no mention of something else.

It's hard to imagine that they're currently planning on getting into the Cloud space.


Parse


100 million hours of video played per day. Are people actually watching this video or are they just inflating the number?


Facebook has over a billion daily active users, so 100 million hours divided over 1 billion users is 0.1 hours/user which is 6 minutes per user. Seems reasonable. Of course, there are lots of people who don't watch any videos and on the flip side there are a lot of people who watch a lot of videos on facebook. Edit : as pointed out below, the autoplaying videos might skew the numbers quite a bit.


They've inflated metrics in the past: http://www.businessinsider.com/facebook-video-views-exaggera...

I'm not sure if there's any standards between platforms for these things that allow you to compare though. I'd say for example that you should exclude watches that last less than 5s or so. YouTube and Netflix may not have thought to do it because it doesn't make much sense to them but Facebook really needs to since I assume most of their video watches are automatic (accidental) while scrolling through the feed.


It does matter to Netflix. They don't publish their numbers and just use them for internal metrics so you can bet that they are honest with themselves about their numbers.


given a lot of auto-play video in my own feed, I'm presuming it's not all actually 'watched'. if they'd give a separate number on videos 'listened' to (where I actually unmute the audio), I'd take that number more seriously.


This wouldn't be too accurate anymore either, because now even audio autoplays.


Is this a setting or experiment or something? It doesn't happen for me.


Neither for me. Videos play, but the audio is always muted. I have to enable it to play for each video, and have for more than a year.


Apparently they've tested it for a year or so already and are now doing a wider rollout. http://www.wxyz.com/money/consumer/dont-waste-your-money/fac...


Autoplayed videos stress their server infrastructure even if no one is watching. It is OK to count them in the context of this article.


Given that you can't scroll past a video without it playing, it's got to be a mix


You can disable this in the settings.


But it defaults to on which means that's where it is for most people.


I'm disappointed in the "open rack" designs. For a really minor improvement in density they have broken compatibility with standard 19" gear.

One could argue that at FB scale it's worth it, but then MS seems to manage just fine with 19".


It's interesting. If they wanted to, they could compete with the likes of HP, Dell, Lenovo, and Cisco if they could ramp up production to accommodate customers. I wonder who does their manufacturing on the backend.


Facebook uses Quanta/QCT, Celestica, and Accton for a lot of their manufacturing. You can buy Facebook servers from companies like Hyve, AMAX, and Stack Velocity but they aren't really aiming at the mainstream server market.


Is it that cheaper to custom build, if you can't ebay them off at their half life point to recover some of the cost?


Most of the cost is in processors and RAM so those parts can be sold at end of life. There are server recycling companies that specialize in this.


Here's an example of one: http://cashforelectronicscrapusa.com


Yeah, that kinda reminds me of expertsexchange.com

Is it electronic-scrap or is it electronics-crap ?


The terms of use page refers to "CJ Environmental" which when combine with the INC 500 reference of the home page returns this profile:

http://www.inc.com/profile/cj-environmental

From the FAQs, "How does material get processed and refined? All materials are unique and subject to different methods of processing. Newer computers are refurbished and given new homes to maximize ROI for our customers. Scrap product is crushed or shredded before the refining process. All material is processed in accordance with all Federal, State and Local regulations. To learn more about our licensing and compliance measures contact us."


When can we start hosting our services with Facebook - similar to aws, gce, etc?


Anyone know what cpus/gpus they use in these?


"Built in collaboration with our ODM partner QCT (Quanta Cloud Technology), the current Big Basin system features eight NVIDIA Tesla P100 GPU accelerators. These GPUs are connected using NVIDIA NVLink to form an eight-GPU hybrid cube mesh — similar to the architecture used by NVIDIA's DGX-1 system. This setup, combined with the NVIDIA Deep Learning SDK, utilizes this new architecture and interconnects to improve deep learning training across all GPUs.

Compared with Big Sur, Big Basin will bring us much better gain on performance per watt, benefiting from single-precision floating-point arithmetic per GPU increasing from 7 teraflops to 10.6 teraflops. Half-precision will also be introduced with this new architecture to further improve throughput."


So I just run the setup CD, right?


Assuming your question is serious: Yes, essentially, but you're going to have to temporarily connect some peripherals if you don't intend to pop in a pre-installed image on a drive. And of course, depending on what OS you intend to run on it you might end up with driver hassles, your best bet is likely a reasonably modern linux distro.


Facebook PXE boot to Anaconda, use Chef to provision the host and then ultimately use their own scheduler to execute a workload.



This makes sense. No onboard graphic card.


No mention of AMD. No wonder why Intel isn't worried. They have probably locked up contract with Amazon, Microsoft, Google, Facebook, Oracle, IBM, OVH, Baidu, Alibaba, Salesforces, SAP, DO etc along with dozens of other slightly smaller players.

Which got me to think, what % of Market share in terms of "Server" Market, do these dozens of player own? 50%?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: