Hacker News new | past | comments | ask | show | jobs | submit login
OnMetal: The Right Way To Scale (rackspace.com)
154 points by philips on June 19, 2014 | hide | past | favorite | 50 comments



This sounds pretty cool, but I have a few suggestions:

1. I want to know how much these things cost. Even a line like, "Under $x/hour." would be useful.

2. The blog post says that the memory and compute types have no disk. This confused me at first, but then I saw on http://www.rackspace.com/cloud/servers/onmetal/ that they have 32GB boot drives. You might want to add that to the comparison table in the blog post.

3. Has any thought been given to letting users provision OnMetal servers in specific cabinets? Sometimes users want to have servers next to each other for better network connectivity. Other times, they want them in different cabinets for increased reliability.

Lastly, kudos to Rackspace for building this. Most people don't realize how virtualization can hurt performance. Even without noisy neighbors, a modern Xeon still takes >500 cycles to switch between VM and host.


1) I don't think prices have been announced yet.

2) They do have 32GB SATADOMs, but these are intended for boot media and configuration only. They are not intended for anything more than boot media. Think something like an SD Card in a solid-state machine.

3) Yes, I want this. No, we don't do it currently.

FWIW, we didn't build all this. Ironic existed long before we came around, we added a deploy driver and set it all up and exposed it for customers. We'll have a blog up shortly with exactly how we're running this in production.

The coolest part of all this is something nobody has mentioned here yet -- this is all on Open Compute hardware. When I managed servers, I always wanted access to the cool hardware and scale that companies like Facebook and Google had. Hopefully with OnMetal, we're going to give our customers the cool hardware I always wanted.

Edit: As a note, I'm a Racker and I work on the team that built this.



> Think something like an SD Card in a solid-state machine.

Ah, that's still way better than no disk. I was initially worried about having to deal with PXE. I wonder if logging will cause trouble for the SATADOMs. I'm guessing they don't do much in the way of wear-leveling.

I didn't recognize your nick at first, but we worked in the same office. I was part of the Cloudkick acquisition. I left Rackspace in August of 2012 to start Floobits. Small world. :)


I would say if you're running applications in the cloud, you should ship your logs and forgo logging them on disk. Or setup an awesome ES+Logstash cluster using IO nodes :).

You should've left off the last paragraph; I assumed you knew who I was :).


I agree with you about log shipping, but customers are bound to misconfigure their servers and log to the SATADOM.


Kudos, very cool. Maybe instead of 'no disk' something like 'SSD for boot media, no local bulk storage' would be clearer, as I was a little confused by this as other posters below were. It's also a little interesting that the 'High CPU' option has the fewest cores of the three options... Any comments on that?


It's low on other resources so it's probably lowest cost per CPU.


That's basically correct, we were trying to balance a few factors:

  - Total cost
  - Cost per core
  - Clock speed (this actually matters a lot for, say, a Python webapp)
We chose to run only a single CPU in this box to keep the total cost of the box low. This slightly increase the cost per core (switch ports, the chassis, rack space, etc being basically fixed overhead per node), but by selling in smaller increments we think it we can keep the total cost lower for most users.

We'll be gathering feedback on the resource balance, and iterating as we go.


Cool! Can you talk about some unexpected challenge you faced getting this rolled out?


Honestly? Using 512GB of ram is pretty hard. I learned during some of this that you can actually get tmpfs to be CPU limited if you try to write to it too quickly.


> Honestly? Using 512GB of ram is pretty hard.

It's extremely easy. Insert 512GB of data into postgres, there, done.


Not if you are in Oil and Gas :)


Working with IPMI can be tough. We nailed the issues eventually, but there were a few frustrating days of images not booting properly with no explanation. Until we found the right combination of settings, we had issues with flakey BMCs that would stop responding for no apparent reason.


Yup, this is what I like about Docker (I'm sure there's some overhead as well), application isolation with out paying for it seriously in terms of performance.

That said you can pull a lot of performance out of a modern 2U server, 24 cores and 256 GB of RAM can be had pretty cheaply these days, pair it with 8 SSDs and you've got a real workhorse.

Even a traditional MySQL/Postgresql DB can maintain upwards of 140,000 tps on hardware like that.


And you can probably rack that for order $11k


Or rent it from EC2 for a month.


Finally the pendulum swings back toward reality.

Virtualization is interesting for many problem sets but it does add overhead and it has it’s costs. All of these technologies are cost -vs- reward and for too long we’ve been drinking the virtualization cool aide not realizing the real costs of those configurations.

Last year I did a super low-latency high qps project on a cloud box and I was lucky to get 37k qps out of the box. I had the same exact box reloaded as bare metal and was able to get to 165k qps. Mind you this is latency in the neighborhood of 10ms and really extreme qps loads. That said the virtualization layer is not free, it’s useful as are many technologies and it’s an awesome tool for the right problem set.

I’m happy to see Rackspace coming out with this, I asked them for something like this years ago and am excited to see they’ve figured out a way to do it..

Like many have said.. Let’s see the pricing. But before you compare make sure you are really comparing apples to apples.


This is an interesting announcement for a few reasons:

1) It's true that VMs generally provide bad performance for high i/o applications, particularly databases. There are various ways to mitigate this such as using high IOPS drives like AWS EBS PIOPS or SSD backed storage from Google/Digital Ocean.

2) This is using OpenCompute and is all open source and going to be released, so they can take advantage of the efficiencies of that hardware architecture.

3) It gives you quick access to physical servers connected to your cloud environment. Benefits of flexibility/scalability with the benefits of hardware. This seems to be how they're differentiating against Softlayer who have bare metal servers available via API, but they're billed monthly and take a few hours to provision (which is still pretty impressive).

However, there is no pricing announced - this will be key.

Also, will this start to eat into their managed services perhaps? Anticipating disruption?


Brute force is a pretty expensive way to get more I/O. Coincidentally the other day I measured 50,737 mixed random IOPS under KVM and 104,518 IOPS under Docker using the same hardware. Could I have bought even more SSD to improve performance under KVM? Maybe, but at what cost.

Rackspace is definitely disrupting themselves with their cloud product and today we can see that they're not doing it half way.


Checking IOPS of a fully-virtualized system (even with virtio, and it's unclear from your post whether you were using it or not) is not at all a relative comparison to something which is literally a process on the host system.


ime non-vm servers are an enormous win over ec2, and probably vms in general. A previous employer moved a large data processing pipeline from ec2 to softlayer; we saw costs fall from ~$100k/mo to high $40k/mo while tripling the throughput for roughly a 6x perf boost.

The one area that this announcement doesn't address, however, is contended networking. Rack local servers with a 2G (or ideally 10G) full port-to-port simultaneous switch did amazing things for our app's performance. When you're looking at whole-app performance, you're sensitive to contention on all of cpu, io, and networking. Hadoop is particularly sensitive to contended i/o, but secondarily sensitive to networking.

edit: s/big data/large data/ -- people who say bigdata are tools


As we talk about here: http://developer.rackspace.com/blog/how-we-run-ironic-and-yo..., each instance is provisioned with redundant 10Gbit network links with minimal network over-provisioning in each cab.


oh awesome; did I miss that in the article? Reading is hard...


This is why I just say bigger than last week's data.


I think this is a neat development, but doesn't this post overstate the case for reduction in complexity?

I hope a Racker will correct me if I'm wrong, but:

- With the exception of the boot device, you're dependent on non-local storage over the network.

- You're not going to get dedicated networking gear

- OnMetal is still dependent on the virtual networking layer (OVS?)

I can appreciate having more bang for the buck, but there are enough shared components here that I would be very hesitant to assume that that MTBF crosses the imaginary line where you can eschew fault-tolerant architecture patterns.


For the network, you're getting 2x 10Gbit connections, configured in an HA-Bond, and plugged into a real switch. There's no current dependency on a virtual networking layer.


That's good to know. How and where are ACLs handled?


Engineering manager on OnMetal here.

Our first iteration, you'll get two networks: "ServiceNet" is our intra-DC network, which is unmetered (ie, free bandwidth) and "PublicNet" which is the internet and will be billed just like we bill for other cloud server bandwidth. Collectively you get 10Gb/s across these, which you can break down however you want. To you these just show up as VLANs which will be configured on build.

We're working on hooking these boxes up to our "Cloud Networks", which are per-tenant software defined networks. If this sounds like a fun project, hit me up, contact info is in my profile.


How do IPs get assigned? Is that still over nova network?


What makes you think it is non-local? Two of the options are RAM only (32GB and 512GB). That's like memory chips plugged into the motherboard (what modern normal user computers often only have 2 to 4GB of). You would run a RAM disk if your software needed something more than the boot drive to run using that. That's why the numbers are so large.

The other option is PCIe Flash, that's a solid state "disk" plugged into the motherboard expansion slot, basically. Again since that flat out state PCIe, they are saying it is local.


Are you saying it's appropriate to run a database on a RAM disk? This product is offered without a disk, so it has to be coupled with a SAN or similar remote storage system.


Our IO flavor has 2x 1.6TB PCIe flash cards, this is how we see most users running databases or otherwise persisting "hot" data. I've gotten to play with these over the last few months, and its really incredible how fast they are - they totally crush even high end SSDs. People have traditionally used these in a tiered setup to accelerate spinning disks, but the costs have come down to where it hardly makes sense to use anything else for low latency persistent storage.

We're working to hook this up to our existing block storage offering as well, which will offer more flexibility to attach SSD or spinning disk backed block devices to any flavor.


Can you recommend a specific brand of PCIe flash card? How reliable are they?


We are running these in OnMetal: http://www.lsi.com/products/flash-accelerators/pages/nytro-w...

Reliability is similar to an SSD. The underlying storage tech is basically the same: some flash chips, with a controller that handles wear leveling and so on. Just that instead of running operations through a SATA controller it uses the PCIe bus, which offers higher bandwidth and lower latency.

Interestingly, Apple is shipping PCIe flash in all (I think) of their latest laptops.


Oh, I see, just plain SSDs with PCIe instead of SATA. Yes, that seems logical.


The thing here is OpenStack and the beefy specs. We have seen other companies allowing you to deploy bare metal, SoftLayer and Ubuntu MAAS jumps to mind. I have been preaching bare metal for years now; it's great to see it's spreading (again).


I never understood why people didn't do this before -- I've wanted great application-accessible APIs to PXE and autoconfiguration for a while.

Even better if it's standardized across competing hosting providers, or at least multiple physical locations.

OpenStack can do this -- makes a lot of sense.

I personally like provisioning hardware with a hardware-virtualized hypervisor anyway, and running admin/keys/monitoring in another VM or the dom0 while running the application in a domU, but I hate multi-tenancy, especially across organizations, but often even across functions on the same server from the same organization.

Pretty excited by this; would love to set it up in a colo cage.


A key point of the post is that inconsistent VM performance is a problem that scaling startups have to do a lot of work to deal with, and some people here are scaling or have scaled startups, so...what are y'all's experiences?


Many people are still going to need or want to be really careful about selecting the best value. So the question is, if you rent say an 8GB VPS from Rackspace, Digital Ocean, Linode, or something similar, how inconsistent is the performance actually? And given that measurement, will the presumably somewhat more expensive "on metal" be a good value?

I'm not even considering EC2 because I don't trust the reliability and I know that its not a good value compared to the others. EC2 only makes sense to me if you really need to take advantage of features they have like the private networking or DynamoDB etc.


I'm looking into developing simplified MMOs. I don't care about SSD. I do care about latency and noisy neighbors. OnMetal sounds attractive, but it also sounds like it will be costly.

I think there's a market opportunity for simplifying the implementation of online multiplayer games, especially on the back end. Right now, there's a lot of painful barriers and some apparent pent up demand.


Are there any other providers out there that have anything like this? For example, SoftLayer has "Bare Metal" servers, but I'm not sure if there's a virtualization layer.


Storm (Liquid Web) also has a bare metal product. http://www.stormondemand.com/servers/baremetal.html


SoftLayer bare metal is bare metal; it's pretty similar to what Rackspace announced.


These things have Teeth!


I think this is an excellent idea, combining the extreme flexibility of the cloud with the high performance of dedicated servers.


Cant they throw at least a hdd on the HighCpu/HighRam servers ?

And would it good to have a HighRam/HighIo server ?


Would like to see HighRam+HighIo too and that's definitely what I'd want to use for a database server with lots of data, but I'm guessing there is a constraint like space inside the chassis that forces a choice between PCIe flash or lots of RAM. Hopefully someone from Rackspace will comment.


Would be very interesting to have a side-by-side feature comparison with SoftLayer.


Would love to see some pricing...




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: