Hacker News new | past | comments | ask | show | jobs | submit login
The slow rise of robots in the data center (datacenterdynamics.com)
103 points by vanburen on June 28, 2021 | hide | past | favorite | 72 comments



Not sure if I really buy that the Switch security bot is a serious project, given that over-the-top security theater seems to be one of Switch's main marketing points. Their employees wear military tactical gear; their facility walls and gates are designed like they're meant to house a prison; they have racks of what appear to be real guns and riot gear displayed prominently behind the security folks that check you in. Factor in the Bond-villain architectural aesthetic and it's really just too much to take seriously.

Bizarro security roleplaying aside, Switch is probably the most professional and competent data center operator I've worked with.


I know half a dozen folks that have worked there who all independently described the security as a joke to impress clients. When I took a tour once they claimed that because of their government contracts they can legally commandeer fuel at gunpoint to run their generators if they need to, which is a cute fantasy.


> commandeer fuel at gunpoint to run their generators if they need to, which is a cute fantasy.

Indeed it is. But sometimes real life has odd moments, and some data can be extraordinarily important:

> The dramatic rescue of title books from doomed abstract businesses proved a greater public good when all official land records were lost [in the Great Chicago Fire of 1871]. John G. Shortall, who forced a passing wagon driver at gunpoint to load his records, would thereafter be remembered for more than the arrangement of legal conveyances.

> Shortall and Hoard, Jones and Sellers and Chase Brothers and Co. - who have saved nearly all of their papers, including the indices to every piece of property in Cook County, and actual abstracts to a large proportion of this property. We have one firm - J.H. Rees and Co. - who has saved copies of all the maps and all the plats ever made of Cook County property.

> In April 1872, the Burnt Records Act was passed by the Illinois Legislature, and the existing abstract records of the three companies were made admissible as evidence in all courts of record.

https://www.ctic.com/history2.aspx


They can, then they will be charged with an armed robbery afterwards


Or possibly shot for attempted armed robbery.


> Not sure if I really buy that the Switch security bot is a serious project

It isn't. Switch is famous in the telecom/datacenter industry for weird security theatre. Notably they started having conspiciously armed-with-AR15 guards at their Las Vegas area facility 15+ years ago. Compare them to any major carrier hotel/critical IX point in the USA, they stand out as weird. And they're not even as critical as they claim to be.

I can't share the details but the real, hidden, non-theater security behind some of the US's major traffic exchange points very much exists but is not on public display for touring the rube customers.


> Factor in the Bond-villain architectural aesthetic and it's really just too much to take seriously.

-- Sith admin


Presumably the Fed/State government people who hand out the DC contracts like the security cosplay. :)


Doesn't explain the high level of competence, though.


The fact that private sector can pay much better than the government can seems to explain it neatly. Concepts like honor and service to your country doesn't pay the bills as nicely as selling AMZN.


I would argue that that's half the problem. The other would be the incentives that govt workers have to excel. Where I am (not the US) , the government jobs have a reputation for cushy roles, where your progression is basically guaranteed.


This may have been true two decades ago. Now AMZN’s biggest customer is the government


When I worked at Oracle our DC had similar, but less stylish armed guards. It was because we had equipment for Federal Government contracts. Another smaller (former Siebel!) DC with no government stuff I visited basically had zero security in comparison.

So it might be a requirement?


it is.

"Prior to FedRAMP, the Security Control Assessor (SCA) had to visit the data center to check the “gates, guards and guns” every single time, even if that specific assessor had previously visited that data center. That is no longer necessary. The FedRAMP ATO takes care of all of that"

https://www.linkedin.com/pulse/security-control-spotlight-in...


Isn't security theater exactly the thing that's needed to deter threats? When you see guards that look like Navy Seals, it is a pretty strong deterrant.


Regular datacenters don't seem to be having major problems with technicals full of insurgents stealing hard disks, but maybe I'm just out of the loop.


No? A cursory reconnaissance (e.g. "I'm a potential customer and I'd like a tour") will usually show that sort of theatre for what it is.


Is physical attack a common method to breach a datacenter? Sure you need some physical security and auditing but I would imagine the main threat is online.


Depends on your threat model.

(Ie any serious attacker will look past the theatre.)


Deterrent of what? Did someone ever hold a datacenter hostage?


[deleted]


I think you're confused, the poster you're replying to is talking about a company named Switch that makes robots for monitoring/patrolling/securing data centers.

Nothing to do with packet switched networks


You are absolutely correct. I removed the post.


> “The server rack is more than 50 years old. There is no other piece of technology in data centers that has survived for so long," Zsolt Szabo told DCD back in 2016.

Quick, someone tell him about floors, walls, ceilings, doors, and electric light.


I know of several datacenter backups that are driven by 50+ year old train diesel engines, and one where the backup generator is/was a WW2 era diesel sub engine.


The rack unit's dimensions are inherited from manual telephone switchboard jack-and-lamp panels ... and the standard rack is approximately 100 years old. https://archive.org/details/bstj2-3-112


Tell him about the SR-71 (entered service in 1966, and it’s still the same instances that are flying). Most Cessnas flying are also 50+ years old, because if you purchase a more recent one, it has to comply with all the newer rules.


The SR-71 hasn’t flown in 22 years, so that’s a bad example.

There are of course planes that have flown for that long, and longer. The B-52s come to mind. But the SR-71 isn’t one of them.


DC-3s are still flying commercial (not joyriding).


Yeah, but those aren't in data centers.


Yeah high speed high altitude reconnaissance is simply not mission-appropriate. That's why Switch uses predator drones loitering in a tight loop pattern for surveillance and a quick strike.


For now


This feels somewhat counterproductive to me. Would these old planes be noticeably more dangerous?


Not OP, but I wouldn't say significantly more dangerous, no. Older aircraft sometimes have to undergo modifications to mitigate faults that have been discovered over time in order to stay airworthy. By and large, a 50 year old aircraft is going to have well-understood failure modes with mandatory steps to mitigate them.

The mechanism by which this is enforced is an "Airworthiness Directive" if you want to read more about it. Source: I used to fly a small 50+ year-old aircraft recreationally


Gone are the days of tape operators on roller skates, but this article is correct in its assessment that change in the datacenter moves at a very deliberate pace.

Oddly enough, I found the reading pace of this article similar to the change it describes. Nothing negative mind you, just gently rolling in its delivery.


Hi do you have any book/video talking about skaters in data center? I checked Youtube and only found one for switch board. Thanks!


No. Sorry. My comment was from a conversation I had in the mid-80's with a couple of field service reps who worked the site and couldn't believe it themselves.

The magtapes used to hang on row after row of metal racks. The operators would retrieve the tapes, hold them on their forearms, and bring them back to what were usually a centralized set of tape drives.

Here's an offset into a video showing what it typically looked like. [0] The narrator refers to a "search tape" used to retrieve the person's information. The tape had to be retrieved from the racks.

The sheer weight and overall length of the rows/racks meant the rooms were often lower level with vinyl flooring over concrete (raised tile later). Imagine walking up and down for an 8-hour 5pm-1am shift. Not difficult to imagine skates.

Here's another video with a good representation of how it worked. [1]

As I have come to accept, the mid-80's was pre-everything. Deliberate change takes time.

[0]: https://youtu.be/Iddrm7mHPrY?t=111 [1]: https://www.youtube.com/watch?v=Cwj6pfhWBps


A few seconds after your starting point in that first video, you can see a loop of tape vibrating back in forth in a rectangular channel. That's a vacuum column, with a (mild) vacuum pulling the tape loop to the right. This lets the servomotors that move the tape past the heads start and stop rapidly, while the heavy tape reels can spin up and down more slowly.

See how the vibrating loop passes back and forth across a hole in the middle of the column? That's a pressure sensor. As the loop moves across the hole, the pressure in the hole changes, telling the drive whether it's time to reel in or dispense more tape.

More info: https://en.wikipedia.org/wiki/9_track_tape


IIRC, once it had a stable hold on the tape the drive would begin reading in an effort to locate the start of the data. On old tapes this might take a number of retries with the drive rewinding, reading, rewinding, reading... etc. Sometimes it didn't work at all and you were stuck with a bad tape that needed special attention.


Thanks a lot for the links!


From personal experience, health and safety officers prefer workers to walk around facilities. Accidents while rare can result in hospitalisation (broken bones) when using skates, scooters and bicycles.

I was once in a datacenter that had a golf cart to get from one end to another (yes - it was that long).


I've been in 3 or 4 (very large) datacenters over the years where they used 3-wheeled bicycles to get around. Bonus of this was there's a pretty roomy basket between the back wheels to hauls stuff around. But yeah...OSHA (in the USA) applies to datacenters too and I can't imagine allowing roller skates.


Wow that's fun time~~


Couldn't resist. Here's another video showing a streamlined and "much more compact" computer room from around 1990. That 44GB of disk storage (DASD) probably cost $150,000 at the time. Somebody signed the P.O. and slept well that night. [0]

Best part. You could roller skate to the music.

[0]: https://www.youtube.com/watch?v=vlvUz3T4WTA


My sense is that robots are in use more than the article makes out. Anywhere there is a risk of human error or repetitive work that can be simplified, you'll find robots are a candidate.

The main issues with robots in my experience is whether you can get the desired throughput while including planned and unplanned maintenance work, reconfigurations, etc. The upfront capital cost generally means you don't have the ability to have a drop-in replacement ready to go when a change is required. That results in downtime. Folk familiar with LEAN six sigma will know that hidden bottlenecks should be avoided or you build up large backlogs during downtime.

Good news for datacenter engineers is that fixing robots may become part of your remit in the future (interesting niche given the mix of software and mechanical). More interesting work will always exist!


Are there any interesting mathematical or other tools that you recommend to evaluate hidden bottlenecks systematically? I wonder for example, about Toyota making a decision to add inventory buffer for chips vs other auto makers. Great decision in hindsight, but how did they evaluate that in context with other possible actions.


From what I remember this (at least used to) fall under the heading "operations research", which spawned a lot of mathematical tools used elsewhere like queueing theory. Related terms include supply chain engineering, logistics, industrial engineering, and the general ideas of scientific management, Taylorism, etc.


The only way to know for sure is by experiments/simulations.

The heart of any competent operations team, whether its in computer systems or plain old logistics, is drills. You can't simulate every possible scenario, but you try to analyze and get an initial set, and the build that up as you run into actual production issues. What you want to avoid is running scripts for the first time in the middle of a disaster.


There are various quotes about robots at Google, but the picture labeled "Google" is of an ordinary tape library (looks like a StorageTek / Oracle library). If Google has built robots, that's not one of them.


Back in the 90's we half joked that we needed a remote controlled robot with 3 fingers to press Alt Ctrl Delete rather than drive down to the data center at 3am to reboot a jammed server. I sketched "R2Reboot".


I'd think that a robot that could neatly wire up racks with cabling (ethernet, power, & fiber) would be quite doable and useful. Does anyone here have experience with such a project?


Having wasted way more hours of my life than I would like on rack-and-stack...I don't think it's possible to fully automate this in a scalable manner without a radical re-work of rack design. Which isn't going to happen any time soon, given that the current standards have so much momentum behind them.

As things are: even in well-managed data centers, the racks themselves are always somewhat finnicky, with varying levels of precision in assembly from rack to rack that require odd workarounds for equipment installation more often than one would expect. And that's not to mention how incredibly variable the rack enclosures themselves can be, which has big implications for cable routing. And nevermind the fact that there's basically no standard for port placement on rackmount systems.

Rack-and-stack labor is dirt cheap, or easily foisted off on your sysadmins for small deployments. I don't see a robot for this being competitive from a cost standpoint unless that robot is extremely general purpose and able to fulfill other roles.


Indeed. Even the largest DCs are putting so few racks up per day (still many, many racks) that throughput is rarely the problem.

Actually getting the parts on time has been more of an issue in my experience, or finding out that your entire rack is forfeit because it fell over at the docking bay.

I can count on one hand the number of times I've seen rack-and-stack go too slow for the consumers to actually use the hardware. Usually the hardware is sitting waiting to be provisioned for eons.

The real labor is pulling bad drives/boards and rejiggering the network after the racks are in. So many outages are caused by magic hands making an error and pulling the wrong cord.


Basically, a giant blade server is what a server rack optimized for automated server install and replacement would look like. Something like a cross between a vertical warehouse robot and a tape jukebox is what would service it.

Of course, at that point it you’re adding a bunch of cost and may be better off just designing the servers to last for like 10 years with enough redundancy (and hot/warm/cold spares) to not need any physical swapping, then replace the whole thing with a forklift at end of life.


Submer seems to be working on exactly that. https://www.youtube.com/watch?v=zIB_BIEttFo (the robot is at the end)


A friend worked in Live (now called Bing) some years ago between 2004-2007 I guess...

He told me they had a robot like the ones from storagetek but for unplugging faulty blade servers and plug back new ones.

Makes sense for large scale installations.


>> I don't think it's possible to fully automate this in a scalable manner without a radical re-work of rack design.

I think the same thing when people talking about sending a 'plumbing robot' to your house.


This.

Contingent workforces are cheap (minimum wage essentially).

It's also cheaper to use connecters that mate on insertion than to use connectors that humans are accustomed to.


Maybe not rack and stack, this will definitely not fully replace traditional datacenter operations but complement it and make it more streamline. Thinking about how we currently do disk and cable swaps,memory or even system boards misses the point. Of course it can't be automated because there's no unified standard. There needs to be one for such tasks to be fully automated with robots, the rack, server, network switch and even hot/cold aisles need to be redesigned to work with a standard that supports robotics arms instead of human ones. No more wires, maybe conductive rails.

This will take infinity to accomplish (barring disruption in compute/storage) because -

* Remote human hands are ridiculously cheap, and they can perform functions robots can't without extra pay or a paid for software upgrade (think shipment handling, etc). Only real use case that comes to mind is one without economic justification, but of necessity. Think datacenters in space or deep underwater, or hostile environments. So maybe innovation in this field will come from the government this time, and trickle to the private sector. Maybe.

* The cost of such investment far outweighs the financial gains. In the very competitive cloud business, it's the services you offer that matters most, and the reliability of your systems. Sending a person with a code scanner to verify and do a quick disk or cable swap is not a risky endeavour. Unplugging the wrong cable and causing an outage never happens on properly designed systems, with properly set maintenances, so this is irrelevant (when was the last time you've heard that in an outage retrospect?)


I think it would be a lot more likely that the next step would be a different "smart rack" where connections, or a single, huge bandwidth network cable would come in to the rack in a single place. And then have a single connection per device in the rack. Then software would handle the routing.


It would be fascinating if we had infinitely reconfigurable data center network topologies. Among other things, you could have eg Just In Time network rebalancers that could add more capacity between nodes that have a lot of traffic between them. The reconfiguration would have to be instant, or at least take less time than the actual data transfer to have any benefits.


Things are already moving this direction with adoption of network virtualization solutions and overlays becoming the norm in corporate data centers.


Plexxi tried something along those lines. It was cool, but very expensive, so it didn't really go anywhere.


There has been a bunch of research on reconfigurable optical datacenter networks to do this.


You do that virtually, not physically.


recent AWS Frankfurt event when the datacenter was flooded with gas to push oxygen out seems to make a good use case for the robots there.


Why did they do that? Are they trying to kill sleeping sysadmins, vaguely threaten intruders, did they have a rat problem, or what? Seems ... inexplicable without the involvement of bureaucrats.


See: Halon fire extinguisher


it was a fire scare. AC failure (some control mistake) -> rise of temp -> some hardware started to smoke.


Aha. I wonder what that costs each time it triggers.


Presumably less than the value of the building and server racks.


Sponsored by CBRE.




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: