Yes, this exactly. The fact that I can guarantee that my code is running, and ha...

crehn · on Oct 13, 2020

The psychological effects of technology seem often overlooked—and they’re hard to quantify.

I wish there was some easier way to quantify the “soft” aspects of everything that surrounds us, e.g. the long-term impact of beautiful and usable UI design, the reduction in “ambient” psychological stress, the impacts of chaos/consistency on how we feel, any robust measures of happiness, excitement, relaxation, etc.

They seem to be such important attributes, but not as easily measured as a latency. Which in turn makes reaching consensus harder—and probably why design by committee doesn't work?

spyspy · on Oct 13, 2020

My joke around the office is some people optimize for latency, some for throughput, some for memory usage. I, however, optimize for sleep.

My entire goal is for my services to never wake me up. And if they do it had better because I clearly messed up and am the sole human alive able to fix it.

ethanwillis · on Oct 13, 2020

I optimize for escape hatches. Because if everything is on fire I want the ability to be able to bail(fix it).

There's no point in waking up when everything is blowing up if you don't have the access to fix the underlying issue. AWS doesn't actually guarantee 99.9% uptime. What they guarantee is they'll give you some company currency[1] if they can't meet or exceed 99.9% uptime.

So the next time the ship is on fire, don't worry, stay asleep and your Amazon account manager will be by shortly with a thimble of water to throw on you :)

[1] - https://aws.amazon.com/about-aws/whats-new/2019/03/aws-syste...

pm90 · on Oct 13, 2020

> I, however, optimize for sleep.

Bravo.

Having been on call for some truly monstrous amount of things and seen things break in all kinds of ways... I can't agree more with this principle.

While the cloud isn't cheap, if your org can afford it, there is no question about it: go with managed services. Focus on building things, leave the OPS to AWS/GCP/Azure.

erikerikson · on Oct 13, 2020

We optimize for good engineers

bufferoverflow · on Oct 13, 2020

> Dynamo will never go offline.

That's some ridiculous propaganda. AWS has whole regions going down for hours.

joana035 · on Oct 13, 2020

Are you really sure?

Supermancho · on Oct 13, 2020

Tracking exactly what individual db entries ran and didn't run when there's one of these "serverless" "painless" interruptions to an ETL, ensures that you have an endless number of ever-changing status columns and re-run processes.

Are you really sure that it all re-ran? No. You have to query your latest batch - which means you have to have arbitrary boundaries for when failures might have occurred like say, last 24 hours. If was a weekend, just query the whole database looking for any entry that's missing the "final" status success. Try not to run anything while you're doing this, unless your system is tolerant to the DB being locked for a full-table scan.

This is all very fragile and change-averse. Who would choose to run an ETL like this? I have worked for large companies who all end up with this rickety system. Every one, every time. God forbid you want to run tests with a mock set of serverless AWS services.

mark242 · on Oct 13, 2020

Here's the beautiful thing about thinking serverless. "Are you really sure that it all re-ran?" Yes I am, because Dynamo streams or Kinesis streams guarantee message delivery. If my Lambda code has a bug in it, which yes sure that happens, I can as part of my exception catching, put that message into a dead letter queue and process it later or report on it. No polling, no querying, I know exactly what went wrong and when.

Meanwhile, I don't have to do things like /make sure the ETL process is running/ or /clear out a bunch of logs when a disk inadvertently fills up/ or any of those other stupid last-century compute problems. It's just a far, far easier way of thinking about application development.

Supermancho · on Oct 13, 2020

> Dynamo streams or Kinesis streams

I can appreciate that. A lot more people than you might think, use Kafka or SQS because of various requirements (like that's all their infrastructure supports) and you end up with a massive "source" DB which is a duplication of all-time messages. Such is the reality of a larger byzantine organization. If it can't be audited by someone else, at their own pace, it's not approved.

justinram11 · on Oct 13, 2020

I generally agree that if you're setting up multi-step or "enterprise grade" ETL processes that it's a job better handled by something like Airflow (we've done a few smaller ones with Step Functions which hasn't been terrible, but Airflow is still better).

Our use of AWS Lambda for ETL is usually just one-off small processes that don't have dependencies.

As soon as you start getting into "This ETL depends on X, Y, and Z being run first" I think you're out of "small utility" territory.

mmaunder · on Oct 13, 2020

This is so naive. Unless you’re just creating utilities all day long, at some point you’ll actually have a production service that people are using that needs to be monitored, even if it is build out of lambda functions. And unless you’ve run AWS vs colo at scale, you have no concept of the orders of magnitude cost savings on bandwidth (at 95th percentile billing vs per gig) and compute.

mark242 · on Oct 13, 2020

I have run AWS vs colo at scale. For the cost of precisely 1 engineer fully loaded, I can have 500TB of bandwidth per month, every month. Guess what, it isn't going to take just 1 engineer to maintain your homegrown worldwide CDN.

Does it make sense for FAANG to run their own datacenters? Yes of course. Does it make sense for you? No. This is what I don't get about arguments for running a bunch of VPSs-- for literally any environment that isn't thousands of engineers, it is more cost effective to let someone manage your underlying infrastructure for you, and any workload at that scale, you aren't contracting with some random bare metal hosting shop.

tomc1985 · on Oct 13, 2020

You seem to forget there is are a ton of people who grew up on bare metal and know it quite well. And they can squeeze far more performance out of it then you could ever hope to see with outsourcing business-critical infra to a supplier who literally doesn't give a shit about you.

Have fun with an army of one-trick pony hires. One old-school diehard computer nerd knows enough to supplant 10 of those fools, but only costs 2x.

jedberg · on Oct 13, 2020

And many of them already work for AWS, who I can pay a small premium to to take advantage without having to hire one myself and then worry about replacing them when they decide to leave.

jlokier · on Oct 13, 2020

I think the point is sometimes that AWS small premium is really large. It varies a lot. For some the AWS costs are free or tiny. For others, AWS costs are larger than salaries, and it creeps up over time.

About staff, maybe consider finding a good consulting/support firm that can look after your servers for <10% the price of hiring one person, are friendly and on-call, and will likely stick around even as their people change.

That solves the worry about replacing people, as well as the cost.

But as with people, it can take some luck to find a good one :-)

(I used to provide that kind of consulting/support service. Not as much now because there isn't much demand, and development & research work is more satisfying. But still a little, ticking along in the background.)

ethanwillis · on Oct 13, 2020

Maybe it's hubris, but I'm 99% sure I could run a homegrown worldwide CDN for the cost of one engineer salary. I'd be willing to take that bet from anyone who thinks it can't be done.

jedberg · on Oct 13, 2020

You're forgetting that a huge part of that is maintaining accounts at a bunch of different colos and maintaining relationships with account managers and shipping hardware and all of the non-engineering work involved in running a CDN.

You'd be spending more than 1/2 your time just on the phone talking to people.

ethanwillis · on Oct 13, 2020

I definitely have not forgotten this aspect of it. Let's just say I've spent a lot of time on the phone for fun

If we want to talk about running a CDN company in terms of lots of additional organizational complexity in terms of customer service it can't be done on a single engineer's salary. My perception of a homegrown CDN is one that rivals uptime and deliverability metrics of existing commercial CDN providers. Within an existing org it can be done. If we're talking outside of an existing org think http://www.nosupportlinuxhosting.com/ to shave costs.

If we're willing to define parameters on what level of service this CDN needs to meet and ramp up time I'd be willing to take it on.

mark242 · on Oct 13, 2020

If you are not in the CDN business, why on earth would you want to do this?

This is my point-- I am not in the hosting business. I am in the application creation business. Every minute that I even have to think about modifying /etc/localtime in order to make sure that the instance is running in UTC is a minute I'm not being productive. Multiply that by those million little things that you have to do in order to keep even an ec2 instance running, and it's such a colossal waste of time and effort. Similarly, I am not in the database tuning business. I plan out my data relationships, create the required Dynamo table or tables, and I never need to worry about replication or backups or any of those ancillary tasks that keep me from being productive.

ethanwillis · on Oct 13, 2020

This is a slightly different conversation than the original but I'll give it a shot.

I'm not a woodworker, machinist, farmer, or doing a PhD in literature. Nevertheless, I engage in all of these things. I enjoy crafting things and learning new techniques. I enjoy learning how to optimize plant growth. I enjoy reading old literature and gathering what I can from it about the historical context, linguistics, etc. All of these types of things, while not "productive" give me additional knowledge that I can pull from in "unrelated" tasks.

In the tech world, I'd say that realistically all abstractions are leaky. So engaging in these tasks(you mentioned) are productive. But even more than that because abstractions are leaky this is knowledge that is useful. If you don't have the knowledge of the systems underlying the abstractions you're using it's a footgun waiting to happen.

I can give an example. I had a boss that was doing some serverless work and he made an assumption about JSON. I told him in review that JSON does not respect ordering of keys. Well, he assumed it would not be a problem. It was pushed to production anyways. And a few weeks later it came back to bite him.

Now what happens when you don't dig into your abstractions? You lack the knowledge to fix problems that happen at a lower level. When you're deploying these "relationships" to your database provider. What happens when the machinery breaks down for replication, backups, etc? You have to sit around and wait for an expert to fix your app. Meanwhile you're staring at your users and stonewalling.

No, you're not in the hosting business, you're not in the database tuning business, you're not in the sysadmin business, and you're not in the Ecmascript working group. Until you are. Then when you're foisted into this role you're a fish out of water. Specialization is for ants, but we're humans, not ants.

chillfox · on Oct 13, 2020

Yep, something like terraform + your favorite config management tool and monitoring system would do nicely for a mostly automated setup once done. I am guessing that a really nice setup could be created in under a month.

calt · on Oct 13, 2020

But are you including your own salary into that cost?

I believe he is comparing aws bills vs. engineer time+other hosting bills.

ethanwillis · on Oct 13, 2020

Depends on scale of course. There will always be some company that isn't of sufficient size to make it worth the cost.

This is how we actually end up having centralized services. The larger companies who have all this infrastructure, and have the size to justify building their own.. well they inevitably ask themselves "Hey! we have all this infra, we have to pay these people anyways, why aren't we selling excess capacity?"

Then you get AWS. Then as time progresses people at even large institutions that could benefit from running their own infra just go and use another company's infra. Where the other company has evolved into being an infra provider instead of a car parts store, book store(amazon), or search engine.

To answer your question directly. Yes I think this can be done within one engineer salary even if I'm including my own salary. There would be a startup cost that eats into that salary in year 1. But it would be manageable. And then iterative upgrades/replacement parts would be a negligible cost ongoing after year 1. With some financial tricks that startup cost can be spread out over a long enough period that it wouldn't even impact year 1.

stephenhuey · on Oct 13, 2020

Maybe you’re extraordinary, and for most people it’s not worth it.

ethanwillis · on Oct 13, 2020

I wouldn't say I am. But I think the perceived complexity(fear) of managing hardware is overblown. However as time goes on and people become more afraid of it due to the loss of institutional knowledge it won't be an overblown fear anymore. At that point institutions will have no choice but to contract out their infrastructure to a centralized institution or an oligopoly of institutions.

And we're approaching that because as these types of skills are outsourced to these institutions the amount of people with the skills necessary to carry them out are diminished in number from attrition. Amazon doesn't need 10 company's worth of engineers to have the requisite skills. If everyone buys into this model then eventually the market will only bear the number of engineers having that knowledge as Amazon needs.

chillfox · on Oct 13, 2020

extraordinary here would be either a developer with solid Linux admin skills, or a Linux admin with basic developer skills.

crehn · on Oct 13, 2020

Nobody’s saying serverless makes monitoring unnecessary. But a serverless architecture gets rid of an entire category of bureaucracy and issues that can arise from dedicated servers.