How to port your OS to EC2 (2018)

zdw · on May 21, 2021

> Due to some bizarre breakage in EC2 — which I've been complaining about for ten years — the serial console is very "laggy". If you find that you're not getting any output, wait five minutes and try again.

This is the single most frustrating bug with EC2 I've experienced. In my case, I have Jenkins spinning up temporary executors in EC2, then getting the SSH host key from serial log to verify the connection (I appreciate the "importance of SSH host keys compared with flossing" comparison in this article...)

Machines take <1m to boot, then sit there for 4-8m just waiting for the log message to show up.

oppositelock · on May 21, 2021

There's a new console feature which addresses this, but yes, the previous incarnation was very frustrating. I'd like to see this on all instance types.

https://aws.amazon.com/about-aws/whats-new/2021/03/introduci...

paulddraper · on May 21, 2021

You should add the fingerprint as an EC2 tag or something from a boot script.

kelnos · on May 21, 2021

Doesn't that require giving the instance elevated privileges around setting tags? Some people might not want that.

paulddraper · on May 22, 2021

It does, though you can limit it to certain tag name like "SSH Fingerprint".

And if course you need the AWS CLI installed (which AWS built AMIs have).

All in all seems fine for a Jenkins worker

cperciva · on May 22, 2021

Two problems with doing this:

1. If you provide an EC2 Role which allows setting that tag, any process on the instance which can access the Instance Metadata Store can record a different SSH fingerprint.

2. You can only have one EC2 Role attached at once, so doing this prevents you from using other roles.

jen20 · on May 22, 2021

It’s worse than that with respect to (1): any process on _any_ instance which has tag setting permissions can set the tag for _any other_ instance, since conditions don’t support scoping to instance ID.

Re (2) technically you cannot have any roles attached directly, but instead attach an instance profile (the distinction is clearer via the API than the console). The shape of the API for an instance profile clearly was designed to support multiple roles, but in practice is limited to one. It’s typical to create a role with many policies attached for each functional type of instance, so in practice it does’t matter too much.

klohto · on May 22, 2021

1) CreateTags does support scoping using StringEquals with ec2:SourceInstanceARN and comparing that with the aws:arn

  "aws:ARN": "${ec2:SourceInstanceARN}"

It will give you a warning when creating through web console, but the condition works.

jen20 · on May 22, 2021

TIL - thanks for the heads up. Is this a documented substitution?

klohto · on May 22, 2021

I don't think so... I have been trying to find it for a few years now, but it just won't turn up anywhere.

FYI, I personally use resource tags as well on top this, just to limit the scope (in case it magically disappears one day).

paulddraper · on May 23, 2021

> any process on the instance which can access the Instance Metadata Store can record a different SSH fingerprint.

A security issue really only if the process is able to pair it with a MITM attack.

> You can only have EC2 Role attached at once.

Yes, so attach this policy to that IAM role.

klohto · on May 22, 2021

1. You can limit access to IMS to certain processes inside linux 2. Just combine the policies

If you’re worried about malicious processes on the EC2 altering the fingerprint, you got bigger issues. You can also sign the fingerprint with gpg.

znpy · on May 22, 2021

You might want to look at SSH certificate authority and stuff like that.

No more checking SSH host keys, validation is automatic and stuff only fail if something is wrong.

goneri · on May 21, 2021

Since this article, Cloud-Init supports FreeBSD, NetBSD and OpenBSD natively. Cloud-Init handles all the interaction with the Cloud provider and the metadata. So beside constraint of the disk image format (qcow2, raw, etc), the same Cloud-Init base image is likely to work for different Cloud vendor seamlessly.

I maintain some Cloud images (qcow2) for BSD https://bsd-cloud-image.org/ that are based on Cloud-Init.

thayne · on May 21, 2021

> The ENA driver is probably the hardest thing to port, since as far as I know there's no way to get your hands on the hardware directly, and it's very difficult to do any debugging in EC2 without having a working network.

I'm certainly not an expert on these things, but couldn't you run your target OS in a VM on a better-supported OS in ec2 (for example Linux), and pass-through a second ENA network interface to the OS you are porting.

nijave · on May 21, 2021

Ec2 just released serial console, too. https://aws.amazon.com/about-aws/whats-new/2021/03/introduci...

This article appears to be a couple years old

eyberg · on May 21, 2021

They've had serial output for a while now and one problem with this new version of it is that you can only get the output that is produced while you are actually connected to it. So if you'd like to see output from say yesterday - it's not going to happen.

Google Cloud will store this info (even after reboot) for a while.

Azure is probably the best at dealing with this as they can take both screenshots and dump text continuously into a bucket of files so you can always have everything available.

my123 · on May 21, 2021

That’s a perfectly valid workflow. It’s what we do sometimes at AWS too.

With .metal instances, it’s a good workflow to have.

geofft · on May 21, 2021

So... how do you get an ENA driver if you don't have one already? I see Amazon provides Linux and FreeBSD drivers at https://github.com/amzn/amzn-drivers - is the answer to start with the the FreeBSD one (because it's under a more permissive license) and adapt it to your OS?

cperciva · on May 22, 2021

I imagine you could start with either; if you need the Linux ENA driver relicensed as BSD, I imagine Amazon would oblige. From my experience, the people working on this side of AWS are quite open source friendly.

Syonyk · on May 21, 2021

That's... very basic. Not an awful lot of useful information about the hardware environment.

The challenge for things like EC2 is more in that most cloud hypervisors, for security/performance/complexity reasons, have stopped emulating the older hardware a lot of "basic osdev projects" rely on. So you're stuck with a very modern set of hardware for interrupt controllers, disk IO, etc. I was hoping to see something more along those lines here, not just how to make and upload a disk image.

cperciva · on May 21, 2021

I agree that it's basic information -- but that's the most important information to have! This blog post was essentially constructed from the replies I sent people who asked me to help them.

AndrewUnmuted · on May 21, 2021

Just want to sneak in here to thank you for Tarsnap. It's the best storage service I've ever used!

actually_a_dog · on May 21, 2021

> ...most cloud hypervisors, for security/performance/complexity reasons, have stopped emulating the older hardware a lot of "basic osdev projects" rely on.

If you're really determined to do basic/hobbyist OS dev on EC2, you can always just run it under QEMU, which is pretty much what you'd do if you're developing on your own hardware, anyway. I know that's kinda beside the point, and reintroduces all those performance issues, but, security isn't a concern here, and it's certainly no more complex than just running it on your own machine.

mikece · on May 21, 2021

Is there a particular OS you had in mind that doesn't deal well with modern hardware?

Syonyk · on May 21, 2021

I guess it depends on how you read "Your OS." I read it as "Hobby osdev project."

"Port your OS to EC2," as I read it (I live in weird weeds so this may not be how other people read it) implies an OS you've written ("your OS") that currently doesn't run on EC2 (or you wouldn't need to "port" it - that implies, at least to me, getting something running where it doesn't currently run).

A lot of the hobby OSes out there, even if they're 64-bit, don't always support the latest and greatest drive controllers, interrupt controllers, etc. So if you don't have NVMe support (or virtio block device support - not sure if EC2 supports that or not), you won't be able to read the disk. Legacy ATA/SATA interfaces are often what's supported.

I would have expected, based on the title and what it implied, a list of the hardware supported in various instance types. If the virtualized instances support virtio, mention it. If they only support SATA or NVMe, mention that. And perhaps some references to the relevant specs to implement those.

But if all you have to do to get your OS running is just put it on a disk image, then I'm not sure it really qualifies as "porting to EC2." Just "Installing on" would be more useful phrasing for that.

eyberg · on May 21, 2021

You are correct. There are actually quite a few different configurations available on AWS alone - not to mention most cloud providers are very different. What works on one instance type on AWS won't immediately run on Azure without additional drivers.

Google Cloud is probably the most friendly one for hobby osdev cause they are based on KVM.

For instance when starting with https://github.com/nanovms/nanos (whom I'm with) we targeted the t2 instances first and it actually took a longer time to come up with support for some of the newer t3 instances because we had to add ENA and NVMe drivers.

actually_a_dog · on May 21, 2021

I believe the GP comment was referring to the plethora of basic/hobbyist OS'es out there. For instance, see https://wiki.osdev.org/Bare_Bones#Implementing_the_Kernel

I'm pretty sure EC2 would barf on that, even if you were running in 32-bit mode.

Syonyk · on May 21, 2021

Correct.

Though I see no reason why EC2 would barf on it, as long as you support the proper hardware. They just don't include a lot of legacy hardware you find on a more standard bit of x86 hardware.

Now, if you wanted a really interesting article, discuss how to port your OS to run on the EC2 ARM instances!

haolez · on May 21, 2021

Waiting for someone to port MenuetOS[0] :D

[0] http://www.menuetos.net/