What happens before main() is executed in C and why is it important?

leggomylibro · on May 4, 2018

One other way to observe these steps a bit more clearly is to look at the startup code provided by a chip's manufacturer or generated by an IDE that targets the chip, such as Keil.

Most will put this type of code in '.S' startup assembly files, which often also contain information like the memory addresses for hardware interrupts to use, and linker scripts for telling the compiler about the memory available on the chip.

For example, ST's 'CubeF0' package has some example projects for their simpler ARM chips:

http://www.st.com/en/embedded-software/stm32cubef0.html

I think that it is good practice with microcontrollers to tell GCC not to include generic startup logic in order to ensure that you are doing the right thing for your particular chip, by using flags such as --specs=nosys.specs, and even -nostdlib with -lgcc and -lc only if necessary.

monocasa · on May 4, 2018

It's not just "good practice", but instead the startup code is so specific to the platform that there really isn't "generic" startup code.

makomk · on May 4, 2018

The generic compiler-provided startup code is fine for AVRs under most circumstances, since they're only produced by a single manufacturer and don't have fancy features like PLLs which need to be initialized. It's the more complicated chips where it gets a little hairy, especially if you have to deal with things like external RAM.

emilfihlman · on May 4, 2018

There actually isn't any reason STM or NXP or whatever ARM vendor couldn't provide AVR-like usability and awesome libc. For example the assembly blobs used on most STM32s are very simple.

It's simply that they want to push people to their propietary IDEs (to increase vendor lock in). STM used to provide an okayish libc implementation but then decided to bury it inside CUBE (MX).

If your design becomes complicated enough sure go ahead and change the startup code. But that's quite a small portion of all uses and for those we should have AVR like ease of use.

monocasa · on May 4, 2018

The choice of board support stuff like crystal frequency is stuff that needs to be tweaked per board really early in the boot process.

monocasa · on May 4, 2018

I guess I rarely see the case for an AVR when you're not prototyping on an Arduino. You can get chips that are an order of magnitude more powerful per $/watt/and any other metric you'd want to use.

Bromskloss · on May 4, 2018

Have you read "The Amazing $1 Microcontroller" [0], which might have been posted here before? It's a survey of available microcontrollers, with the intent of helping you choose one. If you have read it, are you in agreement with the article, for example on how AVRs compare to others?

[0] https://jaycarlson.net/microcontrollers/

monocasa · on May 4, 2018

Yeah, he put a lot of work into that, and I love that he published all of the information.

My biggest complaint though is that he expresses GPIO cycle time and interrupt latency in cycles, rather than normalized on real time.

Particularly when comparing a STM32F0 to an AVR, the perf looks almost even, until you realize that the STM32F0 is clocked 2.5x faster.

Bromskloss · on May 5, 2018

Good point. Have you, or someone else, mentioned this to the author?

kevin_thibedeau · on May 4, 2018

Low latency IO and computational performance isn't always needed.

monocasa · on May 4, 2018

If it's not, you can drop the cost. There are AVR like 8051s that you can get for 10 cents.

thethirdone · on May 4, 2018

I can't find any 8051 less than 30 cents and I digikey sells attinys for 17 cents.

falcrist · on May 5, 2018

I haven't confirmed what you've said yet, but it makes sense since 8051/8052 MCUs are generally used for their compatibility with legacy code.

I've been working on an agricultural spray controller that uses some silab part only because there is already assembly code written for it. The company that sells the controller doesn't want to spend the time rewriting it in C and moving to arm or pic.

monocasa · on May 5, 2018

> digikey

There's your problem. Digikey is only really meant for low volume.

imtringued · on May 5, 2018

That doesn't explain why you suggested a more expensive chip.

monocasa · on May 5, 2018

What's cheaper or not depends on your volume at that level.

popmatrix · on May 4, 2018

It should be noted that relying on static initialization as suggested in the article, while at times useful, can open a new set of problems until initialization order is well-defined (it's out of scope for the article, so I imagine the author is aware and just opting for brevity). Some compilers provide an explicit ordering mechanism (clang and GCC) and the there are proposals in the C++ standards at the moment, but until then relying on static initialization is something one should think about potentially solving differently. Depending on the architecture, there tends to also be well-defined hooks for functions to run pre and post-main which is another consistent model for initialization/shutdown.

Edit: spelling

monocasa · on May 4, 2018

I had a really good experience designing an RTOS initialization system that both depended on static initialization, but didn't have issues with ordering either (and in fact didn't even have a main, post init just went right into the idle thread).

Basically I only let children of one class be statically initialized (and enforced this with a tool that goes through the generated binary and type information). These 'subsystem' classes then gets callbacks after all of the static initializers have run which allows them to find the other subsystems they depend on, and then that dependency graph is walked to initialize the full system (the dependency graph has to be a DAG or the system faults). Combined with a code review rule that static initializations only happen in one compilation unit in an anonymous namespace, and nothing else can happen there, means that no one can really touch other subsystems before they've all been initialized, and therefore the order doesn't matter.

I was really happy with how it turned out, despite being completely off the wall compared to how C++ normally works.

pnathan · on May 4, 2018

> despite being completely off the wall

embedded systems software in one phrase. :-D

pdpi · on May 4, 2018

> relying on static initialization is something one should think about potentially solving differently

This is such a notorious problem, it even has a name: Static Initialisation Order Fiasco (see: https://isocpp.org/wiki/faq/ctors#static-init-order)

The symmetrical Static Destruction Order Fiasco is also a very fun problem to deal with. The solution proposed for SIOF in the C++ FAQ does nothing to help you deal with this one, though.

pcwalton · on May 4, 2018

Perhaps worse, lots of new languages, for example Go, copied this mistake from C++. :(

helper · on May 4, 2018

Is the "static initialization order fiasco" a thing that can happen in go? The order of package level variable initialization is well defined in the spec and because go doesn't allow import cycles it seems like the issue mentioned in the linked page is not possible.

pcwalton · on May 4, 2018

Static initialization order may be defined, but that doesn't mean it's easy to reason about.

pjmlp · on May 4, 2018

Actually, if you mean package initialization blocks, they were already present in Mesa derived languages.

RealityVoid · on May 4, 2018

In embedded you can usually modify every step from boot to main. They are just pieces of code you can interact with and make them do whatever you want and initialize stuff in whatever order you want, providing you know what you do.

Deviating with a war story now, I once modified some startup code so it self-detected its boot location and initialized the ram data according to this location shift. It was fun, but debugging code you relocated is a pain because you need to somehow relocate the symbols as well.

2RTZZSro · on May 4, 2018

What was the project for where you wrote code that self-detected its boot location?

RealityVoid · on May 4, 2018

Well, a bootloader and the self-detection was useful for the bootloader self-update. The update bootloader would be copied in a different section and on reset would start and copy itself over the original bootloader section.

I also wanted to maintain full functionality even at shifted location in case the self-copy process somehow failed. Never did fail, at least to my knowledge.

fra · on May 4, 2018

In the embedded software context, you typically do your own static initialization in a startup.S file or equivalent. There should be 0 ambiguity as to when your DATA section is copied to RAM.

monocasa · on May 4, 2018

There's generally a difference between "static initialization" copying .data from ROM to RAM, and running static initialization functions out of the table. That table doesn't really have ordering constraints generally.

drb91 · on May 4, 2018

> Depending on the architecture, there tends to also be well-defined hooks for functions to run pre and post-main which is another consistent model for initialization/shutdown.

Is atexit the post-main hook to which you were referring, or is that just the posix hook into a larger class of post-main hooks?

kbumsik · on May 5, 2018

Why not just looking at the source code?

I attached an example of Arduino Zero varients [1] (it is directly from Atmel's SDK.) It is quite straightforward to see what happends before main(). In embedded system, you can even change it easily if you want.

In this example, the stack pointer is updated at very first point after powering up the device [2]. And the undersocre-prefixed variables are defined in its linker script. [3]

[1]: https://github.com/arduino/ArduinoCore-samd/blob/master/boot...

[2]: https://github.com/arduino/ArduinoCore-samd/blob/master/boot...

[3]: https://github.com/arduino/ArduinoCore-samd/blob/master/boot...

dicroce · on May 4, 2018

Personally, I think global's with constructors are kind of an anti pattern. I've noticed a high correlation between crappy app's and app's with a lot of pre main() code. Especially if some of your global's have threads: because then you potentially have threads running AFTER main exits... I've seen a number of apps with random crashes at shutdown because a thread that was started in a global was still running and since main returned globals were being cleaned up and the thread used something that had just been cleaned up.

pensono · on May 4, 2018

This article seems to be targeted toward embedded programming, so global state isnt that big of a deal because you have complete control of the chip. It's quite common to not even have threads or an operating system at all.

drb91 · on May 4, 2018

This is a real problem; i’m not sure rooting all mutable state on the stack will help in this case without language-level metadata restricting frees, passing references, and dereferencing.

It seems like the rust answer of making it difficult to share mutable state is a solid answer—you explicitly manage concurrent state, rather than implicitly allowing sharing. In addition rust has some interesting run-once constraints you can add to closures that work well with global initialization. I’ll admit I haven’t seen enough rust to see it work well in action, but the pieces are there.

saagarjha · on May 4, 2018

> some of your global's have threads

Creating threads in pre-main code is generally a code smell, rather than running pre-main code.

doomjunky · on May 4, 2018

To me one curious thing about the main() function without arguments is, these arguments are still pushed onto the stack. You can find them by dereferencing a pointer of another stack variable and playing around with the offset.

    // Just tested with mingw-gcc on Win8.1
    // Maybe you need to play around with the offset, +28
    int main() {
        int i; // Another stack variable

        printf("%d\n", (long long)&i + 28); // &argv
        printf("%d\n", *((void**)((long long)&i + 28))); // argv
        printf("%s\n", ((char**)*((void**)((long long)&i + 28)))[0]); // argv[0] 

        return 0;
    }

Exclaimer: I don't know how this behaves in an embedded environment.

brandmeyer · on May 5, 2018

> Exclaimer: I don't know how this behaves in an embedded environment.

The handful of times I've written startup code for an embedded micro, I've always passed zero arguments to the program's "command line", not even its own program name.

Ie, from the perspective of _start, declare the application's entry point as int main(int argc, char argv), call it with main(0, NULL), and ignore the return value.

ARM passes the first several arguments in registers anyway, so your UB trick wouldn't reveal anything interesting.

saagarjha · on May 4, 2018

Here's what works on my Mac, compiled with clang -m32.

  #include <stdio.h>
  
  int main() {
  	int i;                                // Another stack variable
  	printf("%s\n", **(char ***)(&i + 8)); // argv[0]
  	return 0;
  }

Of course, this won't work at all on x86_64 because arguments are passed on registers instead of the stack.

doomjunky · on May 5, 2018

On Windows this dependes on the calling convention. The calling convention determines whether function arguments are passed: In order or in reverse order. Who takes care of cleaning the stack: caller or callee? And even if the arguments are passed by stack or CPU registers.

In Visual Studio you can control this behavoir with the __cdecl, __clrcall, __stdcall, __fastcall, __thiscall, __vectorcall [1] calling convention function prefixes, but this also depends on the optimization options [2] of the compiler.

Passing arguments via CPU registers has a huge preformance benefit, but also some drawbacks.

[1]: https://msdn.microsoft.com/en-us/library/984x0h58.aspx

[2]: https://msdn.microsoft.com/en-us/library/46t77ak2.aspx

saagarjha · on May 5, 2018

> Passing arguments via CPU registers has a huge preformance benefit, but also some drawbacks.

Like what?

pjmlp · on May 6, 2018

On architectures like x86, the set of registers is limited and some are special, so there is some contention going on.

kbumsik · on May 4, 2018

I don't know much about AMD64 but if the arguments are passed to registers then how it is possible to retrieve argc/argv later, as other functions likely to touch registers?

saagarjha · on May 5, 2018

They're pushed on to the stack if another function will touch them.

nitrogen · on May 5, 2018

Something surprising about C is that an empty argument list doesn't mean "no arguments", it means "unspecified arguments".

https://stackoverflow.com/questions/693788/is-it-better-to-u...

Though that is unlikely to be related to the arguments passed to main.

nineteen999 · on May 6, 2018

The C standard says arguments to main() should be defined as:

  int main(void)

or

  int main(int argc, char *argv[])

  (or equivalent, eg. int can be replaced by a typedef name 
  defined as int, or the type of argv can be written as
  char ** argv).

From http://www.open-std.org/jtc1/sc22/wg14/www/docs/n1256.pdf, Section 5.1.2.2.1: Program startup)

EDIT: apologies for the atrocious formatting, but it wasn't clear to me how to put two asterisks in a sequence here on HN without preformatted text.

JdeBP · on May 8, 2018

It actually has a whole other half to the sentence, following the semi-colon.

* http://jdebp.eu./FGA/legality-of-void-main.html

nineteen999 · on May 15, 2018

I took a detour at the footnote reference, and never came back. Interesting to note, thank you.

kbumsik · on May 4, 2018

In embedded system, many manufacturer’s default startup code does not even implement argc/argv. In that case this won’t work, or you need to implement your own startup code.

ronsor · on May 4, 2018

totally unportable and undefined behavior

dfox · on May 4, 2018

I have strong feeling that while this obviously works on i386 linux (and probably any unix for that matter) if it works on even amd64 it's just a pure coincidence.

colejohnson66 · on May 5, 2018

It shouldn’t work on amd64 because the x64 calling convention puts the arguments in the registers first

barbegal · on May 4, 2018

In embedded applications a lot of things happen before you can even attach a debugger. Most microcontrollers go through a non programmable hardware initialisation sequence which initialises enough hardware for the processor to run. The processor then runs from a section of read only (or at least write protected) memory. This code performs a further stage of initialisation and patching. This can be used by the manufacturer to implement their own bootloader or to perform configuration of the hardware to get around certain hardware bugs. Finally this code jumps to a known memory/flash location where the code that the engineer has written is located. It is from this point that you can usually start debugging.

digikata · on May 4, 2018

Depending on the platform, one can get one to three levels of loaders which have a some range of configuration items (pins/register config) and programmability before you even to 'application' code. (trying to recall if I've ever hit more than three...) And it's often accessible by the developers and not just the manufacturers. Different mfrs are open to openly annoying about the details of access at that level, but you can often get debugger access at those early bootloader levels, though most embedded dev environments sensibly default to later stages.

Sometimes strange bugs manifest due to misalignments between configs at early boot stages and what the later code needs, so you want to be able to review and change those stages.

RealityVoid · on May 4, 2018

Is this even public for us as developers, can we influence it and it influence us? If not, should we even think about it?

I had a lot of curiosity about how JTAG works and how I can build my own debugger for general targets, but is seems documentation for this is thin. I should look over openocd to see how those guys are doing stuff.

barbegal · on May 4, 2018

Most manufacturers do in broad terms disclose how the boot sequence works. Here is an example of the information disclosed about a Cypress microcontroller [1]. In some applications the startup time can be critical so this needs to be known. The code that runs before the engineers code doesn't usually have any effect and it puts the microcontroller into a known documented state so you don't really need to think about it.

[1] https://community.cypress.com/community/psoc-6/blog/2017/05/...

monocasa · on May 4, 2018

It depends heavily on the SoC in question. On an unlocked STM32F4, you can attach JTAG pretty much immediately even if it's way off in the weeds executing garbage, but BCM283x (RPi) has to setup the pinmux for the JTAG first from the ARM core, so you have to already be running some of your own code.

usr1106 · on May 5, 2018

This is a misleading headline for the general audience.

What he writes is an introduction to bare metal programs for somebody who has worked under an operating system before.

In both cases a lot of code is run before main(). But very different code. In bare metal you more likely need to care what it does. In the operating system case the developers of the OS more likely have done more for you than you will ever need to know about.

usr1106 · on May 5, 2018

A couple of years ago there was a conference presentation (FOSDEM?) about what a program does before reaching main(). The presenter (a Brit IIRC) went pretty fast, did not go into too much detail and it took him 30-40 minutes. I think the study was done on a BSD. Maybe somebody can provide a link, the video is online, but Google resists to help me...

sedachv · on May 6, 2018

That sounds like Brooks Davis' talk that he has given at a few conferences:

https://www.youtube.com/watch?v=yWCMy5EiNkQ https://people.freebsd.org/~brooks/talks/eurobsdcon2016-hell...

usr1106 · on May 6, 2018

Exactly! The name of the speaker helps to find what I was looking for https://mirrors.dotsrc.org/fosdem/2017/Janson/hello_world.vp...

nebulous1 · on May 4, 2018

This seems to be concentrating on what the system loader is doing without mentioning whatever the statically linked standard library gets up to before calling the main function?

jwr · on May 4, 2018

On microcontrollers, you often don't have a standard library, or if you do, it is limited, and doesn't do things before main. The article describes pretty well what happens on a microcontroller.

I would add that in some (most?) environments, the code that gets executed before main() is easily accessible: it sits in files within the project (written in assembly most of the time).

nebulous1 · on May 4, 2018

Cool, he is talking about the loader though, which would imply that this is all language independent? Except for where he mentions things getting executed before main in C++, which I'm assuming would be the doings of the C++ standard library.

monocasa · on May 4, 2018

It's still pretty language independent. Qemu for instance is a c codebase that depends on static initialization functions that run before main to register the different components.

jldugger · on May 4, 2018

In any other context, yes, you're missing C library predecessors to main(). But it's specifically referencing embedded systems, which is why hardware initialization and memory segments are mentioned.

nassyweazy · on May 4, 2018

There articles are so incomplete and always make it to the top on HN... I need some explanation.

saagarjha · on May 4, 2018

It's not hard to take a look at doing this yourself: just run a program, break at main, and check what's in your stack backtrace.

dvirsky · on May 4, 2018

you can have pre-main functions in in C as well (as a GCC extension). If you define functions with `__attribute__(constructor)` and they will be called before main(). You can even set the priority on them, and set post-main destructor callbacks. This works even for shared libraries loaded in run-time.

I've used this as a trick to automatically run unit tests, though, not for any real work.

bakztfutur3 · on May 4, 2018

"Yeah, yeah, but your scientists were so preoccupied with whether or not they could that they didn't stop to think if they should."

dvirsky · on May 4, 2018

It can be useful for registering things automatically, like Go's init() functions which I've used on occasion.

textmode · on May 4, 2018

I am a neophyte x86 asm user.

I link using gcc and -nostdlib plus a very simple start.S:

   .globl _start
   _start:
   call main
   movl $1,%eax
   xorl %ebx,%ebx
   int $0x80

fasm 1.asm;gcc -nostdlib start.S 1.o lib1.a;a.out

"lib1.a" contains only the functions needed by a.out instead of, e.g., every function in libc.a

I use ar -M < 1.mri to script changes to lib1.a

(Did I forget about crt0.o?)

textmode · on May 5, 2018

Below is a sample "1.asm" source file which in one expt yields a 2.2K static binary.

"main" has been renamed to "xyz".

    format ELF
    section '.text' executable
    public xyz 

    extrn _exit
    extrn write

    xyz: 
    mov eax,len00
    push eax
    push char00
    mov eax,1
    push eax
    call write
    call _exit

    section '.data' writeable

    char00 db "21 symbols", 0xA 
    len00 = $ - char00

textmode · on May 6, 2018

(1.3K stripped)

saagarjha · on May 4, 2018

Why even have the overhead of a function call to main in that case?

Vogtinator · on May 5, 2018

Because you need a syscall to exit the process.

saagarjha · on May 5, 2018

Is that not what this part of the code does?

  movl $1,%eax
  int $0x80

jtchang · on May 4, 2018

I love the XML question (whether your app does stuff with XML).

There have historically been some big security holes when parsing XML that it is a security code smell now if you are working with it (especially in lower level languages like C or C++).

jononor · on May 4, 2018

The linker script is what runs the global constructors in C++. Nothing preventing using a custom linker script in C to call some `premain()` function.

baruch · on May 4, 2018

At least in gcc you can use constructor attribute to run code before main. I've seen this used to initialize modules and create a dependency tree of modules to initialize before main even starts.

housel · on May 4, 2018

To be picky, the linker script lays out a sequence of calls to the global constructors so that the C runtime startup knows where to find and execute them.

ddtaylor · on May 4, 2018

Is it strange that C++ is able to do this but C itself cannot? Sometimes I hear the argument that "C is lower level" and their is the idiom that "C++ is C but better" (not that I believe these are true)

Retric · on May 4, 2018

It's somewhat misleading when they say the C++ code runs before main. What's going on is after the same initialization steps the first code of your program is executed.

C and C++ have different syntax for what that code is, but it's pure semantics as far as the computer is concerned and C++ gives you zero extra power.

kbumsik · on May 4, 2018

It is certainly possible to do this with C by editing a startup code. In embedded system, a source code of startup is provided and exposed by manufacturer's SDK so it is quite straightforward to do this.

Besides that, I can't see much difference between doing something before main() and just putting some code on the very top of main().

x0054 · on May 5, 2018

That's funny. In Swift i needed to make sure something runs as early as possible and I had to do the same trick. Put the code in the declaration of a variable like this:

var loadTheme: Bool = { Style.loadTheme(); return true; }()

dandigangi · on May 5, 2018

Found this article really interesting! It got me to write my first C program.

Myrmornis · on May 4, 2018

> Command line arguments are received.

Nit: that’s not the best wording. if the executable is being executed by something like `exec` then these are just “arguments”; the command line (i.e. a shell) isn’t involved. Do embedded systems support something like `exec`?

saagarjha · on May 4, 2018

They're still called command-line arguments out of convention: https://en.wikipedia.org/wiki/Command-line_interface#Argumen...

postalrat · on May 4, 2018

This is like what happened before the big bang.

saagarjha · on May 4, 2018

No, because what happens before main is observable.

jancsika · on May 4, 2018

Did someone on HN downvote everything except the bullet points in this article?

Seriously-- why is the font greyed out? What possible purpose does this serve?

slarrick · on May 4, 2018

They have this in their stylesheet across the website

p{color:#999;line-height:1.4;margin-bottom:.75em}

so paragraph tags, sadly, have the ugly grey font color by default

merb · on May 4, 2018

YMMD... seriously I opened the article only looked at the bullet points and than quickly discarded it...

classics2 · on May 4, 2018

Strange they looked into the topic and came to the completely incorrect conclusion.

Osiris · on May 4, 2018

Can you elaborate for those of us unfamiliar with topic?

monocasa · on May 4, 2018

Doing some of that kind of super low level code in a C++ static initialization constructor is almost certainly a poor choice. Like setting up memory and the stack pointer. Doing hardware init there can make a lot of sense, but you have to be careful from an architectural perspective.

tedunangst · on May 4, 2018

The article is not advocating initializing the stack pointer with a C++ constructor.

monocasa · on May 4, 2018

> * Memory segments are initialized. Memory segments such as .bss (for uninitialized data), .data (for initialized data such as static variables, global variables, local static variables, addresses of functions, and function pointers), and .text (where the actual code resides) are initialized and a valid stack is set up.

> * Command line arguments are received. This may not be relevant in embedded systems as in embedded systems we don’t usually call main() with arguments

> * The stack pointer is configured. This is necessary because the program needs to know where to start from. Note that some microcontrollers may require a start.c or cstart file that has the initialization code (whether this file is manually created or auto generated).

> Now that we know what happens before main(), we might wonder if there’s a way we can control what happens before main, or if there is a way we can write our own version of what happens before main(). In C, the answer is largely no. Whatever happens before main() is largely dependent on your architecture and compiler. However this is in fact possible in C++.

> One way you can do this in C++ is by declaring a global class object. Since global variables are initialized before main(), the constructor of the class you have initialized as global will run before main(). You can therefore place the code you want to run before main() in such a constructor.

Certainly looks like it is.

jcranmer · on May 4, 2018

The article is conflating two things:

* What happens before main()

* What can you do to make code run before main()

It suggests only the avenue of C++ global constructors to make code run before main() (as others have noted, __attribute__((constructor)) is basically the same thing for C). But there are other ways to make code run before main, such as by use of linker scripts and assembly files to put code in _start that eventually calls main() (note that it's this latter way by which all of the things they mention are done).

pjc50 · on May 4, 2018

You can't really initialize the stack pointer in a constructor, or indeed in C++ at all; there's no syntax for it and the compiler may use the stack to allocate local variables in the function prologue.

monocasa · on May 4, 2018

You can if you're super duper careful and throw in a little inline asm, depending on the architecture. ARM for instance is very likely not to need to spill to the stack on tiny little leaf functions.

It's an absolutely terrible idea, but I've seen engineers be so afraid of asm files that they'd try something like this.

makomk · on May 4, 2018

The ARM Cortex-M series of chips is I believe kind enough to initialize the stack pointer for you before your code even executes, by copying your chosen stack pointer value from a special reserved location in the interrupt vector table. So in principle you could write all your hardware init code in C.

thr0w__4w4y · on May 5, 2018

Yes, exactly this. As someone who has written startup code in many flavors of assembly (PIC, ARMv4, MIPS, PowerPC, AVR...), I appreciated what ARM did with the design of the Cortex-M architecture -- they designed it so that you could write fully-functional embedded software (firmware) without a line of assembly.

Normally, the 2 places you can't avoid assembly are (1) the startup code (because you're doing things like disabling interrrupts and setting the stack pointer) and (2) interrupt service routines - usually there is a little bit of magic on the front and back ends (for example on an older ARM7 chip, the CPU didn't automatically push / save any registers onto the stack, you had to do it yourself if you needed that).

With the Cortex-M, the CPU design and its microcode took care of all that, so all of the messy assembly stuff went away. Now, as someone who started writing 6502 ASM as a kid, I kind of miss it, but as someone who has to build lots of systems and ship products on deadlines, I like the change.

monocasa · on May 5, 2018

Eh, on the M4s, they borked it. There's an errata that the floating point spill doesn't take into account the divide pipeline, so it can not wait enough time for the pipeline flush, and corrupt the register save. So you have to write your own asm interrupt prologue anyway. : /