One other way to observe these steps a bit more clearly is to look at the startup code provided by a chip's manufacturer or generated by an IDE that targets the chip, such as Keil.
Most will put this type of code in '.S' startup assembly files, which often also contain information like the memory addresses for hardware interrupts to use, and linker scripts for telling the compiler about the memory available on the chip.
For example, ST's 'CubeF0' package has some example projects for their simpler ARM chips:
I think that it is good practice with microcontrollers to tell GCC not to include generic startup logic in order to ensure that you are doing the right thing for your particular chip, by using flags such as --specs=nosys.specs, and even -nostdlib with -lgcc and -lc only if necessary.
The generic compiler-provided startup code is fine for AVRs under most circumstances, since they're only produced by a single manufacturer and don't have fancy features like PLLs which need to be initialized. It's the more complicated chips where it gets a little hairy, especially if you have to deal with things like external RAM.
There actually isn't any reason STM or NXP or whatever ARM vendor couldn't provide AVR-like usability and awesome libc. For example the assembly blobs used on most STM32s are very simple.
It's simply that they want to push people to their propietary IDEs (to increase vendor lock in). STM used to provide an okayish libc implementation but then decided to bury it inside CUBE (MX).
If your design becomes complicated enough sure go ahead and change the startup code. But that's quite a small portion of all uses and for those we should have AVR like ease of use.
I guess I rarely see the case for an AVR when you're not prototyping on an Arduino. You can get chips that are an order of magnitude more powerful per $/watt/and any other metric you'd want to use.
Have you read "The Amazing $1 Microcontroller" [0], which might have been posted here before? It's a survey of available microcontrollers, with the intent of helping you choose one. If you have read it, are you in agreement with the article, for example on how AVRs compare to others?
I haven't confirmed what you've said yet, but it makes sense since 8051/8052 MCUs are generally used for their compatibility with legacy code.
I've been working on an agricultural spray controller that uses some silab part only because there is already assembly code written for it. The company that sells the controller doesn't want to spend the time rewriting it in C and moving to arm or pic.
It should be noted that relying on static initialization as suggested in the article, while at times useful, can open a new set of problems until initialization order is well-defined (it's out of scope for the article, so I imagine the author is aware and just opting for brevity). Some compilers provide an explicit ordering mechanism (clang and GCC) and the there are proposals in the C++ standards at the moment, but until then relying on static initialization is something one should think about potentially solving differently. Depending on the architecture, there tends to also be well-defined hooks for functions to run pre and post-main which is another consistent model for initialization/shutdown.
I had a really good experience designing an RTOS initialization system that both depended on static initialization, but didn't have issues with ordering either (and in fact didn't even have a main, post init just went right into the idle thread).
Basically I only let children of one class be statically initialized (and enforced this with a tool that goes through the generated binary and type information). These 'subsystem' classes then gets callbacks after all of the static initializers have run which allows them to find the other subsystems they depend on, and then that dependency graph is walked to initialize the full system (the dependency graph has to be a DAG or the system faults). Combined with a code review rule that static initializations only happen in one compilation unit in an anonymous namespace, and nothing else can happen there, means that no one can really touch other subsystems before they've all been initialized, and therefore the order doesn't matter.
I was really happy with how it turned out, despite being completely off the wall compared to how C++ normally works.
The symmetrical Static Destruction Order Fiasco is also a very fun problem to deal with. The solution proposed for SIOF in the C++ FAQ does nothing to help you deal with this one, though.
Is the "static initialization order fiasco" a thing that can happen in go? The order of package level variable initialization is well defined in the spec and because go doesn't allow import cycles it seems like the issue mentioned in the linked page is not possible.
In embedded you can usually modify every step from boot to main. They are just pieces of code you can interact with and make them do whatever you want and initialize stuff in whatever order you want, providing you know what you do.
Deviating with a war story now, I once modified some startup code so it self-detected its boot location and initialized the ram data according to this location shift. It was fun, but debugging code you relocated is a pain because you need to somehow relocate the symbols as well.
Well, a bootloader and the self-detection was useful for the bootloader self-update. The update bootloader would be copied in a different section and on reset would start and copy itself over the original bootloader section.
I also wanted to maintain full functionality even at shifted location in case the self-copy process somehow failed. Never did fail, at least to my knowledge.
In the embedded software context, you typically do your own static initialization in a startup.S file or equivalent. There should be 0 ambiguity as to when your DATA section is copied to RAM.
There's generally a difference between "static initialization" copying .data from ROM to RAM, and running static initialization functions out of the table. That table doesn't really have ordering constraints generally.
> Depending on the architecture, there tends to also be well-defined hooks for functions to run pre and post-main which is another consistent model for initialization/shutdown.
Is atexit the post-main hook to which you were referring, or is that just the posix hook into a larger class of post-main hooks?
I attached an example of Arduino Zero varients [1] (it is directly from Atmel's SDK.) It is quite straightforward to see what happends before main(). In embedded system, you can even change it easily if you want.
In this example, the stack pointer is updated at very first point after powering up the device [2]. And the undersocre-prefixed variables are defined in its linker script. [3]
Personally, I think global's with constructors are kind of an anti pattern. I've noticed a high correlation between crappy app's and app's with a lot of pre main() code. Especially if some of your global's have threads: because then you potentially have threads running AFTER main exits... I've seen a number of apps with random crashes at shutdown because a thread that was started in a global was still running and since main returned globals were being cleaned up and the thread used something that had just been cleaned up.
This article seems to be targeted toward embedded programming, so global state isnt that big of a deal because you have complete control of the chip. It's quite common to not even have threads or an operating system at all.
This is a real problem; i’m not sure rooting all mutable state on the stack will help in this case without language-level metadata restricting frees, passing references, and dereferencing.
It seems like the rust answer of making it difficult to share mutable state is a solid answer—you explicitly manage concurrent state, rather than implicitly allowing sharing. In addition rust has some interesting run-once constraints you can add to closures that work well with global initialization. I’ll admit I haven’t seen enough rust to see it work well in action, but the pieces are there.
To me one curious thing about the main() function without arguments is, these arguments are still pushed onto the stack. You can find them by dereferencing a pointer of another stack variable and playing around with the offset.
// Just tested with mingw-gcc on Win8.1
// Maybe you need to play around with the offset, +28
int main() {
int i; // Another stack variable
printf("%d\n", (long long)&i + 28); // &argv
printf("%d\n", *((void**)((long long)&i + 28))); // argv
printf("%s\n", ((char**)*((void**)((long long)&i + 28)))[0]); // argv[0]
return 0;
}
Exclaimer: I don't know how this behaves in an embedded environment.
> Exclaimer: I don't know how this behaves in an embedded environment.
The handful of times I've written startup code for an embedded micro, I've always passed zero arguments to the program's "command line", not even its own program name.
Ie, from the perspective of _start, declare the application's entry point as int main(int argc, char argv), call it with main(0, NULL), and ignore the return value.
ARM passes the first several arguments in registers anyway, so your UB trick wouldn't reveal anything interesting.
On Windows this dependes on the calling convention. The calling convention determines whether function arguments are passed: In order or in reverse order. Who takes care of cleaning the stack: caller or callee? And even if the arguments are passed by stack or CPU registers.
In Visual Studio you can control this behavoir with the __cdecl, __clrcall, __stdcall, __fastcall, __thiscall, __vectorcall [1] calling convention function prefixes, but this also depends on the optimization options [2] of the compiler.
Passing arguments via CPU registers has a huge preformance benefit, but also some drawbacks.
I don't know much about AMD64 but if the arguments are passed to registers then how it is possible to retrieve argc/argv later, as other functions likely to touch registers?
The C standard says arguments to main() should be defined as:
int main(void)
or
int main(int argc, char *argv[])
(or equivalent, eg. int can be replaced by a typedef name
defined as int, or the type of argv can be written as
char ** argv).
In embedded system, many manufacturer’s default startup code does not even implement argc/argv. In that case this won’t work, or you need to implement your own startup code.
I have strong feeling that while this obviously works on i386 linux (and probably any unix for that matter) if it works on even amd64 it's just a pure coincidence.
In embedded applications a lot of things happen before you can even attach a debugger. Most microcontrollers go through a non programmable hardware initialisation sequence which initialises enough hardware for the processor to run. The processor then runs from a section of read only (or at least write protected) memory. This code performs a further stage of initialisation and patching. This can be used by the manufacturer to implement their own bootloader or to perform configuration of the hardware to get around certain hardware bugs. Finally this code jumps to a known memory/flash location where the code that the engineer has written is located. It is from this point that you can usually start debugging.
Depending on the platform, one can get one to three levels of loaders which have a some range of configuration items (pins/register config) and programmability before you even to 'application' code. (trying to recall if I've ever hit more than three...) And it's often accessible by the developers and not just the manufacturers. Different mfrs are open to openly annoying about the details of access at that level, but you can often get debugger access at those early bootloader levels, though most embedded dev environments sensibly default to later stages.
Sometimes strange bugs manifest due to misalignments between configs at early boot stages and what the later code needs, so you want to be able to review and change those stages.
Is this even public for us as developers, can we influence it and it influence us? If not, should we even think about it?
I had a lot of curiosity about how JTAG works and how I can build my own debugger for general targets, but is seems documentation for this is thin. I should look over openocd to see how those guys are doing stuff.
Most manufacturers do in broad terms disclose how the boot sequence works. Here is an example of the information disclosed about a Cypress microcontroller [1]. In some applications the startup time can be critical so this needs to be known. The code that runs before the engineers code doesn't usually have any effect and it puts the microcontroller into a known documented state so you don't really need to think about it.
It depends heavily on the SoC in question. On an unlocked STM32F4, you can attach JTAG pretty much immediately even if it's way off in the weeds executing garbage, but BCM283x (RPi) has to setup the pinmux for the JTAG first from the ARM core, so you have to already be running some of your own code.
This is a misleading headline for the general audience.
What he writes is an introduction to bare metal programs for somebody who has worked under an operating system before.
In both cases a lot of code is run before main(). But very different code. In bare metal you more likely need to care what it does. In the operating system case the developers of the OS more likely have done more for you than you will ever need to know about.
A couple of years ago there was a conference presentation (FOSDEM?) about what a program does before reaching main(). The presenter (a Brit IIRC) went pretty fast, did not go into too much detail and it took him 30-40 minutes. I think the study was done on a BSD. Maybe somebody can provide a link, the video is online, but Google resists to help me...
This seems to be concentrating on what the system loader is doing without mentioning whatever the statically linked standard library gets up to before calling the main function?
On microcontrollers, you often don't have a standard library, or if you do, it is limited, and doesn't do things before main. The article describes pretty well what happens on a microcontroller.
I would add that in some (most?) environments, the code that gets executed before main() is easily accessible: it sits in files within the project (written in assembly most of the time).
Cool, he is talking about the loader though, which would imply that this is all language independent? Except for where he mentions things getting executed before main in C++, which I'm assuming would be the doings of the C++ standard library.
It's still pretty language independent. Qemu for instance is a c codebase that depends on static initialization functions that run before main to register the different components.
In any other context, yes, you're missing C library predecessors to main(). But it's specifically referencing embedded systems, which is why hardware initialization and memory segments are mentioned.
you can have pre-main functions in in C as well (as a GCC extension). If you define functions with `__attribute__(constructor)` and they will be called before main(). You can even set the priority on them, and set post-main destructor callbacks. This works even for shared libraries loaded in run-time.
I've used this as a trick to automatically run unit tests, though, not for any real work.
I love the XML question (whether your app does stuff with XML).
There have historically been some big security holes when parsing XML that it is a security code smell now if you are working with it (especially in lower level languages like C or C++).
At least in gcc you can use constructor attribute to run code before main. I've seen this used to initialize modules and create a dependency tree of modules to initialize before main even starts.
To be picky, the linker script lays out a sequence of calls to the global constructors so that the C runtime startup knows where to find and execute them.
Is it strange that C++ is able to do this but C itself cannot? Sometimes I hear the argument that "C is lower level" and their is the idiom that "C++ is C but better" (not that I believe these are true)
It's somewhat misleading when they say the C++ code runs before main. What's going on is after the same initialization steps the first code of your program is executed.
C and C++ have different syntax for what that code is, but it's pure semantics as far as the computer is concerned and C++ gives you zero extra power.
It is certainly possible to do this with C by editing a startup code. In embedded system, a source code of startup is provided and exposed by manufacturer's SDK so it is quite straightforward to do this.
Besides that, I can't see much difference between doing something before main() and just putting some code on the very top of main().
That's funny. In Swift i needed to make sure something runs as early as possible and I had to do the same trick. Put the code in the declaration of a variable like this:
var loadTheme: Bool = { Style.loadTheme(); return true; }()
Nit: that’s not the best wording. if the executable is being executed by something like `exec` then these are just “arguments”; the command line (i.e. a shell) isn’t involved. Do embedded systems support something like `exec`?
Doing some of that kind of super low level code in a C++ static initialization constructor is almost certainly a poor choice. Like setting up memory and the stack pointer. Doing hardware init there can make a lot of sense, but you have to be careful from an architectural perspective.
> * Memory segments are initialized. Memory segments such as .bss (for uninitialized data), .data (for initialized data such as static variables, global variables, local static variables, addresses of functions, and function pointers), and .text (where the actual code resides) are initialized and a valid stack is set up.
> * Command line arguments are received. This may not be relevant in embedded systems as in embedded systems we don’t usually call main() with arguments
> * The stack pointer is configured. This is necessary because the program needs to know where to start from.
Note that some microcontrollers may require a start.c or cstart file that has the initialization code (whether this file is manually created or auto generated).
> Now that we know what happens before main(), we might wonder if there’s a way we can control what happens before main, or if there is a way we can write our own version of what happens before main(). In C, the answer is largely no. Whatever happens before main() is largely dependent on your architecture and compiler. However this is in fact possible in C++.
> One way you can do this in C++ is by declaring a global class object. Since global variables are initialized before main(), the constructor of the class you have initialized as global will run before main(). You can therefore place the code you want to run before main() in such a constructor.
It suggests only the avenue of C++ global constructors to make code run before main() (as others have noted, __attribute__((constructor)) is basically the same thing for C). But there are other ways to make code run before main, such as by use of linker scripts and assembly files to put code in _start that eventually calls main() (note that it's this latter way by which all of the things they mention are done).
You can't really initialize the stack pointer in a constructor, or indeed in C++ at all; there's no syntax for it and the compiler may use the stack to allocate local variables in the function prologue.
You can if you're super duper careful and throw in a little inline asm, depending on the architecture. ARM for instance is very likely not to need to spill to the stack on tiny little leaf functions.
It's an absolutely terrible idea, but I've seen engineers be so afraid of asm files that they'd try something like this.
The ARM Cortex-M series of chips is I believe kind enough to initialize the stack pointer for you before your code even executes, by copying your chosen stack pointer value from a special reserved location in the interrupt vector table. So in principle you could write all your hardware init code in C.
Yes, exactly this. As someone who has written startup code in many flavors of assembly (PIC, ARMv4, MIPS, PowerPC, AVR...), I appreciated what ARM did with the design of the Cortex-M architecture -- they designed it so that you could write fully-functional embedded software (firmware) without a line of assembly.
Normally, the 2 places you can't avoid assembly are (1) the startup code (because you're doing things like disabling interrrupts and setting the stack pointer) and (2) interrupt service routines - usually there is a little bit of magic on the front and back ends (for example on an older ARM7 chip, the CPU didn't automatically push / save any registers onto the stack, you had to do it yourself if you needed that).
With the Cortex-M, the CPU design and its microcode took care of all that, so all of the messy assembly stuff went away. Now, as someone who started writing 6502 ASM as a kid, I kind of miss it, but as someone who has to build lots of systems and ship products on deadlines, I like the change.
Eh, on the M4s, they borked it. There's an errata that the floating point spill doesn't take into account the divide pipeline, so it can not wait enough time for the pipeline flush, and corrupt the register save. So you have to write your own asm interrupt prologue anyway. : /
Most will put this type of code in '.S' startup assembly files, which often also contain information like the memory addresses for hardware interrupts to use, and linker scripts for telling the compiler about the memory available on the chip.
For example, ST's 'CubeF0' package has some example projects for their simpler ARM chips:
http://www.st.com/en/embedded-software/stm32cubef0.html
I think that it is good practice with microcontrollers to tell GCC not to include generic startup logic in order to ensure that you are doing the right thing for your particular chip, by using flags such as --specs=nosys.specs, and even -nostdlib with -lgcc and -lc only if necessary.