If we're in the realm of "non-standard linker tricks"...
Compilers will concatenate sections of the same name. You can use this trick to produce a concatenation of arrays across several files:
$ cat t1.c
__attribute__((section("some_array"))) int a[] = {1, 2, 3};
$ cat t2.c
__attribute__((section("some_array"))) int b[] = {4, 5, 6};
$ cat t.c
#include <stdio.h>
extern const int __start_some_array;
extern const int __stop_some_array;
int main() {
const int* ptr = &__start_some_array;
const int n = &__stop_some_array - ptr;
for (int i = 0; i < n; i++) {
printf("some_array[%d] = %d\n", i, ptr[i]);
}
return 0;
}
$ gcc -std=c99 -o t t.c t1.c t2.c && ./t
some_array[0] = 1
some_array[1] = 2
some_array[2] = 3
some_array[3] = 4
some_array[4] = 5
some_array[5] = 6
This is the mechanism that linkers use "under the hood" to get a list of C++ object initializers that need to run pre-main().
It's unfortunate that there is no standard way of getting at this functionality in portable C, or to get it in C++ without actually running code pre-main(). Sometimes you really want a linker-initialized list things (of some sort) that have been linked in, without actually running code pre-main() (which has all kind of issues).
I would love to see a thing like this standardized in both C and C++. C++ compilers need it under the hood anyway.
What a weird coincidence seeing a comment about mergeable sections today...
A patch I helped review that's related to this in LLVM just landed today: https://reviews.llvm.org/D72194. (LLVM previously had support, but would perform bad merges in specific and rare edge cases; that patch fixes those cases).
> The code I posted above compiles and runs without any custom linker script.
What?! Ok, I've seen the Linux kernel's custom linker script define symbols for custom ELF section boundaries, but TIL that you can get these automagically. I'm curious to see if LLD implements this logic, too.
How exactly is the ordering of items across the section decided? Is it relatively random or is it based on the order that the object files are passed to the linker?
This is true, to my knowledge, I would NOT relie on the order to be always consistent. But I think you can enforce the order in the linker file, i.e. load the arrays one after the other, and then the order should keep constant. But you still have several places that you need to keep synchronized and, really, most people don't look in the linker file at all.
You're right, you can manually specify the ordering of object files in the linker file. Though, like you say, this is rarely done because of how brittle it is.
I was referring more to the way shown above, where I've seen a linker change the list of object files from a sorted list of filenames, to a hash table of filenames, which obviously changed the ordering with which it iterated across all object files.
It's probably already in link order. If you have to know the names of the symbols in code to be able to sort them, it sort of defeats the purpose of the scheme in the first place.
You might want to sort them by some other criteria.
For example, the FreeBSD kernel's use of this construct embeds a well-defined priority[1], and the array is sorted by this priority value at boot time.[2]
The BSDs define some macros in sys/linker_set.h to make this slightly easier. The mechanism is used somewhat widely in both kernel and userspace in FreeBSD.
One place it's used is to avoid central "registration" for multiple "things". For example, consider a program with many fairly separate "subcommands". Using this mechanism allows adding a new subcommand simply by linking in a new source file (without changing a shared source file to list the new command).
I use this idea in embedded work, where the code is split into "modules". The same firmware source is shared amongst various differing pieces of hardware, each of which utilise a different subset of modules.
I like to use this technique for CLI command and help tables. It eliminates having a single giant table in a file, instead you have a C macro which makes the system aware of your command right next to your command definition.
int my_command(..) { ... }
COMMAND(my_command, "command_name", "example command")
HELP("use --foo argument for this...")
HELP("use --bar argument for that...")
Also used this all the time back when I programmed assembly language.
It's definitely a problem that this is not standardized. For example, TI has their own C compiler for their ARM CPUs, not everybody uses gcc..
This is used for constructor & destructor functions in C code. The function pointers get put in an array in a certain section which is collated then the startup and exit code calls them. Ordering is not guaranteed.
I got about 90% of the way to having a working ruby single-file executable builder which used objcopy to embed a sqlite database of the source files into an MRI build. Then YARV happened and the ruby build chain changed just enough that I needed to throw it out and start again.
Every now and again I ponder having another go with mruby...
FWIW, this is roughly how Tclkits work in the Tcl world. Although by default they use a Metakit database instead of SQLite.
Currently they append the database to the end of the executable, which has some problems and I'm working to make including it in the image more standardized as part of XVFS [0].
I needed this last week! Building an embedded firmware image for a dashboard display, with lots of PNG files for icons etc. 61 of them.
The original developer wrote a tool to expand the PNGs to BMPs (arrays of 32-bit pixel values) and generate a C array definition as a text file. Which is lots bigger than the original PNG (13K => 100K sort of thing). Then included that C source in the build. Used up 700K of my firmware image space which was only 1MB to begin with.
So I wrote a tool to represent the raw PNG file as a C declaration, then added parts of libpng to the boot code to decompress the images during initialization. Even with libpng, I saved over 400K. Now the images use RAM instead of ROM, but that's ok I had buttloads of RAM.
Anyway, this is a much slicker way of including binary files in an image. I may go back and change my build.
There are even smaller png libs than libpng, try a stripped down stb_image for example. Wouldn't use that for user-supplied images, but since you control them all it should be fine.
I do this with embedded graphics in boards for a lot of reasons. Works great - and STB is pretty easy to modify if you need a funny one (eg : monochrome pbm for small displays). Animated gif, png, bunch of others all work pretty smoothly. Just include the bits you need so it can be even smaller.
Might be worth recompressing the PNGs too if you haven't already. In particular, using 8-bit PNG can size a lot on filesize if you don't need loads of colours.
Nope. Dumb Visual Studio doesn't support embedded binary in cross-compile. That's the tool the client wanted, so I'm stuck with it.
Funny, they (VisualGDB) even have a tutorial on doing it. But the option 'Embedded Resource' doesn't exist in my (modern) Visual Studio.
Ha! And the feedback I gave VisualGDB about their outdated tutorial, they responded "Just pay HERE and we'll be glad to solve your problem!" They can go ahead and have outdated documentation, screw them.
Could you not try using the objcopy method mentioned here outside Visual Studio (or somewhere before the build process) and then just using the .o at link time, just referencing it from the outside at linkage? I expect at least a way to:
- hook custom tool calls
- link external objects into the project
Also, hello fellow automotive embedded SW eng! I know your suffering! :))
VisualGDB uses a tool-created makefile for cross-compiling and building. Not a lot of room for me to innovate. I suppose the .o trick could work. But it still requires careful hand-scripted builds. I already have that - I run a script to read a binary file and emit .c declaration, which I pipe to a C file and include in the project.
I suppose it lets me drop support of the script, which is something.
cause it doesn't depend on the static linker generating symbols, which feels a bit like a non-standard/non-portable feature. But, who knows, maybe the vast majority of ld variants actually implement it.
One of the Capture the Flag challenges at CCC last year was to figure out a way to leak the contents of a file on a remote compiler server. The server would accept some C code and just give you a boolean true/false value whether the code compiled or not---without ever executing it.
Others came up with a solution abusing the C preprocessor by defining macros that would make the known structure of the file valid C and therefore they could just #include it. But my solution works with arbitrary files, without knowing the structure beforehand.
As others have pointed out here, you can use inline assembly and the .incbin directive to include a file. But how could that influence whether the compilation succeeds or fails? I figured out how to guess a byte of the file and create metadata sections only accepted by the linker if the guess was correct.
We didn't actually use .incbin on that challenge, interestingly; however, we did use it (along with some nested static constructor function trickery) for Online Calc from TokyoWesterns CTF. For that we had some straightforward tricks to get around the forbidden characters, then we abused the flag format to get the #include to work. After that we could leak the flag byte-by-byte using Linux's BUILD_BUG_ON_ZERO, which is basically an upgraded static_assert. I think this is the code we ran:
The advantage is that it works anywhere, with any compiler.
The disadvantage is that it can increase compile time. I would limit each .c file to 10MB; it seems there's a quadratic increase in build time with file size, at least with gcc.
Also, instead of "0x%02x" I use decimal notation and no spaces in order to decrease the .c file size:
I've never noticed the "quadratic" build time increase you mention, so I did a test[0]. Files in size from 1mb to 50mb, 3 trials each. These are the results, and they look absolutely linear[1].
Nice, I love it when people actually try it out. In the end, I much prefer the method outlined in the article, less processing overall, no need to transform it into an array nor compile the array into an object afterwards. You just skip 2 steps and bring it to an object directly!
I stand corrected! I shouldn't have guessed "quadratic" without measuring.
Large files will make gcc use more memory, and if you're building on a low-RAW machine, the OS will start to swap. That's probably the effect I saw. Nothing to do with "quadratic" compilation time.
Yes, so long as you're willing to take a build-dep on vim. An od+awk+sed+sh combo will get you there from a POSIX base—and you have to add some wrapper text around the output of xxd anyway…
No. So long as you're willing to take a build-dep on that code.
Either you're expecting the code from that repository to exist somewhere in the compilation-host system already, or you are vendoring the code by copying it into your own repository and taking on any potential maintenance, unportability, any necessary configuration integration to make sure, for instance, that you're compiling for the host rather than the target in a cross-compile setup (did you make sure?)—
Either way, if you use that code as part of your build process, you are taking a build-dep on that code. The fact that its only build-dep is a C compiler may play into making this a sensible choice as compared with something that has even more transitive dependencies, but it does not make it go away.
If you click through from the article to the mailing list post that inspired it, you'll see that that's what they originally used, but it created a memory problem.
I also did quite a bit of benchmarking, and never saw any quadratic behaviour [0]. However, there is a floor for RSS and and elapsed time, but you would need to test assets less than ~1MB to see it. For assets over a limited range, the graph might look quadratic, but if you cover a large enough range you will see there are two regimes of behaviour.
So, by encoding in base64, you’ve made a more complicated solution which is also slower and harder to use? This seems like it is worse in every conceivable way. The only reason I can see for embedding base64 is if you need the source code to be compact, which makes sense for e.g. JavaScript but makes no sense for C.
I’ll take the simple, fast, and easy solution, thank you.
Where did they define the symbols `_binary_quine_c_start` and `_binary_quine_c_end`? I would expect that the symbol names would need to be passed to the command that produced the object file from the binary file:
> I would expect that the symbol names would need to be passed to the command that produced the object file from the binary file:
It's the other way around. The command produces these symbols basing on the file name:
$ readelf --symbols myself.o
Symbol table '.symtab' contains 5 entries:
Num: Value Size Type Bind Vis Ndx Name
0: 0000000000000000 0 NOTYPE LOCAL DEFAULT UND
1: 0000000000000000 0 SECTION LOCAL DEFAULT 1
2: 0000000000000000 0 NOTYPE GLOBAL DEFAULT 1 _binary_quine_c_start
3: 0000000000000113 0 NOTYPE GLOBAL DEFAULT ABS _binary_quine_c_size
4: 0000000000000113 0 NOTYPE GLOBAL DEFAULT 1 _binary_quine_c_end
Those are standardized named generated by ld, derived from the file name (in this case "quine", which is treated by the author both as C source and a binary data file).
I bet there's a CLI switch to change the defaults.
That is pretty cool, but also not the behaviour I'd expect coming from other languages where this would be "get the contents of this file from the current directory of my runtime environment, and assign them to immutable variable `a`."
Unfortunately, this technique is a bit problematic with modern C compilers. Because the `start` and `end` symbols are unrelated objects, as far as the compiler is concerned, the subtraction `&end - &start` to get the length of the data invokes undefined behavior. Just for that reason, I feel the include file with a hex dump is the better method.
In C, accessing data outside the bounds of an object is undefined (like, you can’t legally “go past the end” of an array and end up in some other object). The “start” and “end” pointers, as far as the compiler is concerned, are totally different objects, so it may optimize a loop from one pointer to the other out since it’s impossible to increment an address so it’ll go from pounding at one thing to another.
So is the idea that just because the two objects happen to appear sequentially in memory under a certain implementation, the compiler (or linker in this case?) has no obligation to ensure that assumption holds?
It follows the same implementation experience guidelines as #include by leaving the search paths implementation defined, with the understand that implementations are not monsters and will generally provide…
Really? We are going to "understand that implementations are not monsters" after what they've done with "undefined"? I think maybe these standards should be written from the perspective that implementations are sociopathic demons summoned against their will.
(But I really want this feature. I regularly use xxd and some Makefile rules to embed assets in my executables. For instance, in a web service I might have all my default configuration and templates in the executable with command line options to send them to standard output and override them from external files. Then on the chance someone needs to make a change they can just make their own file and use it.)
May I ask what's wrong with using xxd? I don't have an opinion about C, but C++ is already pretty complicated, and adding features to the language when there are already well-known solutions doesn't seem wise.
(For the record, I'm aware that there are size limitation when using xxd, but there are also other solutions).
I wouldn't do it this way, within the toolchain. Better to just append the data to the finished executable, with a header that contains an identifying marker. The program can scan its own image and look for that marker to get at the data. That will work on any OS with any executable and linker format and can be done post-production (users or downstream distributors can receive the binary and add customized data to it without requiring a dev environment with a toolchain, just tiny utility you can bundle with the program).
Scanning for the marker can be avoided, if we do the following:
/* inside the program, at file scope */
struct {
char marker[16] = "dW5pcXVlbWFyawo="
uint32 offset;
} binary_stuff;
Then your tiny utility programs opens the executable and looks for the sequence "dW5pcXVlbWFyawo=" inside it. Having found it, it puts the offset of the data into the 32 bit offset field which follows, and writes the data at that offset.
When the program runs, if it sees a nonzero value in binary_stuff.offset, it can open its image and directly proceed to that offset to read the stuff or map it: no searching.
This isn't portable. Executable files aren't binary blobs, they're a structured format. If you append data to an executable there is no guarantee it's actually going to end up mapped in memory for you. You'd have to put it into an actual ELF segment, and at that point you're back to using the linker.
It's a lot more sensible to ask the linker to do this as in OP than to hack together something like you've described.
> If you append data to an executable there is no guarantee it's actually going to end up mapped in memory for you.
I didn't state it clearly enough, but I didn't say anything about it being mapped. It almost certainly isn't mapped. Loaders do not blindly map the whole thing to memory; then you would end up with debug info unconditionally mapped.
Even if it were mapped, the program wouldn't easily find it with the latter approach I described: the offset given in the structure is measured from the start of the file, not from some address in memory.
> ask the linker
It is not portable either. For instance, it doesn't work with Microsoft's linker which is called link.exe.
The advantage of the approach described in the article is that there's less work to do at startup time. For applications that run on end-users' machines, we should do whatever we can to minimze work done at startup, getting it as close as possible to just mapping a big blob into memory and jumping to the code that does the real work of the program.
> For applications that run on end-users' machines, we should do whatever we can to minimze work done at startup.
That is not an absolute given. If this is some tool that is launched a large number of times from frequently running shell scripts, or from a busy web server's CGI, I'd tend to agree.
The principle you are championing is not widely followed anyway. All sorts of applications that people use on their machines on a daily basis have atrocious startup times. Though "everyone's startup time sucks" is no excuse, it does remove the motivation to worry about single digit millisecond differences.
With my technique we don't actually have to do anything at all until the blob is required. If there are ways of executing the program that don't require the data, that initialization need not even take place. With lazy initialization we can move the cost out of the startup time.
Even if you work at minimizing startup time, there is a lot of activity in a modern application, like attaching shared libraries and opening various files. Contribution from this approach is a like rounding error in the least significant digit of startup time compared to the default stuff that happens before your program even gains control.
You should check out how self-extracting archives work. All of those that I am aware of have a first part that is a regular executable and the archive to ve extracted is appended to that. File formats like zip have an advantage here as they keep their index at the end of the file after the compressed data. So there is no need to scan for the start of the archive.
The footer could work. If you round up the file to some reasonable power-of-two block size and put the footer right at the end of the last block, that would be highly portable; it would work on systems/devices that only write binary files in multiples of a block size (even if you write a short block at the end).
The author puts a lot of work towards making this proposal a reality (as far as I as a casual twitter/slack observer can see) and I'm looking forward to it.
Unfortunately, the proposal has been stalled by the C++ committee and the author is uninterested in continuing it. See their post here https://thephd.github.io/full-circle-embed
Thing is, embedding binary files is not that useful, and there are easy enough ways to do it for those few that really needed it.
The C standard has been very cautious about adding features that would make the standard and the compilers more complex since it is the lingua franca of computing.
Nowadays there is really just a handful of archs in use and their compilers are all backends for GCC/LLVM, so they are relaxing a bit the gating of features.
Any turing language can be as dynamic as you want if you're motivated enough. After all, many interpreted languages are implemented in C...
Many dynamic recompiler/JIT implementations effectively do something like what you describe: they map some executable portion of memory and output native code that is then executed.
Heh, I used this trick when 2 years ago went to Hackaday Belgrade and did a port of a NES emulator on the badge. The ROM that was being played would be compiled using the same basic method. Neat!
Compilers will concatenate sections of the same name. You can use this trick to produce a concatenation of arrays across several files:
This is the mechanism that linkers use "under the hood" to get a list of C++ object initializers that need to run pre-main().It's unfortunate that there is no standard way of getting at this functionality in portable C, or to get it in C++ without actually running code pre-main(). Sometimes you really want a linker-initialized list things (of some sort) that have been linked in, without actually running code pre-main() (which has all kind of issues).
I would love to see a thing like this standardized in both C and C++. C++ compilers need it under the hood anyway.