const char *foo = "foo";
This was recently mentioned on bugzilla, and the problem is usually underestimated, so I thought I would give some details about what is wrong with the code above.
The first common mistake here is to believe foo
is a constant. It is a pointer to a constant. In practical ELF terms, this means the pointer lives in the .data
section, and the string constant in .rodata
. The following code defines a constant pointer to a constant:
const char * const foo = "foo";
The above code will put both the pointer and the string constant in .rodata
. But keeping a constant pointer to a constant string is pointless. In the above examples, the string itself is 4 bytes (3 characters and a zero termination). On 32-bits architectures, a pointer is 4 bytes, so storing the pointer and the string takes 8 bytes. A 100% overhead. On 64-bits architectures, a pointer is 8 bytes, putting the total weight at 12 bytes, a 200% overhead.
The overhead is always the same size, though, so the longer the string, the smaller the overhead, relatively to the string size.
But there is another, not well known, hidden overhead: relocations. When loading a library in memory, its base address varies depending on how many other libraries were loaded beforehand, or depending on the use of address space layout randomization (ASLR). This also applies to programs built as position independent executables (PIE). For pointers embedded in the library or program image to point to the appropriate place, they need to be adjusted to the base address where the program or library is loaded. This process is called relocation.
The relocation process requires information which is stored in .rel.*
or .rela.*
ELF sections. Each pointer needs one relocation. The relocation overhead varies depending on the relocation type and the architecture. REL
relocations use 2 words, and RELA
relocations use 3 words, where a word is 4 bytes on 32-bits architectures and 8 bytes on 64-bits architectures.
On x86 and ARM, to mention the most popular 32-bits architectures nowadays, REL relocations are used, which makes a relocation weigh 8 bytes. This puts the pointer overhead for our example string to 12 bytes, or 300% of the string size.
On x86-64, RELA relocations are used, making a relocation weigh 24 bytes! This puts the pointer overhead for our example string to 32 bytes, or 800% of the string size!
Another hidden cost of using a pointer to a constant is that every time it is used in the code, there will be pointer dereference. A function as simple as
int bar() { return foo; }
weighs one instruction more when foo
is defined const char *
. On x86, that instruction weighs 2 bytes. On x86-64, 3 bytes. On ARM, 4 bytes (or 2 in Thumb). That weight can vary depending on the additional instructions required, but you get the idea: using a pointer to a constant also adds overhead to the code, both in time and space. Also, if the string is defined as a constant instead of being used as a literal in the code, chances are it's used several times, multiplying the number of such instructions. Update: Note that in the case of const char * const
, the compiler will optimize these instruction and avoid reading the pointer, since it's never going to change.
The symbol for foo
is also exported, making it available from other libraries or programs, which might not be required, but also adds its own overhead: an entry in the symbols table (5 words), an entry in the string table for the symbol name (strlen("foo") + 1
) and an entry in the symbols hash chain table (4 bytes if only one type of hash table (sysv or GNU) is present, 8 if both are present), and possibly an entry in the symbols hash bucket table, depending on the other exported symbols (4 or 8 bytes, as chain table). It can also affect the size of the bloom filter table in the GNU symbol hash table.
So here we are, with a seemingly tiny 3 character string possibly taking 64 bytes or more! Now imagine what happens when you have an array of such tiny strings. This also doesn't only apply to strings, it applies to any kind of global pointer to constants.
In conclusion, using a definition like
const char *foo = "foo";
is almost never what you want. Instead, you want to use one of the following forms:
- For a string meant to be exported:
const char foo[] = "foo";
- For a string meant to be used in the same source file:
static const char foo[] = "foo";
- For a string meant to be used across several source files for the same library:
__attribute__((visibility("hidden"))) const char foo[] = "foo";