...

compiled binary, GNU size, nm, analysing linker map files

After the successful compilation of a program, it’s possible to analyse the To measure memory at compile time, one can analyse the resulting binary and associated metadata, such as memory map files. Analysis at compile time allows assessing full Flash usage and partial RAM usage. Different techniques allow measuring usage It can be performed with more or less granularity, in terms of which modules use more or less memory, depending on the tool used. Available tools include GNU objdump, size, and nm ; the memory map file generated by the linker can also be used for manual inspection or automated parsing.

For global granularity, use the GNU size command on your binary. The text and data sections represent memory used for code and initialized variables, respectively, and they will use space in the Flash. So in the example below, Flash usage is 15840 + 56 = 15896 bytes. The bss section (which for historical reasons stands for Block Started by Symbol) stores uninitialized variables, and therefore it does not occupy space in the Flash. Since the variables in bss and data will need to be manipulate during runtime, these occupy space in RAM, thus in this example the static RAM usage amounts to 1032 + 56 = 1088 bytes.

Code Block
lakers-FORK $ size target/thumbv7em-none-eabihf/debug/lakers-no_std text data bss dec hex filename 15840 56 1032 16928 4220 target/thumbv7em-none-eabihf/debug/lakers-no_std

We can see a slightly more detailed table with the objdump -h <binary file> command, which will print all the section headers and their attributes. For example, the text section contains code, while sections .vector_table, .rodata, and .data contain data.

Code Block

$ objdump -h target/thumbv7em-none-eabihf/release/lakers-no_std
Idx Name          Size      VMA       LMA       File off  Algn
  0 .vector_table 00000400  00000000  00000000  00010000  2**2
                  CONTENTS, ALLOC, LOAD, READONLY, DATA
  1 .text         0000ceec  00000400  00000400  00010400  2**2
                  CONTENTS, ALLOC, LOAD, READONLY, CODE
  2 .rodata       000038d0  0000d2f0  0000d2f0  0001d2f0  2**3
                  CONTENTS, ALLOC, LOAD, READONLY, DATA
  3 .data         00000038  20000000  00010bc0  00030000  2**2
                  CONTENTS, ALLOC, LOAD, DATA
(...)

Sometimes, we want to measure only the sizes for certain parts of our code. For example, in lakers, we normally want to measure how much memory is needed by the library itself, but want to discard things like the cryptographic backend, since it changes across platforms. One way to do that is by using the nm (name list) utility.

One way to do that is by analysing the memory map file generated by the compiler linker (one might need to enable that it by passing a flag such as -Clink-args=-Map=/tmp/lakers.map to the linker). Different linkers generate slightly different memory map files, but in general they present the address, size, and location of every symbol in your program. For example, …all map files will show exactly what symbols are placed in each section of the memory, as well as their address and size. In the example below, the Flash section begins at address 0x400 (__stext). The first symbol is a reset handler introduced by the cortex-m-rt crate, and the next symbol is the prepare_message_1 function, which starts at address 0x459 and uses 0x13c bytes. Using a script to parse this file and selecting only the target libraries or functions will give a very granular insight into Flash usage by the program. Similarly, sections such as .data, .bss, and .rodata can be analysed to obtain static RAM usage.

Code Block

$ cat /tmp/lakers_no-std.map | grep " .text" -A 8
     400      400    15a4c     4 .text
     400      400        0     1         __stext = .
     400      400       58     4         /home/gfedrech/Developer/inria/dev/lakers-FORK/target/thumbv7em-none-eabihf/debug/deps/libcortex_m_rt-ab9dabb33bc95171.rlib(cortex_m_rt-ab9dabb33bc95171.cortex_m_rt.bd536e3d6951dd08-cgu.0.rcgu.o):(.Reset)
     400      400        0     1                 $t.1
     401      401       3e     1                 Reset
     440      440        0     1                 $d.12
     458      458      13c     2         /home/gfedrech/Developer/inria/dev/lakers-FORK/target/thumbv7em-none-eabihf/debug/deps/lakers_no_std-3f752946f41f98ae.08piolvegulzp6fpuoijkvkau.rcgu.o:(.text._ZN6lakers28EdhocInitiator$LT$Crypto$GT$17prepare_message_117h45bd752ef830d0b1E)
     458      458        0     1                 $t.0
     459      459      13c     1                 lakers::EdhocInitiator$LT$Crypto$GT$::prepare_message_1::h45bd752ef830d0b1

Stack and heap (RAM at Runtime)

Stack and heap: memory painting

...

, probe-rs

Measuring RAM at runtime in embedded systems can be challenging due to the lack of an operating system that keeps track of memory usage. A way of circunventing this consists in employing the technique of “memory painting”. It consists in filling the RAM with a known pattern (e.g. 0DEAD_BEEF) before the program executes, then let it run, and finally count how many bytes are still intact.

To fill the memory, we can use a simple loop that writes the pattern to the memory. We need, however, to find what should be the start and stop addresses. There are a few ways to do that:

Look at the target datasheet. For example, in the nRF52840, the RAM goes from 0x2000000 to 0x4000000.
Look at or configure the GNU linker script. For example, our application sets RAM : ORIGIN = 0x20000000, LENGTH = 64K, meaning that the RAM begins at 0x2000000 (as per the datasheet) and has a total size of 64 kiB.
Look at the generated memory map file, and find where are the symbols _stack_start and __sheap. Since the stack grows from top to bottom (e.g. from 0x2000000 to 0x2000000 + 64 kiB), and the heap grows from the bottom, that is the total size of our allocatable RAM (i.e. discarding sections such as .bss, .data, .uninit).

We now know that we want to paint the memory from __sheap up to _stack_start.

Before continuing, remember that we want to write in the RAM before our code starts executing, otherwise we risk overwriting the stack that is already in use. One way of doing that is writing the loop in assembly (using only registers), and another way is doing it in the reset handler or in some pre-initialisation code in your platform. The cortex-m-rt crate provides a pre_init hook that runs before main, which is ideal to put the stack painting code. In the code below, we first obtain the address where the heap starts using the symbol defined by the linker. Next, since our code is already executing, we do not want to overwrite already allocated stack memory, so we get the current value of the stack pointer, offset by a constant since we are using the stack while painting it. Finally, we run the loop that writes the pattern to the memory.

Code Block

extern "C" {
    static mut __sheap: u8;
}
#[cortex_m_rt::pre_init]
unsafe fn pre_init() {
    let mut addr;

    // get heap start
    extern "C" { static mut __sheap: u8; }
    let heap_start = core::ptr::addr_of!(__sheap) as *mut u8 as usize;

    // get stack pointer
    let stack_pointer: *const u8;
    core::arch::asm!("mrs {}, msp", out(reg) stack_pointer);
    let stack_pointer = stack_pointer as usize;

    // paint the stack
    addr = heap_start;
    while addr < stack_pointer - 4 {
        unsafe {
            core::ptr::write_volatile(addr as *mut u32, 0xDEAD_BEEF);
        }
        addr += 4;
    }
}

Next, we flash and run the program, and after it finishes, we can use a debugger to inspect the memory and learn how much of our pattern was erased. We can use the command probe-rs read <WIDTH> <ADDRESS> <WORDS> and parse it’s output. For example, when reading the first 2 words after __sheap we can see our pattern.

Code Block
# read 2 words starting at __sheap (value 0x20000440 comes from memory map file) $ probe-rs read b32 0x20000440 2 --chip nRF52840_xxAA deadbeef deadbeef

Assuming our RAM size is set to 4096 = 0x1000 bytes (as per the memory.x file), we can compute the runtime memory usage as follows:

Code Block

$ painted_words=$(( (0x20001000 - 0x20000440) / 4 ))
$ remaining_bytes=$(probe-rs read b32 0x20000440 $painted_words --chip nRF52840_xxAA | tr ' ' '\n' | grep deadbeef | wc -l | awk '{print $1*4}')
$ echo $(( (painted_words * 4) - remaining_bytes))
908

Measuring Execution Time

timers, gpio's connected to logic analysers

...

If the constrained devices talks to a computer or gateway, just run Wireshark on the computer.
If two devices talk between each other, you need a third device that understand the protocol to sniff the conversation. Some IoT platforms offer facilities to save the conversation as a .cap file, which can be later analyzed on Wireshark.

drafts

nm (actually not recommended)

One way to do that is by using the nm (name list) utility. By default it returns all the symbol names and respective Flash addresses. To also get the symbol size, we set the -S flag. In the example below, we filter out the crypto backend and get the the symbol belonging to the lakers library (EdhocBuffer (...) Default ), which occupies 0x22 bytes.

Code Block

$ nm -S target/thumbv7em-none-eabihf/debug/lakers-no_std  | grep -v lakers_crypto | grep lakers | tail -1
00010da7 00000022 T _ZN86_$LT$lakers_shared..buffer..EdhocBuffer$LT$_$GT$$u20$as$u20$core..default..Default$GT$7default17hd2a38afa1cbc0a83E

Sum all relevant symbols with awk and we get the Flash usage for our application.

Code Block
$ nm -S target/thumbv7em-none-eabihf/debug/lakers-no_std \| grep -v lakers_crypto \| grep lakers \| awk '{sum += strtonum("0x" $2)} END {print sum}' 33230

Versions Compared

Old Version 6

New Version 7

Key

Stack and heap (RAM at Runtime)

Measuring Execution Time

drafts

nm (actually not recommended)

Page Comparison

Versions Compared

Old Version 6

New Version 7

Key

Stack and heap (RAM at Runtime)

Measuring Execution Time

drafts

nm (actually not recommended)