Replacing __aeabi_dsub to save space (-flto issues)
Asked Answered
M

1

8

I'm trying to cram a lot of code into a reasonably small ARM microcontroller. I've done a massive amount of work on size optimisation already, and I'm down to the point where I need double arithmetic, but __aeabi_ddiv, __aeabi_dadd and __aeabi_dsub are some of the biggest functions on the whole device.

Both __aeabi_dadd and __aeabi_dsub are ~1700 bytes each, despite doing basically the same job (the very top bit of doubles is the sign bit). Neither function references the other one.

Realistically all I need to do is replace __aeabi_dsub with:

double __aeabi_dsub(double a, double b) {
  // flip top bit of 64 bit number (the sign bit)
  ((uint32_t*)&b)[1] ^= 0x80000000; // assume little endian
  return a + b;
}

and I'd save ~1700 bytes - so flipping the sign of the second argument, then adding them using __aeabi_dadd.

I'm aware that this may not be 100% compatible with the IEEE spec, but on this platform I'm ok with that in order to save > 1% of my available flash.

My problem is that when I add that function, the linker complains with undefined reference to __aeabi_dsub - which seems strange given that it's the act of defining it that causes the error.

This appears to be related to link time optimisation (-flto) - turning it off means it all works perfectly, however it adds 8k to the firmware size to it no longer fits in available flash!

So what do I need to do to be able to replace the built-in function __aeabi_dsub when link time optimisation is active?

thanks!

Mooring answered 20/9, 2018 at 15:49 Comment(7)
Tried compiling it with -Os yet? Really, before hacking std libraries, better try optimizing the code. Also if your processor has an FPU you can utilize it and get rid of these functions.Ankara
Yes, it's using -Os already, and I have made many other code optimisations first including replacing things like sin with slower but smaller versions (which works fine). This question is about GCC, FLTO and built-in functions, @toohonestforthissite your personal views on language choice shouldn't come into it. The build is for the BBC micro:bit, a device for school children. IMO the vast majority of 10 year olds aren't going to get far with interrupts, pointers, and Nordic's Bluetooth softdevice.Mooring
There you go, I've edited the question to remove reference to JavaScript so maybe we can actually try and fix the problem - which could be a real issue for anyone trying to use double math on some of the smaller microcontrollers.Mooring
This is a familiar activity for me when creating a boot loader. Have you used -ffreestanding? This will often eliminate the issue. Please provide the gcc version as well.Ibrahim
Thank you! Please can you post that as the answer? That did it. The act of adding -ffreestanding actually added ~250 bytes to the firmware size (I guess some of the assumptions about builtins were broken), but adding my dsub code saved 1680 bytes, so it's still a very clear winMooring
There would be more to my answer than that, including a rational for why you have this issue. -ffreestanding is not the only solution. See: Static libraries with lto for some more background. Either -lc or -lgcc that you link with may not be LTO friendly. I doubt your question will be reopened; most people can not understand you (or were too distracted by what you are trying to accomplish); which you should take as a complement. I used nsjs years ago and it was under 100k at that time.Ibrahim
Thanks for all your help @artlessnoise - I'll check it out a bit more, it may well be as you say. It's a shame I and others won't get to see your more detailed answer :(Mooring
M
1

The solution for me (as suggested by @artless-noise) was to use the -ffreestanding compiler flag. GCC has this to say about it:

Assert that compilation targets a freestanding environment... A freestanding environment is one in which the standard library may not exist, and program startup may not necessarily be at main. The most obvious example is an OS kernel.

So it seems to make a lot of sense for an embedded environment anyway...

This added ~250 bytes to the firmware size (about 0.1%) because I guess it stopped the compiler taking advantage of some assumptions about built-in operators, however it did allow me to add my own __aeabi_dsub implementation, which saved 1680 bytes in total.

Mooring answered 11/1, 2021 at 8:34 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.