printf() debugging library using string table "decoder ring"

Asked 2/8, 2011 at 12:54 Answered 28/7, 2013 at 11:32

I'm writing to see if any of you have ever seen or heard of an implementation of the idea I'm about to describe.

I'm interested in developing a printf-style debugging library for an embedded target. The target is extremely remote, and the comms bandwidth budget between me and the target is extremely tight, so I want to be able to get the debugging messages in a very efficient format.

Quite often, debug statements look something like the following:

myDebugLibraryPrintf("Inside loop, processing item %d out of %d.\n", i, numItems);

Of course, when this is expanded into text, the string printed is something like "Inside loop, processing item 5 out of 10.\n", a total of ~42 bytes or so. Over 90% of the data printed out by this statement is static, literal -- known at compile-time. Of course, only the "5" and "10" aren't known at compile-time.

What I'd like to do is be able to send back only those two integers (8 bytes instead of 42). Once I've received that data, I'd have some kind of "decoder ring" that lets me "reconstitute" the received data and print out the full debug message here at my location.

I'd generate the "decoder ring" by automatically (as part of the build process) giving every myDebugLibraryPrintf() statement a unique ID at compile time, and generating a table that maps those unique IDs to the original format strings. Then, any time myDebugLibraryPrintf() is called on the target, it transmits the unique ID and any of the "%d", "%f", etc. varargs values seen in the format string, but the format string itself is NOT transmitted. (I'll probably just disallow "%s" items for now...) Back at my location, we'll have a program that looks up the unique IDs in the table, finds the appropriate format string, and uses it to reconstruct the original debug message.

I feel like someone has probably had this idea before and I figured maybe someone in the community would have seen something like it (or even know of an open-source library that does this).

Constraints:

To clarify, I'm dealing with C/C++ here, and I'm not interested in a 100%-complete replacement implementation of printf() -- things like non-literal format strings, %s (string) format specifiers, or more advanced format specifiers like putting the width or precision in the varargs list with %*.*d don't need to be supported.
I want the string table to be generated automatically as part of the build process so that adding debug involves no more work than adding a traditional printf(). If any more than the minimum amount of effort is required, nobody on my project will use it.
Doing extra work as part of the build process to generate the string table is pretty much assumed. Fortunately, I have control of all the source code that I'm interested in using this library with, and I have a lot of flexibility within the build process.

Thanks!

Erechtheus answered 2/8, 2011 at 12:54 Comment(5)

What you're proposing is a simple form of data compression. You could probably save yourself a lot of time and effort, and still get 90+% of the benefit, simply by filtering your program's debug output through gzip before sending it across the link, and filtering it through gunzip at the other end. gzip/gunzip will automatically build up symbol tables and do the compression for you, without constraining your program's output the way a manual tokenizing scheme would. – Dioptase 10/8, 2011 at 22:54

Is it C or C++? (Tag editors seem to disagree) – Thermocline 11/8, 2011 at 0:18

@Jeremy Friesner: In my discussions with colleagues, compression has come up as an option as well, and it seems like it could be a good option -- however, it seems like a stretch to say I'd get comparable benefit from it. In the example I gave, I was able to effectively send 42 bytes worth of information in 8 bytes -- a space savings of 80%. Can gzip really achieve a space savings of 80% on this kind of data? (I need to do an experiment to find out.) Of course, even if gzip can't achieve the same savings, I may be willing to accept the relative inefficiency in the name of lower complexity. – Erechtheus 11/8, 2011 at 12:16

@AShelly: I guess it could be either C or C++. Currently, I don't see any reason that I'd need to use any C++-specific language features, so ideally I'd like to shoot for the "lowest common denominator" and see a "C-only" solution. However, most of the code on my project is already in C++, so if we had to rely on something that only worked in C++, then I'd be fine with that. – Erechtheus 11/8, 2011 at 12:32

I've just created 1000 lines of your type of output, and run it through gzip. The change was from 44,000+ chars to 1,700+ chars. The new length is under 4% of the old length. Try running your typical debugger output through gzip... – Vinery 12/8, 2011 at 19:47

I've only seen this idea implemented with a pre-defined set of strings. The code would look like debug_print(INSIDE_LOOP_MSG_ID, i, n). When developers wanted to add new messages they would have to put the new text in a specific header file and give it a new ID.

I think the idea of generating it on the fly from a normal-looking print statement is a interesting challenge. I haven't come across any existing implementations.

One idea might be a macro/template which turns the first string argument into a hash value at compile time. So the developer writes debug_print("test %d",i), which gets compiled to debug_port_send(0x1d3s, i). Writing a post-processing script to extract the strings and hashes for use on the recieving side should be simple. (simplest way to resolve hash collisions would be to give error message and force user to alter the wording slightly).

edit:
So I tried this with the compile-time hash at the link above.

#define QQuot_(x) #x
#define QQuote(x) QQuot_(x)
#define Debug_Print(s, v) (Send( CONSTHASH(QQuote(__LINE__)##s), *((long*)&(v))))

void Send(long hash, long value)
{
   printf("Sending %x %x\n", hash, value); //replace with COMMS
}


int main()
{
   int i = 1;
   float f= 3.14f;
   Debug_Print("This is a test %d", i);
   i++;
   Debug_Print("This is a test %d", i);
   Debug_Print("This was test %f", f);
}

With a little more cleverness you could support multiple arguments. Examining dissasembly shows that all the hashes are indeed computed at compile time. Output is as expected, no collisions from identical strings. (This page confirms the hex is correct for 3.14):

Sending 94b7555c 1
Sending 62fce13e 2
Sending 506e9a0c 4048f5c3

All you need now is a text-processing script that can be run on the code which extracts the strings from Debug_Print, calculates the hashes and populates a table your reciever side. The reciever gets a hash value from the Send call, looks up the string that goes with it, and passes that, along with the argument(s) to a normal printf call.

The only problem I see is that the nested macros in the compile time hash are confusing my refactoring plug-in and killing my IDE responsiveness. Disabling the add-in removed that issue.

Thermocline answered 2/8, 2011 at 18:36 Comment(4)

I'd been thinking about forming the unique ID based on the __FILE__ and __LINE__ at which the printf() was called. Since I have ALL of the source code of interest, it wouldn't be difficult to assign a distinct number to each source file (either automatically or explicitly) and then form a 32-bit unique ID using something like uniqueID = (fileId << 14) | lineNum, but your hashing idea might actually let me get away with a 16-bit unique ID, especially if I include the file and line as part of the hash key and use gperf to generate the hash function. – Erechtheus 3/8, 2011 at 12:2

I would just define long name constants for indexing into a string table. So if the string is "Inside loop, processing item %d out of %d.\n" I would use an identifier Inside_loop_processing_item_d_out_of_d (= index into a string table) It allows you to have a very readable debug statement and at the same time, avoid having to go to macros / compile time hashing. Hopefully, you won't hit limits on the symbol length for most print statements. The symbol table would need to be constructed manually. – Boar 11/8, 2011 at 0:3

Doesn't that violate constraint #2? "no more work than adding a traditional printf(). " – Thermocline 11/8, 2011 at 0:15

@ritesh: Ashelly is right -- I'm trying to avoid the need to manually maintain the string table. – Erechtheus 11/8, 2011 at 12:17

I've seen something that accomplishes something similar on the ARM platform. I believe it's called the "Embedded Trace Macrocell". A series of macros translates statements like TRACE_POWER_SYSTEM_VOLTAGE_REGULATOR_TRIGGER(inputX); to two register writes into the ETM registers. Note that this ONLY accepts 16bit, 32bit and 64bit integers as arguments, though.

We can use the ARM tools to extract these (timestamped) buffers. Then we apply a pre-compiled bit of trickery to convert the first (index) register write into an output file that looks like this:

timestamp  | POWER SYSTEM    |    VOLTAGE REGULATOR TRIGGER    | 0x2380FF23

The code has been examined to determine the data type of the argument, so we don't have to bother. It can also be annotated with a "real time" timestamp (instead of ms since powerup), and file and line numbers of the trace statements.

ARM is setup to store this circular buffer internally (and very quickly), so it can be used in production. Even if you don't have the hardware support, though... some aspects of this could be easily reproduced.

Note that it's extremely important when analyzing a trace, that you only use a 'decode' file that matches the particular version of the code running on the device.

Jeans answered 10/8, 2011 at 22:50 Comment(1)

This is also a useful idea -- not quite what I was imagining, but interesting nonetheless. This ETM thing reminds me of the [mc.com/products/software/tatl/]("Trace Analysis Tool and Library") ("TATL" -- no relation to Zelda) I've used on PPCs in the past, but faster via harware support (always cool). You touched on a key maintainability issue in your last sentence -- to deal with this I was thinking of tracking various versions of the string table through some kind of (cryptographic) signature and having the embedded target report the signature of the table it's using at bootup. – Erechtheus 11/8, 2011 at 12:24

I seem to recall many tools for extracting string literals for the purpose of internationalization. GNU strings can extract the strings directly from the executable. This should help with part of the task.

Carnify answered 10/8, 2011 at 23:8 Comment(0)

I had the same problem PLUS I wanted to reduce the image size (due to tiny embedded flash). My solution is sending file name and line (which should be 14-20 Byte) and having a source parser on the server side, which will generate map of the actual texts. This way the actual code will contain no "format" strings, but single "filename" string for each file. Furthermore, file names can be easily replaced with enum (unlike replacing every string in the code) to reduce the COMM throughput.

I hope the sample psaudo-code will help clarifying the idea:

/* target code */
#define PRINT(format,...) send(__FILE__,__LINE__,__VA_ARGS__)
...

/* host code (c++) */
void PrintComm(istream& in)
{
    string fileName;
    int    line,nParams;
    int*   params;
    in>>fileName>>line>>nParams;
    if (nParams>0)
    {
        params = new int[nParams];
        for (int i=0; i<nParams; ++i)
            in>>params[i];
    }
    const char* format = FindFormat(fileName,line);
    ...
    delete[] params;
}

Adonis answered 28/7, 2013 at 11:32 Comment(0)

Hot tags

Godot Unity Godot Help Programming Godot 4.X GUI GDScript 3D 2D Physics CSharp Godot 3.X VR XR Projects C++

Recommended topics

Hot tags