I know it is not supported, but I am wondering if there are any tricks around it. Any tips?
Reflection in general is a means for a program to analyze the structure of some code. This analysis is used to change the effective behavior of the code.
Reflection as analysis is generally very weak; usually it can only provide access to function and field names. This weakness comes from the language implementers essentially not wanting to make the full source code available at runtime, along with the appropriate analysis routines to extract what one wants from the source code.
Another approach is tackle program analysis head on, by using a strong program analysis tool, e.g., one that can parse the source text exactly the way the compiler does it. (Often people propose to abuse the compiler itself to do this, but that usually doesn't work; the compiler machinery wants to be a compiler and it is darn hard to bend it to other purposes).
What is needed is a tool that:
- Parses language source text
- Builds abstract syntax trees representing every detail of the program. (It is helpful if the ASTs retain comments and other details of the source code layout such as column numbers, literal radix values, etc.)
- Builds symbol tables showing the scope and meaning of every identifier
- Can extract control flows from functions
- Can extact data flow from the code
- Can construct a call graph for the system
- Can determine what each pointer points-to
- Enables the construction of custom analyzers using the above facts
- Can transform the code according to such custom analyses (usually by revising the ASTs that represent the parsed code)
- Can regenerate source text (including layout and comments) from the revised ASTs.
Using such machinery, one implements analysis at whatever level of detail is needed, and then transforms the code to achieve the effect that runtime reflection would accomplish. There are several major benefits:
- The detail level or amount of analysis is a matter of ambition (e.g., it isn't limited by what runtime reflection can only do)
- There isn't any runtime overhead to achieve the reflected change in behavior
- The machinery involved can be general and applied across many languages, rather than be limited to what a specific language implementation provides.
- This is compatible with the C/C++ idea that you don't pay for what you don't use. If you don't need reflection, you don't need this machinery. And your language doesn't need to have the intellectual baggage of weak reflection built in.
See our DMS Software Reengineering Toolkit for a system that can do all of the above for C, Java, and COBOL, and most of it for C++.
[EDIT August 2017: Now handles C11 and C++2017]
Tips and tricks always exists. Take a look at Metaresc library https://github.com/alexanderchuranov/Metaresc
It provides interface for types declaration that will also generate meta-data for the type. Based on meta-data you can easily serialize/deserialize objects of any complexity. Out of the box you can serialize/deserialize XML, JSON, YAML, XDR, Lisp-like notation, C-init notation.
Here is a simple example:
#include <stdio.h>
#include <stdlib.h>
#include <math.h>
#include "metaresc.h"
TYPEDEF_STRUCT (point_t,
double x,
double y
);
int main (int argc, char * argv[])
{
point_t point = {
.x = M_PI,
.y = M_E,
};
MR_PRINT ((point_t, &point, XML));
return (EXIT_SUCCESS);
}
This program will output
$ ./point
<?xml version="1.0"?>
<point_t>
<x>3.1415926535897931</x>
<y>2.7182818284590451</y>
</point_t>
Library works fine for latest gcc and clang on Linux, MacOs, FreeBSD and Windows. Custom macro language is one of the options. User could do declaration as usual and generate types descriptors from DWARF debug info. This moves complexity to the build process, but makes adoption much easier.
any tricks around it? Any tips?
The compiler will probably optionally generate 'debug symbol file', which a debugger can use to help debug the code. The linker may also generate a 'map file'.
A trick/tip might be to generate and then read these files.
I know of the following options, but all come at cost and a lot of limitations:
- Use
libdl
(#include <dfcln.h>
) - Call a tool like
objdump
ornm
- Parse the object files yourself (using a corresponding library)
- Involve a parser and generate the necessary information at compile time.
- "Abuse" the linker to generate symbol arrays.
I'll use a bit of unit test frameworks as examples further down, because automatic test discovery for unit test frameworks is a typical example where reflection comes in very handy, and it's something that most unit test frameworks for C fall short of.
Using libdl
(#include <dfcln.h>
) (POSIX)
If you're on a POSIX environment, a little bit of reflection can be done using libdl
. Plugins are developed that way.
Use
#include <dfcln.h>
in your source code and link with -ldl
.
Then you have access to functions dlopen()
, dlerror()
, dlsym()
and dlclose()
with which you could load and access / run shared objects at runtime. However, it does not give you easy access to the symbol table.
Another disadvantage of this approach is that you basically restrict reflection to objects loaded as dynamic library (shared object loaded at runtime via dlopen()
).
Running nm
or objdump
You could run nm
or objdump
to show the symbol table and parse the output.
For me, nm -P --defined-only -g xyz.o
gives good results, and parsing the output is trivial.
You'd be interested in the first word of each line only, which is the symbol name, and maybe the second one, which is the section type.
If you do not know the object name in some static way, i.e. the object is actually a shared object, at least on Linux you then might want to skip symbol names starting with '_'.
objdump
, nm
or similar tools are also often available outside POSIX environments.
Parsing the object files yourself
You could parse the object files yourself. You probably don't want to implement that from scratch but use an existing library for that. This is how nm
, objdump
and even libdl
are implemented. You could peek at the source code of nm
, objdump
and libdl
and the libraries they use in order to find out how they do what they do.
Involving a Parser
You could write a parser and code generator which generates the necessary reflective information at compile time and stores it in the object file. Then you have a lot of freedom and could even implement primitive forms of annotations. That's what some unit test frameworks like AceUnit do.
I found that writing a parser which covers straight-forward C syntax is fairly trivial. Writing a parser which really understands C and could deal with all cases is NOT trivial. So, this has limitations which depend on how exotic the C syntax is that you want to reflect upon.
"Abusing" the linker to generate symbol arrays
You could put references to symbols which you want to reflect upon in a special section and use a linker configuration to emit the section boundaries so you can access them in C.
I've described here N-Dependency injection in C - better way than linker-defined arrays? how this works.
But beware, this is depending on a lot of things and not very portable. I have only tried this with GCC
/ld
, and I know it doesn't work with all compilers / linkers. Also, it's almost guaranteed that dead code elimination will not detect how you call this stuff, so if you use dead code elimination, you will have to add all the reflected symbols as entry points.
Pitfalls
For some of the mechanisms, dead code elimination can be a problem, in particular when you "abuse" the linker to generate a symbol arrays. It can be worked around by telling the reflected symbols as entry points to the linker, and depending on the amount of symbols this might be neither nice nor convenient.
Conclusion
Combining nm
and libdl
can actually give quite good results. The combination can be almost as powerful as the level of Reflection used by JUnit 3.x in Java. The level of reflection given is sufficient to implement a JUnit 3.x-style unit test framework for C, including test-case discovery by naming convention.
Involving a parser is more work and limited to objects that you compile yourself, but gives you most power and freedom. The level of reflection given can be sufficient to implement a JUnit 4.x-style unit test framework for C, including test-case discovery by annotations. AceUnit is a unit test framework for C that does exactly this.
Combining parsing and the linker to generate symbol arrays can give very nice results - if your environment is so much under your control that you can ensure that working with the linker that way works for you.
And of course you can combine all approaches to stitch together the bits and pieces until they fit your needs.
Based on the responses to How can I add reflection to a C++ application? (Stack Overflow) and the fact that C++ is considered a "superset" of C, I would say you're out of luck.
There's also a nice long answer about why C++ doesn't have reflection (Stack Overflow).
I needed reflection in a bunch of struct
s in a C++ project.
I created a xml file with the description of all those structs - fortunately the fields types were primitive types.
I used a template (not C++ template
) to auto generate a class
for each struct
along with setter/getter methods.
In each class
I used a map to associate string names and class members (pointers to members).
I didn't regret using reflection because it opened new ways to design my core functionality that I couldn't even imagine without reflection.
(BTW, it was an external report generator for a program that uses a raw database)
So, I used code generation, function pointers and maps to simulate reflection.
You would need to implement it from yourself from the ground up. In straight C, there is no runtime information whatsoever kept on structure and composite types. Metadata simply does not exist in the standard.
- Implementing reflection for C would be much simpler... because C is simple language.
- There is some basic options for analazing program, like detect if function exists by calling dlopen/dlsym -- depends on your needs.
- There are tools for creating code that can modify/extend itselfusing tcc.
- You may use the above tool in order to create your own code analizers.
For similar reasons to the author of the question, I have been working on a C-type-reflection-API along with a C reflection graph database format and a clang plug-in that writes reflection metadata.
The intent is to use the C reflection API for writing serialization and deserialization routines, such as mappers for ASN.1, function argument printers, function proxies, fuzzers, etc. Clang and GCC both have plugin APIs that allow access to the AST but there currently is no standard graph format for C reflection metadata.
The proposed C reflection API is called Crefl:
The Crefl API provides runtime access to reflection metadata for C structure declarations with support for arbitrarily nested combinations of: intrinsic, set, enum, struct, union, field (member), array, constant, variable.
- The Crefl reflection graph database format for portable reflection metadata.
- The Crefl clang plug-in outputs C reflection metadata used by the library.
- The Crefl API provides task-oriented query access to C reflection metadata
A C reflection API provides access to runtime reflection metadata for C structure declarations with support for arbitrarily nested combinations of: intrinsic, set, enum, struct, union, field, array, constant, variable. The Crefl C reflection data model is essentially a transcription of the C data types in ISO/IEC 9899:9999.
- C intrinsic data types.
- integer types.
- floating-point types.
- complex number types.
- boolean type.
- nested struct, union, field, and bitfield
- arrays and pointers
- typedef type aliases
- enum and enum constants
- functions and function parameters
- const, volatile and restrict qualifiers
- GNU-C style attributes using (
__attribute__
).
The library is still a work in progress. The hope is to find others who are interested in reflection support in C.
Parsers and Debug Symbols are great ideas. However, the gotcha is that C does not really have arrays. Just pointers to stuff.
For example, there is no way by reading the source code to know whether a char * points to a character, a string, or a fixed array of bytes based on some "nearby" length field. This is a problem for human readers let alone any automated tool.
Why not use a modern language, like Java or .Net? Can be faster than C as well.
© 2022 - 2024 — McMap. All rights reserved.