Is it safe to use the address of a static local variable within a function template as a type identifier?

Asked 26/1, 2017 at 6:41 Answered 26/1, 2017 at 7:50

Solved c++templates language-lawyer rtti

I wish to create an alternative to std::type_index that does not require RTTI:

template <typename T>
int* type_id() {
    static int x;
    return &x;
}

Note that the address of the local variable x is used as the type ID, not the value of x itself. Also, I don't intend to use a bare pointer in reality. I've just stripped out everything not relevant to my question. See my actual type_index implementation here.

Is this approach sound, and if so, why? If not, why not? I feel like I am on shaky ground here, so I am interested in the precise reasons why my approach will or will not work.

A typical use case might be to register routines at run-time to handle objects of different types through a single interface:

class processor {
public:
    template <typename T, typename Handler>
    void register_handler(Handler handler) {
        handlers[type_id<T>()] = [handler](void const* v) {
            handler(*static_cast<T const*>(v));
        };
    }

    template <typename T>
    void process(T const& t) {
        auto it = handlers.find(type_id<T>());
        if (it != handlers.end()) {
            it->second(&t);
        } else {
            throw std::runtime_error("handler not registered");
        }
    }

private:
    std::map<int*, std::function<void (void const*)>> handlers;
};

This class might be used like so:

processor p;

p.register_handler<int>([](int const& i) {
    std::cout << "int: " << i << "\n";
});
p.register_handler<float>([](float const& f) {
    std::cout << "float: " << f << "\n";
});

try {
    p.process(42);
    p.process(3.14f);
    p.process(true);
} catch (std::runtime_error& ex) {
    std::cout << "error: " << ex.what() << "\n";
}

Conclusion

Thanks to everyone for your help. I have accepted the answer from @StoryTeller as he has outlined why the solution should be valid according the rules of C++. However, @SergeBallesta and a number of others in the comments have pointed out that MSVC performs optimizations which come uncomfortably close to breaking this approach. If a more robust approach is needed, then a solution using std::atomic may be preferable, as suggested by @galinette:

std::atomic_size_t type_id_counter = 0;

template <typename T>
std::size_t type_id() {
    static std::size_t const x = type_id_counter++;
    return x;
}

If anyone has further thoughts or information, I am still eager to hear it!

Timm answered 26/1, 2017 at 6:41 Comment(17)

@A.S.H I've added a typical use case. Obviously I've simplified it, but I have used this pattern in actual production code. – Timm 26/1, 2017 at 7:20

Thanks for sharing. Please take a look at my comment to the answer of @galinette. since the handlers map is mainly used as an indirection tool for the execution, it is expected to be much faster if replaced by a vector, and using some auto-incremented enum for the types instead of some random pointers. – Simons 26/1, 2017 at 8:28

Side note : use an unordered_map instead of a map, the performance may vary a lot and you don't use any ordering – Copper 26/1, 2017 at 8:32

@Simons It would undoubtedly be faster, though you could end up with a bunch of unused entries in the vector if the processor wasn't the only thing using type_id. An unordered_map or a sorted vector would probably be faster than a map. In reality, the "handlers" would be doing far more work than they are in the example, so the lookup time for the routines would probably be negligible. – Timm 26/1, 2017 at 8:33

If you really want to use pointer as type id, at least make it more clear in the code : do not return a pointer, use uintptr_t, which is the unsigned integer type which has the same size as a pointer type, or typedef it. And comment! – Copper 26/1, 2017 at 8:36

@Copper I admit that naming the local variable id is misleading, since the address is the actual ID, but I thought the question would be clear from the title. Of course I would comment this code in real life. And I chose not to cast to uintptr_t or void* because I wanted to keep things as simple as possible. I'd rather not risk invoking additional C++ voodoo if it isn't necessary. – Timm 26/1, 2017 at 8:42

I've improved my answer with a (possibly hacky) way to secure that pattern against optimizing compilers... – Sciomachy 26/1, 2017 at 9:43

have you seen this stackoverflow question? https://mcmap.net/q/276988/-compile-time-constant-id – Dollar 26/1, 2017 at 10:59

This seems to be more suitable for code review. – Lowdown 26/1, 2017 at 12:21

You can even use function pointer of an inlined template function as a type id. Just let the function return a pointer to itself. No need for the static variable. – Adenectomy 26/1, 2017 at 12:57

@Adenectomy That's a good idea, though it would have to be a different function. A function returning a pointer to itself will have an infinitely recursive return type! – Timm 26/1, 2017 at 13:50

The type could just be a void *. – Barela 26/1, 2017 at 13:55

@Barela Function pointers cannot be cast to void* (isocpp.org/wiki/faq/…). – Timm 26/1, 2017 at 13:56

@JosephThomson You are right ... but afaik you can safely cast every function pointer to any other function pointer type. https://mcmap.net/q/408639/-cast-a-function-pointer – Adenectomy 26/1, 2017 at 14:4

@Adenectomy But it does say that the result of such a conversion is unspecified. While I doubt that anything unexpected would happen in practice, doesn't this mean that technically it might (e.g. you might get two different pointers from two separate reinterpret_casts of the same pointer). – Timm 26/1, 2017 at 14:28

@Adenectomy Using just a function pointer would be far more risky on MSVC. It's very likely that all function template instantiations will be folded into one with the /opt:icf linker option. – Helotism 26/1, 2017 at 14:40

@Helotism I was just about to ask about that. Just gave it a test on MSVC, and it does indeed fold them all into one. Scratch that idea. – Timm 26/1, 2017 at 14:53

Yes, it will be correct to an extent. Template functions are implicitly inline, and static objects in inline functions are shared across all translation units.

So, in every translation unit, you will get the address of the same static local variable for the call to type_id<Type>(). You are protected here from ODR violations by the standard.

Therefore, the address of the local static can be used as a sort of home-brewed run-time type identifier.

Queston answered 26/1, 2017 at 7:27 Comment(6)

The problem is not whether a type will always be represented by same address, but whether 2 different types could share same address... – Sciomachy 26/1, 2017 at 7:52

@SergeBallesta - type_id<int>() and type_id<float>() are completely different functions. I see no leeway in the standard that allows them to share the same static locals. – Queston 26/1, 2017 at 7:53

@SergeBallesta - In fact §14.8 ¶2 – Queston 26/1, 2017 at 7:59

I've read again 14.8.2 . It does declare that templace functions all have their own copy of the static variable, no problem on it. My question is whether an optimizing compiler can merge variables sharing the same value all along the program. – Sciomachy 26/1, 2017 at 9:8

@SergeBallesta Not if their addresses are observable ([intro.object]/8). That said, I've seen MSVC with the /opt:icf linker option merge some COMDATs in non-standard ways (the behaviour it at least documented). I'm not sure if OP's solution would be affected, but the one I gave here shouldn't be, as the variable id is not const. – Helotism 26/1, 2017 at 9:34

@SergeBallesta After doing a couple of tests, it looks like OP's code will be safe on MSVC as long as the local static is not const. MSVC's non-standard behaviour was discussed here. That question uses MSVC 2013. In my tests, MSVC 2015 U3 doesn't have the problem (at least in a simple test), but the latest 2017 RC does. Non-const works fine across versions. – Helotism 26/1, 2017 at 13:29

This is coherent with standard because C++ use templates and not generics with type erasure like Java so each declared type will have its own function implementation containing a static variable. All those variables are different and as such should have different addresses.

The problem is that their value is never used and worse never changed. I remember that the optimizers can merge string constants. As optimizers do their best to be far more clever than any human programmer, I will be afraid that a too zealous optimizing compiler discover that as those variable values are never changed, they will all keep a 0 value, so why not merge them all to save memory?

I know that because of the as if rule, the compiler is free to do what it wants provided the observable results are the same. And I am not sure that the addresses of static variables that will always share the same value shall be different or not. Maybe someone could confirm what part of the standard actually cares for it?

Current compilers still compile separately program units, so they cannot be sure whether another program unit will use or change the value. So my opinion is that the optimizer will not have enough information to decide to merge the variable, and your pattern is safe.

But as I really do not think that standard protects it, I cannot say whether future versions of C++ builders (compiler + linker) will not invent a global optimizing phase actively searching for unchanged variables that could be merged. More or less the same as they actively search UB to optimize out parts of code... Only common patterns, where not allowing them would break a too large code base are protected of it, and I do not think that yours is common enough.

A rather hacky way to prevent an optimizing phase to merge variables having same value would just be to give each one a different value:

int unique_val() {
    static int cur = 0;  // normally useless but more readable
    return cur++;
}
template <typename T>
void * type_id() {
    static int x = unique_val();
    return &x;
}

Ok, this does not even try to be thread safe, but it not a problem here: the values will never be used per themselves. But you now have different variables having static duration (per 14.8.2 of standard as said by @StoryTeller), that except in race conditions have different values. As they are odr used they must have different addresses and you should be protected for future improvement of optimizing compilers...

Note: I think that as the value will not be used, returning a void * sounds cleaner...

Just an addition stolen from a comment from @bogdan. MSVC is known to have very aggressive optimization with the /OPT:ICF flag. The discussion suggest that is should not be conformant, and that it only applies to variable marked as const. But it enforces my opinion that even if OP's code seems conformant, I would not dare to use it without additional precautions in production code.

Sciomachy answered 26/1, 2017 at 7:49 Comment(21)

What if its declared static volatile? Will that keep the optimizer at bay? – Tatia 26/1, 2017 at 9:44

why not declaring a static point of type T? in that case the variables will be of a different types, so it should prevent cleaver optimization – Dollar 26/1, 2017 at 9:53

@AlessandroTeruzzi A constant value of 0 for an int, long, char and their unsigned equivalent could all share same address... – Sciomachy 26/1, 2017 at 9:57

You could use a type like template <typename T> struct dummy. Presumably the compiler wouldn't be able to optimize those instances away. Then return a void*, assuming there isn't any problem with using ==, < and hash on void* pointers that point to different types of objects. – Timm 26/1, 2017 at 10:15

@JosephThomson: AFAIK, < on pointers to objects not being members of a same array, or bytes not belonging to same object is UB... Why do you need that? But it could be a different question... – Sciomachy 26/1, 2017 at 10:34

I meant less not <. My question is whether it is okay to compare and hash void* in this way. And whether this would stop the opimizer. – Timm 26/1, 2017 at 10:47

uintptr_t x = (uintptr_t)&x; merge THAT, you smarty pants compiler! – Par 26/1, 2017 at 11:0

"And I am not sure that the addresses of static variables that will always share the same value shall be different or not." – "Two objects that are not bit-fields may have the same address if one is a subobject of the other, or if at least one is a base class subobject of zero size and they are of different types; otherwise, they shall have distinct addresses." (§1.8/6 in N4140). No point in wasting time to protect against potential future insanity IMO. – Mcclary 26/1, 2017 at 12:37

FWIW the comparison with generics is not very relevant: In C# (and other .NET languages), a static member of a generic type will have an instance per reification, just like with templates. In Java, OTOH, you'll have only one instance of a static field in a generic class, due to type erasure (you can't even specify the type parameters when accessing a static field of a generic class). So it very much depends on the way the language implements generics, not on the mechanism itself. – Lutero 26/1, 2017 at 12:54

@ArneVogel: The note 4) in same paragraph says Under the “as-if” rule an implementation is allowed to store two objects at the same machine address or not store an object at all if the program cannot observe the difference. – Sciomachy 26/1, 2017 at 13:42

@LucasTrzesniewski: I was thinking about Java generics. I have made it more explicit. – Sciomachy 26/1, 2017 at 13:44

@Serge Ballesta: If my well-formed program cannot observe the difference, why would I care? The hack you propose may actually introduce UB in a multi-threaded program, as you yourself point out, which is IMO a worse can of worms. Obviously an atomic would solve that issue, but then again, I'm not convinced you're trying to solve a problem that currently exists… By the way, implementations are allowed to merge string literals… "string constants" is an ambiguous term here. They are certainly not allowed to merge objects of type const char[]. – Mcclary 26/1, 2017 at 14:7

@n.m I think casting to uintptr_t may not be guaranteed to work, since pointers may have multiple integer representations (en.cppreference.com/w/cpp/language/reinterpret_cast). – Timm 26/1, 2017 at 14:59

@SergeBallesta: Well of course the as-if rule applies - but if the program is going to compare the address of the two objects and behave differently if they are the same, then the program can observe the difference, and as-if does not apply. That's just saying that if you don't take the address of the objects, they can overlap (or if you do, that you don't compare these addresses). – Unreadable 26/1, 2017 at 15:21

@JosephThomson sorry it's a joke. different objects are guaranteed to have distinct addresses (unless subobjects are involved, which are not present here). – Par 26/1, 2017 at 16:9

@JosephThomson : in your link from cppreference, I read : "2) Any pointer can be converted to any integral type large enough to hold the value of the pointer (e.g. to std::uintptr_t)" – Copper 27/1, 2017 at 13:43

@Copper Yes, but it doesn't specify what the result of the conversion will be. It says that a pointer may have multiple integer representations: "the round-trip conversion in the opposite direction is not guaranteed; the same pointer may have multiple integer representations". This would mean that the uintptr_t may not necessarily be a unique identifier. – Timm 27/1, 2017 at 15:7

@JosephThomson : you misread the whole sentence. "A pointer converted to an integer of sufficient size and back to the same pointer type is guaranteed to have its original value, otherwise the resulting pointer cannot be dereferenced safely (the round-trip conversion in the opposite direction is not guaranteed; the same pointer may have multiple integer representations)" Means if the integer is large enough, the round-trip is guaranteed. If the integer is not large enough, it is not garanteed. uintptr_t is there exactly for that purpose since size is guaranteed if this type is defined. – Copper 27/1, 2017 at 15:29

@Copper I don't think so. It specifically talks about the round trip conversion from integer to pointer to integer not being guaranteed, since pointers may have multiple integer representations. I think this is an interpretation of the standard where it says, "The mapping function [from pointer to integer] is implementation-defined." – Timm 28/1, 2017 at 3:52

"A pointer converted to an integer of sufficient size and back to the same pointer type is guaranteed to have its original value". Again I am 100% sure of this, and this is the very reason for uintptr_t existence. If a compiler cannot guarantee the pointer<>integer conversion, it must not declare this type. This is why the type is optional. – Copper 28/1, 2017 at 9:30

@Copper All that appears to be guaranteed is that casting from pointer to integer (of sufficient size) and back to pointer again will give you the original pointer. The standard doesn't specify what the result of the pointer to integer conversion will be though, so it is technically not safe to assume that casting a pointer to an integer will always give you the same value. The pointer may have multiple integer representations. Whether or not this happens is practice is another question. – Timm 29/1, 2017 at 3:33

Post-comment edit : I did not realize at first read that the address was used as the key, not the int value. That's a clever approach, but it suffers IMHO a major flaw : the intent is very unclear if someone else finds that code.

It looks like an old C hack. It's clever, efficient, but the code does not self-explain at all what the intent is. Which in modern c++, imho, is bad. Write code for programmers, not for compilers. Unless you have proven that there is a serious bottleneck which requires bare metal optimization.

I would say it should work but I'm clearly not a language lawyer...

An elegant, but complex constexpr solution, may be found here or here

Original answer

It is "safe" in the sense that this is valid c++ and you can access the returned pointer in all your program, as the static local will be initialized at first function call. There will be one static variable per type T used in your code.

But :

Why returning a non const pointer? This will allow callers to change the static variable value, which is clearly not something you would like
If returning a const pointer, I see no interest in not returning by value instead of returning the pointer

Also, this approach for getting a type id will only work at compile time, not at run time with polymorphic objects. So it will never return the derived class type from a base reference or pointer.

How will you initialize the static int values? Here you do not initialize them so this is not valid. Maybe you wanted to use the non const pointer for initializing them somewhere?

There are two better possibilities:

1)Specialize the template for all the types you want to support

template <typename T>
int type_id() {
    static const int id = typeInitCounter++;
    return id;
}

template <>
int type_id<char>() {
    static const int id = 0;
    return id;  //or : return 0
}

template <>
int type_id<unsigned int>() {
    static const int id = 1;
    return id;  //or : return 1
}

//etc...

2)Use a global counter

std::atomic<int> typeInitCounter = 0;

template <typename T>
int type_id() {
    static const int id = typeInitCounter++;
    return id;
}

This last approach is IMHO better because you don't have to manage types. And as pointed out by A.S.H, zero-based incremented counter allows using a vector instead of a map which is much more simple and efficient.

Also, use an unordered_map instead of a map for this, you do not need ordering. This gives you O(1) access instead of O(log(n))

Copper answered 26/1, 2017 at 6:56 Comment(9)

Your edit covers my earlier comment, so I deleted that. That said: "How will you initialize the static int values? Here you do not initialize them so this is not valid." -- Statics are implicitly initialised to zero. This is guaranteed by the standard. Even if they weren't, it wouldn't be a problem, since the statics aren't accessed. – Atli 26/1, 2017 at 7:4

It doesn't matter if they change the value of the static. What matters is its address only. You sort of missed the point here. Your single sentence about it being a run-time thing only is however the correct and biggest caveat. – Queston 26/1, 2017 at 7:12

The previous commenters are correct that it is only the address that matters. Besides, in reality I wrap the int* in a type_index type, as mentioned in the question. I am really interested in why my solution is/is not sound. What rules in C++ mean that the solution will/will not work? Maybe I should add the language-lawyer tag. – Timm 26/1, 2017 at 7:26

The "better" solution involves one additional object, requires people to remember to make the typeInitCounter atomic and is marginally less performant. Really the original is fine, but it'd be more sensible to return an opaque type or void*. – Strophanthus 26/1, 2017 at 7:40

Voo, that a subjective matter but with the "better" approach the intent is clear and easy to understand without a single comment, which for me is good. The pointer key approach (which I did not see at all at first answer, I admit) looks like an old C hack. It's clever, efficient, but the code does not self-explain at all what the intent is. Which in modern c++, imho, is bad. – Copper 26/1, 2017 at 8:13

FWIW, returning an auto-incremented identifier has an additional advantage IMO. That is, the handlers map in the processor object can be made of a vector instead of a map, and that's a great boost of performance since that map is mainly used as an indirection tool. – Simons 26/1, 2017 at 8:25

As an old C programmer, OP's intent was evident for me :-). But your analyzis that it is not the modern C++ way confirms my fear: future evolutions of optimizing compilers could not care much for such a pattern... – Sciomachy 26/1, 2017 at 9:14

@Copper I do like your version using atomic. It seems like the "proper" way to do things to me, and I would definitely use it if I wanted the most robust solution. – Timm 26/1, 2017 at 15:22

To be fair, "clever but not clear" is one of the best reasons to add a comment. // Maps each distinct template parameter T to a unique address. clarifies things nicely, for example. – Edmond 26/1, 2017 at 19:25

As mentioned by @StoryTeller, it works just fine at runtime.
It means you can't use it as it follows:

template<int *>
struct S {};

//...

S<type_id<char>()> s;

Moreover, it's not a fixed identifier. Therefore you have no guarantees that char will be bound to the same value through different runnings of your executable.

If you can deal with these limitations, it's just fine.

If you already know the types for which you want a persistent identifier, you can use something like this instead (in C++14):

template<typename T>
struct wrapper {
    using type = T;
    constexpr wrapper(std::size_t N): N{N} {}
    const std::size_t N;
};

template<typename... T>
struct identifier: wrapper<T>... {
    template<std::size_t... I>
    constexpr identifier(std::index_sequence<I...>): wrapper<T>{I}... {}

    template<typename U>
    constexpr std::size_t get() const { return wrapper<U>::N; }
};

template<typename... T>
constexpr identifier<T...> ID = identifier<T...>{std::make_index_sequence<sizeof...(T)>{}};

And creates your identifiers as it follows:

constexpr auto id = ID<int, char>;

You can use those identifiers more or less as you did with your other solution:

handlers[id.get<T>()] = ...

Moreover, you can use them wherever a constant expression is required.
As an example as a template parameter:

template<std::size_t>
struct S {};

// ...

S<id.get<B>()> s{};

In a switch statement:

    switch(value) {
    case id.get<char>():
         // ....
         break;
    case id.get<int>():
        // ...
        break;
    }
}

And so on. Note also that they are persistent through different runnings as long as you don't change the position of a type in the template parameter list of ID.

The main drawback is that you must know all the types for which you need an identifier when you introduce the id variable.

Menchaca answered 26/1, 2017 at 7:50 Comment(4)

OP's version can be adapted so that it can be used as a constant expression, with the caveat that it will work nicely as a template argument only from C++17 onwards. – Helotism 26/1, 2017 at 9:15

@Helotism Yeah, sure, that's definitely another valid approach. It still has the problem that identifiers change through different executions. Am I wrong? – Menchaca 26/1, 2017 at 9:32

That's correct, there's no guarantee that they won't change between executions. – Helotism 26/1, 2017 at 9:41

@Helotism The code above is what I get out of my mind when I was trying to define something that gives me consistenst identifiers through different executions. I found it's worth it to share the code with the OP, for he has more or less the problems I had at the time. That's all. ;-) – Menchaca 26/1, 2017 at 9:44

Hot tags

Godot Unity Godot Help Programming Godot 4.X GUI GDScript 3D 2D Physics CSharp Godot 3.X VR XR Projects C++

Conclusion

Recommended topics

Hot tags