Is it safe to use the address of a static local variable within a function template as a type identifier?
Asked Answered
T

4

40

I wish to create an alternative to std::type_index that does not require RTTI:

template <typename T>
int* type_id() {
    static int x;
    return &x;
}

Note that the address of the local variable x is used as the type ID, not the value of x itself. Also, I don't intend to use a bare pointer in reality. I've just stripped out everything not relevant to my question. See my actual type_index implementation here.

Is this approach sound, and if so, why? If not, why not? I feel like I am on shaky ground here, so I am interested in the precise reasons why my approach will or will not work.

A typical use case might be to register routines at run-time to handle objects of different types through a single interface:

class processor {
public:
    template <typename T, typename Handler>
    void register_handler(Handler handler) {
        handlers[type_id<T>()] = [handler](void const* v) {
            handler(*static_cast<T const*>(v));
        };
    }

    template <typename T>
    void process(T const& t) {
        auto it = handlers.find(type_id<T>());
        if (it != handlers.end()) {
            it->second(&t);
        } else {
            throw std::runtime_error("handler not registered");
        }
    }

private:
    std::map<int*, std::function<void (void const*)>> handlers;
};

This class might be used like so:

processor p;

p.register_handler<int>([](int const& i) {
    std::cout << "int: " << i << "\n";
});
p.register_handler<float>([](float const& f) {
    std::cout << "float: " << f << "\n";
});

try {
    p.process(42);
    p.process(3.14f);
    p.process(true);
} catch (std::runtime_error& ex) {
    std::cout << "error: " << ex.what() << "\n";
}

Conclusion

Thanks to everyone for your help. I have accepted the answer from @StoryTeller as he has outlined why the solution should be valid according the rules of C++. However, @SergeBallesta and a number of others in the comments have pointed out that MSVC performs optimizations which come uncomfortably close to breaking this approach. If a more robust approach is needed, then a solution using std::atomic may be preferable, as suggested by @galinette:

std::atomic_size_t type_id_counter = 0;

template <typename T>
std::size_t type_id() {
    static std::size_t const x = type_id_counter++;
    return x;
}

If anyone has further thoughts or information, I am still eager to hear it!

Timm answered 26/1, 2017 at 6:41 Comment(17)
@A.S.H I've added a typical use case. Obviously I've simplified it, but I have used this pattern in actual production code.Timm
Thanks for sharing. Please take a look at my comment to the answer of @galinette. since the handlers map is mainly used as an indirection tool for the execution, it is expected to be much faster if replaced by a vector, and using some auto-incremented enum for the types instead of some random pointers.Simons
Side note : use an unordered_map instead of a map, the performance may vary a lot and you don't use any orderingCopper
@Simons It would undoubtedly be faster, though you could end up with a bunch of unused entries in the vector if the processor wasn't the only thing using type_id. An unordered_map or a sorted vector would probably be faster than a map. In reality, the "handlers" would be doing far more work than they are in the example, so the lookup time for the routines would probably be negligible.Timm
If you really want to use pointer as type id, at least make it more clear in the code : do not return a pointer, use uintptr_t, which is the unsigned integer type which has the same size as a pointer type, or typedef it. And comment!Copper
@Copper I admit that naming the local variable id is misleading, since the address is the actual ID, but I thought the question would be clear from the title. Of course I would comment this code in real life. And I chose not to cast to uintptr_t or void* because I wanted to keep things as simple as possible. I'd rather not risk invoking additional C++ voodoo if it isn't necessary.Timm
I've improved my answer with a (possibly hacky) way to secure that pattern against optimizing compilers...Sciomachy
have you seen this stackoverflow question? https://mcmap.net/q/276988/-compile-time-constant-idDollar
This seems to be more suitable for code review.Lowdown
You can even use function pointer of an inlined template function as a type id. Just let the function return a pointer to itself. No need for the static variable.Adenectomy
@Adenectomy That's a good idea, though it would have to be a different function. A function returning a pointer to itself will have an infinitely recursive return type!Timm
The type could just be a void *.Barela
@Barela Function pointers cannot be cast to void* (isocpp.org/wiki/faq/…).Timm
@JosephThomson You are right ... but afaik you can safely cast every function pointer to any other function pointer type. https://mcmap.net/q/408639/-cast-a-function-pointerAdenectomy
@Adenectomy But it does say that the result of such a conversion is unspecified. While I doubt that anything unexpected would happen in practice, doesn't this mean that technically it might (e.g. you might get two different pointers from two separate reinterpret_casts of the same pointer).Timm
@Adenectomy Using just a function pointer would be far more risky on MSVC. It's very likely that all function template instantiations will be folded into one with the /opt:icf linker option.Helotism
@Helotism I was just about to ask about that. Just gave it a test on MSVC, and it does indeed fold them all into one. Scratch that idea.Timm
Q
27

Yes, it will be correct to an extent. Template functions are implicitly inline, and static objects in inline functions are shared across all translation units.

So, in every translation unit, you will get the address of the same static local variable for the call to type_id<Type>(). You are protected here from ODR violations by the standard.

Therefore, the address of the local static can be used as a sort of home-brewed run-time type identifier.

Queston answered 26/1, 2017 at 7:27 Comment(6)
The problem is not whether a type will always be represented by same address, but whether 2 different types could share same address...Sciomachy
@SergeBallesta - type_id<int>() and type_id<float>() are completely different functions. I see no leeway in the standard that allows them to share the same static locals.Queston
@SergeBallesta - In fact §14.8 ¶2Queston
I've read again 14.8.2 . It does declare that templace functions all have their own copy of the static variable, no problem on it. My question is whether an optimizing compiler can merge variables sharing the same value all along the program.Sciomachy
@SergeBallesta Not if their addresses are observable ([intro.object]/8). That said, I've seen MSVC with the /opt:icf linker option merge some COMDATs in non-standard ways (the behaviour it at least documented). I'm not sure if OP's solution would be affected, but the one I gave here shouldn't be, as the variable id is not const.Helotism
@SergeBallesta After doing a couple of tests, it looks like OP's code will be safe on MSVC as long as the local static is not const. MSVC's non-standard behaviour was discussed here. That question uses MSVC 2013. In my tests, MSVC 2015 U3 doesn't have the problem (at least in a simple test), but the latest 2017 RC does. Non-const works fine across versions.Helotism
S
12

This is coherent with standard because C++ use templates and not generics with type erasure like Java so each declared type will have its own function implementation containing a static variable. All those variables are different and as such should have different addresses.

The problem is that their value is never used and worse never changed. I remember that the optimizers can merge string constants. As optimizers do their best to be far more clever than any human programmer, I will be afraid that a too zealous optimizing compiler discover that as those variable values are never changed, they will all keep a 0 value, so why not merge them all to save memory?

I know that because of the as if rule, the compiler is free to do what it wants provided the observable results are the same. And I am not sure that the addresses of static variables that will always share the same value shall be different or not. Maybe someone could confirm what part of the standard actually cares for it?

Current compilers still compile separately program units, so they cannot be sure whether another program unit will use or change the value. So my opinion is that the optimizer will not have enough information to decide to merge the variable, and your pattern is safe.

But as I really do not think that standard protects it, I cannot say whether future versions of C++ builders (compiler + linker) will not invent a global optimizing phase actively searching for unchanged variables that could be merged. More or less the same as they actively search UB to optimize out parts of code... Only common patterns, where not allowing them would break a too large code base are protected of it, and I do not think that yours is common enough.

A rather hacky way to prevent an optimizing phase to merge variables having same value would just be to give each one a different value:

int unique_val() {
    static int cur = 0;  // normally useless but more readable
    return cur++;
}
template <typename T>
void * type_id() {
    static int x = unique_val();
    return &x;
}

Ok, this does not even try to be thread safe, but it not a problem here: the values will never be used per themselves. But you now have different variables having static duration (per 14.8.2 of standard as said by @StoryTeller), that except in race conditions have different values. As they are odr used they must have different addresses and you should be protected for future improvement of optimizing compilers...

Note: I think that as the value will not be used, returning a void * sounds cleaner...


Just an addition stolen from a comment from @bogdan. MSVC is known to have very aggressive optimization with the /OPT:ICF flag. The discussion suggest that is should not be conformant, and that it only applies to variable marked as const. But it enforces my opinion that even if OP's code seems conformant, I would not dare to use it without additional precautions in production code.

Sciomachy answered 26/1, 2017 at 7:49 Comment(21)
What if its declared static volatile? Will that keep the optimizer at bay?Tatia
why not declaring a static point of type T? in that case the variables will be of a different types, so it should prevent cleaver optimizationDollar
@AlessandroTeruzzi A constant value of 0 for an int, long, char and their unsigned equivalent could all share same address...Sciomachy
You could use a type like template <typename T> struct dummy. Presumably the compiler wouldn't be able to optimize those instances away. Then return a void*, assuming there isn't any problem with using ==, < and hash on void* pointers that point to different types of objects.Timm
@JosephThomson: AFAIK, < on pointers to objects not being members of a same array, or bytes not belonging to same object is UB... Why do you need that? But it could be a different question...Sciomachy
I meant less not <. My question is whether it is okay to compare and hash void* in this way. And whether this would stop the opimizer.Timm
uintptr_t x = (uintptr_t)&x; merge THAT, you smarty pants compiler!Par
"And I am not sure that the addresses of static variables that will always share the same value shall be different or not." – "Two objects that are not bit-fields may have the same address if one is a subobject of the other, or if at least one is a base class subobject of zero size and they are of different types; otherwise, they shall have distinct addresses." (§1.8/6 in N4140). No point in wasting time to protect against potential future insanity IMO.Mcclary
FWIW the comparison with generics is not very relevant: In C# (and other .NET languages), a static member of a generic type will have an instance per reification, just like with templates. In Java, OTOH, you'll have only one instance of a static field in a generic class, due to type erasure (you can't even specify the type parameters when accessing a static field of a generic class). So it very much depends on the way the language implements generics, not on the mechanism itself.Lutero
@ArneVogel: The note 4) in same paragraph says Under the “as-if” rule an implementation is allowed to store two objects at the same machine address or not store an object at all if the program cannot observe the difference.Sciomachy
@LucasTrzesniewski: I was thinking about Java generics. I have made it more explicit.Sciomachy
@Serge Ballesta: If my well-formed program cannot observe the difference, why would I care? The hack you propose may actually introduce UB in a multi-threaded program, as you yourself point out, which is IMO a worse can of worms. Obviously an atomic would solve that issue, but then again, I'm not convinced you're trying to solve a problem that currently exists… By the way, implementations are allowed to merge string literals… "string constants" is an ambiguous term here. They are certainly not allowed to merge objects of type const char[].Mcclary
@n.m I think casting to uintptr_t may not be guaranteed to work, since pointers may have multiple integer representations (en.cppreference.com/w/cpp/language/reinterpret_cast).Timm
@SergeBallesta: Well of course the as-if rule applies - but if the program is going to compare the address of the two objects and behave differently if they are the same, then the program can observe the difference, and as-if does not apply. That's just saying that if you don't take the address of the objects, they can overlap (or if you do, that you don't compare these addresses).Unreadable
@JosephThomson sorry it's a joke. different objects are guaranteed to have distinct addresses (unless subobjects are involved, which are not present here).Par
@JosephThomson : in your link from cppreference, I read : "2) Any pointer can be converted to any integral type large enough to hold the value of the pointer (e.g. to std::uintptr_t)"Copper
@Copper Yes, but it doesn't specify what the result of the conversion will be. It says that a pointer may have multiple integer representations: "the round-trip conversion in the opposite direction is not guaranteed; the same pointer may have multiple integer representations". This would mean that the uintptr_t may not necessarily be a unique identifier.Timm
@JosephThomson : you misread the whole sentence. "A pointer converted to an integer of sufficient size and back to the same pointer type is guaranteed to have its original value, otherwise the resulting pointer cannot be dereferenced safely (the round-trip conversion in the opposite direction is not guaranteed; the same pointer may have multiple integer representations)" Means if the integer is large enough, the round-trip is guaranteed. If the integer is not large enough, it is not garanteed. uintptr_t is there exactly for that purpose since size is guaranteed if this type is defined.Copper
@Copper I don't think so. It specifically talks about the round trip conversion from integer to pointer to integer not being guaranteed, since pointers may have multiple integer representations. I think this is an interpretation of the standard where it says, "The mapping function [from pointer to integer] is implementation-defined."Timm
"A pointer converted to an integer of sufficient size and back to the same pointer type is guaranteed to have its original value". Again I am 100% sure of this, and this is the very reason for uintptr_t existence. If a compiler cannot guarantee the pointer<>integer conversion, it must not declare this type. This is why the type is optional.Copper
@Copper All that appears to be guaranteed is that casting from pointer to integer (of sufficient size) and back to pointer again will give you the original pointer. The standard doesn't specify what the result of the pointer to integer conversion will be though, so it is technically not safe to assume that casting a pointer to an integer will always give you the same value. The pointer may have multiple integer representations. Whether or not this happens is practice is another question.Timm
C
6

Post-comment edit : I did not realize at first read that the address was used as the key, not the int value. That's a clever approach, but it suffers IMHO a major flaw : the intent is very unclear if someone else finds that code.

It looks like an old C hack. It's clever, efficient, but the code does not self-explain at all what the intent is. Which in modern c++, imho, is bad. Write code for programmers, not for compilers. Unless you have proven that there is a serious bottleneck which requires bare metal optimization.

I would say it should work but I'm clearly not a language lawyer...

An elegant, but complex constexpr solution, may be found here or here

Original answer

It is "safe" in the sense that this is valid c++ and you can access the returned pointer in all your program, as the static local will be initialized at first function call. There will be one static variable per type T used in your code.

But :

  • Why returning a non const pointer? This will allow callers to change the static variable value, which is clearly not something you would like
  • If returning a const pointer, I see no interest in not returning by value instead of returning the pointer

Also, this approach for getting a type id will only work at compile time, not at run time with polymorphic objects. So it will never return the derived class type from a base reference or pointer.

How will you initialize the static int values? Here you do not initialize them so this is not valid. Maybe you wanted to use the non const pointer for initializing them somewhere?

There are two better possibilities:

1)Specialize the template for all the types you want to support

template <typename T>
int type_id() {
    static const int id = typeInitCounter++;
    return id;
}

template <>
int type_id<char>() {
    static const int id = 0;
    return id;  //or : return 0
}

template <>
int type_id<unsigned int>() {
    static const int id = 1;
    return id;  //or : return 1
}

//etc...

2)Use a global counter

std::atomic<int> typeInitCounter = 0;

template <typename T>
int type_id() {
    static const int id = typeInitCounter++;
    return id;
}

This last approach is IMHO better because you don't have to manage types. And as pointed out by A.S.H, zero-based incremented counter allows using a vector instead of a map which is much more simple and efficient.

Also, use an unordered_map instead of a map for this, you do not need ordering. This gives you O(1) access instead of O(log(n))

Copper answered 26/1, 2017 at 6:56 Comment(9)
Your edit covers my earlier comment, so I deleted that. That said: "How will you initialize the static int values? Here you do not initialize them so this is not valid." -- Statics are implicitly initialised to zero. This is guaranteed by the standard. Even if they weren't, it wouldn't be a problem, since the statics aren't accessed.Atli
It doesn't matter if they change the value of the static. What matters is its address only. You sort of missed the point here. Your single sentence about it being a run-time thing only is however the correct and biggest caveat.Queston
The previous commenters are correct that it is only the address that matters. Besides, in reality I wrap the int* in a type_index type, as mentioned in the question. I am really interested in why my solution is/is not sound. What rules in C++ mean that the solution will/will not work? Maybe I should add the language-lawyer tag.Timm
The "better" solution involves one additional object, requires people to remember to make the typeInitCounter atomic and is marginally less performant. Really the original is fine, but it'd be more sensible to return an opaque type or void*.Strophanthus
Voo, that a subjective matter but with the "better" approach the intent is clear and easy to understand without a single comment, which for me is good. The pointer key approach (which I did not see at all at first answer, I admit) looks like an old C hack. It's clever, efficient, but the code does not self-explain at all what the intent is. Which in modern c++, imho, is bad.Copper
FWIW, returning an auto-incremented identifier has an additional advantage IMO. That is, the handlers map in the processor object can be made of a vector instead of a map, and that's a great boost of performance since that map is mainly used as an indirection tool.Simons
As an old C programmer, OP's intent was evident for me :-). But your analyzis that it is not the modern C++ way confirms my fear: future evolutions of optimizing compilers could not care much for such a pattern...Sciomachy
@Copper I do like your version using atomic. It seems like the "proper" way to do things to me, and I would definitely use it if I wanted the most robust solution.Timm
To be fair, "clever but not clear" is one of the best reasons to add a comment. // Maps each distinct template parameter T to a unique address. clarifies things nicely, for example.Edmond
M
6

As mentioned by @StoryTeller, it works just fine at runtime.
It means you can't use it as it follows:

template<int *>
struct S {};

//...

S<type_id<char>()> s;

Moreover, it's not a fixed identifier. Therefore you have no guarantees that char will be bound to the same value through different runnings of your executable.

If you can deal with these limitations, it's just fine.


If you already know the types for which you want a persistent identifier, you can use something like this instead (in C++14):

template<typename T>
struct wrapper {
    using type = T;
    constexpr wrapper(std::size_t N): N{N} {}
    const std::size_t N;
};

template<typename... T>
struct identifier: wrapper<T>... {
    template<std::size_t... I>
    constexpr identifier(std::index_sequence<I...>): wrapper<T>{I}... {}

    template<typename U>
    constexpr std::size_t get() const { return wrapper<U>::N; }
};

template<typename... T>
constexpr identifier<T...> ID = identifier<T...>{std::make_index_sequence<sizeof...(T)>{}};

And creates your identifiers as it follows:

constexpr auto id = ID<int, char>;

You can use those identifiers more or less as you did with your other solution:

handlers[id.get<T>()] = ...

Moreover, you can use them wherever a constant expression is required.
As an example as a template parameter:

template<std::size_t>
struct S {};

// ...

S<id.get<B>()> s{};

In a switch statement:

    switch(value) {
    case id.get<char>():
         // ....
         break;
    case id.get<int>():
        // ...
        break;
    }
}

And so on. Note also that they are persistent through different runnings as long as you don't change the position of a type in the template parameter list of ID.

The main drawback is that you must know all the types for which you need an identifier when you introduce the id variable.

Menchaca answered 26/1, 2017 at 7:50 Comment(4)
OP's version can be adapted so that it can be used as a constant expression, with the caveat that it will work nicely as a template argument only from C++17 onwards.Helotism
@Helotism Yeah, sure, that's definitely another valid approach. It still has the problem that identifiers change through different executions. Am I wrong?Menchaca
That's correct, there's no guarantee that they won't change between executions.Helotism
@Helotism The code above is what I get out of my mind when I was trying to define something that gives me consistenst identifiers through different executions. I found it's worth it to share the code with the OP, for he has more or less the problems I had at the time. That's all. ;-)Menchaca

© 2022 - 2024 — McMap. All rights reserved.