Should I use integer ID or pointers for my opaque objects?

O

3

6

I'm writing an abstraction layer on top of some graphics API (DirectX9 and DirectX11) and I would like your opinion.

Traditionally I would create a base class for each concept I want to abstract.
So in typical OO fashion I would have for example a class Shader and 2 subclasses DX9Shader and DX11Shader.

I would repeat the process for textures, etc... and when I need to instantiate them I have an abstract factory that will return the appropriate subclass depending on the current graphics API.
Following RAII, the returned pointer would be encapsulated in a std::shared_ptr.

So far so good but in my case there are a few problems with this approach:

I need to come up with a public interface that encapsulate the functionality of both APIs (and other APIs in the future).
The derived class are stored in separate DLLs (one for DX9, one for DX11 etc...) and having a shared_ptr to them in the client is a curse: on exit the graphic dlls are unloaded and if the client still has a shared_ptr to one of the graphics objects boom, crash due to calling code from unloaded DLL.

This prompted me to re-design the way I do things: I thought I could just return raw pointers to the resources and have the graphics API clean after itself but there's still the issue of dangling pointers on the client side and the interface problem. I even considered manual reference counting like COM but I thought that would be a step backwards (correct me if I'm wrong, coming from the shared_ptr world, manual reference counting seems primitive).

Then I saw the work of Humus where all his graphics classes are represented by integer IDs (much like what OpenGL does). Creating a new object only returns its integer ID, and stores the pointer internally; it's all perfectly opaque!

The classes that represent the abstraction (such as DX9Shader etc...) are all hidden behind the device API which is the only interface.
If one wants to set a texture, it's just a matter of calling device->SetTexture(ID) and the rest happens behind the scenes.

The downfall is that the hidden part of the API is bloated, there is a lot of boiler plate code required to make it work and I'm not a fan of a do-it-all class.

Any ideas/thoughts ?

Overthrust answered 30/11, 2011 at 9:50 Comment(0)

E

3

You say that the main problem is that a DLL is unloaded while still having a pointer to its internals. Well... don't do that. You have a class instance, who's members are implemented in that DLL. It is fundamentally an error for that DLL to be unloaded so long as those class instances exist.

You therefore need to be responsible in how you use this abstraction. Just as you need to be responsible with any code you load from a DLL: stuff that comes from the DLL must be cleaned up before you unload the DLL. How you do that is up to you. You could have an internal reference count that gets incremented for every object the DLL returns and only unload the DLL after all referenced objects go away. Or anything, really.

After all, even if you use these opaque numbers or whatever, what happens if you call one of those API functions on that number when the DLL is unloaded? Oops... So it doesn't really buy you any protection. You have to be responsible either way.

The downsides of the number method that you may not be thinking about are:

Reduced ability to know what an object actually is. API calls can fail because you passed a number that isn't really an object. Or worse, what happens if you pass a shader object into a function that takes a texture? Maybe we're talking about a function that takes a shader and a texture, and you accidentally forget the order of the arguments? The rules of C++ wouldn't allow that code to even compile if those were object pointers. But with integers? It's all good; you'd only get runtime errors.
Performance. Every API call will have to look this number up in a hashtable or something to get an actual pointer to work with. If it's a hashtable (ie: an array), then it's probably fairly minor. But it's still an indirection. And since your abstraction seems very low-level, any performance loss at this level can really hurt in performance-critical situations.
Lack of RAII and other scoping mechanisms. Sure, you could write a shared_ptr-esque object that would create and delete them. But you wouldn't have to do that if you were using an actual pointer.

It just doesn't seem worthwhile.

Edaedacious answered 30/11, 2011 at 10:6 Comment(4)

I agree with you, but I can't be responsible for what the client does with the pointers I return to them. On my side I do the clean up, but if they hold onto pointers longer than they should what can I do about it ? The idea was to use shared_ptr to do the reference counting for me, so should I check the ref count of all objects before unloading the library ? What happens in the case the client forgot to release his/hers shared_ptr ? Then we've got a leak/deadlock. – Overthrust 30/11, 2011 at 10:26

I think it's easier to detect an invalid handle (not found in the hash_map), than to detect an invalid pointer (unless it's stored in a hash_map itself in which case we pay the cost of the lookup, and still risk an invalid pointer indirection). Your point about type safety is very good, I totally agree there. However I don't remove RAII altogether; I still store those objects internally before I return them. – Overthrust 30/11, 2011 at 10:26

"On my side I do the clean up, but if they hold onto pointers longer than they should what can I do about it ?" Do what happens when you create an object, delete it, and then try to call a function on the deleted pointer. IE: crash. I simply don't see the need to go to such lengths just to trap this kind of error. After all, what are you going to do about it even if you catch it? The most you might do is print a nicer error than a SegFault/GPF. But you're still going to have to ASSERT or something of that nature; you can't just continue the program as though everything is fine. – Edaedacious 30/11, 2011 at 10:37

Yep there's no going around that, and that's the most sensible thing to do. – Overthrust 30/11, 2011 at 10:48

C

3

Does it matter? To the user of the object, it is just an opaque handle. its actual implementation type doesn't matter, as long as I can pass the handle to your API functions and have them do stuff with the object.

You can change the implementation of these handles easily, so make it whatever is easier for you now.

Just declare the handle type as a typedef of either a pointer or an integer, and make sure that all client code uses the typedef name, then the client code doesn't depend on the specific type you chose to represent your handles.

Go for the simple solution now, and if/when you run into problems because that was too simple, change it.

Carrelli answered 30/11, 2011 at 9:56 Comment(4)

That's the problem, neither are simple: returning a raw pointer I need to have a public interface for each class and deal with invalid pointers. returning an id I need to move all the implementation code in one class and implement a bit of boiler plate code. This seems like a tight trade-off to me; at the moment I have both implemented and can't decide which one to use. I agree that for the client it doesn't matter, but it's all about implementation details here. I also agree that I should go for the simpler version, if there was one ! – Overthrust 30/11, 2011 at 10:5

You don't need to put everything into one single class for integer ids to work. And you don't need to provide a public interface per class if using pointers. (Like I said, the client can treat the pointer as a opaque handle which is passed around just like an integer id would be, and then you internally treat and use it as a pointer) – Carrelli 30/11, 2011 at 10:20

That's a good point, I've seen that paradigm at work before in C. The handle would be a typedef void* and the client uses a set of functions that take that handle as a parameter. I just thought with C++ we'd have better tools/ways to deal with those situations. – Overthrust 30/11, 2011 at 10:33

Well, you have more tools to deal with those situations. Which one is better depends on your requirements. :) – Carrelli 30/11, 2011 at 11:41

E

3

You say that the main problem is that a DLL is unloaded while still having a pointer to its internals. Well... don't do that. You have a class instance, who's members are implemented in that DLL. It is fundamentally an error for that DLL to be unloaded so long as those class instances exist.

You therefore need to be responsible in how you use this abstraction. Just as you need to be responsible with any code you load from a DLL: stuff that comes from the DLL must be cleaned up before you unload the DLL. How you do that is up to you. You could have an internal reference count that gets incremented for every object the DLL returns and only unload the DLL after all referenced objects go away. Or anything, really.

After all, even if you use these opaque numbers or whatever, what happens if you call one of those API functions on that number when the DLL is unloaded? Oops... So it doesn't really buy you any protection. You have to be responsible either way.

The downsides of the number method that you may not be thinking about are:

Reduced ability to know what an object actually is. API calls can fail because you passed a number that isn't really an object. Or worse, what happens if you pass a shader object into a function that takes a texture? Maybe we're talking about a function that takes a shader and a texture, and you accidentally forget the order of the arguments? The rules of C++ wouldn't allow that code to even compile if those were object pointers. But with integers? It's all good; you'd only get runtime errors.
Performance. Every API call will have to look this number up in a hashtable or something to get an actual pointer to work with. If it's a hashtable (ie: an array), then it's probably fairly minor. But it's still an indirection. And since your abstraction seems very low-level, any performance loss at this level can really hurt in performance-critical situations.
Lack of RAII and other scoping mechanisms. Sure, you could write a shared_ptr-esque object that would create and delete them. But you wouldn't have to do that if you were using an actual pointer.

It just doesn't seem worthwhile.

Edaedacious answered 30/11, 2011 at 10:6 Comment(4)

I agree with you, but I can't be responsible for what the client does with the pointers I return to them. On my side I do the clean up, but if they hold onto pointers longer than they should what can I do about it ? The idea was to use shared_ptr to do the reference counting for me, so should I check the ref count of all objects before unloading the library ? What happens in the case the client forgot to release his/hers shared_ptr ? Then we've got a leak/deadlock. – Overthrust 30/11, 2011 at 10:26

I think it's easier to detect an invalid handle (not found in the hash_map), than to detect an invalid pointer (unless it's stored in a hash_map itself in which case we pay the cost of the lookup, and still risk an invalid pointer indirection). Your point about type safety is very good, I totally agree there. However I don't remove RAII altogether; I still store those objects internally before I return them. – Overthrust 30/11, 2011 at 10:26

"On my side I do the clean up, but if they hold onto pointers longer than they should what can I do about it ?" Do what happens when you create an object, delete it, and then try to call a function on the deleted pointer. IE: crash. I simply don't see the need to go to such lengths just to trap this kind of error. After all, what are you going to do about it even if you catch it? The most you might do is print a nicer error than a SegFault/GPF. But you're still going to have to ASSERT or something of that nature; you can't just continue the program as though everything is fine. – Edaedacious 30/11, 2011 at 10:37

Yep there's no going around that, and that's the most sensible thing to do. – Overthrust 30/11, 2011 at 10:48

D

0

Regarding your p. 2: Client is always unloaded before libraries.

Every process has its library dependency tree, with .exe as tree root, user Dll at intermediate levels, and system libraries at low level. Process is loaded from low to high level, tree root (exe) is loaded last. Process is unloaded starting from the root, low-level libraries are unloaded last. This is done to prevent situations you are talking about.

Of course, if you load/unload libraries manually, this order is changed, and you are responsible to keep pointers valid.

Dunn answered 30/11, 2011 at 9:59 Comment(1)

Yep that's exactly my situation, I manually load the graphic dlls since on some system they might fail to load (DX11 on Windows XP for example). – Overthrust 30/11, 2011 at 10:8

Recommended topics

Hot tags