Is there a way to write a BSTR literal?
Asked Answered
P

2

6

When calling a function that expects a BSTR it'd be nice to be able to write something like:

iFoo->function( bs"HELLO" );

However the only workaround I'm aware of is to use a wrapper that calls SysAllocString etc., e.g.:

iFoo->function( WideString(L"HELLO").c_bstr() );

which is kind of ugly. Is there actually such an option to create a BSTR literal?

Motivation: easier-to-read code, and faster runtime performance by avoiding an allocation and deallocation.

Clarification: I am only talking about situations where the caller (i.e. us) has ownership of the BSTR, for example: calling a function that takes a BSTR [in] parameter. Of course, it would be silly to supply a pointer to a BSTR literal to a function which will go on to try and free the string.

Paralogism answered 13/1, 2015 at 9:31 Comment(18)
A very ugly solution is L"\xA\0" "HELLO" + 2Paralogism
Not only ugly, wait until someone calls SysFreeString() on that :)Minimal
BSTR's are by-definition dynamic managed. The concept could certainly be tossed into a literal, but it wouldn't be a BSTR. And it would be disastrous if ever used in a place where it was eventually free'd. The OLE libs have many places (variant functions, marshaller code, etc) where things like VARIANT members are managed behind the scenes. Placement of something like this in such a place would be disastrous. You could always punt and just use any of the canned BSTR smart pointer classes like bstr_t or CComBSTR.Biddick
@FrédéricHamidi Functions that accept BSTR shouldn't be freeing it (i.e. memory is managed by the caller), this is the same as complaining that you can't do printf("hello"); because printf might free the string!Paralogism
@Biddick If the called function does free the string (and we did not use the literal I am suggesting) it'd break the program anyway as there would be a double free.Paralogism
@MattMcNabb Why would you free a BSTR that is being freed already? VariantChangeType for example, Called from a loaded VARIANT with a "literal" invalid BSTR for "1000" changing to VT_LONG would be UB from the rafters. BSTR [in,out] marshalling by-definition will free the input BSTR and replace it with an output-BSTR multiple times. There is zero sense in literal BSTRs. I just reread your comment and I think you agree. I concur, something being sent a [in] BSTR should not be freeing it (except SysFreeString of course).Biddick
@Biddick When creating a VARIANT that owns its BSTR you would allocate one. BSTRs have well defined ownership semantics. Only the owner should free the string. There is no [in, out] BSTR, only BSTR * which has callee-ownership.Paralogism
@MattMcNabb Declaring [in,out] BSTR in MIDL will create BSTR * in the generated header and proxy/stub (it did last I checked anyway; been awhile). I completely agree its all about ownership.Biddick
@Biddick OK, so when it is a caller-ownership situation then why shouldn't the caller be able to use a literal in order to avoid wasting time with an allocation and deallocation?Paralogism
Answering my own question... I guess that some memory checker tool might try and check that any pointer supplied to a function expecting BSTRs actually corresponds with something that exists in the OLE allocation tableParalogism
@MattMcNabb that is a good question. Since everything- BSTR is supposed to play by the rules, how it is built is not up to you; its up to MS. They document them, which is nice, but they're also free to change that. Trying to literalize that is easily, if not more, tedious than just playing by the rules. Ex: your proposed "very ugly" solution is not conforming (it doesn't have two terminating nulls). Why would you want to do that? And you can toss BSTR-caching, which COM does for you, out entirely.Biddick
Great brain-food, btw. uptick =PBiddick
@Biddick I always took "two null characters" to mean two null narrow characters (i.e. one null wide character) - because requiring two null wide characters is just too strange! :)Paralogism
It caught me somewhat by surprise as well, since the rest of the documentation freely interchanges "characters" with wide or narrow depending on the context. I wish I had a Windows box to verify, but as memory serves there are 32-bits of nothingness at the end of a valid BSTR. If you have one handy (a windows box) I'm truly curious which it is. Otherwise I'll check tomorrow at work and report back.Biddick
@Biddick Tried it just now , here is the result ...Paralogism
Definitely worth noting in your question imho. The ABABAB looks like typical MS debug fill. I humbly apologize for my inaccuracy. (and makes me wonder where did I see all those added octets). Now I'm genuinely curious if MS just ignores state-junk they may be keeping when given a non-rule-allocated BSTR in their functions. Good question!Biddick
@Biddick AB stands for allocated block i.e. an uninitialized memory block that was allocated with LocalAlloc().Mileage
Similar question: #20265116Paralogism
R
4

To follow up on @MSalters's answer, a custom user-defined literal could look something like this:

CComBSTR operator "" _bstr (const char* str, std::size_t len)
{
    return CComBSTR(len, str);
}

Then you can do this (as CComBSTR has a BSTR conversion operator defined):

iFoo->function( "HELLO"_bstr );

You can even overload the operator for multiple input string literal types:

CComBSTR operator "" _bstr (const wchar_t* str, std::size_t len)
{
    return CComBSTR(len, str);
}

CComBSTR operator "" _bstr (const char16_t* str, std::size_t len)
{
    return CComBSTR(len, (wchar_t*)str);
}

iFoo->function( L"HELLO"_bstr ); // calls wchar_t* version with UTF-16 encoded data

iFoo->function( u"HELLO"_bstr ); // calls char16_t* version with UTF-16 encoded data

iFoo->function( u8"HELLO"_bstr ); // calls char* version with UTF-8 encoded data...

Note the last case. Since the operator will not know whether it is being passed ANSI or UTF-8 data, and CComBSTR assumes ANSI when passed char* data, you should use a different literal suffix to differentiate so you can convert the UTF-8 correctly, eg:

CComBSTR operator "" _utf8bstr (const char* str, std::size_t len)
{
    std::wstring_convert<std::codecvt_utf8_utf16<wchar_t>, wchar_t> conv;
    std::wstring wstr = conv.from_bytes(std::string(str, len));
    return CComBSTR(wstr.length(), wstr.c_str());
}

iFoo->function( u8"HELLO"_utf8bstr );
Rigorous answered 13/1, 2015 at 19:16 Comment(3)
This actually looks like a good case for user-defined literals.Lux
One must be careful not to write BSTR b = "hello"_bstr;Paralogism
Yup, you would have to use CComBSTR b = "hello"_bstr; or better auto b = "hello"_bstr; instead.Rigorous
L
4

User defined literals would be the way to go:

"HELLO"_bstr calls template<char...> BSTR operator "" _bstr ( const char*, std::size_t) which can then call SysAllocString()

New in VS14.

[edit]

Based on the comments, it might be better to return a _bstr_t or other class which takes ownership of the SysAllocString() result and implicitly converts to BSTR. This temporary will be destroyed at the end of the full expression, and therefore after iFoo->function( "HELLO"_bstr ); returns.

Lux answered 13/1, 2015 at 11:12 Comment(9)
That solves the "ugly syntax" problem, although it causes a memory leak if used like iFoo->function( "hello"_bstr );.Paralogism
@MattMcNabb: Isn't the destructor called when iFoo->function returns?Variola
@MattMcNabb: I'd have to check the exact rules. It's fixable by returning a temporary which implicitly converts to BSTR but frees the string afterwards. But I thought that passing a BSTR implies passing ownership.Lux
@TonyK: A BSTR is just a pointer, there is no destructor to call if the operator returns a BSTR directly. You would have to make the operator return an instance of a class instead, and that class could have a BSTR conversion operator defined.Rigorous
@MSalters: simply passing a BSTR does not imply passing ownership. It depends on the contract of the parameter. If the parameter is marked as [out] or [out,retval], the function allocates the BSTR and passes ownership to the caller. If the parameter is marked as [in,out], the function is allowed to reallocate the source BSTR, and the caller maintains ownership of whatever the BSTR is set to upon exit. If the parameter is marked as [in], ownership is not changed at all.Rigorous
@RemyLebeau: Yikes, that explains why I didn't remember the exact rules. And it also looks like typical C: type-safe in name only, one type with multiple different behaviors. [out] is de facto not part of the type system. I don't think it ever makes sense to pass a "BSTR literal" to an [out] argument, yet the compiler has no way to prevent it.Lux
@MSalters: a parameter marked with [out] must be passed by address, not by value. So you would not be able to pass a BSTR to an [out] parameter because it is expecting a BSTR* instead. So a custom operator "" _bstr that returns a BSTR (directly or otherwise) would only be usable with [in] parameters anyway.Rigorous
@RemyLebeau: Ok, then my idea of a temporary object with implicit conversion to BSTR and destruction on exit is indeed safe.Lux
@MSalters: yes, it is. I have added an example to demonstrate it.Rigorous
R
4

To follow up on @MSalters's answer, a custom user-defined literal could look something like this:

CComBSTR operator "" _bstr (const char* str, std::size_t len)
{
    return CComBSTR(len, str);
}

Then you can do this (as CComBSTR has a BSTR conversion operator defined):

iFoo->function( "HELLO"_bstr );

You can even overload the operator for multiple input string literal types:

CComBSTR operator "" _bstr (const wchar_t* str, std::size_t len)
{
    return CComBSTR(len, str);
}

CComBSTR operator "" _bstr (const char16_t* str, std::size_t len)
{
    return CComBSTR(len, (wchar_t*)str);
}

iFoo->function( L"HELLO"_bstr ); // calls wchar_t* version with UTF-16 encoded data

iFoo->function( u"HELLO"_bstr ); // calls char16_t* version with UTF-16 encoded data

iFoo->function( u8"HELLO"_bstr ); // calls char* version with UTF-8 encoded data...

Note the last case. Since the operator will not know whether it is being passed ANSI or UTF-8 data, and CComBSTR assumes ANSI when passed char* data, you should use a different literal suffix to differentiate so you can convert the UTF-8 correctly, eg:

CComBSTR operator "" _utf8bstr (const char* str, std::size_t len)
{
    std::wstring_convert<std::codecvt_utf8_utf16<wchar_t>, wchar_t> conv;
    std::wstring wstr = conv.from_bytes(std::string(str, len));
    return CComBSTR(wstr.length(), wstr.c_str());
}

iFoo->function( u8"HELLO"_utf8bstr );
Rigorous answered 13/1, 2015 at 19:16 Comment(3)
This actually looks like a good case for user-defined literals.Lux
One must be careful not to write BSTR b = "hello"_bstr;Paralogism
Yup, you would have to use CComBSTR b = "hello"_bstr; or better auto b = "hello"_bstr; instead.Rigorous

© 2022 - 2024 — McMap. All rights reserved.