Why is std::codecvt only used by file I/O streams?
Asked Answered
T

1

9

I've been implementing a codecvt for handling indentiation of output streams. It can be used like this and works fine:

std::cout << indenter::push << "im indentet" << indenter::pop << "\n im not..."

However, while I can imbue an std::codecvt to any std::ostream I was very confused when I found out that my code worked with std::cout as well as std::ofstream, but not for example for std::ostringstream even while all of which inherit from the base class std::ostream.

The facet is constructed normally, the code compiles, it doesn't throw any exceptions... It's just that none of the member functions of the std::codecvt are called.

For me that is very confusing and I had to spend a lot of time figuring out that std::codecvt won't do anything on non file I/O streams.

Is there any reason std::codecvt is not being used by all classes inherited by std::ostream?

Furthermore does anyone have an idea on which structs I could fall back on to implement the indenter?

Edit: this is the part of the language I'm referring to:

All file I/O operations performed through std::basic_fstream use the std::codecvt<CharT, char, std::mbstate_t> facet of the locale imbued in the stream.

Source: https://en.cppreference.com/w/cpp/locale/codecvt


Update 1:

I've made a small example illustrating my problem:

#include <iostream>
#include <locale>
#include <fstream>
#include <sstream>

static auto invocation_counter = 0u;

struct custom_facet : std::codecvt<char, char, std::mbstate_t>
{
  using parent_t = std::codecvt<char, char, std::mbstate_t>;

  custom_facet() : parent_t(std::size_t { 0u }) {}

  using parent_t::intern_type;
  using parent_t::extern_type;
  using parent_t::state_type;

  virtual std::codecvt_base::result do_out (state_type& state, const intern_type* from, const intern_type* from_end, const intern_type*& from_next,
                                                               extern_type* to, extern_type* to_end, extern_type*& to_next) const override
  {
    while (from < from_end && to < to_end)
    {
      *to = *from;

      to++;
      from++;
    }

    invocation_counter++;

    from_next = from;
    to_next = to;

    return std::codecvt_base::noconv;
  }

  virtual bool do_always_noconv() const throw() override
  {
    return false;
  }
};

std::ostream& imbueFacet (std::ostream& ostream)
{
  ostream.imbue(std::locale { ostream.getloc(), new custom_facet{} });

  return ostream;
}

int main()
{
  std::ios::sync_with_stdio(false);

  std::cout << "invocation_counter = " << invocation_counter << "\n";

  {
    auto ofstream = std::ofstream { "testFile.txt" };

    ofstream << imbueFacet << "test\n";
  }

  std::cout << "invocation_counter = " << invocation_counter << "\n";

  {
     auto osstream = std::ostringstream {};

     osstream << imbueFacet << "test\n";
  }

  std::cout << "invocation_counter = " << invocation_counter << "\n";
}

I would except invocation_counter to increase after streaming in the std::ostringstream, but it doesn't.


Update 2:

After more research I found out that I could use std::wbuffer_converter. To quote https://en.cppreference.com/w/cpp/locale/wbuffer_convert

std::wbuffer_convert is a wrapper over stream buffer of type std::basic_streambuf<char> which gives it the appearance of std::basic_streambuf<Elem>. All I/O performed through std::wbuffer_convert undergoes character conversion as defined by the facet Codecvt. [...]

This class template makes the implicit character conversion functionality of std::basic_filebuf available for any std::basic_streambuf.

This way I can apply a facet to a std::ostringstream:

auto osstream = std::ostringstream {};

osstream << "test\n";
  
auto facet = custom_facet{};
  
std::wstring_convert<custom_facet, char> conv;
  
auto str = conv.to_bytes(osstream.str());

However, I lose the ability to concate facets using the streaming operator <<.

This confuses me even more why the std::codecvt is not implicity used by ALL output streams. All output streams inherit from std::basic_streambuf whose interface is suitable to using std::codecvt, which is just using an input and an output character sequence, fully implemented in std::basic_streambuf.

So why is the parsing of std::codecvt implemented in std::basic_filebuf instead of std::basic_streambuf? std::basic_filebuf inherits std::basic_streambuf after all...

Either I have some fundamental misunderstanding on how streams work in C++ or std::codecvt is poorly integrated in the standard. Maybe this is why it is marked as deprecated?

Teenyweeny answered 23/11, 2020 at 23:3 Comment(3)
I don't know about the facet shenanigans, but maybe just use std::format and forget about iostreams altogether?Etam
@PasserBy I've thought about using std::format but the advantage of using streams is, that it can use any stream it gets. I'm using the indenter in my json serializer which is able to write in any output stream using a reference of an object derived from std::ostream. That way I can serialize into std::ofstream, an std::ostringstream, or std::cout. Using std::format I would lose this flexibility, as the serializer is recursive calling each of the objects members to serialize.Teenyweeny
C++98's std::codecvt is not deprecated, only the C++11's Unicode conversion locales derived from it are.Pr
F
5

The std::codecvt facet was originally intended to handle I/O conversions between disk and memory character representation. Quoted from paragraph 39.4.6 of Bjarne Stroustrup's The C++ Programming Language fourth edition:

Sometimes, the representation of characters stored in a file differs from the desired representation of those same characters in main memory. ... the codecvt facet provides a mechanism for converting characters from one representation to another as they are read or written.

The intended purpose was thus to use std::codecvt only for adapting characters between file (disk) and memory, which partly answers your question:

Why is std::codecvt only used by file I/O streams?

From the docs we see that:

All file I/O operations performed through std::basic_fstream<CharT> use the std::codecvt<CharT, char, std::mbstate_t> facet of the locale imbued in the stream.

Which then answers the question why std::ofstream (uses a file-based streambuffer) and std::cout (linked to standard output FILE stream) invokes std::codecvt.

Now, to use the high-level std::ostream interface you need to provide an underlying streambuf. The std::ofstream provides a filebuf and the std::ostringstream provides a stringbuf (which is not linked to the use of std::codecvt). See this post over the streams, which also highlights the following:

...in the case of ofstream, there are also a few extra functions which forward to additional functions in the filebuf interface

But, to invoke the character conversion functionality of a std::codecvt when you have a std::ostringstream which is a std::ostream with an underlying std::basic_streambuf you can use, as indicated in your post, the std::wbuffer_convert.

You have only used the std::wstring_convert in your second update and not the std::wbuffer_convert.

When using the std::wbuffer_convert you can wrap the original std::ostringstream with a std::ostream as follows:

// Create a std::ostringstream
auto osstream = std::ostringstream{};

// Create the wrapper for the ostringstream
std::wbuffer_convert<custom_facet, char> wrapper(osstream.rdbuf());

// Now create a std::ostream which uses the wrapper to send data to
// the original std::ostringstream
std::ostream normal_ostream(&wrapper);
normal_ostream << "test\n";

// Flush the stream to invoke the conversion
normal_ostream << std::flush;

// Check the invocation_counter
std::cout << "invocation_counter after wrapping std::ostringstream with "
                "std::wbuffer_convert = "
            << invocation_counter << "\n";

Together with the complete example here, the output would be:

invocation_counter start of test1 = 0
invocation_counter after std::ofstream = 1
> test printed to std::cout
invocation_counter after std::cout = 2
invocation_counter after std::ostringstream (should not have changed)= 2
ic after test1 = 2
invocation_counter after std::ostringstream with std::wstring_convert = 3
ic after test2 = 3
invocation_counter after wrapping std::ostringstream with std::wbuffer_convert = 4
ic after test3 = 4

Conclusion

std::codecvt was intended for converting between disk and memory representation. That is why the std::codecvt implementation is only called with streams using an underlying filebuf such as std::ofstream and std::cout. However, a stream using an underlying stringbuf can be wrapped using std::wbuffer_convert into a std::ostream instance which would then invoke the underlying std::codecvt.

Findley answered 13/10, 2021 at 12:3 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.