C++ reading unsigned char from file stream

Asked 2/3, 2009 at 23:0 Answered 19/11, 2022 at 13:13

I want to read unsigned bytes from a binary file. So I wrote the following code.

#include <iostream>
#include <fstream>
#include <vector>
#include <istream>

std::string filename("file");
size_t bytesAvailable = 128;
size_t toRead = 128;

std::basic_ifstream<unsigned char> inf(filename.c_str(), std::ios_base::in | std::ios_base::binary) ;
if (inF.good())
{
    std::vector<unsigned char> mDataBuffer;
    mDataBuffer.resize(bytesAvailable) ;
    inF.read(&mDataBuffer[0], toRead) ;
    size_t counted = inF.gcount() ;
}

This results in reading in always 0 bytes as shown by the variable counted.

There seem to be references on the web saying that I need to set the locale to make this work. How to do this exactly is not clear to me.

The same code works using the data type 'char' instead of 'unsigned char'

The above code using unsigned char seems to work on Windows but fails running in a colinux Fedora 2.6.22.18 .

What do I need to do to get it to work for linux?

Polyzoic answered 2/3, 2009 at 23:0 Comment(5)

Not an answer to the question, but related. Remember that the definition of the string class in C++ is typedef basic_string<char> string;, so you can always make an unsigned char string class a la typedef basic_string<unsigned char> bytestring;. – Floeter 2/3, 2009 at 23:47

true, but I want to read a BINARY file – Polyzoic 3/3, 2009 at 6:29

.read() and .write() can be used for binary/text, the stream operators << and >> are for text files only. All data on a computer is ultimately binary, it's how you choose to interpret it. – Hawser 3/3, 2009 at 15:35

If you want "binary" use uint8_t... just ignore that it's a typedef alias for unsigned char. – Floeter 3/3, 2009 at 16:35

This problem has been solved here: https://mcmap.net/q/582369/-why-does-this-specialized-char_traits-lt-uint8_t-gt-and-codecvt-lt-uint8_t-gt-for-use-with-the-basic_ifstream-template-throw-std-bad_cast/331024 It has a full implementation of char_traits<uint8_t> and codecvt<uint8_t,char,...> – Chandless 10/10, 2013 at 22:28

C++ does require the implementation only to provide explicit specializations for two versions of character traits:

std::char_traits<char>
std::char_traits<wchar_t>

The streams and strings use those traits to figure out a variety of things, like the EOF value, comparison of a range of characters, widening of a character to an int, and such stuff.

If you instantiate a stream like

std::basic_ifstream<unsigned char>

You have to make sure that there is a corresponding character trait specialization that the stream can use and that this specialization does do useful things. In addition, streams use facets to do actual formatting and reading of numbers. Likewise you have to provide specializations of those too manually. The standard doesn't even require the implementation to have a complete definition of the primary template. So you could aswell get a compile error:

error: specialization std::char_traits could not be instantiated.

I would use ifstream instead (which is a basic_ifstream<char>) and then go and read into a vector<char>. When interpreting the data in the vector, you can still convert them to unsigned char later.

Carducci answered 2/3, 2009 at 23:31 Comment(1)

I did not get a compiler error, no hints in documentation, nothing, but silent failure and a wasted day. Thank you Bjarne Stroustrup and Dennis Ritchie. – Jeanmariejeanna 26/7, 2013 at 6:15

Don't use the basic_ifstream as it requires specializtion.

Using a static buffer:

linux ~ $ cat test_read.cpp
#include <fstream>
#include <iostream>
#include <vector>
#include <string>


using namespace std;

int main( void )
{
        string filename("file");
        size_t bytesAvailable = 128;

        ifstream inf( filename.c_str() );
        if( inf )
        {
                unsigned char mDataBuffer[ bytesAvailable ];
                inf.read( (char*)( &mDataBuffer[0] ), bytesAvailable ) ;
                size_t counted = inf.gcount();
                cout << counted << endl;
        }

        return 0;
}
linux ~ $ g++ test_read.cpp
linux ~ $ echo "123456" > file
linux ~ $ ./a.out
7

using a vector:

linux ~ $ cat test_read.cpp

#include <fstream>
#include <iostream>
#include <vector>
#include <string>


using namespace std;

int main( void )
{
        string filename("file");
        size_t bytesAvailable = 128;
        size_t toRead = 128;

        ifstream inf( filename.c_str() );
        if( inf )
        {

                vector<unsigned char> mDataBuffer;
                mDataBuffer.resize( bytesAvailable ) ;

                inf.read( (char*)( &mDataBuffer[0]), toRead ) ;
                size_t counted = inf.gcount();
                cout << counted << " size=" << mDataBuffer.size() << endl;
                mDataBuffer.resize( counted ) ;
                cout << counted << " size=" << mDataBuffer.size() << endl;

        }

        return 0;
}
linux ~ $ g++ test_read.cpp -Wall -o test_read
linux ~ $ ./test_read
7 size=128
7 size=7

using reserve instead of resize in first call:

linux ~ $ cat test_read.cpp

#include <fstream>
#include <iostream>
#include <vector>
#include <string>


using namespace std;

int main( void )
{
        string filename("file");
        size_t bytesAvailable = 128;
        size_t toRead = 128;

        ifstream inf( filename.c_str() );
        if( inf )
        {

                vector<unsigned char> mDataBuffer;
                mDataBuffer.reserve( bytesAvailable ) ;

                inf.read( (char*)( &mDataBuffer[0]), toRead ) ;
                size_t counted = inf.gcount();
                cout << counted << " size=" << mDataBuffer.size() << endl;
                mDataBuffer.resize( counted ) ;
                cout << counted << " size=" << mDataBuffer.size() << endl;

        }

        return 0;
}
linux ~ $ g++ test_read.cpp -Wall -o test_read
linux ~ $ ./test_read
7 size=0
7 size=7

As you can see, without the call to .resize( counted ), the size of the vector will be wrong. Please keep that in mind. it is a common to use casting see cppReference

Hawser answered 2/3, 2009 at 23:6 Comment(8)

This is reading signed chars. I know this works. I specifically want to read unsigned chars – Polyzoic 2/3, 2009 at 23:15

just change the char[] to unsigned char[]. – Hawser 2/3, 2009 at 23:17

Is it possible to do so without casting? – Polyzoic 2/3, 2009 at 23:20

why don't you want to cast? you could use the c++ reinterpret_cast<char*>( &mDataBuffer[0]) – Hawser 2/3, 2009 at 23:23

@David: There is no difference between signed and unsigned chars on disk. (or, for that matter, 4 chars and and int!) – Irreformable 2/3, 2009 at 23:26

@Simon: ints have endianess issues ;-) – Floeter 2/3, 2009 at 23:37

There is nothing wrong with using vector. Its probably overkill in this case, since its a fixed size, but it will work correctly. – Clamworm 3/3, 2009 at 5:22

@KeithB: True enough, there are more caveats, such as thinking the size is correct without the resize, but I've added the 2 new versions to demonstrate. – Hawser 3/3, 2009 at 15:55

If you're on Windows you can directly use:

using ufstream = std::basic_fstream<unsigned char, std::char_traits<unsigned char>>;
ufstream file;

On Linux no such luck, as unsigned_char facets or locales are not provided, so follow @Johannes approach.

Afore answered 19/11, 2022 at 13:13 Comment(0)

-1

A much easier way:

#include <fstream>
#include <vector>

using namespace std;


int main()
{
    vector<unsigned char> bytes;
    ifstream file1("main1.cpp", ios_base::in | ios_base::binary);
    unsigned char ch = file1.get();
    while (file1.good())
    {
        bytes.push_back(ch);
        ch = file1.get();
    }
    size_t size = bytes.size();
    return 0;
}

Rhianna answered 3/3, 2009 at 0:36 Comment(3)

That is very inefficient. Try running benchmarks with 1GB files, the overhead of the calls will show a big difference. – Hawser 3/3, 2009 at 4:36

@david: it makes no difference in the file. 0xFF is 255 if stored in an unsigned char or -1 if stored in the signed char. Hence why the cast is not a bad thing. If this was multi byte the only difference would be if the endianness is different. – Hawser 3/3, 2009 at 15:37

@David: endianness is usually only a problem when switch architecture types eg. powerpc vs x86. – Hawser 3/3, 2009 at 15:58

Recommended topics

Hot tags