How to use memset while handling strings in C++? [closed]
Asked Answered
L

4

29

I am from Python background and recently learning C++. I was learning a C/C++ function called memset and following the online example from website https://www.geeksforgeeks.org/memset-in-cpp/ where I got some compilation errors:

/**
 * @author      : Bhishan Poudel
 * @file        : a02_memset_geeks.cpp
 * @created     : Wednesday Jun 05, 2019 11:07:03 EDT
 * 
 * Ref: 
 */

#include <iostream>
#include <vector>
#include <cstring>

using namespace std;

int main(int argc, char *argv[]){
    char str[] = "geeksforgeeks";

    //memset(str, "t", sizeof(str));
    memset(str, 't', sizeof(str));

    cout << str << endl;

    return 0;
}

Error when using single quotes 't'
This prints extra characters.

tttttttttttttt!R@`

Error when using "t" with double quotes

$ g++ -std=c++11 a02_memset_geeks.cpp 
a02_memset_geeks.cpp:17:5: error: no matching function for call to 'memset'
    memset(str, "t", sizeof(str));
    ^~~~~~
/usr/include/string.h:74:7: note: candidate function not viable: no known
      conversion from 'const char [2]' to 'int' for 2nd argument
void    *memset(void *, int, size_t);
         ^
1 error generated.

How to use the memset in C++ ?

Further Study
Excellent tutorial with shortcomings of memset is given here: https://web.archive.org/web/20170702122030/https:/augias.org/paercebal/tech_doc/doc.en/cp.memset_is_evil.html

Liriodendron answered 5/6, 2019 at 15:13 Comment(20)
"t" and 't' are not the same.Barouche
most online learning resources for c++ are crap and afaik that site is no exception, give this a try instead: #388742Ideograph
I used single quote 't' but still get extra characters in the output.Liriodendron
It is no longer a properly zero-terminated C string after you do this. You lost the 0. Consider passing sizeof(str)-1 instead.Promycelium
@HansPassant Then how to use it? Is it deprecated and not used nowadays?Liriodendron
Why even use memset in C++? The reason old C functions exists is for backwards compability.Antisepticize
It is a loaded gun, you aimed it at your left foot and pulled the trigger. You have to aim right.Promycelium
this is quite relevant: https://mcmap.net/q/501154/-difference-between-sizeof-and-strlen-in-c-duplicateIdeograph
Also, use std::string instead of char[]Antisepticize
@HansPassant So he should shoot his right foot then? ;)Herb
@dbush: Or possibly worse...Lucilius
You should not change question underneath people who are answering it. If you take a comment or answer in and it is still not working, you can ask another question, but this sort of editing, which replaces once question with another, is destructiveBarouche
Don't use std::memset, use std::fill, its just as fast and safer.Infracostal
Questions refering to given answers or containing an edit with something that belongs into an answer always look a bit odd imho. You can answer your can question, thoughIdeograph
If you look closely at the page you cited and count a bit, you'll notice that "geeksforgeeks" has 13 characters, and that row of t's that represents the output has 14. So the example code produces extra output, too. As you can see from the answers, that's not unexpected -- the code is simply wrong.Soniferous
You really ought to avoid using namespace std - it is a bad habit to get into, and can silently change the meaning of your program when you're not expecting it. Get used to using the namespace prefix (std is intentionally very short), or importing just the names you need into the smallest reasonable scope. Is it really so hard to write std::memset?Agraffe
@HansPassant Is shooting at the right foot any better? =PIonium
@Barouche I am not sure why this question is put on hold, it has complete MWE, addresses the question correctly and does not have vulgar words or anything bad comments.Liriodendron
@astro123 original question had "t" in double quotes, and I voted to close it as a typo (clearly memset doesn't accept pointers as it's second argument). And than you edited the question, completely changing it's meaning - not it became a valid answerable question, but doing so, you invalidate previous answers, which is actually not that great.Barouche
IMO the only correct answer to this question is "You don't" - at least not while you are learning C++. Mybe when you are an expert. And even then, probably not.Nikaniki
V
69

This declaration

char str[] = "geeksforgeeks";

declares a character array that contains a string that is a sequence of characters including the terminating zero symbol '\0'.

You can imagine the declaration the following equivalent way

char str[] = 
{ 
    'g', 'e', 'e', 'k', 's', 'f', 'o', 'r', 'g', 'e', 'e', 'k', 's', '\0'
};

This call of the function memset

memset(str, 't', sizeof(str));

overrides all characters of the array including the terminating zero.

So the next statement

cout << str << endl;

results in undefined behavior because it outputs characters until the terminating zero is encountered.

You could write instead

#include <iostream>
#include <cstring>

int main()
{
    char str[] = "geeksforgeeks";

    std::memset( str, 't', sizeof( str ) - 1 );
    
    std::cout << str << '\n';
}

Or the following way

#include <iostream>
#include <cstring>

int main()
{
    char str[] = "geeksforgeeks";

    std::memset( str, 't', std::strlen( str ) );
    
    std::cout << str << '\n';
}

That is keeping the terminating zero unchanged in the array.

If you want to override all characters of the array including the terminating zero, then you should substitute this statement

std::cout << str << '\n';

for this statement

std::cout.write( str, sizeof( str ) ) << '\n';

as it is shown in the program below because the array now does not contain a string.

#include <iostream>
#include <cstring>

int main()
{
    char str[] = "geeksforgeeks";

    std::memset( str, 't', sizeof( str ) );
    
    std::cout.write( str, sizeof( str ) ) << '\n';
}

As for this call

memset(str, "t", sizeof(str));

then the type of the second argument (that is the type const char *) does not correspond to the type of the second function parameter that has the type int. See the declaration of the function

void * memset ( void * ptr, int value, size_t num );

Thus the compiler issues an error message.

Apart from character arrays (that are used very often even in C++) you can use also the standard class std::string (or std::basic_string) that simulates strings.

In this case there is no need to use the standard C function memset to fill a string with a single character. The simplest way to do this is the following

#include <iostream>
#include <string>

int main()
{
    std::string s( "geeksforgeeks" );
    
    s.assign( s.length(), 't' );
    
    std::cout << s << '\n';
}

Another way is to use the standard algorithm std::fill or std::fill_n declared in the header <algorithm>. For example

#include <iostream>
#include <string>
#include <iterator>
#include <algorithm>

int main()
{
    std::string s( "geeksforgeeks" );
    
    std::fill( std::begin( s ), std::end( s ), 't' );
    
    std::cout << s << '\n';
}

or

#include <iostream>
#include <string>
#include <iterator>
#include <algorithm>

int main()
{
    std::string s( "geeksforgeeks" );
    
    std::fill_n( std::begin( s ), s.length(), 't' );
    
    std::cout << s << '\n';
}

You even can use the method replace of the class std::string one of the following ways

#include <iostream>
#include <string>

int main()
{
    std::string s( "geeksforgeeks" );
    
    s.replace( 0, s.length(), s.length(), 't' );
    
    std::cout << s << '\n';
}

Or

#include <iostream>
#include <string>

int main()
{
    std::string s( "geeksforgeeks" );
    
    s.replace( std::begin( s ), std::end( s ), s.length(), 't' );
    
    std::cout << s << '\n';
}
Vogeley answered 5/6, 2019 at 15:27 Comment(7)
The original post clearly indicates that the user is trying to learn C++. Please mention at least that none of this is relevant if you use std::string, which should be used here instead instead of using this complicated C stuff. (It might be relevant to know, though not at the beginning of a course)Hellenism
@Hellenism The original post clearly indicates that the user is trying to know how to use memset with character arrays.:)Vogeley
Good answer. If you want it to be better for the OP: note the difference in the type system. C++ has a static type system, where variables have a fixed static type. Python has a fully dynamic type system, where values have a type and variables do not. This is probably the source of his confusion involving 't' and "t".Bradleybradly
What do you mean by "simulates strings"?Resiniferous
@Ray You are mistaken. For starters a correct declaration will look like const char *str = "geeksforgeeks"; And in any case string literals in C and C++ are immutable. Any attempt to change a string literal results in undefined behaviour.Vogeley
@VladfromMoscow You're right, of course. I wasn't paying close enough attention to exactly what was being done after the strlen call. I revise my suggestion to: You might want to also explain that if the string were declared as const char *str = "geeksforgeeks";, sizeof will no longer report the length of the string, but rather the size of the pointer. (Even if declaring it as a pointer to string literal in this particular example leads to further problems, I've seen enough people make the mistake of doing sizeof of a pointer to string that I think it's worth covering why that doesn't work.)Farny
@Farny Thanks. But it'll be too broad answer to a simple question.:)Vogeley
L
31

Error when using single quotes 't' This prints extra characters.

That's because you overwrote the null terminator.

The terminator is part of the array's size (an array is not magic), though it's not part of the logical string size.

So, I think you meant:

memset(str, 't', strlen(str));
//               ^^^^^^

Error when using "t" with double quotes

Completely different thing. You told the computer to set every character in the string, to a string. Doesn't make sense; won't compile.


How to use memset in C++?

Don't.

Either use the type-safe std::fill, in combination with std::begin and std::end:

std::fill(std::begin(str), std::end(str)-1, 't');

(If you're worried about performance, don't be: this will just delegate to memset where possible via template specialisation, optimisation not required, without sacrificing type-safety; example here in libstdc++.)

Or just a std::string to begin with. 😊


I was learning the fuction memset in C++ from https://www.geeksforgeeks.org/memset-in-cpp/ where the example is given as below

Don't attempt to learn C++ from random websites. Get yourself a good book instead.

Lucilius answered 5/6, 2019 at 15:23 Comment(13)
unfortunately it really is sizeof in the orginial example. A pity that such code is used to "teach" c++ :(Ideograph
I am learning C++, and learning online from geeksforgeeks.org/memset-in-cpp, The example is taken from there, nothing warnings were given there. Thanks for the usage info.Liriodendron
Updated to address both comments.Lucilius
Also, I am from Python background, where a single quote and double quotes are the same, so I got another error also.Liriodendron
@astro123 Another reason to work from a good book instead. There are different kinds of literals in C++, which is completely different from Python.Lucilius
This site (geeksforgeeks) should be forever banned.Barouche
@astro123: learning online from geeksforgeeks.org/memset-in-cpp There's your first problem. That tutorial has a serious bug in its tiny example. This is not rare on geeksforgeeks.org. There is some good stuff, but it's often mixed in with bad stuff, and until you're already an expert you won't know how to tell the difference. Unlike Stack Overflow, geeksforgeeks doesn't have a voting mechanism for people to review posts and indicate their quality, so you have no way of knowing which ones to trust.Fungal
@PeterCordes its a shame SO Documentation went the way it did... there's clearly a demand for voted-on, curated tutorials. I'm sure somebody will figure out the right design, eventually.Disillusion
Why "don't"? Isn't the implementation of memset often significantly faster? fill may not always be implemented to use an assembly directive (not even at -O4), while memset should always be using it if it's available. Also strlen should never be used. Just never. if you know the size of the string at compile time (and sizeof does) use it. If you don't know it at compile time, strlen is unsafe.Fendig
@Fendig No, a mainstream implementation will delegate to memset via specialisation, i.e. when the template args suggest it's called for (optimisation level not required) - e.g. libstdc++. There's no need to try to "beat the toolchain", because it's better than us. On the other hand, you sacrifice your type safety and I have seen long-standing hidden bugs when someone's changed a type and not scanned for all its uses, one of which was a naughty memset on what had previously been a C array.Lucilius
@LightnessRacesinOrbit you would expect templates to be specialized for char*, but I have seen compilers used in production code (I think it was Sun's) being 10x slower when using copy() instead of memcpy on vectors. Just because there are mechanisms in the language to handle this, doesn't mean that the compiler you use handles it. And when the issue is of practical rather than theoretical importance, you have to use the tools you rather than the ones you think you should have.Fendig
@Fendig Such a blatantly substandard implementation should not be used in the first place.Lucilius
@LightnessRacesinOrbit again, when the issue is of practical rather than theoretical importance, you have to use the tools you have rather than the ones you think you should have. I see that you still have strlen in the answer, btw. That function should never be used. In any code.... ever.Fendig
S
5

This is the correct syntax for memset...

void* memset( void* dest, int ch, std::size_t count );

Converts the value ch to unsigned char and copies it into each of the first count characters of the object pointed to by dest. If the object is a potentially-overlapping subobject or is not TriviallyCopyable (e.g., scalar, C-compatible struct, or an array of trivially copyable type), the behaviour is undefined. If count is greater than the size of the object pointed to by dest, the behaviour is undefined.

(source)

For the first syntax memset(str, 't', sizeof(str));. The compiler complained because of extra size. It prints 18 times tttttttttttttt!R@. I suggest try with sizeof(str) -1 for char array.

For Second syntax memset(str, "t", sizeof(str)); you are providing the second parameter is a string. This is the reason compiler complains error: invalid conversion from ‘const char*’ to ‘int’

Stylet answered 5/6, 2019 at 15:55 Comment(13)
potentially-overlapping subobject of what? It's not automatically UB to modify the object-representation of other objects in C++. For example, uint32_t has a fully defined object representation (except for the endian byte-order). So it's not clear what kind of overlap you're talking about, because memset only takes one pointer arg; the other args are by value. That phrasing makes sense for memcpy which forbids overlap, unlike memmove.Fungal
@PeterCordes To be fair, that phrase was plagiarised from cppreference.com. So if it's wrong, cppreference.com needs to be corrected.Lucilius
@LightnessRacesinOrbit: on cppref, that phrase is a hyperlink to a definition that makes sense. It's somewhat plausible for it to be UB if a memset might be modifying the bytes of another object as well (because the pointer is to a subojected of a struct that's declared with [[no_unique_address]] allowing a compiler to do whatever it wants, including create bitfields for narrow or bool types I guess). I'm less clear on the "base class subobject" part; possibly that's UB because it could overwrite a vtable pointer?Fungal
@PeterCordes - it's talking about something like this. Here, base is trivially copyable, but it not safe for memset (or memmove) or because it is a potentially overlapping subobject. Note that sizeof(base) == 8, yet when it is used as a base of derived (which itself has a char member), sizeof(derived) == 8! So the members of derived are stored in the padding of base. Hence it is unsafe to overwrite an arbitrary base& with memset since you'd also clobber the derived member in this case.Absorber
Note also how this is reflected in the code generation on gcc for zeroing base in b = base{}: it does a qword and byte write, because it can't safely extend that to a single qword write because the padding may be reused. Then see base2 and derived2: these are identical except that base2 is struct not class. Then it becomes an aggregate and I guess overlapping is banned (note how the b = base2{} codegen changes).Absorber
@BeeOnRope: You mean dword + byte to zero base (you said qword twice). Interesting. The only difference between class and struct is that class defaults to private: while struct defaults to public:. It appears that putting derived members into the padding of the base depends only on visibility, and switches if you use those tags to have private members in the base2 struct and public members of the base class. godbolt.org/z/3VLeiSFungal
@Peter Yes, it's to do with visibility although I wasn't sure why. As above I thought it was keying off whether base was an Aggregate or not (basically the most POD-like thing C++ offers). It is not keying off of standard layout, that I checked.Absorber
@BeeOnRope: I think it might be a design decision that could have gone either way. Or maybe not: according to itanium-cxx-abi.github.io/cxx-abi/abi.html#POD This ABI uses the definition of POD only to decide whether to allocate objects in the tail-padding of a base-class subobject. While the standards have broadened the definition of POD over time, they have also forbidden the programmer from directly reading or writing the underlying bytes of a base-class subobject with, say, memcpy. (x86-64 uses the same C++ ABI). With some stuff about how POD in C++ has evolved.Fungal
@PeterCordes - right, well the designed decision must be in the context of the platform ABI, not just at the compiler level, since everyone has to agree on this, right? Anyways, the only property that I found that wasn't contradicted by practice, regarding whether padding could be used by a derived class was "aggregate". See here. base is POD, trivial, and standard layout, but it still not safe. It is not aggregate, however. Of course, this is not a proof :).Absorber
@BeeOnRope: Ah, I wasn't aware that "aggregate" had a specific technical meaning which included having no private / protected members. What are Aggregates and PODs and how/why are they special?. I haven't checked this, but I think from the C++ ABI's notes on "POD" that (some draft of) ISO C++ must say that you can step on the padding of an aggregate, but not necessarily in general for any POD / trivially-copyable type. So you can put derived members in that padding when the base is not an aggregate. That's what this C++ ABI chooses to agree on.Fungal
@PeterCordes - yes, I just finished reading (skimming) that FAQ also :). I don't find the word aggregate in the Itanium ABI you linked. The ABI was written long along, before a lot of the changes in the C++ standard mentioned in the FAQ, and before some of the terms even existed. In particular, finer distinctions were introduced in later standard that the ABI doc wouldn't know about. 1/xAbsorber
I didn't read the ABI, but based on searches I can't find language covering the case. It mentions "base class subobject" in the section you linked as one type "potentially-overlapping subobject" (the other one being data members with no_unique_address), but all further referces to "base class subobject" don't seem relevant (they are about vtables), and further refs to "p-o subobject" all seem to concern the data member case, not the base-class case. I'm making a specific question on the topic, will link it here.Absorber
@PeterCordes - FYI.Absorber
A
5

Vlad has helpfully answered the first part of your question, but I feel like the second part could be explained a little more intuitively:

As others have mentioned, 't' is a character whereas "t" is a string, and strings have a null terminator at the end. This makes "t" an array of not one but two characters - ['t', '\0']! This makes memset's error more intuitive - it can coerce a single char to an int easily enough, but it chokes when it's given an array of chars. Just like in Python, int(['t', '\0']) (or ord(['t', '\0'])) doesn't compute.

Ancier answered 6/6, 2019 at 2:14 Comment(1)
And to be even more precise, when passing "t", one passes the address of 't' in "t". So if it were converted to the int parameter in memset, it would be the pointer to 't' getting converted to int, rather than the value of the string getting converted to int.Fendig

© 2022 - 2024 — McMap. All rights reserved.