Does C have a string type? [closed]
Asked Answered
H

7

58

I have recently started programming in C, coming from Java and Python. Now, in my book I have noticed that to make a "Hello World" program, the syntax is something like this:

char message[10]
strcpy(message, "Hello, world!")
printf("%s\n", message);

Now, this example is using a char array and I wondered - what happened to strings? Why can't I simply use one of those? Maybe there is a different way to do this?

Hyoscyamus answered 5/2, 2013 at 14:3 Comment(8)
C doesn't have strings.Bonspiel
you need char message[14];Holbein
Your strcpy will overflow your char array by the way. you need at least a char array of length 14 (13 chars + nul terminator)Attune
Also note your sample code has a buffer overrun. strlen("Hello, World!") > 10. You actually need 14 chars to store that string. You might also want to look at strncmp - the N is becuase it takes a parameter 'n' that is the size of the buffer and stops overruns like this.Aggregation
@Aggregation strncmp is the wrong function for two reasons, firstly its a cmp function instead of a cpy function, secondly you should use strlcpy instead which makes sure a nul termination byte is used. strncpy may give you an unterminated string.Attune
@wich: I meant strncpy - but was unaware of strlcpy which does look like a better option. Thanks, I've learnt something today..Aggregation
@ariel - The string type is from C++. Perhaps that is what you want to learn instead? :-)Haileyhailfellowwellmet
string is included in standard library <string.h> And even Java String is from the standard library.notice String is not a datatype but it is a Class name in JavaNoteworthy
H
94

C does not and never has had a native string type. By convention, the language uses arrays of char terminated with a null char, i.e., with '\0'. Functions and macros in the language's standard libraries provide support for the null-terminated character arrays, e.g., strlen iterates over an array of char until it encounters a '\0' character and strcpy copies from the source string until it encounters a '\0'.

The use of null-terminated strings in C reflects the fact that C was intended to be only a little more high-level than assembly language. Zero-terminated strings were already directly supported at that time in assembly language for the PDP-10 and PDP-11.

It is worth noting that this property of C strings leads to quite a few nasty buffer overrun bugs, including serious security flaws. For example, if you forget to null-terminate a character string passed as the source argument to strcpy, the function will keep copying sequential bytes from whatever happens to be in memory past the end of the source string until it happens to encounter a 0, potentially overwriting whatever valuable information follows the destination string's location in memory.

In your code example, the string literal "Hello, world!" will be compiled into a 14-byte long array of char. The first 13 bytes will hold the letters, comma, space, and exclamation mark and the final byte will hold the null-terminator character '\0', automatically added for you by the compiler. If you were to access the array's last element, you would find it equal to 0. E.g.:

const char foo[] = "Hello, world!";
assert(foo[12] == '!');
assert(foo[13] == '\0');

However, in your example, message is only 10 bytes long. strcpy is going to write all 14 bytes, including the null-terminator, into memory starting at the address of message. The first 10 bytes will be written into the memory allocated on the stack for message and the remaining four bytes will simply be written on to the end of the stack. The consequence of writing those four extra bytes onto the stack is hard to predict in this case (in this simple example, it might not hurt a thing), but in real-world code it usually leads to corrupted data or memory access violation errors.

Hardunn answered 5/2, 2013 at 14:20 Comment(2)
An array of char that does not have a '\0'-byte in it is not a string.Etalon
does writing 4 extra bytes result in corruption of initial 10 bytes or corrupt some other memory location(4 bytes) on the stack? In what scenario segmentation fault will be thrown instead of memory corruption?Disarm
S
17

There is no string type in C. You have to use char arrays.

By the way your code will not work ,because the size of the array should allow for the whole array to fit in plus one additional zero terminating character.

Sigler answered 5/2, 2013 at 14:6 Comment(0)
P
17

To note it in the languages you mentioned:

Java:

String str = new String("Hello");

Python:

str = "Hello"

Both Java and Python have the concept of a "string", C does not have the concept of a "string". C has character arrays which can come in "read only" or manipulatable.

C:

char * str = "Hello";  // the string "Hello\0" is pointed to by the character pointer
                       // str. This "string" can not be modified (read only)

or

char str[] = "Hello";  // the characters: 'H''e''l''l''o''\0' have been copied to the 
                       // array str. You can change them via: str[x] = 't'

A character array is a sequence of contiguous characters with a unique sentinel character at the end (normally a NULL terminator '\0'). Note that the sentinel character is auto-magically appended for you in the cases above.

Proximal answered 5/2, 2013 at 14:27 Comment(0)
H
9

In C, a string simply is an array of characters, ending with a null byte. So a char* is often pronounced "string", when you're reading C code.

Handsome answered 5/2, 2013 at 14:6 Comment(0)
A
7

C does not support a first class string type.

C++ has std::string

Attune answered 5/2, 2013 at 14:6 Comment(0)
G
2

C does not have its own String data type like Java.

Only we can declare String datatype in C using character array or character pointer For example :

 char message[10]; 
 or 
 char *message;

But you need to declare at least:

    char message[14]; 

to copy "Hello, world!" into message variable.

  • 13 : length of the "Hello, world!"
  • 1 : for '\0' null character that identifies end of the string
Geof answered 5/2, 2013 at 14:10 Comment(0)
H
1

First, you don't need to do all that. In particular, the strcpy is redundant - you don't need to copy a string just to printf it. Your message can be defined with that string in place.

Second, you've not allowed enough space for that "Hello, World!" string (message needs to be at least 14 characters, allowing the extra one for the null terminator).

On the why, though, it's history. In assembler, there are no strings, only bytes, words etc. Pascal had strings, but there were problems with static typing because of that - string[20] was a different type that string[40]. There were languages even in the early days that avoided this issue, but that caused indirection and dynamic allocation overheads which were much more of an efficiency problem back then.

C simply chose to avoid the overheads and stay very low level. Strings are character arrays. Arrays are very closely related to pointers that point to their first item. When array types "decay" to pointer types, the buffer-size information is lost from the static type, so you don't get the old Pascal string issues.

In C++, there's the std::string class which avoids a lot of these issues - and has the dynamic allocation overheads, but these days we usually don't care about that. And in any case, std::string is a library class - there's C-style character-array handling underneath.

Hamish answered 5/2, 2013 at 14:17 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.