Doesn't %[] or %[^] specifier in scanf(),sscanf() or fscanf() store the input in null-terminated character array?
Asked Answered
F

2

5

Here's what the Beez C guide (LINK) tells about the %[] format specifier:

It allows you to specify a set of characters to be stored away (likely in an array of chars). Conversion stops when a character that is not in the set is matched.

I would appreciate if you can clarify some basic questions that arise from this premise:

1) Are the input fetched by those two format specifiers stored in the arguments(of type char*) as a character array or a character array with a \0 terminating character (string)? If not a string, how to make it store as a string , in cases like the program below where we want to fetch a sequence of characters as a string and stop when a particular character (in the negated character set) is encountered?

2) My program seems to suggest that processing stops for the %[^|] specifier when the negated character | is encountered.But when it starts again for the next format specifier,does it start from the negated character where it had stopped earlier?In my program I intend to ignore the | hence I used %*c.But I tested and found that if I use %c and an additional argument of type char,then the character | is indeed stored in that argument.

3) And lastly but crucially for me,what is the difference between passing a character array for a %s format specifier in printf() and a string(NULL terminated character array)?In my other program titled character array vs string,I've passed a character array(not NULL terminated) for a %s format specifier in printf() and it gets printed just as a string would.What is the difference?

//Program to illustrate %[^] specifier

#include<stdio.h>

int main()
{
char *ptr="fruit|apple|lemon",type[10],fruit1[10],fruit2[10];

sscanf(ptr, "%[^|]%*c%[^|]%*c%s", type,fruit1, fruit2);
printf("%s,%s,%s",type,fruit1,fruit2);
}

//character array vs string

#include<stdio.h>

int main()
{
char test[10]={'J','O','N'};
printf("%s",test);
}

Output JON

//Using %c instead of %*c

#include<stdio.h>

int main()
{
char *ptr="fruit|apple|lemon",type[10],fruit1[10],fruit2[10],char_var;

sscanf(ptr, "%[^|]%c%[^|]%*c%s", type,&char_var,fruit1, fruit2);
printf("%s,%s,%s,and the character is %c",type,fruit1,fruit2,char_var);

}

Output fruit,apple,lemon,and the character is |

Flourish answered 9/5, 2013 at 4:46 Comment(0)
F
6
  1. It is null terminated. From sscanf():

    The conversion specifiers s and [ always store the null terminator in addition to the matched characters. The size of the destination array must be at least one greater than the specified field width.

  2. The excluded characters are unconsumed by the scan set and remain to be processed. An alternative format specifier:

    if (sscanf(ptr, "%9[^|]|%9[^|]|%9s", type,fruit1, fruit2) == 3)
    
  3. The array is actually null terminated as remaining elements will be zero initialized:

    char test[10]={'J','O','N' /*,0,0,0,0,0,0,0*/ };
    

If it was not null terminated then it would keep printing until a null character was found somewhere in memory, possibly overruning the end of the array causing undefined behaviour. It is possible to print a non-null terminated array:

    char buf[] = { 'a', 'b', 'c' };
    printf("%.*s", 3, buf);
Fauna answered 9/5, 2013 at 4:52 Comment(10)
Can you clearly state that in the other answer, Tony's claim for the second part is wrong when he claims that %*c shouldn't consume the excluded character?Pyrrho
I intuitively agree with you that the excluded character should remain to be processed, but then,Tony is also a veteran and can't be ignored just like that....Pyrrho
@Rüppell'sVulture, it is unprocessed by the scan set.Fauna
and remains open to be processed by the next format specifier, which is %*c or %c in my case?Pyrrho
What does you alternative format specifier for 2) do?It only specifies that the format specifier should fetch 9 characters max.But why use that when we know it would stop at the negated character anyways?Pyrrho
Yes, to be processed by the next part of the format specifier.Fauna
To prevent buffer overrun. It might look unrequited here but what if the input was from an external source.Fauna
And finally,what would you say to Tony's answer to 3) that it is undefined?You say the character array is clearly NULL terminated during initialization as remaining elements are 0,but he says it is UBPyrrho
Updated my answer, which clarifies was tony meant.Fauna
+1 from me :-). Thanks for helping explain how our answers relate/differ. Cheers.Ethical
E
1

1) Are the input fetched by those two format specifiers stored in the arguments(of type char*) as a character array or a character array with a \0 terminating character (string)? If not a string, how to make it store as a string , in cases like the program below where we want to fetch a sequence of characters as a string and stop when a particular character (in the negated character set) is encountered?

They're stored in ASCIIZ format - with a NUL/'\0' terminator.

2) My program seems to suggest that processing stops for the %[^|] specifier when the negated character | is encountered.But when it starts again for the next format specifier,does it start from the negated character where it had stopped earlier?In my program I intend to ignore the | hence I used %*c.But I tested and found that if I use %c and an additional argument of type char,then the character | is indeed stored in that argument.

It shouldn't consume the next character. Show us your code or it didn't happen ;-P.

3) And lastly but crucially for me,what is the difference between passing a character array for a %s format specifier in printf() and a string(NULL terminated character array)?In my other program titled character array vs string,I've passed a character array(not NULL terminated) for a %s format specifier in printf() and it gets printed just as a string would.What is the difference?

(edit: the following addresses the question above, which talks about array behaviours generally and is broader than the code snippet in the question that specifically posed the case char[10] = "abcd"; and is safe)

%s must be passed a pointer to a ASCIIZ text... even if that text is explicitly in a char array, it's the mandatory presence of the NUL terminator that defines the textual content and not the array length. You must NUL terminate your character array or you have undefined behaviour. You might get away with it sometimes - e.g. strncpy into the array will NUL terminate it if-and-only-if there's room to do so, and static arrays start with all-0 content so if you only overwrite before the final character you'll have a NUL, your char[10] example happens to have elements for which values aren't specified populated with NULs, but you should generally take responsibility for ensuring that something is ensuring NUL termination.

Ethical answered 9/5, 2013 at 4:55 Comment(10)
I added the code.Check it out.It works as I had said.Now I have two answers for the 2nd part,one by you and one by the hmjd and they contradict each other.You say it shouldn't consume,he says it does.Pyrrho
@Rüppell'sVulture, I said unconsumed.Fauna
You said It shouldn't consume the next character.Pyrrho
Your 3) answer kind of clashes with hmjd as well.He clearly states that the character is null terminated due to the very nature of the initialization.So there is question of UB as per him.What would you say?Pyrrho
For char x[10] = "123456789"; you're guaranteed a NUL, but it's fragile - if you miscount and have say char x[10] = "1234567890", there are no unspecified trailing elements to fill with NUL. You're best off using char x[] = "whatever"; if you only need one NUL and won't write more data into the buffer, or if you're reading data into a buffer sometimes you'll want an extra x[9] = '\0'; to ensure termination. Crucially, if you have char x[10]; x[0] = 'A'; it won't be terminated automatically.Ethical
You should also consider using max-width specifications in your scanf conversYou should also consider using max-width specifications in your scanf conversions to ensure you don't write data beyond the end of the provided buffer.Ethical
And re consumption... I'm saying "%[^|]" won't consume the '|'; the following "%*c" does consume it. You claim your test didn't see this... I'm saying you mis-coded or misinterpreted your test.Ethical
@TonyD I didn't get a new-message alert for your new comments as one is expected to.Don't know why it fails to do so sometimes.I was online this long but saw your comments on my own just now.Pyrrho
@Rüppell'sVulture: oh weird - I'll use an explicit "@..." in future. Cheers.Ethical
@TonyD No,that's not the problem.Since I am the OP for the question,any activity on it should alert me.It generally does, but sometimes failsPyrrho

© 2022 - 2024 — McMap. All rights reserved.