Read wide char from a stream created with fmemopen
Asked Answered
W

3

10

I'm trying to read a wide char from a stream that was created using fmemopen with a char *.

char *s = "foo bar foo";
FILE *f = fmemopen(s,strlen(s),"r");

wchar_t c = getwc(f);

getwc throws a segmentation fault, I checked using GDB.

I know this is due to opening the stream with fmemopen, because calling getwc on a stream opened normally works fine.

Is there a wide char version of fmemopen, or is there some other way to fix this problem?

Wismar answered 10/8, 2017 at 22:12 Comment(4)
Please post a proper MCVE, the fmemopen invocation is invalidPeephole
@AnttiHaapala Oh, whoops, I missed that part. Sorry.Wismar
@MDXF: From the examples one might get the impression that perhaps iconv_open() and iconv() might be a better solution to the underlying problem.Beardsley
@MDXF: In fact, at least GNU libc uses iconv in the background - it uses a separate buffer for already-converted data. After you have set the locale (all, or LC_CTYPE), you can use nl_langinfo(CODESET) to obtain the character set in a form you can supply to iconv_open(). While this is not ISO C, it is POSIX.1, and should be quite portable. (Since there is even GNU libiconv, this approach should be relatively easy to port across to any system using standard C, including Windows.)Beardsley
V
7

The second line should read FILE *f = fmemopen(s, strlen(s), "r");. As posted, fmemopen has undefined behavior and might return NULL, which causes getwc() to crash.

Changing the fmemopen() line and adding a check for NULL fixes the crash but does not meet the OPs goal.

It seems wide orientation is not supported on streams open with fmemopen(), At least for the GNU C library. Note that fmemopen is not defined in the C Standard but in POSIX.1-2008 and is not available on many systems (like OS/X).

Here is a corrected and extended version of your program:

#include <errno.h>
#include <stdio.h>
#include <string.h>
#include <wchar.h>

int main(void) {
    const char *s = "foo bar foo";
    FILE *f = fmemopen((void *)s, strlen(s), "r");
    wchar_t c;

    if (f == NULL) {
        printf("fmemopen failed: %s\n", strerror(errno));
        return 1;
    }
    printf("default wide orientation: %d\n", fwide(f, 0));
    printf("selected wide orientation: %d\n", fwide(f, 1));
    while ((c = getwc(f)) != WEOF) {
        printf("read %lc (%d 0x%x)\n", c, c, c);
    }
    return 0;
}

Run on linux:

default wide orientation: -1
selected wide orientation: -1

No output, WEOF is returned immediately.

Explanation for fwide(f, 0) from the linux man page:

SYNOPSIS

#include <wchar.h>
int fwide(FILE *stream, int mode);

When mode is zero, the fwide() function determines the current orientation of stream. It returns a positive value if stream is wide-character oriented, that is, if wide-character I/O is permitted but char I/O is disallowed. It returns a negative value if stream is byte oriented, i.e., if char I/O is permitted but wide-character I/O is disallowed. It returns zero if stream has no orientation yet; in this case the next I/O operation might change the orientation (to byte oriented if it is a char I/O operation, or to wide-character oriented if it is a wide-character I/O operation).

Once a stream has an orientation, it cannot be changed and persists until the stream is closed.

When mode is nonzero, the fwide() function first attempts to set stream's orientation (to wide-character oriented if mode is greater than 0, or to byte oriented if mode is less than 0). It then returns a value denoting the current orientation, as above.

The stream returned by fmemopen() is byte-oriented and cannot be changed to wide-character oriented.

Vibratile answered 13/8, 2017 at 15:14 Comment(6)
So there's no way to fmemopen a string and read wide characters from it?Wismar
@MDXF: Indeed I'm afraid the Glibc implementation does not support wide orientation.Vibratile
fwide does not changes the orientation if the orientation is already defined. So the second call fwide has zero effect. You can try open stream this way fmemopen(s, strlen(s), "r,ccs=UNICODE");Citriculture
@VadimHryshkevich: The first call to fwide() is a query for the current orientation. It returns byte-oriented. The second call attempts to change the orientation to wide and indeed fails. Your proposed approach is interesting. It is non standard but classic on some systems.Vibratile
@chqrlie: This is from fwide() man page: "Once a stream has an orientation, it cannot be changed and persists until the stream is closed." So the second call to fwide() has zero effect. P.S. 1. I have looked into the source code of fwide() on my linux distrib: if the stream has not zero orientation fwide() just exits. 2. From the source code of fmemopen(): there is no chance to change orientation of the stream in this function in any way. 3. It is possible to use function freopen(NULL,"r",fmemopen(...)) to get stream without orientation, but I have tried this without luck.Citriculture
@vadim_hr: yes, I already quoted the man page in the answer (just added more paragraphs for clarity) and got to the same conclusion: The stream returned by fmemopen() is byte-oriented and cannot be changed to wide-character oriented. It is a pity that a stream orientation cannot be changed once set and that it is not handled generically enough to work for memory streams transparently.Vibratile
U
3
  1. Your second line does not use the correct number of parameters, does it? corrected

    FILE *fmemopen(void *buf, size_t size, const char *mode);

  2. glibc's fmemopen does not (fully) support wide characters AFAIK. There's also open_wmemstream(), which supports wide characters but is just for writing.

  3. Is _UNICODE defined? See wchar_t reading.
    Also, have you set the locale to an encoding that supports Unicode, for example, setlocale(LC_ALL, "en_US.UTF-8");? See here.

  4. Consider using a temporary file. Consider using fgetwc / 4 instead.

I have changed my code and adopted the code from @chqrlie since it more close to the OP code but added the locale, otherwise it fails to produce correct output for extended/Unicode characters.

#include <errno.h>
#include <stdio.h>
#include <string.h>
#include <wchar.h>
#include <stdlib.h>
#include <locale.h>

int main(void)
{
    setlocale(LC_ALL, "en_US.UTF-8");
    const char *s = "foo $€ bar foo";
    FILE *f = fmemopen((void *)s, strlen(s), "r");
    wchar_t c;

    if (f == NULL) {
        printf("fmemopen failed: %s\n", strerror(errno));
        return 1;
    }
    printf("default wide orientation: %d\n", fwide(f, 0));
    printf("selected wide orientation: %d\n", fwide(f, 1));
    while ((c = getwc(f)) != WEOF) {
        printf("read %lc (%d 0x%x)\n", c, c, c);
    }
    return 0;
}
Underpainting answered 13/8, 2017 at 6:28 Comment(0)
C
1
  1. You can use getwc() only on unoriented or wide-oriented stream. From getwc() man page: The stream shall not have an orientation yet, or be wide-oriented.

  2. It is not possible to change stream orientation, if the stream already has orientation. From fwide() man page: Calling this function on a stream that already has an orientation cannot change it.

  3. Stream opened with glibc's fmemopen() has an byte-orientation and therefore can't be wide-oriented in any way. As described here uClibc has fmemopen() routine without this limitation.

Conclusion: You need to use uClibc or another library or make your own fmemopen().

Citriculture answered 17/8, 2017 at 8:36 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.