Since you're using MinGW (actually MinGW-w64, but that shouldn't matter in this case), you have access to the Windows API, so the following should work for you. It could probably be cleaner and actually tested properly, but it should provide a good idea at the least:
#define _WIN32_WINNT 0x0600
#include <errno.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <wchar.h>
#include <windows.h>
int main (void)
{
int argc;
int i;
LPWSTR *argv;
argv = CommandLineToArgvW(GetCommandLineW(), &argc);
if (argv == NULL)
{
FormatMessageA(
(
FORMAT_MESSAGE_ALLOCATE_BUFFER |
FORMAT_MESSAGE_FROM_SYSTEM |
FORMAT_MESSAGE_IGNORE_INSERTS),
NULL,
GetLastError(),
0,
(LPWSTR)&error, 0,
NULL);
fprintf(stderr, error);
fprintf(stderr, "\n");
LocalFree(error);
return EXIT_FAILURE;
}
for (i = 0; i < argc; ++i)
wprintf(L"argv[%d]: %ls\n", i, argv[i]);
// You must free argv using LocalFree!
LocalFree(argv);
return 0;
}
Bear in mind this one issue with it: Windows will not compose your strings for you. I use my own Windows keyboard layout that uses combining characters (I'm weird), so when I type
example -o àlf
in my Windows Command Prompt, I get the following output:
argv[0]: example
argv[1]: -o
argv[2]: a\u0300lf
The a\u0300
is U+0061 (LATIN SMALL LETTER A)
followed by a representation of the Unicode code point U+0300 (COMBINING GRAVE ACCENT)
. If I instead use
example -o àlf
which uses the precomposed character U+00E0 (LATIN SMALL LETTER A WITH GRAVE)
, the output would have differed:
argv[0]: example
argv[1]: -o
argv[2]: \u00E0lf
where \u00E0
is a representation of the precomposed character à
represented by Unicode code point U+00E0. However, while I may be an odd person for doing this, Vietnamese code page 1258 actually includes combining characters. This shouldn't affect filename handling ordinarily, but there may be some difficulty encountered.
For arguments that are just strings, you may want to look into normalization with the NormalizeString
function. The documentation and examples linked in it should help you to understand how the function works. Normalization and a few other things in Unicode can be a long journey, but if this sort of thing excites you, it's also a fun journey.
argv
array. Does it encode the command line using UTF-8 or ANSI? If it's ANSI then you should check whether MinGW supportswmain
to usewchar_t *
parameters. Otherwise just ignore the decrepit ANSI strings (IMO, the entire ANSI API is worthless garbage nowadays that so often leads to mojibake) and callCommandLineToArgvW
and manually encode to UTF-8 viaWideCharToMultiByte
if you needchar *
strings. – KetoneGetCommandLineA
to get an ANSI encoded copy of the command line, and so you get the mojibake "Ω" => "O", since that's the closest mapping your ANSI character set (probably 1252) has for the Greek Omega character. This is worthless. UseGetCommandLineW
,CommandLineToArgvW
, andWideCharToMultibyte
to get UTF-8 encoded command line arguments. – Ketone