Is there a way to print Runes as individual characters?
Asked Answered
C

2

10

Program's Purpose: Rune Cipher

Note - I am linking to my Own GitHub page below (it is only for purpose-purpose (no joke intended; it is only for the purpose of showing the purpose of it - what I needed help with (and got help, thanks once again to all of you!)


Final Edit:

I have now (thanks to the Extremely Useful answers provided by the Extremely Amazing People) Completed the project I've been working on; and - for future readers I am also providing the full code.

Again, This wouldn't have been possible without all the help I got from the guys below, thanks to them - once again!

Original code on GitHub

Code

(Shortened down a bit)

#include <stdio.h>
#include <locale.h>
#include <wchar.h>
#define UNICODE_BLOCK_START 0x16A0
#define UUICODE_BLOCK_END   0x16F1

int main(){
  setlocale(LC_ALL, "");
  wchar_t SUBALPHA[]=L"ᛠᚣᚫᛞᛟᛝᛚᛗᛖᛒᛏᛋᛉᛈᛇᛂᛁᚾᚻᚹᚷᚳᚱᚩᚦᚢ";
  wchar_t DATA[]=L"hello";
  
    int lenofData=0;
    int i=0;

    while(DATA[i]!='\0'){
          lenofData++;  i++;
          }

  for(int i=0; i<lenofData; i++) {
      printf("DATA[%d]=%lc",i,DATA[i]);
      DATA[i]=SUBALPHA[i];
      printf(" is now Replaced by %lc\n",DATA[i]); 
      }        printf("%ls",DATA);

return 0;
}

Output:

DATA[0]=h is now Replaced by ᛠ

...

DATA[4]=o is now Replaced by ᛟ
ᛠᚣᚫᛞᛟ

Question continues below

(Note that it's solved, see Accepted answer!)

In Python3 it is easy to print runes:

for i in range(5794,5855):
print(chr(i))

outputs

ᚢ ᚣ (..) ᛝ ᛞ

How to do that in C ?

  • using variables (char, char arrays[], int, ...)

Is there a way to e.g print ᛘᛙᛚᛛᛜᛝᛞ as individual characters?

When I try it, it just prints out both warnings about multi-character character constant 'ᛟ'.

I have tried having them as an array of char, a "string" (e.g char s1 = "ᛟᛒᛓ";)

  • And then print out the first (ᛟ) char of s1: printf("%c", s1[0]); Now, this might seem very wrong to others.

One Example of how I thought of going with this:

Print a rune as "a individual character":

To print e.g 'A'

  • printf("%c", 65); // 'A'

How do I do that, (if possible) but with a Rune ?

I have as well as tried printing it's digit value to char, which results in question marks, and - other, "undefined" results.

As I do not really remember exactly all the things I've tried so far, I will try my best to formulate this post.

If someone spots a a very easy (maybe, to him/her - even plain-obvious) solution(or trick/workaround) -

I would be super happy if you could point it out! Thanks!

This has bugged me for quite some time. It works in python though - and it works (as far as I know) in c if you just "print" it (not trough any variable) but, e.g: printf("ᛟ"); this works, but as I said I want to do the same thing but, trough variables. (like, char runes[]="ᛋᛟ";) and then: printf("%c", runes[0]); // to get 'ᛋ' as the output

(Or similar, it does not need to be %c, as well as it does not need to be a char array/char variable) I am just trying to understand how to - do the above, (hopefully not too unreadable)

I am on Linux, and using GCC.

External Links

Python3 Cyphers - At GitHub

Runes - At Unix&Linux SE

Junicode - At Sourceforge.io

Cleodell answered 27/2, 2021 at 16:37 Comment(14)
Does this answer your question? How to print Unicode codepoints as characters in C?Entire
or char *a[] = { "ᛘ","ᛙ","ᛚ","ᛛ","ᛜ","ᛝ","ᛞ" }; then printf("%s\n", a[5]);Plagal
@e2-e4 I do not think this will work EDIT : It does, on modern displays and the like. It may not be portable.Doloroso
To store them, maybe try using wchar_tDominquedominquez
well C's chars are bytes, and runes are clearly not in those 0-255 range, since its default encoding is ASCII, you can use multi byte encoding like utf8, and functions like printf(not on each system though) can actually understand that and print it correctly. So for character it will be 3 bytes: [0xE1 0x9A 0xA2], try it out: char n[4] = {0xe1, 0x9a, 0xa2, '\0'};printf("%s\n", n);Swollen
@Swollen this seems (I have not tested it yet) to be the issue, I will try this,Cleodell
@AntoninGAVREL Will do, thanks.Cleodell
@BladeMigh as it is stored in a wchar_t it will occupy 4 bytes which is the wchar_t size.Halfmoon
@antonin: on some platforms (Linux and Mac, for example). On Windows it's two bytes and can't hold Unicode characters in the astral planes.Crumple
Final - Last comment of this Question: I have now provided the program I was trying to work on, and this is the provided code - for the future readers. But this wouldn't have been possible (at all) without all of you helping me - thanks again!Cleodell
It's best not to include the solution in the question. You've accepted an answer, but if you want to provide more information about how you solved it, I suggest posting another answer (which you can link to from the question).Venereal
@KeithThompson Hey! Thanks for pointing it out; I thought it would be easier to understand what the question was about(by, taking a look at the code, - and The solution is not in the question, the solution Is very well indeed below But what I have posted above is the program that I used the below solution to :) (one probably asks why- and that's because since my writing, Is really confusing - this is not a joke, I often have hard times formulating myself) Edit: Will obviously post it as an answ. If it looks more readable; thanks again [+1]Cleodell
@e2-e4 the encoding of a C file itself is unspecified, but if your compiler understands UTF-8, assuming the terminal understands, I see no reason why this would not be okay.Aseity
@AntoninGAVREL Again, THANKS for helping me out. This was flawless. Absolutely brilliant! Still, (obviously) works - just wanted to say this; Have a good corona-Free day on you! (And on all other's reading this comment!)Cleodell
H
3

Stored on the stack as a string of (wide) characters

If you want to add your runes (wchar_t) to a string then you can proceed the following way:

using wcsncpy: (overkill for char, thanks chqrlie for noticing)

#define UNICODE_BLOCK_START 0x16A0 // see wikipedia link for the start
#define UUICODE_BLOCK_END   0x16F0 // true ending of Runic wide chars

int main(void) {
  setlocale(LC_ALL, "");
  wchar_t buffer[UUICODE_BLOCK_END - UNICODE_BLOCK_START + sizeof(wchar_t) * 2];

  int i = 0;
  for (wchar_t wc = UNICODE_BLOCK_START; wc <= UUICODE_BLOCK_END; wc++)
    buffer[i++] = wc;
  buffer[i] = L'\0';

  printf("%ls\n", buffer);
  return 0;
}

About Wide Chars (and Unicode)

To understand a bit better what is a wide char, you have to think of it as a set of bits set that exceed the original range used for character which was 2^8 = 256 or, with left shifting, 1 << 8).

It is enough when you just need to print what is on your keyboard, but when you need to print Asian characters or other unicode characters, it was not enough anymore and that is the reason why the Unicode standard was created. You can find more about the very different and exotic characters that exist, along with their range (named unicode blocks), on wikipedia, in your case runic.

Range U+16A0..U+16FF - Runic (86 characters), Common (3 characters)

NB: Your Runic wide chars end at 0x16F1 which is slightly before 0x16FF (0x16F1 to 0x16FF are not defined)

You can use the following function to print your wide char as bits:

void print_binary(unsigned int number)
{
    char buffer[36]; // 32 bits, 3 spaces and one \0
    unsigned int mask = 0b1000000000000000000000000000;
    int i = 0;
    while (i++ < 32) {
        buffer[i] = '0' + !!(number & (mask >> i));
        if (i && !(i % 8))
            buffer[i] = ' ';
    }
    buffer[32] = '\0';
    printf("%s\n", buffer);
}

That you call in your loop with:

print_binary((unsigned int)wc);

It will give you a better understand on how your wide char is represented at the machine level:

               ᛞ
0000000 0000001 1101101 1100000

NB: You will need to pay attention to detail: Do not forget the final L'\0' and you need to use %ls to get the output with printf.

Halfmoon answered 27/2, 2021 at 17:54 Comment(5)
I am not so sure about your case, on linux it works fine, it may have to do with the size of wc on windows, try to replace sizeof(wc) by 2 in the functionHalfmoon
It worked; (for me with - a minor change (probably just was my computer) detected trough GDB - line 12 (the, wchar_t buffer[5855 - 5794 + 2 + 1]; ) I added + 1 and it worked in my case,(Highly likely that, the system I am using using another kind, which is why it maybe failed) it printed out ᚢᚣ... all til ᛝᛞ - coming back in a few mins, really - thank you for this huge help!Cleodell
Ok @chqrlie I agree, will edit. William: glad it worked!Halfmoon
NB: Your wide chars start as states wikipedia and as for the ending chars it stops at 0x16F1, giving you some nice extra runic words ;) @chqrlie it is %s that I use in the case of printing the bits, not the wide chars word.Halfmoon
For consistency with Unicode block definitions and the size of the buffer wchar_t array, UUICODE_BLOCK_END should be the code point for the last runic character (0x16F0 or possibly 0x16F8) and the test should be wc <= UUICODE_BLOCK_ENDKindred
C
4

To hold a character outside of the 8-bit range, you need a wchar_t (which isn't necessarily Unicode). Although wchar_t is a fundamental C type, you need to #include <wchar.h> to use it, and to use the wide character versions of string and I/O functions (such as putwc shown below).

You also need to ensure that you have activated a locale which supports wide characters, which should be the same locale as is being used by your terminal emulator (if you are writing to a terminal). Normally, that will be the default locale, selected with the string "".

Here's a simple equivalent to your Python code:

#include <locale.h>
#include <stdio.h>
#include <wchar.h>
int main(void) {
  setlocale(LC_ALL, "");
  /* As indicated in a comment, I should have checked the
   * return value from `putwc`; if it returns EOF and errno
   * is set to EILSEQ, then the current locale can't handle
   * runic characters.
   */
  for (wchar_t wc = 5794; wc < 5855; ++wc)
    putwc(wc, stdout);
  putwc(L'\n', stdout);
  return 0;
}

(Live on ideone.)

Crumple answered 27/2, 2021 at 17:8 Comment(14)
I will try this; this looks really, promising - (already tried it) will try it out a bit, coming back in a few mins, - again; **Thanks so much for your time; **Cleodell
this does work, although - is it possible to assign the output (of the above code) to a, for example array? +Edit: Reading this one cplusplus.com/reference/cwchar right now to at least get some context about what it is; thanks againCleodell
You can store a wchar_t in an array of wchar_t elements using array assignment. (wa[0] = 5794). Which is just the same as storing a char in char array (or an int in an int array). Obviously, you can't store an arbitrary wchar_t in a char array because it might overflow. (Equally, you can't store a long long in an array of short.) If you want to create a formatted wide string from a format specification, see swprintf (similar to snprintf). I suspect that none of those are what you really are asking.Crumple
If you're looking for a way to convert an array of wide characters into a UTF-8 encoding of the same string (which you could store in a char array because UTF-8 encodes a Unicode string into a sequence of 8-bit codes) then you can just use snprintf with a %ls format conversion. Assuming: your current locale is a UTF-8 locale, and your C library implements standard C99 or more recent.Crumple
It's important to understand the difference between a "character" (an abstract representation of a letter or other graphic symbol) and the encoding of a character.Crumple
I have read your comments, (will(Must) read them multi times again, (I am that kind of slow-reader) but still - thanks for all this! even if it isn't what I maybe, exactly expected it still points out quite a few really important parts!Cleodell
Note that this will only work if the user's locale is something unicode-based (which it probably is). You could instead use if (!setlocale(LC_ALL, "C.UTF-8")) { fprintf(stderr, "No unicode support!\n"); exit(1); } which should work on any system that supports unicode regardless of what the user's locale is.Fighterbomber
@ChrisDodd This was something brilliant to add, thank you.Cleodell
@Chris: I don't entirely agree. Calling setlocale(LC_ALL, "C.UTF-8"); will change to a UTF-8 locale if it exists on the host. But it doesn't change the coding being used by the terminal, so the test doesn't help you much ...Crumple
(cont'd) On most systems, there is a locale definition for UTF-8 multibyte characters, but if the terminal is set to GB 18030 then UTF-8 output will be gibberish. The locale set with "" is what the terminal actually uses (normally). All of that is independent of the wide character encoding. These days, this is usually unicode (or the unicode BMP), which allows the use of multiple locales. If the locale is not UTF-8, attempting to wputc a runic character will result in an EILSEQ error, which is a more reliable error signal. (Not 100% reliable, but more reliable.)Crumple
@rici: If the output device can't support UTF-8, the setlocale call should fail. If the terminal does not support utf-8, it is (more) likely that the putwc with a unicode codepoint would output gibberish regardless of which locale is used, not an EILSEQ.Fighterbomber
Just to mention : I had to install juniper (I think it was called) (by apt) see my Unix & Linux SE post for more "context"; Just for the future readers. //Have a great weekend and THANKS to both of you + all others who helped! //Wishes from a Swede!Cleodell
@chris: the program has no idea what the terminal supports. (The terminal could be across the world connected by telnet or whatever.) The only thing the user can really do is get their shell to set the locale to fit the terminal they are using. Putwc uses the current locale to figure out how to convert a wchar_t value into a multibyte encoding. If you select a utf-8 locale, that's what will be used, whether or not the terminal supports utf8. If your locale reflects the encoding the terminal uses, then characters which cannot be encoded will return EILSEQ.Crumple
I'd say "try it and see" but I'm acutely aware of how difficult it is to try experiments like this. You'd need both a non utf-8 locale, perhaps an iso-8859-x encoding since Big-5 is rare outside China, and a terminal which uses that encoding instead of unicode. But it can be done if you're determined enough to test it.Crumple
H
3

Stored on the stack as a string of (wide) characters

If you want to add your runes (wchar_t) to a string then you can proceed the following way:

using wcsncpy: (overkill for char, thanks chqrlie for noticing)

#define UNICODE_BLOCK_START 0x16A0 // see wikipedia link for the start
#define UUICODE_BLOCK_END   0x16F0 // true ending of Runic wide chars

int main(void) {
  setlocale(LC_ALL, "");
  wchar_t buffer[UUICODE_BLOCK_END - UNICODE_BLOCK_START + sizeof(wchar_t) * 2];

  int i = 0;
  for (wchar_t wc = UNICODE_BLOCK_START; wc <= UUICODE_BLOCK_END; wc++)
    buffer[i++] = wc;
  buffer[i] = L'\0';

  printf("%ls\n", buffer);
  return 0;
}

About Wide Chars (and Unicode)

To understand a bit better what is a wide char, you have to think of it as a set of bits set that exceed the original range used for character which was 2^8 = 256 or, with left shifting, 1 << 8).

It is enough when you just need to print what is on your keyboard, but when you need to print Asian characters or other unicode characters, it was not enough anymore and that is the reason why the Unicode standard was created. You can find more about the very different and exotic characters that exist, along with their range (named unicode blocks), on wikipedia, in your case runic.

Range U+16A0..U+16FF - Runic (86 characters), Common (3 characters)

NB: Your Runic wide chars end at 0x16F1 which is slightly before 0x16FF (0x16F1 to 0x16FF are not defined)

You can use the following function to print your wide char as bits:

void print_binary(unsigned int number)
{
    char buffer[36]; // 32 bits, 3 spaces and one \0
    unsigned int mask = 0b1000000000000000000000000000;
    int i = 0;
    while (i++ < 32) {
        buffer[i] = '0' + !!(number & (mask >> i));
        if (i && !(i % 8))
            buffer[i] = ' ';
    }
    buffer[32] = '\0';
    printf("%s\n", buffer);
}

That you call in your loop with:

print_binary((unsigned int)wc);

It will give you a better understand on how your wide char is represented at the machine level:

               ᛞ
0000000 0000001 1101101 1100000

NB: You will need to pay attention to detail: Do not forget the final L'\0' and you need to use %ls to get the output with printf.

Halfmoon answered 27/2, 2021 at 17:54 Comment(5)
I am not so sure about your case, on linux it works fine, it may have to do with the size of wc on windows, try to replace sizeof(wc) by 2 in the functionHalfmoon
It worked; (for me with - a minor change (probably just was my computer) detected trough GDB - line 12 (the, wchar_t buffer[5855 - 5794 + 2 + 1]; ) I added + 1 and it worked in my case,(Highly likely that, the system I am using using another kind, which is why it maybe failed) it printed out ᚢᚣ... all til ᛝᛞ - coming back in a few mins, really - thank you for this huge help!Cleodell
Ok @chqrlie I agree, will edit. William: glad it worked!Halfmoon
NB: Your wide chars start as states wikipedia and as for the ending chars it stops at 0x16F1, giving you some nice extra runic words ;) @chqrlie it is %s that I use in the case of printing the bits, not the wide chars word.Halfmoon
For consistency with Unicode block definitions and the size of the buffer wchar_t array, UUICODE_BLOCK_END should be the code point for the last runic character (0x16F0 or possibly 0x16F8) and the test should be wc <= UUICODE_BLOCK_ENDKindred

© 2022 - 2024 — McMap. All rights reserved.