Different representation of unicode code points in Japanese and chinese
Asked Answered
D

2

5

I am trying to display the glyph corresponding to unicode 0x95E8. This codepoint is basically of CJK block (chinese, Japanese, Korean).

I am struggling to know if the glyph representation of this particular codepoint can be different for Japanese and Chinese.

When I am displaying this U+95E8 in a JTextArea, i am able to see "门" character on linux/windows. But when I am trying to display the same codepoint in my "embedded device". the displayed character changes to.

japanese_glyph

I want to know if this codepoint U+95E8 should have uniform representation in all the CJK (Chinese, Japanese, Korean) locales or is different for some of them. Can this kind of manifestation be because of different kind of font installed in different devices? I am sorry for my ignorance but I am not too much into internationalization.

import java.awt.*;
import java.awt.event.*;
import java.util.Locale;

import javax.swing.*;

public class TextDemo extends JPanel implements ActionListener {

    public TextDemo() {
    }

    public void actionPerformed(ActionEvent evt) {
    }

    /**
     * Create the GUI and show it.  For thread safety,
     * this method should be invoked from the
     * event dispatch thread.
     * @throws InterruptedException 
     */
    private static void createAndShowGUI() throws InterruptedException {

        JFrame frame = new JFrame(java.util.Locale.getDefault().getDisplayName());

        frame.setDefaultCloseOperation(JFrame.EXIT_ON_CLOSE);

        Container contentPane = frame.getContentPane();
        contentPane.setLayout(new SpringLayout());

        Dimension size = new Dimension(500, 500);
        frame.setSize(size);
        JTextArea textArea = new JTextArea();

        //Font font1 = new Font("SansSerif", Font.BOLD, 20);
        //textArea.setFont(font1);

        textArea.setEditable(true);
        textArea.setSize(new Dimension(400,400));
        textArea.setDefaultLocale(java.util.Locale.SIMPLIFIED_CHINESE);

        textArea.setText("Printing U+95E8 : \u95e8");                
        contentPane.add(textArea);        
        frame.setVisible(true);
    }

    public static void main (String[] args) {
        java.util.Locale.setDefault(java.util.Locale.JAPANESE);
        javax.swing.SwingUtilities.invokeLater(new Runnable() {
            public void run() {
                try {
                    createAndShowGUI();
                } catch (InterruptedException e) {
                    // TODO Auto-generated catch block
                    e.printStackTrace();
                }
            }
        });
    }
}
Dreher answered 22/7, 2014 at 18:30 Comment(0)
D
4

Generally, CJK characters in Unicode are “unified”, which means that a single code point is used even though the character has traditionally been somewhat different for the different languages. In theory, a single font can contain multiple glyphs for a code point, with some selection mechanism. In practice, a font that contains CJK characters typically has a single design for them, reflecting the design of Traditional Chinese, Simplified Chinese, Japanese, or Korean. In this sense, some fonts might be called “Traditional Chinese”, “Japanese”, etc.

Obviously, you should select the font according to the language of the text.

The glyph in the image in the question looks somewhat odd, and it deviates from the glyphs for U+95E8 in some common fonts, which generally show rather similar designs for this character. So for this specific character, the variation can be expected to be only in the general style (e.g., serif vs. sans-serif, stroke width). It seems that the font being used is somehow oddly designed, at least for this character,

Devotion answered 22/7, 2014 at 20:33 Comment(1)
adding the testcode also, can you please let me know how can I recognize the correct font/environment in my PC that is causing "门" to be printed. So that I can install the same font in my embedded device.Dreher
C
5

Adding to Jukka's answer:

Here is some more info on the "Han unification": http://en.wikipedia.org/wiki/Han_unification

There are two main ways one can render the glyph desired:

  1. Use a locale-specific font (means different fonts for Chinese Traditional, Chinese Simplified, Japanese, Korean). The designers of such fonts take care to do the right thing. This is Jukka's answer. As an example you can take a look at the Noto family of fonts (http://www.google.com/get/noto/cjk.html). Download the "Language specific fonts in OTF" files:
    • The Simplified Chinese font is NotoSansHans-Regular.otf
    • The Traditional Chinese font is NotoSansHant-Regular.otf
    • The Japanese font is NotoSansJP-Regular.otf
    • The Korean font is NotoSansKR-Regular.otf
  2. Use a generic CJK font with multiple locale-speciffic glyphs. As an example you can again use the CJK Noto font, the "Multilingual fonts in OTF" option. See "Script Table and Language System Record" in http://www.microsoft.com/typography/otspec/chapter2.htm. But to use that the font should have the info, the text rendering engine should understand how to deal with the language setting, and the API should expose it.

Now, the stuff below is very low level. When you use something like JTextArea, you have no control. You use what the implementers of JTextArea decided to do.

You can call the setDefaultLocale of your component, and that might help. It is recommended you do that, no matter what. But if you want to be sure what is going on, you take control and specify a language specific font.

how can I recognize the correct font/environment in my PC that is causing "门" to be printed.

You can't do that reliably. The layers below Java might do their own fallback operations. And you can't legally distribute the Windows fonts.

So that I can install the same font in my embedded device

Don't. Use an open source, good quality font. The Noto fonts are a very good option.

Cartilage answered 24/7, 2014 at 21:54 Comment(0)
D
4

Generally, CJK characters in Unicode are “unified”, which means that a single code point is used even though the character has traditionally been somewhat different for the different languages. In theory, a single font can contain multiple glyphs for a code point, with some selection mechanism. In practice, a font that contains CJK characters typically has a single design for them, reflecting the design of Traditional Chinese, Simplified Chinese, Japanese, or Korean. In this sense, some fonts might be called “Traditional Chinese”, “Japanese”, etc.

Obviously, you should select the font according to the language of the text.

The glyph in the image in the question looks somewhat odd, and it deviates from the glyphs for U+95E8 in some common fonts, which generally show rather similar designs for this character. So for this specific character, the variation can be expected to be only in the general style (e.g., serif vs. sans-serif, stroke width). It seems that the font being used is somehow oddly designed, at least for this character,

Devotion answered 22/7, 2014 at 20:33 Comment(1)
adding the testcode also, can you please let me know how can I recognize the correct font/environment in my PC that is causing "门" to be printed. So that I can install the same font in my embedded device.Dreher

© 2022 - 2024 — McMap. All rights reserved.