In Windows, how do you enter a character outside of the Unicode Basic Multilingual Plane?

Asked 18/3, 2012 at 2:26 Answered 3/4, 2014 at 18:3

I know that Windows has supported supplemental planes since Windows XP.

I have fonts which I know have characters outside the basic multilingual plane (BMP).

For these characters, the Unicode codepoint consists of five hexadecimal digits.

I do not know how to enter these characters in applications.

Windows seems to only support keyboard entry of characters in the BMP. You can either enter a decimal number or some applications allow you to enter a four digit hexadecimal number.

Can someone confirm how entry is managed? I don't care if it directly from the keyboard or application-assisted. (The default Windows "Character Map" application only supports characters in the BMP, so I need suggestions -- preferably to an application supporting at least Unicode Version 5, if not 6.)

In Java, these characters are managed using "surrogate pairs" in UTF-16. I'm concerned that Windows may also have some of the old "Unicode is 16 bit" legacy, causing to have a similar issue. Even getting confirmation that I need to punch in surrogate pair numbers would be an answer.

Thanks!

Mcclinton answered 18/3, 2012 at 2:26 Comment(0)

Ok, i clearly do not know what are you talking about.

Anyway, refering to:

The default Windows "Character Map" application only supports characters in the BMP, so I need suggestions -- preferably to an application supporting at least Unicode Version 5, if not 6.

I've found a link to an application that could help.

https://www.babelstone.co.uk/Software/BabelPad.html

Download it, and select menu Tools -> then Character map.

Hope it could help.

If not sorry for the missunderstanding, just intending to help.

Bluh answered 18/3, 2012 at 2:57 Comment(1)

It looks like babelstone.co.uk/Software/BabelMap.html is explicitly a character map application currently supporting Unicode 6.0. That should work. – Mcclinton 18/3, 2012 at 13:53

At least in MS Word 2007, the Alt+X method works for non-BMP characters, too: enter U+ followed by the Unicode number in hexadecimal, then Alt+X. The characters U+ may be omitted if the preceding character is not a digit or a letter A–F or X. You may need to explicitly select the font of the text (i.e., Word does not necessarily switch to a font that contains the character, as it normally does with BMP characters).

In Word, you can alternatively use the Insert → Symbol command and then, in the insertion window, select a font that contains the character you need.

Using the UnicodeInput program, you can enter a character by pressing Alt++ and then entering the Unicode number. It supports non-BMP too, but with an odd restriction, due to a program bug: it does not work for non-BMP characters if the fourth digit from the right is a letter (e.g., U+1B000).

BabelPad, mentioned in Martin’s answer, is great alternative and lets you select characters both by number and by Unicode name.

There are probably other Unicode editors too that let you work with BMP; check out Alan Wood’s list of Unicode and Multilingual Programs and Utilities.

Nagoya answered 18/3, 2012 at 5:52 Comment(1)

It is odd. Supplimentary planes have existed since Unicode 2.0 (where they were reserved, but unused). Alan Wood's list doesn't mention whether an application supports characters outside of the BMP and I know for a fact that some of the applications mentioned are limited to the BMP. -- It is handy about the Alt-X thing. I didn't know that it supported characters outside the BMP. – Mcclinton 18/3, 2012 at 14:16

I have now composed a small utility than can be used on a web browser in an application-like manner: Full Unicode Input utility. It’s similar to Character Map in Windows but lets you access all Unicode planes and get the selected characters collected in an area, from which they can be copied as a unit. Somewhat quick and dirty, but functional.

Nagoya answered 2/10, 2012 at 8:24 Comment(2)

It looks pretty neat, but... Which version of Unicode is it based upon? For instance, if you go to "CJK Unified Ideographs Extension B", I see a bunch of characters in that section, but the only one shown as in that section is simply "first" and the others are listed as "unassigned." It looks like the Unicode database being used is old. – Mcclinton 25/5, 2013 at 3:6

@yam655, it’s based on version 6.2 (the newest one). There was a bug in handling code points defined (in the character database) as ranges, with just First and Last point having their entries. This affected the info shown about a code point on mouseover and the color of the cell, not the functionality. I have now made a quick and dirty fix to this. – Nagoya 25/5, 2013 at 5:47

I will shamelessly plug a little tool I wrote for entering symbols in Windows as I find any solution usually presented too cumbersome for daily frequent use. My personal use case is typing the Swedish å for example on an international US keyboard without having to switch layouts.

It allows the entering of unicode characters through a popup window not dissimilar to how this works in Apple OS X.

See https://github.com/mjvh80/SymWin for details, it's free and open source, but must (currently) be compiled. If there is sufficient interest I could add a pre-built version.

The tool can be configured per key, e.g. by copy/pasting symbols once from a site such as http://copypastecharacter.com.

Nl answered 3/4, 2014 at 18:3 Comment(0)

Recommended topics

Hot tags