Zero-width space vs zero-width non-joiner
Asked Answered
H

2

7

What is the difference between zero-width space (U+200B) and zero-width non-joiner (U+200C) from practical point of view?

I have already read Wikipedia articles, but I can't understand if these characters are interchangeable or not.

I think they are completely interchangeable, but then I can't understand why we have two in Unicode set instead of one.

Hedonic answered 26/11, 2017 at 23:8 Comment(1)
The zero width non joiner breaks up ligatures but does not create a word break. The zero width space is a word break, used in languages that do not use spaces to separate words.Torsion
M
6

A zero-width non-joiner is almost non-existing. Its only purpose is to split things into two. For example, 123 zero-width-non-joiner 456 is two numbers with nothing in between.

A zero-width space is a space character, just a very very narrow one. For example 123 zero-width-space 456 is two numbers with a space character in between.

Manikin answered 26/11, 2017 at 23:34 Comment(7)
Well, thanks, I upvoted, but I can't understand. In both cases we have "something" between 123 and 456. Correct? Yes. Then, if we type 123 then-any-of-these-special-characters 456 in plain text editor with monospaced font, we will not see any space (even a very very narrow one). Correct? Yes, again. So, it seems they are completely interchangeable? But then, why we have 2 instead of one? (That was my original question).Hedonic
so it's somewhat similar to <b> vs <strong> difference. besides both look the same semantically they are different. say zero-width space should match \s in regexp while non-joiner should unmatch. also there could be difference in browser support.Pass
123 zero-width-non-joiner 456 is one number (123456) with no ligature between 3 and 4. There wouldn't normally be a ligature there so its use is redundant in that example.Torsion
@RaymondChen Well, lets assume we have some text which will allow ligatures, for example of one-of-these-special-characters ficial. We can use U+200B and (I haven't tested, though) it will prevent ligature between f and i as well as U+200C. So, what the difference? (Hm, maybe when we use U+200B, we have 2 different words, of and ficial, instead of single word official when we use U+200C?)Hedonic
@RaymondChen My bad, I see, you already mentioned about it in another comment.Hedonic
Actually, not quite correct. A zero-width non-joiner will prevent characters from "joining". For instance the f and i will not join in the "fi" ligature. But the real use is for complex scripts (most Indic scripts, Arabic, a few others). There is also a matching "zero-width joiner". And you can always go to the source: unicode.org/versions/Unicode10.0.0/UnicodeStandard-10.0.pdf (page 373, 422, 452, etc, just search for "zero width non-joiner" and "zero width joiner")Evangelize
@johnc.j. This answer doesn't answer the question at all, can you mark the other answer as the correct answer? The question was "What is the difference between ZWSP and ZWNJ from a practical point of view?", and as the other answer explained, "A zero width space (ZWSP) does everything a ZWNJ does, but it also creates opportunities for line breaks." That basic information isn't in this answer at all.Carilla
G
6

A zero width non joiner (ZWNJ) only interrupts ligatures. These are hard to notice in the latin alphabet but are most frequent in serif fonts displaying some specific combinations of lowercase letters. There are a few alphabets, such as the arabic abjad, that use ligatures very prominently.

Example of ligature fi

A zero width space (ZWSP) does everything a ZWNJ does, but it also creates opportunities for line breaks. Very good for displaying file paths and long URLs, but beware that it might screw up copy pasting.

By the way, I tested regular expression matching in Python 3.8 and Javascript 1.5 and none of them match \s. Unicode considers these characters as formatting characters (similar to direction markers and such) as opposed to space/punctuation. There are other codepoints in the same Unicode block (e.g. Thin Space, U+2009) that are considered space by Unicode and do match \s.

Grouper answered 14/9, 2021 at 21:20 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.