Word wrapping in pango with mixed scripts
Asked Answered
D

3

13

I have a text box implementation that uses pango. If i put a string that starts with a word in right-to-left script, followed by a space, followed by word in left-to-right based script, the word wrapping that pango uses gets messed up (using PANGO_WRAP_WORD_CHAR). For the string العربية ENGLISH I get the following:

Bad word wrapping

If I add the unicode character U+200F after the space, then I get the expected word wrapping:

Expected word wrapping

Also, if I replace the Arabic script above with Hindi (which is left-to-right like the English next to it) then I still get the problem, so it doesn't seem to be a strictly left-to-right, right-to-left thing. In the Hindi case, I put in a hack that inserts a 0x200E after the space it resolves the problem.

Is this a bug in pango? Are there work-arounds I can try that are generic enough to fix the problem but not break other cases? The current work around I'm using inserts either a 0x200E or 0x200F after every space based on the direction of the previous strongly directed character in the string, but I'm not sure if there's certain strings that this will cause problems with.

Update: I was able to reproduce this problem on Ubuntu 12.04 with gedit (with Enable Text Wrapping and Do no split words over two lines settings enabled). I simply typed Hello world over and over until it wrapped several times, then replaced all instances of world with पहुंचगया, and everything collapsed to a single line.

Diversified answered 9/12, 2015 at 18:47 Comment(2)
the issue pango things that an LTR word after RTL word (or the inverse) as one word, so it will not break it on two if you choose wrap on words.Outwash
I updated the question to mention that the problem also occurs when I have only LTR scripts alternating (e.g. English and Hindi)Diversified
O
5

The symbols U+200F and U+200E are RIGHT-TO-LEFT and LEFT-TO-RIGHT Marks. S:

  • between each english text and arabic text, put a RIGHT-TO-LEFT mark
  • between each arabic text and english text, put a LEFT-TO-RIGHT mark

It is a bug because Pango should this automatically in viewing text but as Pango isnt doing it, you should do it manually.

Outwash answered 10/12, 2015 at 16:39 Comment(1)
Thanks. I'm trying to figure out if this is a bug in pango or if these symbols are required. If I put the same text in a microsoft word text box or a Qt text edit box, then the results are what I expect (look like the bottom picture).Diversified
C
4

It seems to me a bug or not complete feature as it appears on mixed scripts.

Seem to me you are using an old pango development, may be from Ubuntu 12.04?

Ubuntu 12.04 contains Gedit 3.4
Ubuntu 15.10 contains Gedit 3.10

Pango has radical change in 3.6, it has replaced his shaping engine with HarfBuzz. [2]

I couldn't reproduce the bug using Gedit 15.10, it always moves (2) two words down, also it does not allow me to resize its window to try splitting those two words. See screen-shot.

pango shaping mixed scripts in gedit

Update:

It seems its behavior has changed:

  • It does not wrap the 1st word from English script when start with Arabic.

    pango-view  --text "وقعت أطراف سياسية ليبية اليوم في المغرب اتفاق سلام برعاية أممية aljazeeranet" --width=70 --margin=0 --wrap=word 
    

    enter image description here

  • It same as previous case, does not wrap, and enforce the width

    pango-view  --text "elections الجزيرة" --width=30 --margin=0 --wrap=word
    

    enter image description here

References:

Cameraman answered 18/12, 2015 at 7:24 Comment(4)
Thanks for the help. After your post, I've tried this in ubuntu 15.1's gedit and I get the same result (using the hello world test case I mentioned above).Diversified
I also am not sure what version of pango ubuntu 15.1 is using. Searching for pango from /usr/lib, the only pango library I see is libpango-1.0.so.0 (and some others with similar versioning)Diversified
@pauld , ok I could reproduce it, I didn't notice --wrap=word, as your question was clear about word-char wrapping mode. Same behavior in Ubuntu 15.10. So this answer is irrelevant for the problem I will remove it later. I should add its not just with U+200F & U+200E, but any control char make the english word wrap. what i have tested so far U+202C, U+061C, U+202A, U+2069 , and same thing happen when swapping English word 1st then Arabic 2nd.Cameraman
@Sneetsher, I think your answer is relevant as it shows how to reproduce the behavior.Outwash
D
2

Note, we recently upgraded the version of pango we used, from pango version 1.36.1 to 1.38.1, and this issue went away. So I believe this was a bug in pango or harfbuzz that has since been fixed.

Diversified answered 3/6, 2016 at 21:28 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.