Symbol not displaying properly [closed]
Asked Answered
D

5

30

The symbol is: ؤْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْ

What's so special about this symbol and where did it come from?

What can be done to validate against such input? Or even better, how can such symbols be displayed properly (i.e. not letting them overlap over other elements) ?

Deathbed answered 18/12, 2015 at 6:49 Comment(13)
what's with the line? or just on my screen?Coons
@DrixsonOseña:- I guess that's what OP is asking! It's there on my screen as wellSabir
@RahulTripathi I had no idea :)Coons
That is some kind of modifier of a sign, normally you would just use one, but you can make crazy combinations. e.g. you could enter the letter ä directly or with an a and that double point modificatior..Hoffman
@chaosifier:- May be because you have not mentioned from where you get this symbol? Whats the source....etc(Not the downvoter btw)Sabir
Guys on 9gag have been using this symbol for a while because of its weird behavior. I tried to find more about it on google but google replied with a 400 error. So i had to post this question here.Deathbed
but google replied with a 400 error - that's kinda interesting in itself! I wonder why that happensHands
If you paste this on password input it paste a lot of things. Tried pasting in facebook and they wont accept it , youtube won't paste it :DCoons
@chaosifier: How you used it on 9GAG? since google is replying 404 error. For a change I checked with Bing. It returns result on some Arebic character. Not sure though if it matches your purpose of using it. +1 for interesting question.Rauscher
@mk08, They use it on the comments section. Thank you.Deathbed
It's beautiful. ؤْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْSurmullet
Related: How does Zalgo text work?Chukker
@Traubenfuchs It's f***ed up :D ؤْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْ‌​ْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْْChili
H
20

Well since it seems to be not as trivial as I thought for others here is my answer.

This is called Combining Diacritical Marks.

To give you an example you can write a ä directly or as ä which results in "ä".

Now you can mess up with that signs like here: "ä̈̈̈̈̈̈", here I entered: ä̈̈̈̈̈̈

To protect yourself to such "unicode" attacks you could limit the count of unicode chars which are allowed to come after each other. I cannot give you an exact example since you tags don't give a hint about your server side language. If you have a plain english website you might try to limit it to ascii chars only. However I would not recomment that, since I would be not allowed to sign then with my name :-)

I would just limit the count of Unicode characters after each other. That might been done with regex.

If you just want to avoid that the Unicode characters "break out" of their container try using style="overflow:auto" which seems to limit the way how it is rendered.

Hoffman answered 18/12, 2015 at 7:16 Comment(7)
I didn't recognized before that this is displayed differently in other browsers. If Rahul Tripathi is right and this special char is an arabic one (I didn't invest to check this special one), I could imagine that some browers/operating systems don't have installed the support for arabic chars, so I would guess it is a bug in this case or a missing support.Hoffman
Since i had some more questions, and also since some members were saying that this question was not programming related, i had to update the question and uncheck your answer. Sorry for the inconvenience caused, i should have included everything in the beginning.Deathbed
@Deathbed now you have a solution in my answer how to fix it :)Hoffman
Is that how Facebook handles such input? Can the overlapping nature of the symbol be stopped without having to validate the input i.e. by using HTML/CSS alone?Deathbed
@chaosifier:- By HTML/CSS, I dont think, you can use some other language like Javascript to validate itSabir
@Deathbed I don't use Facebook so no idea. However the edit of " 一二三" (this is 123 I guess) gives you a hint put it into a div with overflow:auto.Hoffman
@rekire, that worked like a charm. I think you could include that in your answer. I have rolled back your changes to the question to allow others to see the problem. Thanks a lot for your answer, really appreciate it.Deathbed
S
8

I just copied the symbol to SQL Server and Visual Studio and found that the symbol got converted to

enter image description here

So it looks like the combination of ْ (which looks like an Arabic symbol)symbol which the browser is not able to recognize.

The symbol is Arabic Hamza symbol.

Also the same symbol is interpreted correctly by IE.

enter image description here

So it looks like that some browsers are not able to recognize the symbol.

EDIT:

To validate such input usually you can use some sort validation(like to restirct user to enter only ASCII characters) using languages like Javascript or PHP through which you can restrict the user to input the characters as per your choice.

Or even better, how can such symbols be displayed properly

If the browser cannot render the symbol as the one you have shown then as a workaround you can put some limit on those characters like put them inside a div with overflow:auto but that would not be a good solution. A better one would be to use a validation script.

Sabir answered 18/12, 2015 at 7:2 Comment(1)
Why do you think that IE is correct and firefox (which produces the line) is wrong? I'm not an expert for arabic, but my first guess would have been the other way round. The line seems like the logical consequence of stacking combining marks.Chukker
P
5

It strange that, on screen you will see only 1 character followed by a line drawn from nowhere.

But when inspected with chrome, It is actually characters with 1st character having Unicode 1572, followed by 161 characters that draws line having Unicode 1618 ! And after that there is Unicode (or ASCII code) 32 for space.

Prevocalic answered 18/12, 2015 at 7:3 Comment(4)
"(Unicode) code point", not "ASCII code".Cryptograph
True but limited. 1572 & 1618 are ASCII code (decimal system) and if you convert those two into hex you get 624 & 652 respectively. Now use &#x<HexCode>; and you will see magic. So &#x624; is a Unicode to a first character that you see in the question and &#x652; is Unicode to rest of 161 characters... :DPrevocalic
ASCII vs Unicode has nothing to do with decimal vs hexadecimal. ASCII is a 7-bit character set, so the largest code point is 127; there is no "ASCII code" (code point) 1572. You are talking about another character set, Unicode, so the term "ASCII" is not correct.Cryptograph
Yes, that's true. Unicode is superset of ASCII. I have read https://mcmap.net/q/73389/-what-39-s-the-difference-between-ascii-and-unicode ... Thanks for correcting me..Prevocalic
U
2

I am not sure if parsing your symbols in Javascript is gonna be helpful but here is a script that does that:

var text = 'your symbol goes here',
regex1 = /(?:[\u0624|\u0652])/g,
result;
// note that the symbol comprises of the letter and the repeated diacritics;
// to remove the symbol completely: 
result = text.replace( regex1, '');

Here is a way to see what kind of characters are included in the symbol and how these chars made it looked very weird (it’s using javascript regex):

https://regex101.com/r/yW4aM8/3

You may wanna use meta tag: charset=UTF-8 to render the entire symbol correctly on all browsers than trying it only on IE. I would say the only reason your symbol looks weird is because the diacritics (the repeated chars) are not used correctly, otherwise, the chars included are all legit. I wouldn’t really be surprised if this symbol is just someone trying to misuse a form input or something for the same effect.

The symbol is using pure Arabic characters, and just for you to know the range of this language’s characters in the unicode are as follows (javascript regex) and available at unicode.org:

/[\u0600-\u06FF]/g

/[\u0600-\u06FF]/g.exec( ‘text here’ );

// it's advised that you wrap the Arabic words in spans to control and show them correctly, do the following:
'text includes arabic words'.replace(/(?:([\u0600-\u06FF]+))/g, '<span class="xyz">$1</span>';

and the css would be:

.xyz { unicode-bidi: bidi-override; }

I hope that helps a bit. good luck.

Undistinguished answered 21/12, 2015 at 11:51 Comment(1)
Thanks mate. It was helpful.Deathbed
Z
0
$ echo -n ؤْْ | recode utf8..dump
UCS2   Nem   Descripción

0624   wH    arabic letter waw with hamza above
0652   0+    arabic sukun
0652   0+    arabic sukun
0652   0+    arabic sukun
[...lots of repeated lines...]
0652   0+    arabic sukun

That's the arabic waw (w) with a lot of diacritics: 1 hamza (precomposed as the character waw with hamza above) and about 160 repeated sukun diacritics.

Zygophyte answered 18/12, 2015 at 19:34 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.