What is this character ( Â ) and how do I remove it with PHP?
Asked Answered
S

10

25

It's a capital A with a ^ on top: Â

It is showing up in strings pulled from webpages. It shows up where there was previously an empty space in the original string on the original site. This is the actual character that is stored in my database. It's also what displays on my website when I echo a string that contains it.

I realize it's a character encoding problem when I originally process the webpage, but I am now stuck with these characters in my database. I have to convert this character when it is displayed, or somewhere else in the php before outputting html that contains it. I cannot reprocess the original documents.

I have tried str_replace() and html_entity_decode() and neither do anything.

What else should I try?

Steppe answered 25/8, 2011 at 7:27 Comment(1)
you should not remove them by str_replace, you should fix the encoding problem first. take a look at this: stackoverflow.com/search?q=mysql+encoding and this stackoverflow.com/search?q=php+encodingIrregular
C
28

"Latin 1" is your problem here. There are approx 65256 UTF-8 characters available to a web page which you cannot store in a Latin-1 code page.

For your immediate problem you should be able to

$clean = str_replace(chr(194)," ",$dirty)

However I would switch your database to use utf-8 ASAP as the problem will almost certainly reoccur.

Chapiter answered 25/8, 2011 at 8:19 Comment(5)
The Unicode codespace goes up to U+10FFFF, so that's about a million code points, give or take a few illegal ones.Lamarckism
here's a useful chart to reference characters like this: ascii-code.comSteppe
@Ignacio -- very true -- I was limiting myself to the UTF-16 character set. :-}Chapiter
UTF-16 has the same number of characters. You probably meant UCS-2.Frontward
Thanks for this trick, in case if someone will search for the solution how to pint out Latin-1 text from SQL Server to wordpress Here is str_replace(chr(194)," ",mb_convert_encoding($val, 'UTF-8', 'ISO-8859-1'));Meeker
D
14

This works for me:

$string = "Sentence ‘not-critical’ and \n sorting ‘not-critical’ or this \r and some ‘not-critical’ more. ' ! -.";
$output = preg_replace('/[^(\x20-\x7F)\x0A\x0D]*/','', $string);
Decentralize answered 5/9, 2017 at 21:23 Comment(2)
Some whitespaces are missing from my text now, but encoded chars are gone.Ethnarch
This was the only answer that got rid of the  char for me.Aggarwal
H
8

It isn't really one character, and is likely caused by misalignment between content encoding and browser's encoding. Try to set the encoding of your outputted page to what you are using.

e.g. In the section, output:

echo "<META http-equiv='Content-Type' content='text/html; charset=UTF-8'>";

(Adjust UTF-8 to whatever you are using)

Hod answered 25/8, 2011 at 7:32 Comment(4)
+1 - this is a problem that needs fixing the root cause (although merely changing headers might not entirely cut it, depending on the situation)Misleading
This is the actual character that is stored in my database. Does that change the situation at all? My database encoding is Latin 1 (default). I'm not very familiar with encoding issues.Steppe
Oh yes, sorry I didn't read the question carefully. In that case, after you pull data from another site you need to detect its encoding and convert it to your database's encoding before storing them. Usually it's done by parsing header like the one I gave, but depending on the site you crawl it can get complicated.Hod
That sounds like the correct solution to the problem. I will make that change when I get back into that part of the project. Any suggestions for temporary solutions using PHP before outputting the character, or is it impossible?Steppe
W
3

I use this one a lot

function cleanStr($value){
    $value = str_replace('Â', '', $value);
    $value = iconv('UTF-8', 'ASCII//TRANSLIT//IGNORE', $value);
    return $value;
}
Wendling answered 22/2, 2017 at 9:35 Comment(1)
That makes £ to lbNovation
A
1

This is coming from database so the best option will be remove from database using a SQL query like:

UPDATE products SET description = REPLACE(description, 'Â', ' ');
Ankylose answered 20/3, 2020 at 18:3 Comment(0)
C
0

Use Bellow codes

echo "<META http-equiv='Content-Type' content='text/html; charset=UTF-8'>";
echo htmlspecialchars_decode($your_string, ENT_QUOTES);
Crossways answered 29/5, 2014 at 8:34 Comment(0)
A
0

This problem occurs when using different charset in your web.

To solve this (using utf-8 in the examples):

in the <HEAD> of your page add charset:

<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />

In any form you submit add accept-charset:

<form name="..." method=".." id=".."  accept-charset="utf-8">

If you are using php+MySQLi to process your form, you should make sure the database connection is also supporting your charset. Procedural style:

mysqli_set_charset($link, "utf8");

and object oriented style:

$mysqli->set_charset("utf8")
Allargando answered 22/2, 2017 at 9:52 Comment(0)
P
0

I Actually had to have all of this:

    <--!DOCTYPE html--> 
    <--html lang="en-US"-->
    <--head-->
    <--meta charset="utf-8"-->   
    <--meta http-equiv="X-UA-Compatible" content="IE=edge"--> 
    <--meta name="viewport" content="width=device-width, initial-scale=1"--> 
    <--meta http-equiv="Content-Type" content="text/html; charset=utf-8/" /--> 
Paillette answered 29/4, 2019 at 23:58 Comment(0)
G
0

To remove â character from string

mysqli_set_charset($con,"utf8");

$price = "₹ 250.00";

$price2 = preg_replace('/[^(\x20-\x7F)]*/','', $price); 

Result : 250.00

Guarantor answered 9/7, 2019 at 11:7 Comment(0)
D
0

I was facing same problem. It get solved when I used utf8_decode() function.

Duff answered 11/7 at 6:14 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.