How to replace all XHTML/HTML line breaks ( ) with new lines?

Asked 12/3, 2010 at 21:51 Answered 5/3, 2024 at 11:36

I am looking for the best br2nl function. I would like to replace all instances of   and   with newlines \n. Much like the nl2br() function but the opposite.

I know there are several solutions in the PHP manual comments but I'm looking for feedback from the SO community on possible solutions.

Sarmatia answered 12/3, 2010 at 21:51 Comment(2)

Are you sure you want to replace the HTML/XHTML line break elements with physical line breaks? Because nl2br does not replace the physical line breakts but just adds HTML/XHTML line break elements. – Harcourt 12/3, 2010 at 22:0

I'm not using this function to negate or recover a string that was returned from nl2br. I am using it to sanitize text in a legacy database (from a webapp that allowed html) before I import it into my database. I just said the opposite of nl2br because people generally know that function. – Sarmatia 12/3, 2010 at 22:23

104

I would generally say "don't use regex to work with HTML", but, on this one, I would probably go with a regex, considering that   tags generally look like either :

 
or  , with any number of spaces before the /

I suppose something like this would do the trick :

$html = 'this <br>is<br/>some<br />text <br    />!';
$nl = preg_replace('#<br\s*/?>#i', "\n", $html);
echo $nl;

Couple of notes :

starts with <br
followed by any number of white characters : \s*
optionnaly, a / : /?
and, finally, a >
and this using a case-insensitive match (#i), as   would be valid in HTML

Cuticula answered 12/3, 2010 at 21:57 Comment(5)

To be very nit-picky =] : <input type="text" value=" "> is allowed in html (not xhtml). And in a CDATA section   is "normal" text. – Foremast 12/3, 2010 at 22:50

@Foremast : humph, true :-) ;; I was writting this using DOM, and when I finished, I saw you posted the same kind of solution I would have proposed (excepts I used getElementsByName, and not XPath), so didn't post it -- maybe I should edit my answer, though, for the sake of completness, as it's been accepted... – Cuticula 13/3, 2010 at 10:37

But this solution is faster and less memory consuming (if this is a matter). If you don't have completely arbitrary documents I'd probably consider these edge-cases acceptable. – Foremast 13/3, 2010 at 12:12

Shouldn't the second argument be "\\n"? this is the only thing that works on my setup here. – Customer 19/7, 2012 at 15:9

My HTML looks like – Sorkin 19/10, 2019 at 14:44

You should be using PHP_EOL constant to have platform independent newlines.

In my opinion, using non-regexp functions whenever possible makes the code more readable.

$newlineTags = array(
  '<br>',
  '<br/>',
  '<br />',
);
$html = str_replace($newlineTags, PHP_EOL, $html);

I am aware this solution has some flaws, but wanted to share my insights still.

Carbylamine answered 25/8, 2014 at 12:16 Comment(2)

And regular expressions require usually heavier computations. – Ekaterinodar 24/9, 2017 at 12:5

@BenBITDesign Regarding your suggested edit, please note that it is absolutely not true that regex in general requires more computation. In fact, without having timed this specific case it’s quite likely that the PCRE engine can perform this replacement more efficiently than str_replace, especially when just-in-time compilation is enabled. – Ingemar 31/10, 2019 at 12:5

If the document is well-formed (or at least well-formed-ish) you can use the DOM extension and xpath to find and replace all br elements by a \n text node.

$in = '<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN"
"http://www.w3.org/TR/html4/strict.dtd">
<html><head><title>...</title></head><body>abc<br />def<p>ghi<br />jkl</p></body></html>';

$doc = new DOMDOcument;
$doc->loadhtml($in);
$xpath = new DOMXPath($doc);

$toBeReplaced = array();
foreach($xpath->query('//br') as $node) {
    $toBeReplaced[] = $node;
}

$linebreak = $doc->createTextNode("\n");
foreach($toBeReplaced as $node) {
    $node->parentNode->replaceChild($linebreak->cloneNode(), $node);
}

echo $doc->savehtml();

prints

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN" "http://www.w3.org/TR/html4/strict.dtd">
<html>
<head><title>...</title></head>
<body>abc
def<p>ghi
jkl</p>
</body>
</html>

edit: shorter version with only one iteration

$in = '<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN"
"http://www.w3.org/TR/html4/strict.dtd">
<html><head><title>...</title></head><body>abc<br />def<p>ghi<br />jkl</p></body></html>';

$doc = new DOMDOcument;
$doc->loadhtml($in);
$xpath = new DOMXPath($doc);

$linebreak = $doc->createTextNode("\n");
foreach($xpath->query('//br') as $node) {
  $node->parentNode->removeChild($node);
}

echo $doc->savehtml();

Foremast answered 12/3, 2010 at 22:13 Comment(4)

You don’t need to do two rounds. You can replace the nodes with the first foreach. – Harcourt 12/3, 2010 at 22:19

That seems to be so ;-) For some (unknown) reason I remembered it to break the xpath iterator. – Foremast 12/3, 2010 at 22:27

Shorter version doesn't add the $linebreak node. Anyway this is exactly what I needed, thanks. – Solfa 26/5, 2020 at 14:10

The same with the linebreak replacement and without xpath: 3v4l.org/UiJ1m#v8.2.7 – Clavicembalo 23/6, 2023 at 7:2

From the nl2br comments:

<?php
function br2nl($string){
  $return=eregi_replace('<br[[:space:]]*/?'.
    '[[:space:]]*>',chr(13).chr(10),$string);
  return $return;
}
?>

Ordinand answered 12/3, 2010 at 22:15 Comment(1)

the posix regular expression module has been deprecated. From the ereg_replace manual page: "This function has been DEPRECATED as of PHP 5.3.0 and REMOVED as of PHP 6.0.0. Relying on this feature is highly discouraged." – Foremast 12/3, 2010 at 22:34

Thanks to @antti for accepted answer, @konstantin-xflash-stratigenas who pointed to a defect
 
i try write a better regex to cover them too:

$html = 'this <br>is<br/>some<br />text <br    />, <br//>, <br xyz/>!';

$html_nl = preg_replace('/<br[^>]*>/i', "\n", $html);

echo htmlspecialchars($html_nl);

although this not cover the @volkerk pointed defect yet

Roundish answered 5/3, 2024 at 11:36 Comment(0)

Recommended topics

Hot tags