Reliably split user-submitted textarea value on newlines [duplicate]
Asked Answered
D

6

3

The string input comes from textarea where users are supposed to enter every single item on a new line.

When processing the form, it is easy to explode the textarea input into an array of single items like this:

$arr = explode("\n", $textareaInput);

It works fine but I am worried about it not working correctly in different systems (I can currently only test in Windows). I know newlines are represented as \r\n or as just \r across different platforms. Will the above line of code also work correctly under Linux, Solaris, BSD or other OS?

Doviedow answered 16/2, 2010 at 16:37 Comment(0)
P
4

'\r' by itself as a line terminator is an old convention that's not really used anymore (not since OSX which is Unix based).

Your explode will be fine. Just trim off the '\r' in each resulting element for the Windows users.

Paxton answered 16/2, 2010 at 16:40 Comment(0)
E
8

You can use preg_split to do that.

$arr = preg_split('/[\r\n]+/', $textareaInput);

It splits it on any combination of the \r or \n characters. You can also use \s to include any white-space char.

Edit
It occurred to me, that while the previous code works fine, it also removes empty lines. If you want to preserve the empty lines, you may want to try this instead:

$arr = preg_split('/(\r\n|[\r\n])/', $textareaInput);

It basically starts by looking for the Windows version \r\n, and if that fails it looks for either the old Mac version \r or the Unix version \n.

For example:

<?php
$text = "Windows\r\n\r\nMac\r\rUnix\n\nDone!";
$arr = preg_split('/(\r\n|[\r\n])/', $text);
print_r($arr);
?>

Prints:

Array
(
    [0] => Windows
    [1] => 
    [2] => Mac
    [3] => 
    [4] => Unix
    [5] => 
    [6] => Done!
)
Ebonieebonite answered 16/2, 2010 at 16:40 Comment(1)
@Michael Brooks. It is a regular expression. the [\r\n] means either \r or \n. It looks for any combination of the two characters... And no, you are incorrect. I do in fact prefer working on Linux, and I am well aware of the differences.Ebonieebonite
B
4
$arr = preg_split( "/[\n\r]+/", $textareaInput );
Been answered 16/2, 2010 at 16:39 Comment(0)
P
4

'\r' by itself as a line terminator is an old convention that's not really used anymore (not since OSX which is Unix based).

Your explode will be fine. Just trim off the '\r' in each resulting element for the Windows users.

Paxton answered 16/2, 2010 at 16:40 Comment(0)
C
1

You can normalize the input:

<?php

$foo = strtr($foo, array(
    "\r\n" => "\n",
    "\r" => "\n",
    "\n" => "\n",
));

?>

Alternatively, you can explode with regular expressions:

<?php

$foo = preg_split ("/[\r\n]+/", $foo);

?>
Calculus answered 16/2, 2010 at 16:49 Comment(0)
P
0

Following code must do the job

<?php
$split = preg_split('/[\r\n]+/', $src);
foreach ($split as $k=>$string) {
    $split[$k] = trim($string);
    if (empty($split[$k]))
    unset($split[$k]);
}
ksort($split);
$join = implode('', $split);
?>

to get string with newlinews completely stripped. It won't work correctly with JS though :(

Protomorphic answered 15/11, 2011 at 10:36 Comment(0)
B
0

The system agnostic technique with regex is involves the \R escape sequence.

PHP Documentation on Escape Sequences

It really is as simple as calling preg_split('~\R~', $textareaInput).

\R - line break: matches \n, \r and \r\n

Normalizing the input is a waste of time and effort if you are just going to explode on the replacement characters anyhow.

If you are worried about multiple consecutive newline characters in the string, you can just add the + quantifier afer \R.

If you want to trim whitespace characters from both sides of the strings in the resultant array, you can use ~\s*\R\s*~

Bellboy answered 16/9, 2020 at 15:22 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.