split a string on newlines in .NET
Asked Answered
K

17

971

I need to split a string into newlines in .NET and the only way I know of to split strings is with the Split method. However that will not allow me to (easily) split on a newline, so what is the best way to do it?

Kenon answered 10/10, 2009 at 9:25 Comment(2)
Why would it not? Just split on System.Environment.NewLineUglify
But you have to wrap it in a string[] and add an extra argument and... it just feels clunky.Kenon
M
1693

To split on a string you need to use the overload that takes an array of strings:

string[] lines = theText.Split(
    new string[] { Environment.NewLine },
    StringSplitOptions.None
);

Edit:
If you want to handle different types of line breaks in a text, you can use the ability to match more than one string. This will correctly split on either type of line break, and preserve empty lines and spacing in the text:

string[] lines = theText.Split(
    new string[] { "\r\n", "\r", "\n" },
    StringSplitOptions.None
);
Marius answered 10/10, 2009 at 9:29 Comment(19)
I suspected as much. Also, you need another StringSplitOptions argument in there. I was just hoping there was a less clunky way of doing it.Kenon
Move the clunk into your own method - possibly in your own StringUtils class.Occupancy
@RCIX: Sending the correct parameters to the method is a bit awkward because you are using it for something that is a lot simpler than what it's capable of. At least it's there, prior to framework 2 you had to use a regular expression or build your own splitting routine to split on a string...Marius
Excellent, I used it. How can I know the "environment.newline" of the output of a Process called (commandline)?Koel
@Leandro: The Environment.NewLine property contains the default newline for the system. For a Windows system for example it will be "\r\n".Marius
@Marius so I dont know why the output returns \r\r\n :S I'm on Windows7 64x using Framework 2Koel
@Leandro: One guess would be that the program splits on \n leaving an \r at the end of each line, then outputs the lines with a \r\n between them.Marius
I thought \r and \n didn't have meaning in .NET strings? Shouldn't vbLf, vbCrLf, and vbCr be used instead?Lubeck
@Samuel: The \r and \n escape sequences (among others) have a special meaning to the C# compiler. VB doesn't have those escape sequences, so there those constants are used instead.Marius
If you want to accept files from lots of various OS's, you might also add "\n\r" to the start and "\r" to the end of the delimiter list. I'm not sure it's worth the performance hit though. (en.wikipedia.org/wiki/Newline)Ira
@user420667: That's a possibility, but as it seems so very unlikely that you would encounter them on anything that runs C#, it's more likely that it would just cause some unwanted effect some times.Marius
.netcf doesn't have this overload, but the solution from Clement works well!Anarchist
Why the downvote? If you don't explain what it is that you think is wrong, it can't improve the answer.Marius
@Marius C# is not just used in .NET, but the open-source Mono as well, including the very popular Unity cross-platform game engine. I would say it's more likely to be in a cross-platform game or desktop app then Java these days (especially with all the Oracle hate going around). Mono has allowed C# to be used to develop apps for iOS/Android/WindowsPhone/MacOSX/Windows/WindowsStore with a shared codebase, so interacting with non-Windows line endings is quite likely. Even if it was restricted to Windows, the text file could come from anywhere.Strapping
@novaterata: Yes, I have used Mono too. The Environment.NewLine constant is specifically intended for frameworks on other systems, and contains the new line combination for that system. To handle a file that contains different newlines you can use the code that matches different strings.Marius
@Marius Sure, I just didn't agree that foreign platform line endings were unlikely, especially for C#Strapping
@novaterata: Aha, now I understand what you mean. That was a discussion about newline combinations from very unusal systems, e.g. Commodore 64 and Acorn BBC.Marius
This is a great answer. The only things I recommend to improve it is in your first example omit StringSplitOptions.None since that is the default value and not requred. In the 2nd example use StringSplitOptions.RemoveEmptyEntries instead, this is the other option.Cookie
@SendETHToThisAddress: When you are using a string array as the first parameter, the options parameter is actually required. Using StringSplitOptions.RemoveEmptyEntries means that you would remove empty lines from the result. That is a possible use case, but leaving the lines intact is usually the expected behaviour.Marius
B
152

What about using a StringReader?

using (System.IO.StringReader reader = new System.IO.StringReader(input)) {
    string line = reader.ReadLine();
}
Breeks answered 14/11, 2012 at 1:10 Comment(6)
This is my favorite. I wrapped in an extension method and yield return current line: gist.github.com/ronnieoverby/7916886Narvaez
This is the only non-regex solution I've found for .netcf 3.5Anarchist
Specially nice when the input is large and copying it all over to an array becomes slow/memory intensive.Cancel
As written, this answer only reads the first line. See Steve Cooper's answer for the while loop that should be added to this answer.Karnes
This doesn't return a line when the string is emptyReborn
@Karnes Gee, thanks, I wasn't able to figure out why it wasn't getting all the lines, because I don't have a minimally functional human-ish brain.Restrainer
C
86

Try to avoid using string.Split for a general solution, because you'll use more memory everywhere you use the function -- the original string, and the split copy, both in memory. Trust me that this can be one hell of a problem when you start to scale -- run a 32-bit batch-processing app processing 100MB documents, and you'll crap out at eight concurrent threads. Not that I've been there before...

Instead, use an iterator like this;

public static IEnumerable<string> SplitToLines(this string input)
{
    if (input == null)
    {
        yield break;
    }

    using (System.IO.StringReader reader = new System.IO.StringReader(input))
    {
        string line;
        while ((line = reader.ReadLine()) != null)
        {
            yield return line;
        }
    }
}

This will allow you to do a more memory efficient loop around your data;

foreach(var line in document.SplitToLines()) 
{
    // one line at a time...
}

Of course, if you want it all in memory, you can do this;

var allTheLines = document.SplitToLines().ToArray();
Conjunctiva answered 1/5, 2014 at 12:49 Comment(1)
I have been there... (parsing large HTML files and running out of memory). Yes, avoid string.Split. Using string.Split may result in usage of the Large Object Heap (LOH) - but I am not 100% sure of that.Heed
R
60

You should be able to split your string pretty easily, like so:

aString.Split(Environment.NewLine.ToCharArray());
Rabbet answered 10/10, 2009 at 9:29 Comment(7)
On a non-*nix system that will split on the separate characters in the Newline string, i.e. the CR and LF characters. That will cause an extra empty string between each line.Marius
Correct me if i'm wrong, but won't that split on the characters \ and n?Kenon
@RCIX: No, the \r and \n codes represent single characters. The string "\r\n" is two characters, not four.Marius
@IainMH: No, you shouldn't. As I explained it will return an extra empty line between each line.Marius
if you add the parameter StringSplitOptions.RemoveEmptyEntries, then this will work perfectly.Atrioventricular
@Ruben: No, it will not. Serge already suggested that in his answer, and I have aldready explained that it will also remove the empty lines in the original text that should be preserved.Marius
@Marius That assumes, of course, that you actually want to preserve empty lines. In my case I don't, so this is perfect. But yeah, if you're trying to keep empty line data for your users, then you'll have to do something less elegant than this.Wigley
L
27

Based on Guffa's answer, in an extension class, use:

public static string[] Lines(this string source) {
    return source.Split(new string[] { "\r\n", "\n" }, StringSplitOptions.None);
}
Laurustinus answered 2/6, 2011 at 15:34 Comment(0)
S
13

Starting with .NET 6 we can use the new String.ReplaceLineEndings() method to canonicalize cross-platform line endings, so these days I find this to be the simplest way:

var lines = input
  .ReplaceLineEndings()
  .Split(Environment.NewLine, StringSplitOptions.None);
Siegfried answered 1/2, 2022 at 11:9 Comment(0)
U
11

For a string variable s:

s.Split(new string[]{Environment.NewLine},StringSplitOptions.None)

This uses your environment's definition of line endings. On Windows, line endings are CR-LF (carriage return, line feed) or in C#'s escape characters \r\n.

This is a reliable solution, because if you recombine the lines with String.Join, this equals your original string:

var lines = s.Split(new string[]{Environment.NewLine},StringSplitOptions.None);
var reconstituted = String.Join(Environment.NewLine,lines);
Debug.Assert(s==reconstituted);

What not to do:

  • Use StringSplitOptions.RemoveEmptyEntries, because this will break markup such as Markdown where empty lines have syntactic purpose.
  • Split on separator new char[]{Environment.NewLine}, because on Windows this will create one empty string element for each new line.
Underbodice answered 4/10, 2012 at 15:56 Comment(0)
C
11

Regex is also an option:

    private string[] SplitStringByLineFeed(string inpString)
    {
        string[] locResult = Regex.Split(inpString, "[\r\n]+");
        return locResult;
    }
Cresset answered 9/1, 2013 at 21:45 Comment(2)
If you want to match lines exactly, preserving blank lines, this regex string would be better: "\r?\n".Lati
Rory's answer above worked the best for splitting console output lines while still preserving blank lines.Libelous
L
7

I just thought I would add my two-bits, because the other solutions on this question do not fall into the reusable code classification and are not convenient.

The following block of code extends the string object so that it is available as a natural method when working with strings.

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Threading.Tasks;
using System.Collections;
using System.Collections.ObjectModel;

namespace System
{
    public static class StringExtensions
    {
        public static string[] Split(this string s, string delimiter, StringSplitOptions options = StringSplitOptions.None)
        {
            return s.Split(new string[] { delimiter }, options);
        }
    }
}

You can now use the .Split() function from any string as follows:

string[] result;

// Pass a string, and the delimiter
result = string.Split("My simple string", " ");

// Split an existing string by delimiter only
string foo = "my - string - i - want - split";
result = foo.Split("-");

// You can even pass the split options parameter. When omitted it is
// set to StringSplitOptions.None
result = foo.Split("-", StringSplitOptions.RemoveEmptyEntries);

To split on a newline character, simply pass "\n" or "\r\n" as the delimiter parameter.

Comment: It would be nice if Microsoft implemented this overload.

Leslie answered 10/7, 2016 at 16:18 Comment(4)
Environment.Newline is preferred to hard-coding either \n or \r\n.Inaugurate
@MichaelBlackburn - That is an invalid statement because there is no context. Environment.Newline is for cross platform compatability, not for working with files using different line terminations than the current operating system. See here for more information, so it really depends on what the developer is working with. Use of Environment.Newline ensures there is no consistency in the line return type between OS's, where 'hard-coding' gives the developer full control.Leslie
@MichaelBlackburn - There is no need for you to be rude. I was merely providing the information. .Newline isn't magic, under the hood it is just the strings as provided above based on a switch of if it is running on unix, or on windows. The safest bet, is to first do a string replace for all "\r\n" and then split on "\n". Where using .Newline fails, is when you are working with files that are saved by other programs that use a different method for line breaks. It works well if you know every time the file read is always using the line breaks of your current OS.Leslie
So what I'm hearing is the most readable way (maybe higher memory use) is foo = foo.Replace("\r\n", "\n"); string[] result = foo.Split('\n');. Am I understanding correctly that this works on all platforms?Fredericafrederich
B
4

I'm currently using this function (based on other answers) in VB.NET:

Private Shared Function SplitLines(text As String) As String()
    Return text.Split({Environment.NewLine, vbCrLf, vbLf}, StringSplitOptions.None)
End Function

It tries to split on the platform-local newline first, and then falls back to each possible newline.

I've only needed this inside one class so far. If that changes, I will probably make this Public and move it to a utility class, and maybe even make it an extension method.

Here's how to join the lines back up, for good measure:

Private Shared Function JoinLines(lines As IEnumerable(Of String)) As String
    Return String.Join(Environment.NewLine, lines)
End Function
Bonina answered 14/5, 2013 at 19:35 Comment(2)
@Lubeck - note the quotations. They actually do have that meaning. "\r" = return . "\r\n" = return + new line. ( please review this post and the accepted solution hereLeslie
@Kraang Hmm.. I haven't worked with .NET in a long time. I would be surprised if that many people up voted a wrong answer. I see that I commented on Guffa's answer too, and got clarification there. I've deleted my comment to this answer. Thanks for the heads up.Lubeck
B
2

Well, actually split should do:

//Constructing string...
StringBuilder sb = new StringBuilder();
sb.AppendLine("first line");
sb.AppendLine("second line");
sb.AppendLine("third line");
string s = sb.ToString();
Console.WriteLine(s);

//Splitting multiline string into separate lines
string[] splitted = s.Split(new string[] {System.Environment.NewLine}, StringSplitOptions.RemoveEmptyEntries);

// Output (separate lines)
for( int i = 0; i < splitted.Count(); i++ )
{
    Console.WriteLine("{0}: {1}", i, splitted[i]);
}
Bilbe answered 10/10, 2009 at 9:35 Comment(2)
The RemoveEmptyEntries option will remove empty lines from the text. That may be desirable in some situations, but a plain split should preserve the empty lines.Marius
yes, you're right, I just made this assumption, that... well, blank lines are not interesting ;)Bilbe
N
1

I did not know about Environment.Newline, but I guess this is a very good solution.

My try would have been:

        string str = "Test Me\r\nTest Me\nTest Me";
        var splitted = str.Split('\n').Select(s => s.Trim()).ToArray();

The additional .Trim removes any \r or \n that might be still present (e. g. when on windows but splitting a string with os x newline characters). Probably not the fastest method though.

EDIT:

As the comments correctly pointed out, this also removes any whitespace at the start of the line or before the new line feed. If you need to preserve that whitespace, use one of the other options.

Newtonnext answered 10/10, 2009 at 9:32 Comment(4)
The Trim will also remove any white space at the beginning and end of lines, for example indentation.Marius
".Trim removes any \r or \n that might be still present" - ouch. Why not write robust code instead?Gilud
Maybe I got the question wrong, but it was/is not clear of that whitespace must be preserved. Of course you are right, Trim() also removes whitespace.Newtonnext
@Max: Wow, wait until I tell my boss that code is allowed to do anything that is not specifically ruled out in the specification... ;)Marius
T
1
string[] lines = text.Split(
  Environment.NewLine.ToCharArray(), 
  StringSplitOptions.RemoveEmptyStrings);

The RemoveEmptyStrings option will make sure you don't have empty entries due to \n following a \r

(Edit to reflect comments:) Note that it will also discard genuine empty lines in the text. This is usually what I want but it might not be your requirement.

Timisoara answered 10/10, 2009 at 9:36 Comment(2)
The RemoveEmptyStrings options will also remove empty lines, so it doesn't work properly if the text has empty lines in it.Marius
You probably want to preserve genuine empty lines : \r\n\r\nOccupancy
G
1

Examples here are great and helped me with a current "challenge" to split RSA-keys to be presented in a more readable way. Based on Steve Coopers solution:

    string Splitstring(string txt, int n = 120, string AddBefore = "", string AddAfterExtra = "")
    {
        //Spit each string into a n-line length list of strings
        var Lines = Enumerable.Range(0, txt.Length / n).Select(i => txt.Substring(i * n, n)).ToList();
        
        //Check if there are any characters left after split, if so add the rest
        if(txt.Length > ((txt.Length / n)*n) )
            Lines.Add(txt.Substring((txt.Length/n)*n));

        //Create return text, with extras
        string txtReturn = "";
        foreach (string Line in Lines)
            txtReturn += AddBefore + Line + AddAfterExtra +  Environment.NewLine;
        return txtReturn;
    }

Presenting a RSA-key with 33 chars width and quotes are then simply

Console.WriteLine(Splitstring(RSAPubKey, 33, "\"", "\""));

Output:

Output of Splitstring();

Hopefully someone find it usefull...

Glycerin answered 17/10, 2020 at 21:50 Comment(0)
U
-2

Silly answer: write to a temporary file so you can use the venerable File.ReadLines

var s = "Hello\r\nWorld";
var path = Path.GetTempFileName();
using (var writer = new StreamWriter(path))
{
    writer.Write(s);
}
var lines = File.ReadLines(path);
Underbodice answered 4/10, 2012 at 15:59 Comment(0)
H
-3
using System.IO;

string textToSplit;

if (textToSplit != null)
{
    List<string> lines = new List<string>();
    using (StringReader reader = new StringReader(textToSplit))
    {
        for (string line = reader.ReadLine(); line != null; line = reader.ReadLine())
        {
            lines.Add(line);
        }
    }
}
Hargrave answered 29/3, 2014 at 12:40 Comment(0)
P
-6

Very easy, actually.

VB.NET:

Private Function SplitOnNewLine(input as String) As String
    Return input.Split(Environment.NewLine)
End Function

C#:

string splitOnNewLine(string input)
{
    return input.split(environment.newline);
}
Positivism answered 7/7, 2017 at 21:5 Comment(2)
Totally incorrect and doesn't work. Plus, in C#, it's Environment.NewLine just like in VB.Millimeter
See End-of-line identifier in VB.NET? for the different options for new line.Heed

© 2022 - 2024 — McMap. All rights reserved.