Replace CRLF using powershell
Asked Answered
M

7

48

Editor's note: Judging by later comments by the OP, the gist of this question is: How can you convert a file with CRLF (Windows-style) line endings to a LF-only (Unix-style) file in PowerShell?

Here is my powershell script:

 $original_file ='C:\Users\abc\Desktop\File\abc.txt'
 (Get-Content $original_file) | Foreach-Object {
 $_ -replace "'", "2"`
-replace '2', '3'`
-replace '1', '7'`
-replace '9', ''`
-replace "`r`n",'`n'
} | Set-Content "C:\Users\abc\Desktop\File\abc.txt" -Force

With this code i am able to replace 2 with 3, 1 with 7 and 9 with an empty string. I am unable to replace the carriage return line feed with just the line feed. But this doesnt work.

Moretta answered 1/10, 2013 at 23:35 Comment(1)
Set-Content writes from the pipeline to a file. Each item from the pipeline is written on a new line.Detain
D
47

You have not specified the version, I'm assuming you are using Powershell v3.

Try this:

$path = "C:\Users\abc\Desktop\File\abc.txt"
(Get-Content $path -Raw).Replace("`r`n","`n") | Set-Content $path -Force

Editor's note: As mike z points out in the comments, Set-Content appends a trailing CRLF, which is undesired. Verify with: 'hi' > t.txt; (Get-Content -Raw t.txt).Replace("`r`n","`n") | Set-Content t.txt; (Get-Content -Raw t.txt).EndsWith("`r`n"), which yields $True.

Note this loads the whole file in memory, so you might want a different solution if you want to process huge files.

UPDATE

This might work for v2 (sorry nowhere to test):

$in = "C:\Users\abc\Desktop\File\abc.txt"
$out = "C:\Users\abc\Desktop\File\abc-out.txt"
(Get-Content $in) -join "`n" > $out

Editor's note: Note that this solution (now) writes to a different file and is therefore not equivalent to the (still flawed) v3 solution. (A different file is targeted to avoid the pitfall Ansgar Wiechers points out in the comments: using > truncates the target file before execution begins). More importantly, though: this solution too appends a trailing CRLF, which may be undesired. Verify with 'hi' > t.txt; (Get-Content t.txt) -join "`n" > t.NEW.txt; [io.file]::ReadAllText((Convert-Path t.NEW.txt)).endswith("`r`n"), which yields $True.

Same reservation about being loaded to memory though.

Detain answered 2/10, 2013 at 0:6 Comment(22)
That will almost work. Set-Content will still insert an extra CR/LF at the end.Spontoon
I see this : $psversiontable.psversion Major Minor Build Revision ----- ----- ----- -------- 2 0 -1 -1Moretta
@Zespri: This is the error message that i get when i execute your script: $path = "C:\Users\abc\Desktop\File\abc.txt" (Get-Content $path -Raw).Replace("rn","n") | Set-Content $path -Force Get-Content : A parameter cannot be found that matches parameter name 'Raw'. At line:2 char:24 + (Get-Content $path -Raw <<<< ).Replace("rn","n") | Set-Content $path -Force + CategoryInfo : InvalidArgument: (:) [Get-Content], ParameterBindingException + FullyQualifiedErrorId : NamedParameterNotFound,Microsoft.PowerShell.Commands.GetContentCommandMoretta
Yep, this is because the example is for powershell v3 and you are using v2. There is no -Raw switch for v2.Detain
Great i updated to powershell v3 and your code worked, but it still leaves CR/LF at the end like mike mentioned. I just want all LF's and no CR/LF'sMoretta
Your suggestion for PowerShell v2 will erase the files content, because the redirection will create a new empty file before the subshell can read it. Please remove it.Substratosphere
@AnsgarWiechers is it different in v3? Because in v3 it "works for me". And thank you for your feedback.Detain
The behavior is identical in PowerShell v2 and v3. Using the redirection operator truncates the file before it's read by Get-Content.Substratosphere
@AnsgarWiechers, ok, thank you again for spotting this. Initially, I thought along the same lines as you, but then I tested it and it worked. So I assumed that the file is read before the redirection operation kicks in. Apparently I have not tested it properly. Should be fine now.Detain
@AW and zespri, big thanks. FYI - AW's answer below failed for me with posh2 but worked with posh3.Mustee
PSv5+ offers a solution to the trailing CRLF problem: Set-Content -NoNewline. The truncation of the output file with > can be avoided by using | Out-File … (or | Set-Content …) instead.Kalin
thanks for the -raw here I had some text files that were throwing inconsistent content lengths until I added that. saved from hours of grief.Coneflower
@Kalin I think you got wrong end of the stick. The solution was already fixed according to Ansgar Wiechers' feedback so your comment does not reflect reality. I'm going to roll it back. Please comment here before editing again if you disagreeDetain
@AndrewSavinykh: My apologies for my (since rolled-back) 2nd editor's note: I had missed that you're now writing to a different file - that fact is worth mentioning in the answer, though, because it means that your v2 solution is not equivalent. The 1st (since rolled-back) editor's note still stands, however: your solution v3 solution is flawed, as demonstrated in the note. Please edit your answer accordingly - there's all the information you need in the comments. Given the popularity of your answer, not fixing the problem is a disservice to future readers.Kalin
@AndrewSavinykh: I've since realized: your edit to the v2 solution fixed its data-destroying flaw, yet now suffers the same shortcoming as the v3 solution (appending an unwanted CRLF). In the interim, until you get around to fixing your answer, I've reinstated updated versions of my editor's notes as a service to future readers. I do hope you get around to fixing your answer. Do let me know where I'm wrong. (I won't fight you on rolling back my edit again, but I encourage users who read this to check out the edit history).Kalin
@mklement0, sorry, what you are saying does not make sense to me. Text files should end with a newline. If they do, then no additional new line is added by the code snippet. If you want something other than the regular situation (that is absence of "CRLF" where it's due), you can ask a different question. You also can provide a different answer if this one feels unsatisfactory to you, although I can't understand why. As far as I can see there is nothing that needs "fixing".Detain
@mklement0, same as you I'm open to feedback, so if you want to try again and explain me why your version of truth is better than mine, I'm all ears. - Just remember, that you already made a mistake here that you realized later. Are you sure you are holding your ground because of "service to future reader" and not because it does not feel good admitting that you were mostly wrong?Detain
@AndrewSavinykh: To quote from the OP's comment above: "your code worked, but it still leaves CR/LF at the end" (I've since added a note to the question to make its gist more obvious). In other words: a CRLF sequence at the end of the file is undesired - the intent is to convert all CRLF newlines to LF-only newlines - the output mustn't have any CRLF. Both your solutions fall short, because they end the output with a CRLF (with an additional one in the v3 solution, if the input had a trailing newline, unlike in the v2 solution, but that's a moot point).Kalin
@AndrewSavinykh: Re "You also can provide a different answer if this one feels unsatisfactory to you": I have, but given your answer's popularity, it may not get noticed, which is why I'm hoping to get your answer fixed.Kalin
@Kalin would it be acceptable, if we roll back your comments both of the question and of my answer and I'll add the link to your answer in the end saying: "While having a new line at the end of the text file is standard, sometimes it's desirable to avoid it. If you are in this situation please refer to mklement0's answer (link)"?Detain
Let us continue this discussion in chat.Detain
I know this thread is very old but I had the same issue and couldn't find any other source on the topic. I kind of solved the issue @AndrewSavinykh had with the last CRLF printed by Set-Content, one just has to remove the last 2 lines of the file according to this thread #11643543. This leaves the file with a blank line after the last LF, avoids the usage of .NET, and should work with Powershell above version 2 (if you have version 5 just use -NoNewLine)Wozniak
K
71

This is a state-of-the-union answer as of Windows PowerShell v5.1 / PowerShell Core v6.2.0:

  • Andrew Savinykh's ill-fated answer, despite being the accepted one, is, as of this writing, fundamentally flawed (I do hope it gets fixed - there's enough information in the comments - and in the edit history - to do so).

  • Ansgar Wiecher's helpful answer works well, but requires direct use of the .NET Framework (and reads the entire file into memory, though that could be changed). Direct use of the .NET Framework is not a problem per se, but is harder to master for novices and hard to remember in general.

  • A future version of PowerShell Core may introduce a
    Convert-TextFile cmdlet with a -LineEnding parameter to allow in-place updating of text files with a specific newline style: see GitHub issue #6201.

In PSv5+, PowerShell-native solutions are now possible, because Set-Content now supports the -NoNewline switch, which prevents undesired appending of a platform-native newline[1] :

# Convert CRLFs to LFs only.
# Note:
#  * (...) around Get-Content ensures that $file is read *in full*
#    up front, so that it is possible to write back the transformed content
#    to the same file.
#  * + "`n" ensures that the file has a *trailing LF*, which Unix platforms
#     expect.
((Get-Content $file) -join "`n") + "`n" | Set-Content -NoNewline $file

The above relies on Get-Content's ability to read a text file that uses any combination of CR-only, CRLF, and LF-only newlines line by line.

Caveats:

  • You need to specify the output encoding to match the input file's in order to recreate it with the same encoding. The command above does NOT specify an output encoding; to do so, use -Encoding;

  • By default, without -Encoding:

    • In Windows PowerShell, you'll get "ANSI" encoding, your system's single-byte, 8-bit legacy encoding, such as Windows-1252 on US-English systems.

    • In PowerShell (Core), v6+, you'll get UTF-8 encoding without a BOM.

    • The input file's content as well as its transformed copy must fit into memory as a whole, which can be problematic with large input files, though is rarely a concern with text files.

    • There's a small risk of file corruption, if the process of writing back to the input file gets interrupted.


[1] In fact, if there are multiple strings to write, -NoNewline also doesn't place a newline between them; in the case at hand, however, this is irrelevant, because only one string is written.

Kalin answered 22/2, 2018 at 3:25 Comment(0)
D
47

You have not specified the version, I'm assuming you are using Powershell v3.

Try this:

$path = "C:\Users\abc\Desktop\File\abc.txt"
(Get-Content $path -Raw).Replace("`r`n","`n") | Set-Content $path -Force

Editor's note: As mike z points out in the comments, Set-Content appends a trailing CRLF, which is undesired. Verify with: 'hi' > t.txt; (Get-Content -Raw t.txt).Replace("`r`n","`n") | Set-Content t.txt; (Get-Content -Raw t.txt).EndsWith("`r`n"), which yields $True.

Note this loads the whole file in memory, so you might want a different solution if you want to process huge files.

UPDATE

This might work for v2 (sorry nowhere to test):

$in = "C:\Users\abc\Desktop\File\abc.txt"
$out = "C:\Users\abc\Desktop\File\abc-out.txt"
(Get-Content $in) -join "`n" > $out

Editor's note: Note that this solution (now) writes to a different file and is therefore not equivalent to the (still flawed) v3 solution. (A different file is targeted to avoid the pitfall Ansgar Wiechers points out in the comments: using > truncates the target file before execution begins). More importantly, though: this solution too appends a trailing CRLF, which may be undesired. Verify with 'hi' > t.txt; (Get-Content t.txt) -join "`n" > t.NEW.txt; [io.file]::ReadAllText((Convert-Path t.NEW.txt)).endswith("`r`n"), which yields $True.

Same reservation about being loaded to memory though.

Detain answered 2/10, 2013 at 0:6 Comment(22)
That will almost work. Set-Content will still insert an extra CR/LF at the end.Spontoon
I see this : $psversiontable.psversion Major Minor Build Revision ----- ----- ----- -------- 2 0 -1 -1Moretta
@Zespri: This is the error message that i get when i execute your script: $path = "C:\Users\abc\Desktop\File\abc.txt" (Get-Content $path -Raw).Replace("rn","n") | Set-Content $path -Force Get-Content : A parameter cannot be found that matches parameter name 'Raw'. At line:2 char:24 + (Get-Content $path -Raw <<<< ).Replace("rn","n") | Set-Content $path -Force + CategoryInfo : InvalidArgument: (:) [Get-Content], ParameterBindingException + FullyQualifiedErrorId : NamedParameterNotFound,Microsoft.PowerShell.Commands.GetContentCommandMoretta
Yep, this is because the example is for powershell v3 and you are using v2. There is no -Raw switch for v2.Detain
Great i updated to powershell v3 and your code worked, but it still leaves CR/LF at the end like mike mentioned. I just want all LF's and no CR/LF'sMoretta
Your suggestion for PowerShell v2 will erase the files content, because the redirection will create a new empty file before the subshell can read it. Please remove it.Substratosphere
@AnsgarWiechers is it different in v3? Because in v3 it "works for me". And thank you for your feedback.Detain
The behavior is identical in PowerShell v2 and v3. Using the redirection operator truncates the file before it's read by Get-Content.Substratosphere
@AnsgarWiechers, ok, thank you again for spotting this. Initially, I thought along the same lines as you, but then I tested it and it worked. So I assumed that the file is read before the redirection operation kicks in. Apparently I have not tested it properly. Should be fine now.Detain
@AW and zespri, big thanks. FYI - AW's answer below failed for me with posh2 but worked with posh3.Mustee
PSv5+ offers a solution to the trailing CRLF problem: Set-Content -NoNewline. The truncation of the output file with > can be avoided by using | Out-File … (or | Set-Content …) instead.Kalin
thanks for the -raw here I had some text files that were throwing inconsistent content lengths until I added that. saved from hours of grief.Coneflower
@Kalin I think you got wrong end of the stick. The solution was already fixed according to Ansgar Wiechers' feedback so your comment does not reflect reality. I'm going to roll it back. Please comment here before editing again if you disagreeDetain
@AndrewSavinykh: My apologies for my (since rolled-back) 2nd editor's note: I had missed that you're now writing to a different file - that fact is worth mentioning in the answer, though, because it means that your v2 solution is not equivalent. The 1st (since rolled-back) editor's note still stands, however: your solution v3 solution is flawed, as demonstrated in the note. Please edit your answer accordingly - there's all the information you need in the comments. Given the popularity of your answer, not fixing the problem is a disservice to future readers.Kalin
@AndrewSavinykh: I've since realized: your edit to the v2 solution fixed its data-destroying flaw, yet now suffers the same shortcoming as the v3 solution (appending an unwanted CRLF). In the interim, until you get around to fixing your answer, I've reinstated updated versions of my editor's notes as a service to future readers. I do hope you get around to fixing your answer. Do let me know where I'm wrong. (I won't fight you on rolling back my edit again, but I encourage users who read this to check out the edit history).Kalin
@mklement0, sorry, what you are saying does not make sense to me. Text files should end with a newline. If they do, then no additional new line is added by the code snippet. If you want something other than the regular situation (that is absence of "CRLF" where it's due), you can ask a different question. You also can provide a different answer if this one feels unsatisfactory to you, although I can't understand why. As far as I can see there is nothing that needs "fixing".Detain
@mklement0, same as you I'm open to feedback, so if you want to try again and explain me why your version of truth is better than mine, I'm all ears. - Just remember, that you already made a mistake here that you realized later. Are you sure you are holding your ground because of "service to future reader" and not because it does not feel good admitting that you were mostly wrong?Detain
@AndrewSavinykh: To quote from the OP's comment above: "your code worked, but it still leaves CR/LF at the end" (I've since added a note to the question to make its gist more obvious). In other words: a CRLF sequence at the end of the file is undesired - the intent is to convert all CRLF newlines to LF-only newlines - the output mustn't have any CRLF. Both your solutions fall short, because they end the output with a CRLF (with an additional one in the v3 solution, if the input had a trailing newline, unlike in the v2 solution, but that's a moot point).Kalin
@AndrewSavinykh: Re "You also can provide a different answer if this one feels unsatisfactory to you": I have, but given your answer's popularity, it may not get noticed, which is why I'm hoping to get your answer fixed.Kalin
@Kalin would it be acceptable, if we roll back your comments both of the question and of my answer and I'll add the link to your answer in the end saying: "While having a new line at the end of the text file is standard, sometimes it's desirable to avoid it. If you are in this situation please refer to mklement0's answer (link)"?Detain
Let us continue this discussion in chat.Detain
I know this thread is very old but I had the same issue and couldn't find any other source on the topic. I kind of solved the issue @AndrewSavinykh had with the last CRLF printed by Set-Content, one just has to remove the last 2 lines of the file according to this thread #11643543. This leaves the file with a blank line after the last LF, avoids the usage of .NET, and should work with Powershell above version 2 (if you have version 5 just use -NoNewLine)Wozniak
S
32

Alternative solution that won't append a spurious CR-LF:

$original_file ='C:\Users\abc\Desktop\File\abc.txt'
$text = [IO.File]::ReadAllText($original_file) -replace "`r`n", "`n"
[IO.File]::WriteAllText($original_file, $text)
Substratosphere answered 2/10, 2013 at 8:9 Comment(3)
Nicely done (works in v2 too). A tip re use of relative paths: Use (Convert-Path $original_file) to convert relative paths to full paths first, because the .NET framework's idea of what the current directory is usually differs from PS's.Kalin
What would the replace clause look like if you wanted to switch Unix to Windows, but it was possible that it was already Windows.Bohaty
@Bohaty Use a negative lookbehind assertion: '(?<!\r)\n', "`r`n" (replace LF with CR-LF only if LF is not preceded by CR).Substratosphere
A
3

Below is my script for converting all files recursively. You can specify folders or files to exclude.

$excludeFolders = "node_modules|dist|.vs";
$excludeFiles = ".*\.map.*|.*\.zip|.*\.png|.*\.ps1"

Function Dos2Unix {
    [CmdletBinding()]
    Param([Parameter(ValueFromPipeline)] $fileName)

    Write-Host -Nonewline "."

    $fileContents = Get-Content -raw $fileName
    $containsCrLf = $fileContents | %{$_ -match "\r\n"}
    If($containsCrLf -contains $true)
    {
        Write-Host "`r`nCleaing file: $fileName"
        set-content -Nonewline -Encoding utf8 $fileName ($fileContents -replace "`r`n","`n")
    }
}

Get-Childitem -File "." -Recurse |
Where-Object {$_.PSParentPath -notmatch $excludeFolders} |
Where-Object {$_.PSPath -notmatch $excludeFiles} |
foreach { $_.PSPath | Dos2Unix }
Assuntaassur answered 22/4, 2020 at 10:47 Comment(2)
Heads up: This is opinionated to use utf8 as encoding and not add a new line at the end. I used this after accidentally pushing an entire project with crlf to VCS which murdered the gradle build.Whitehurst
Digging a little more, this is caused by Powershell adding the BOM to the start of the file. For ways around that either check here or don't use Powershell to rewrite your filed :')Whitehurst
P
2

Adding another version based on example above by @ricky89 and @mklement0 with few improvements:

Script to process:

  • *.txt files in the current folder
  • replace LF with CRLF (Unix to Windows line-endings)
  • save resulting files to CR-to-CRLF subfolder
  • tested on 100MB+ files, PS v5;

LF-to-CRLF.ps1:

# get current dir
$currentDirectory = Split-Path $MyInvocation.MyCommand.Path -Parent

# create subdir CR-to-CRLF for new files
$outDir = $(Join-Path $currentDirectory "CR-to-CRLF")
New-Item -ItemType Directory -Force -Path $outDir | Out-Null

# get all .txt files
Get-ChildItem $currentDirectory -Force | Where-Object {$_.extension -eq ".txt"} | ForEach-Object {
  $file = New-Object System.IO.StreamReader -Arg $_.FullName
  # Resulting file will be in CR-to-CRLF subdir
  $outstream = [System.IO.StreamWriter] $(Join-Path  $outDir $($_.BaseName + $_.Extension))
  $count = 0 
  # read line by line, replace CR with CRLF in each by saving it with $outstream.WriteLine
  while ($line = $file.ReadLine()) {
        $count += 1
        $outstream.WriteLine($line)
    }
  $file.close()
  $outstream.close()
  Write-Host ("$_`: " + $count + ' lines processed.')
}
Pharmaceutics answered 18/5, 2017 at 5:30 Comment(0)
M
1

For CMD one line LF-only:

powershell -NoProfile -command "((Get-Content 'prueba1.txt') -join \"`n\") + \"`n\" | Set-Content -NoNewline 'prueba1.txt'"

so you can create a .bat

Minsk answered 14/12, 2020 at 17:6 Comment(0)
B
0

The following will be able to process very large files quickly.

$file = New-Object System.IO.StreamReader -Arg "file1.txt"
$outstream = [System.IO.StreamWriter] "file2.txt"
$count = 0 

while ($line = $file.ReadLine()) {
      $count += 1
      $s = $line -replace "`n", "`r`n"
      $outstream.WriteLine($s)
  }

$file.close()
$outstream.close()

Write-Host ([string] $count + ' lines have been processed.')
Bedazzle answered 6/9, 2015 at 14:39 Comment(1)
On Windows, this works for LF -> CRLF conversion (the opposite of what the OP wanted), but only accidentally so: System.IO.StreamReader can also read LF-only files, and .ReadLine() returns a line without its original line ending (whether it was LF or CRLF), so the -replace operation does nothing. On Windows, System.IO.StreamReader appends CRLF when using .WriteLine(), so that's how the CRLF line breaks end up in the output file.Kalin

© 2022 - 2024 — McMap. All rights reserved.