CMD or Powershell command to combine (merge) corresponding lines from two files [duplicate]
Asked Answered
I

6

1

Is it possible using CMD and Powershell to combine 2 files into 1 file like this:

file1-line1 tab file2-line1
file1-line2 tab file2-line2
file1-line3 tab file2-line3

So it takes file 1 line 1 and the inserts a tab and then inserts file 2 line 1. Then does this for all subsequent lines in each file?

Interatomic answered 1/12, 2014 at 16:52 Comment(0)
P
6

In PowerShell, and assuming both files have exactly the same number of lines:

$f1 = Get-Content file1
$f2 = Get-Content file2

for ($i = 0; $i -lt $f1.Length; ++$i) {
  $f1[$i] + "`t" + $f2[$i]
}
Plethora answered 1/12, 2014 at 16:57 Comment(1)
Nicely done. Two things worth noting: (a) This solution reads both files into memory as a whole (as arrays of lines), which may be problematic with large files. (b) The behavior is also reasonable if the line count differs: indexing beyond the bounds of an array in PowerShell yields $null; in the context of string concatenation, this is equivalent to the empty string, so once file2 has run out of lines, you'll simply get nothing after the "`t".Rubefaction
W
4

Probably the simplest solution is to use a Windows port of the Linux paste utility (e.g. paste.exe from the UnxUtils):

paste C:\path\to\file1.txt C:\path\to\file2.txt

From the man page:

DESCRIPTION

Write lines consisting of the sequentially corresponding lines from each FILE, separated by TABs, to standard output.


For a PowerShell(ish) solution, I'd use two stream readers:

$sr1 = New-Object IO.StreamReader 'C:\path\to\file1.txt'
$sr2 = New-Object IO.StreamReader 'C:\path\to\file2.txt'

while ($sr1.Peek() -ge 0 -or $sr2.Peek() -ge 0) {
  if ($sr1.Peek() -ge 0) { $txt1 = $sr1.ReadLine() } else { $txt1 = '' }
  if ($sr2.Peek() -ge 0) { $txt2 = $sr2.ReadLine() } else { $txt2 = '' }

  "{0}`t{1}" -f $txt1, $txt2
}

This avoids having to read the two files entirely into memory before merging them, which bears the risk of memory exhaustion for large files.

Whydah answered 1/12, 2014 at 20:48 Comment(0)
N
2
@echo off
setlocal EnableDelayedExpansion
rem Next line have a tab after the equal sign:
set "TAB=   "
Rem First file is read with FOR /F command
Rem Second file is read via Stdin
< file2.txt (for /F "delims=" %%a in (file1.txt) do (
   Rem Read next line from file2.txt
   set /P "line2="
   Rem Echo lines of both files separated by tab
   echo %%a%TAB%!line2!
))

Further details at this post

Nephridium answered 1/12, 2014 at 19:39 Comment(0)
R
2

A generalized solution supporting multiple files, building on Ansgar Wiechers' great, memory-efficient System.IO.StreamReader solution:

PowerShell's ability to invoke members (properties, methods) directly on a collection and have them automatically invoked on all items in the collection (member-access enumeration, v3+) allows for easy generalization:

# The input file paths.
$files = 'file1', 'file2', 'file3'

# Create stream-reader objects for all input files.
# Note: Convert-Path converts the $files elements to *full paths*, which is
#       necessary, because .NET's current dir. usually differs from PowerShell's.
$readers = [IO.StreamReader[]] (Convert-Path -LiteralPath $files)

# Keep reading while at least 1 file still has more lines.
while ($readers.EndOfStream -contains $false) {

  # Read the next line from each stream (file).
  # Streams that are already at EOF fortunately just return "".
  $lines = $readers.ReadLine()
  
  # Output the lines separated with tabs.
  $lines -join "`t"

}

# Close the stream readers.
$readers.Close()

Get-MergedLines (source code below; invoke with -? for help) wraps the functionality in a function that:

  • accepts a variable number of filenames - both as an argument and via the pipeline

  • uses a configurable separator to join the lines (defaults to a tab)

  • allows trimming trailing separator instances

function Get-MergedLines() {
<#
.SYNOPSIS
Merges lines from 2 or more files with a specifiable separator (default is tab).

.EXAMPLE
Get-MergedLines file1, file2 '<->'

.EXAMPLE
Get-ChildItem file? | Get-MergedLines
#>
  param(
    [Parameter(Mandatory, ValueFromPipeline, ValueFromPipelineByPropertyName)]
    [Alias('PSPath')]
    [string[]] $Path,

    [string] $Separator = "`t",

    [switch] $TrimTrailingSeparators
  )

  begin { $allPaths = @() }

  # Note: += to "grow" arrays is generally best avoided, given
  #       that a new array must be created every time; for *small*
  #       arrays, however, this method is convenient, without noticeably 
  #       impacting performance.
  process { $allPaths += $Path } 

  end {

    # Resolve all paths to full paths, which may include wildcard resolution.
    # Note: By using full paths, we needn't worry about .NET's current dir.
    #       potentially being different.
    $fullPaths = (Resolve-Path $allPaths).ProviderPath

    # Create stream-reader objects for all input files.
    $readers = [System.IO.StreamReader[]] $fullPaths

    # Keep reading while at least 1 file still has more lines.
    while ($readers.EndOfStream -contains $false) {

      # Read the next line from each stream (file).
      # Streams that are already at EOF fortunately just return "".
      $lines = $readers.ReadLine()
      
      # Join the lines.
      $mergedLine = $lines -join $Separator

      # Trim (remove) trailing separators, if requested.
      if ($TrimTrailingSeparators) {
        $mergedLine = $mergedLine -replace ('^(.*?)(?:' + [regex]::Escape($Separator) + ')+$'), '$1'
      }

      # Output the merged line.
      $mergedLine

    }

    # Close the stream readers.
    $readers.Close()

  }

}
Rubefaction answered 28/4, 2017 at 18:30 Comment(0)
P
1

Powershell solution:

$file1 = Get-Content file1
$file2 = Get-Content file2
$outfile = "file3.txt"

for($i = 0; $i -lt $file1.length; $i++) {
  "$($file1[$i])`t$($file2[$i])" | out-file $outfile -Append 
}
Polished answered 1/12, 2014 at 16:58 Comment(1)
Your command is missing -InputObject before the output string in the out-file call, because the string is otherwise implicitly treated as the -Encoding argument. That said, unless you're concerned about memory (in which case the System.IO.StreamReader solution may be a better choice), $(for($i = 0; $i -lt $file1.length; $i++) { "$($file1[$i])`t$($file2[$i])" }) | Out-File $outfile is simpler and faster and also doesn't require you to ensure truncation of a preexisting output file beforehand.Rubefaction
S
0

There are a number of recent locked [duplicate] questions that link into this question like:

were I do not agree with because they differ in a way that this question concerns text files and the other concern csv files. As a general rule, I would advice against manipulating files that represent objects (like xml, json and csv). Instead, I recommend to import these files (to objects), make the concerned changes and ConvertTo/Export the results back to a file.

One example where all the given general solutions in this issue will result in an incorrect output for these "duplicates" is where e.g. both csv files have a common column (property) name.
The general Join-Object (see also: In Powershell, what's the best way to join two tables into one?) will join two objects list when the -on parameter is simply omitted. Therefor this solution will better fit the other (csv) "duplicate" questions. Take Merge 2 csv files in powershell [duplicate] from @Ender as an example:

$A = ConvertFrom-Csv @'
ID,Name
1,Peter
2,Dalas
'@

$B = ConvertFrom-Csv @'
Class
Math
Physic
'@

$A | Join $B

ID Name  Class
-- ----  -----
1  Peter Math
2  Dalas Physic

In comparison with the "text" merge solutions given in this answer, the general Join-Object cmdlet is able to deal with different file lengths, and let you decide what to include (LeftJoin, RightJoin or FullJoin). Besides you have control over which columns you what to include ($A | Join $B -Property ID, Name) the order ($A | Join $B -Property ID, Class, Name) and a lot more which cannot be done which just concatenating text.

Specific to this question:

As this specific question concerns text files rather then csv files, you will need to ad a header (property) name (e.g.-Header File1) while imparting the file and remove the header (Select-Object -Skip 1) when exporting the result:

$File1 = Import-Csv .\File1.txt -Header File1 
$File2 = Import-Csv .\File2.txt -Header File2
$File3 = $File1 | Join $File2
$File3 | ConvertTo-Csv -Delimiter "`t" -NoTypeInformation |
    Select-Object -Skip 1 | Set-Content .\File3.txt
Swayder answered 9/2, 2019 at 15:31 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.