In PowerShell, what's the best way to join two tables into one?
Asked Answered
E

5

33

I'm fairly new to PowerShell, and am wondering if someone knows of any better way to accomplish the following example problem.

I have an array of mappings from IP address to host-name. This represents a list of active DHCP leases:

PS H:\> $leases

IP                    Name
--                    ----
192.168.1.1           Apple
192.168.1.2           Pear
192.168.1.3           Banana
192.168.1.99          FishyPC

I have another array of mappings from MAC address to IP address. This represents a list of IP reservations:

PS H:\> $reservations

IP                    MAC
--                    ---
192.168.1.1           001D606839C2
192.168.1.2           00E018782BE1
192.168.1.3           0022192AF09C
192.168.1.4           0013D4352A0D

For convenience, I was able to produce a third array of mappings from MAC address to IP address and host name using the following code. The idea is that $reservations should get a third field, "Name", which is populated whenever there's a matching "IP" field:

$reservations = $reservations | foreach {
    $res = $_
    $match = $leases | where {$_.IP -eq $res.IP} | select -unique
    if ($match -ne $NULL) {
        "" | select @{n="IP";e={$res.IP}}, @{n="MAC";e={$res.MAC}}, @{n="Name";e={$match.Name}}
    }
}

The desired output is something like this:

PS H:\> $ideal

IP                    MAC                 Name
--                    ---                 ----
192.168.1.1           001D606839C2        Apple
192.168.1.2           00E018782BE1        Pear
192.168.1.3           0022192AF09C        Banana
192.168.1.4           0013D4352A0D

Is there any better way of doing this?

Embolden answered 4/12, 2009 at 18:31 Comment(2)
@Michael How were you able to get a table for IP Address to Host Name for all active leases? I have been using the DHCP module but can not figure it out. I am also trying to combine some tables.Degreeday
@Degreeday I was using netsh's dhcp command to get a listing, and then parsing the output with regular expressions. There is also the DHCP Server Management API, but it is only available to native code. I'm not aware of any powershell module or .net library for doing this kind of thing. I ended up writing P/Invoke wrappers to the DHCP Server Management API.Embolden
W
39

After 1.5 years, the cmdlet I had pasted in the original answer has undergone so many updates that it has become completely outdated. Therefore I have replaced the code and the ReadMe with a link to the latest version.

Join-Object

Combines two object lists based on a related property between them.

Description
Combines properties from one or more objects. It creates a set that can be saved as a new object or used as it is. An object join is a means for combining properties from one (self-join) or more object lists by using values common to each.

Main features

  • An intuitive idiomatic PowerShell syntax
  • SQL like joining features
  • Smart property merging
  • Predefined join commands for updating, merging and specific join types
  • Well defined pipeline for the (left) input objects and output objects (preserves memory when correctly used)
  • Performs about twice as fast as Compare-Object on large object lists
  • Supports (custom) objects, data tables and dictionaries (e.g. hash tables) for input
  • Smart properties and calculated property expressions
  • Custom relation expressions
  • Easy installation (dot-sourcing)
  • Supports PowerShell for Windows (5.1) and PowerShell Core

The Join-Object cmdlet reveals the following proxy commands with their own (-JoinType and -Property) defaults:

  • InnerJoin-Object (Alias InnerJoin or Join), combines the related objects
  • LeftJoin-Object (Alias LeftJoin), combines the related objects and adds the rest of the left objects
  • RightJoin-Object (Alias RightJoin), combines the related objects and adds the rest of the right objects
  • FullJoin-Object (Alias FullJoin), combines the related objects and adds the rest of the left and right objects
  • CrossJoin-Object (Alias CrossJoin), combines each left object with each right object
  • Update-Object (Alias Update), updates the left object with the related right object
  • Merge-Object (Alias Merge), updates the left object with the related right object and adds the rest of the new (unrelated) right objects

ReadMe

The full ReadMe (and source code) is available from GitHub: https://github.com/iRon7/Join-Object

Installation

There are two versions of this Join-Object cmdlet (both versions supply the same functionality):

Install-Module -Name JoinModule

Install-Script -Name Join

(or rename the Join.psm1 module to a Join.ps1 script file)
and invoked the script by dot sourcing:

. .\Join.ps1

Answer

To answer the actual example in the question:

$reservations |LeftJoin $leases -On IP

IP          MAC          Name
--          ---          ----
192.168.1.1 001D606839C2 Apple
192.168.1.2 00E018782BE1 Pear
192.168.1.3 0022192AF09C Banana
192.168.1.4 0013D4352A0D

Performance

A little word on performance measuring:
The PowerShell pipeline is designed to stream objects (which saves memory), meaning that both¹ lists of input objects usually aren't (shouldn't be) resident in memory. Normally they are retrieved from somewhere else (i.e. a remote server or a disk). Also, the output usually matters where linq solutions are fast but might easily put you on the wrong foot in drawing conclusions because linq literally defers the execution (lazy evaluation), see also: fastest way to get a uniquely index item from the property of an array.
In other words, if it comes to (measuring) performance in PowerShell, it is important to look to the complete end-to-end solution, which is more likely to look something like:

 import-csv .\reservations.csv |LeftJoin (import-csv .\leases.csv) -On IP |Export-Csv .\results.csv

(1) Note: unfortunately, there is no easy way to build two parallel input streams (see: #15206 Deferred input pipelines)

(more) Examples

More examples can be found in the related Stack Overflow questions at:

side-by-side join examples

And in the Join-Object test script.

Please give a 👍 if you support the proposal to Add a Join-Object cmdlet to the standard PowerShell equipment (#14994)

Warhol answered 3/8, 2017 at 11:39 Comment(2)
Great function, but careful if using it from inside a Powershell module. Because of the way that scoping works, the $Left and $Right variables will not be available in the -Merge script block (modules' variables are private to the module, hence the script block cannot see them). This is not an issue if dot sourcing or including the function directly in the script that's calling it.Psychosocial
@Omni, I did some testing with putting the cmdlet in a module but couldn't confirm your issue. Nevertheless, during revising the cmdlet I found a rather big bug which is described at Unexpected results when reusing custom objects in the pipeline. I have created a branched version, which pasted in this answer. I suspect that this is actually the issue you ran into (I will appreciate it if you could confirm whether or not this is the case).Warhol
F
5

This can also be done using my module Join-Object

Install-Module 'Join-Object'

Join-Object -Left $leases -Right $reservations -LeftJoinProperty 'IP' -RightJoinProperty 'IP'

Regarding performance, I tested against a sample data of 100k lines:

  1. Hashtable example posted by @js2010 run in 8 seconds.
  2. Join-Object by me run in 14 seconds.
  3. LeftJoin by @iRon run in 1 minute and 50 seconds
Floreneflorentia answered 20/3, 2021 at 0:46 Comment(0)
P
4

Easiest way I've found to Merge two Powershell Objects is using ConvertTo-Json and ConvertFrom-Json

One liner based on the OPs Senario:

$leases | foreach {(ConvertTo-Json $_) -replace ("}$", (ConvertTo-Json ($reservations | where IP -eq $_.IP | select * -ExcludeProperty IP)) -Replace "^{", ",")} 
| ConvertFrom-Json

Results in:

IP          Name  Mac
--          ----  ---
192.168.1.1 Apple 001D606839C2
192.168.1.2 Pear  00E018782BE1

For another example lets make a couple objects:

$object1 = [PSCustomObject]@{"A" = "1"; "B" = "2"}
$object2 = [PSCustomObject]@{"C" = "3"; "D" = "4"}

Merge them together using Json by replacing the opening and closing brackets:

(ConvertTo-Json $object1) -replace ("}$", $((ConvertTo-Json $object2) -Replace "^{", ",")) | ConvertFrom-Json

Output:

A B C D
- - - -
1 2 3 4

Another example using a group of objects:

$mergedObjects = [PSCustomObject]@{"Object1" = $Object1; "Object2" = $Object2}
Object1     Object2
-------     -------
@{A=1; B=2} @{C=3; D=4}

Can just do the same again within a foreach:

$mergedObjects | foreach {(ConvertTo-Json $_.Object1) -replace ("}$", $((ConvertTo-Json $_.Object2) -Replace "^{", ",")) | ConvertFrom-Json}

Output:

A B C D
- - - -
1 2 3 4
Palpate answered 29/7, 2022 at 1:57 Comment(0)
P
3

You can use script block like this

$leases | select IP, NAME, @{N='MAC';E={$tmp=$_.IP;($reservations| ? IP -eq $tmp).MAC}}
Paulpaula answered 13/8, 2020 at 5:52 Comment(1)
+1 for arguably the most "powershell-y" way to do it without a third-party module. Although I do find the original foreach option more readable IMO.Westerfield
S
3

Here's a simple example using a hashtable. With big arrays, this turns out to be faster.

$leases =
'IP,Name
192.168.1.1,Apple
192.168.1.2,Pear
192.168.1.3,Banana
192.168.1.99,FishyPC' | convertfrom-csv

$reservations =
'IP,MAC
192.168.1.1,001D606839C2
192.168.1.2,00E018782BE1
192.168.1.3,0022192AF09C
192.168.1.4,0013D4352A0D' | convertfrom-csv

$hashRes=@{}
foreach ($resRecord in $reservations) {
  $hashRes[$resRecord.IP] = $resRecord
}

$leases | foreach {
  $other = $hashRes[$_.IP]

  [pscustomobject]@{IP=$_.IP
                   MAC=$other.MAC
                  Name=$_.name}
}
IP           MAC          Name
--           ---          ----
192.168.1.1  001D606839C2 Apple
192.168.1.2  00E018782BE1 Pear
192.168.1.3  0022192AF09C Banana
192.168.1.99              FishyPC
Stricker answered 17/3, 2021 at 15:29 Comment(2)
I like this a lot as it is pretty adaptable as well. One thing I might change, unless there is an underlying reason that I'm missing, is that you don't need to create $other and since ForEach loops are faster than pipeline to ForEach-Object, you might want to change how $leases is handled to match your earlier loop. Something like this...Foreach ($leaseRecord in $leases){[PSCustomObject]@{IP=$leaseRecord.IP; MAC=$hashRes[$leaseRecord.IP].MAC; Name=$leaseRecord.Name}}Incapacious
Create an inner join in the same loop with the outer join: $reservations_outer=$reservations.clone(); $leases | foreach { $lease=$_; $reservations_outer | where { $_.IP -eq $lease.IP} | add-member -notepropertymembers @{name=$lease.name} -passthru } -outvariable reservations_inner.Machismo

© 2022 - 2024 — McMap. All rights reserved.