Using Office365 excel array formulas, how to remove duplicates, keeping the last value?
Asked Answered
F

5

6

With data in A1 - B5:

A 1  //remove
A 2
B 3  //remove
B 2
C 1

How do you remove duplicates in column A, keeping the last set of values in other columns? Results should look like this:

A 2
B 2
C 1

I've tried combinations of Filter, Unique, and xlookup but haven't found an approach that works yet.

Flighty answered 25/4, 2024 at 20:14 Comment(1)
Why not use XLOOKUP() with UNIQUE() --> =LET(α, A1:A5, δ, UNIQUE(α), HSTACK(δ, XLOOKUP(δ,α,B1:B5,,,-1)))? Will it be slow as well? or why not use LOOKUP() --> =LET(α, A1:A5, δ, UNIQUE(α), HSTACK(δ, LOOKUP(δ,α,B1:B5)))Unionist
U
6

There are many possibilities of doing this, I assume these are two more methods which one can apply, although I have not made any speed test yet.

enter image description here


Method One:

=LET(α, A1:A5, δ, UNIQUE(α), HSTACK(δ, LOOKUP(δ,α,B1:B5)))

Method Two:

=LET(α, A1:A5, δ, UNIQUE(α), HSTACK(δ, XLOOKUP(δ,α,B1:B5,,,-1)))

Method Three: (using a reverse binary xlookup for speed)

=LET(d,SORT(A1:B5,,-1),  a,CHOOSECOLS(d,1),  b,CHOOSECOLS(d,2),  u,SORT(UNIQUE(a)),
    HSTACK(u,XLOOKUP(u,a,b,"",0,-2))  ) 

Unionist answered 26/4, 2024 at 4:3 Comment(5)
Very fast. Used method two since it uses xlookup, but changed to use a reverse binary lookup by sorting the data in reverse to begin with, but kept the unique variable sort ascending so the results come out ascending. This method is simple, fast, and intuitive. Thank yoU!!Flighty
For posterity... Binary modification to the answer: =LET(d,SORT(A1:B5),,-1),a,CHOOSECOLS(d,1),b,CHOOSECOLS(d,2),u,SORT(UNIQUE(a)),HSTACK(u,XLOOKUP(u,a,b,"",0,-2)))Flighty
@Flighty if you wish you can edit the answer to put up your idea there. Also what I see you have wrong syntax for sort function the bracket after b5 shouldn't be there, but I got your formula might be a typo. Also I will prefer index instead of choosecolsUnionist
Why do you prefer index instead of choosecols?Flighty
@Flighty it is shorter in comparison to the other.Unionist
T
5

There are likely many options. Here's one using XMATCH to search last-to-first.

=CHOOSEROWS(A1:B5,UNIQUE(XMATCH(A1:A5,A1:A5,0,-1)))

enter image description here

EDIT:

Similar option: calculate the UNIQUE items first (thanks to @ScottCraner):

=CHOOSEROWS(A1:B5,XMATCH(UNIQUE(A1:A5),A1:A5,0,-1))
Turfy answered 25/4, 2024 at 20:28 Comment(8)
This works, but running this on 50,000 rows, it takes a long time to process. Is there are more efficient way?Flighty
Possibly. I suspect this thread will turn into a bit of Code Golf.Turfy
=CHOOSEROWS(A1:B5,TAKE(GROUPBY(A1:A5,ROW(B1:B5),MAX,0,0),,-1)) NOt sure if quicker or not.Yokoyama
Still very slow when there are only a few (less than 1%) duplicates... but it does work though. Would be great if it could also work on 500,000 rows.Flighty
I doubt any formula is going to work quickly on that many rows with that number of duplicates. It may be better to use VBA and a dictionary. But even then, using arrays, one would still be looping. @FlightyYokoyama
Dynamic arrays work very fast with many complicated structures on large number of rows. This seems like a simple thing to do, but haven't figured it out yet.Flighty
But you say first 50K rows and in another comment you say 500K rows which one should be taken into account?Unionist
Approx 50k rows is what I'm working with, but if it can be fast enough to handle 500k rows, even better.Flighty
Y
3

Another play on CHOOSEROWS is to use GROUPBY(at time of writing only available to insiders):

=CHOOSEROWS(A1:B5,TAKE(GROUPBY(A1:A5,SEQUENCE(ROWS(B1:B5)),MAX,0,0),,-1))

enter image description here

Yokoyama answered 25/4, 2024 at 20:49 Comment(4)
Can't get this to work. What is "MAX"? the formula listed probably doesn't match what's in your spreadsheet (can't see fully).Flighty
What do you get in return? I assume you are getting the #Name error. I thought they had released it to everyone, but I may be wrong and it is only available to insiders at present.Yokoyama
Yes, #Name error. Not an insider.Flighty
Ah, Well I will put that caveat in the answer, sorry.Yokoyama
G
2

Here's an interesting method:

=LET(
    a, GROUPBY(A1:A5, B1:B5, ARRAYTOTEXT, , 0),
    HSTACK(CHOOSECOLS(a, 1), --RIGHT(CHOOSECOLS(a, 2)))
)

enter image description here

Gaye answered 26/4, 2024 at 4:52 Comment(1)
Sir this one, using RIGHT() will not return if the values in Column B is double or triple digits. =LET(a, GROUPBY(A1:A5,B1:B5,ARRAYTOTEXT,,0), HSTACK(TAKE(a,,1), --TEXTAFTER(", "&TAKE(a,,-1),", ",-1)))Unionist
M
2

Bottom Unique

  • I'm not sure what you mean by BigBen's formulas being slow.
  • Here is a more dynamic version of the second one which runs in under a second for 500k rows on my old Windows 64-bit and Office 64-bit configuration. You cannot seriously expect it to run faster, I mean, =SEQUENCE(500000) takes nearly half a second.
=LET(data,A2:F500001,unique_col,1,
    du,INDEX(data,,unique_col),
    u,UNIQUE(du),
    CHOOSEROWS(data,XMATCH(u,du,,-1)))
  • The data was =RANDARRAY(500000,,1000,9999,1) in A2 and =SEQUENCE(500000) in B2, then copy/pasted values and copied all of it to C2:F2. This resulted in the expected 9000 unique rows (1.8%).
  • An interesting observation was that if INDEX(data,,unique_col) is replaced with CHOOSECOLS(data,unique_col) it takes 2 seconds (TAKE(data,,unique_col) performs the same).
  • Maybe the latter or some other formula in your workbook is the reason behind a bad performance. Share some feedback.
Marimaria answered 26/4, 2024 at 6:28 Comment(1)
For 50,000 rows, BigBen's took about a minute to come back. Mayukh's about one second.Flighty

© 2022 - 2025 — McMap. All rights reserved.