Using Office365 excel array formulas, how to remove duplicates, keeping the last value?

Asked 25/4, 2024 at 20:14 Answered 26/4, 2024 at 6:28

Solved excel excel-formula office365 array-formulas dynamic-arrays

With data in A1 - B5:

A 1  //remove
A 2
B 3  //remove
B 2
C 1

How do you remove duplicates in column A, keeping the last set of values in other columns? Results should look like this:

A 2
B 2
C 1

I've tried combinations of Filter, Unique, and xlookup but haven't found an approach that works yet.

Flighty answered 25/4, 2024 at 20:14 Comment(1)

Why not use XLOOKUP() with UNIQUE() --> =LET(α, A1:A5, δ, UNIQUE(α), HSTACK(δ, XLOOKUP(δ,α,B1:B5,,,-1)))? Will it be slow as well? or why not use LOOKUP() --> =LET(α, A1:A5, δ, UNIQUE(α), HSTACK(δ, LOOKUP(δ,α,B1:B5))) – Unionist 26/4, 2024 at 3:48

There are many possibilities of doing this, I assume these are two more methods which one can apply, although I have not made any speed test yet.

Method One:

=LET(α, A1:A5, δ, UNIQUE(α), HSTACK(δ, LOOKUP(δ,α,B1:B5)))

Method Two:

=LET(α, A1:A5, δ, UNIQUE(α), HSTACK(δ, XLOOKUP(δ,α,B1:B5,,,-1)))

Method Three: (using a reverse binary xlookup for speed)

=LET(d,SORT(A1:B5,,-1),  a,CHOOSECOLS(d,1),  b,CHOOSECOLS(d,2),  u,SORT(UNIQUE(a)),
    HSTACK(u,XLOOKUP(u,a,b,"",0,-2))  )

Unionist answered 26/4, 2024 at 4:3 Comment(5)

Very fast. Used method two since it uses xlookup, but changed to use a reverse binary lookup by sorting the data in reverse to begin with, but kept the unique variable sort ascending so the results come out ascending. This method is simple, fast, and intuitive. Thank yoU!! – Flighty 26/4, 2024 at 18:2

For posterity... Binary modification to the answer: =LET(d,SORT(A1:B5),,-1),a,CHOOSECOLS(d,1),b,CHOOSECOLS(d,2),u,SORT(UNIQUE(a)),HSTACK(u,XLOOKUP(u,a,b,"",0,-2))) – Flighty 26/4, 2024 at 18:10

@Flighty if you wish you can edit the answer to put up your idea there. Also what I see you have wrong syntax for sort function the bracket after b5 shouldn't be there, but I got your formula might be a typo. Also I will prefer index instead of choosecols – Unionist 26/4, 2024 at 18:24

Why do you prefer index instead of choosecols? – Flighty 29/4, 2024 at 18:3

@Flighty it is shorter in comparison to the other. – Unionist 29/4, 2024 at 18:4

There are likely many options. Here's one using XMATCH to search last-to-first.

=CHOOSEROWS(A1:B5,UNIQUE(XMATCH(A1:A5,A1:A5,0,-1)))

EDIT:

Similar option: calculate the UNIQUE items first (thanks to @ScottCraner):

=CHOOSEROWS(A1:B5,XMATCH(UNIQUE(A1:A5),A1:A5,0,-1))

Turfy answered 25/4, 2024 at 20:28 Comment(8)

This works, but running this on 50,000 rows, it takes a long time to process. Is there are more efficient way? – Flighty 25/4, 2024 at 20:37

Possibly. I suspect this thread will turn into a bit of Code Golf. – Turfy 25/4, 2024 at 20:37

=CHOOSEROWS(A1:B5,TAKE(GROUPBY(A1:A5,ROW(B1:B5),MAX,0,0),,-1)) NOt sure if quicker or not. – Yokoyama 25/4, 2024 at 20:47

Still very slow when there are only a few (less than 1%) duplicates... but it does work though. Would be great if it could also work on 500,000 rows. – Flighty 25/4, 2024 at 20:49

I doubt any formula is going to work quickly on that many rows with that number of duplicates. It may be better to use VBA and a dictionary. But even then, using arrays, one would still be looping. @Flighty – Yokoyama 25/4, 2024 at 20:52

Dynamic arrays work very fast with many complicated structures on large number of rows. This seems like a simple thing to do, but haven't figured it out yet. – Flighty 25/4, 2024 at 20:57

But you say first 50K rows and in another comment you say 500K rows which one should be taken into account? – Unionist 26/4, 2024 at 3:54

Approx 50k rows is what I'm working with, but if it can be fast enough to handle 500k rows, even better. – Flighty 26/4, 2024 at 17:27

Another play on CHOOSEROWS is to use GROUPBY(at time of writing only available to insiders):

=CHOOSEROWS(A1:B5,TAKE(GROUPBY(A1:A5,SEQUENCE(ROWS(B1:B5)),MAX,0,0),,-1))

Yokoyama answered 25/4, 2024 at 20:49 Comment(4)

Can't get this to work. What is "MAX"? the formula listed probably doesn't match what's in your spreadsheet (can't see fully). – Flighty 25/4, 2024 at 20:55

What do you get in return? I assume you are getting the #Name error. I thought they had released it to everyone, but I may be wrong and it is only available to insiders at present. – Yokoyama 25/4, 2024 at 20:58

Yes, #Name error. Not an insider. – Flighty 25/4, 2024 at 20:59

Ah, Well I will put that caveat in the answer, sorry. – Yokoyama 25/4, 2024 at 21:0

Here's an interesting method:

=LET(
    a, GROUPBY(A1:A5, B1:B5, ARRAYTOTEXT, , 0),
    HSTACK(CHOOSECOLS(a, 1), --RIGHT(CHOOSECOLS(a, 2)))
)

Gaye answered 26/4, 2024 at 4:52 Comment(1)

Sir this one, using RIGHT() will not return if the values in Column B is double or triple digits. =LET(a, GROUPBY(A1:A5,B1:B5,ARRAYTOTEXT,,0), HSTACK(TAKE(a,,1), --TEXTAFTER(", "&TAKE(a,,-1),", ",-1))) – Unionist 26/4, 2024 at 5:32

Bottom Unique

I'm not sure what you mean by BigBen's formulas being slow.
Here is a more dynamic version of the second one which runs in under a second for 500k rows on my old Windows 64-bit and Office 64-bit configuration. You cannot seriously expect it to run faster, I mean, =SEQUENCE(500000) takes nearly half a second.

=LET(data,A2:F500001,unique_col,1,
    du,INDEX(data,,unique_col),
    u,UNIQUE(du),
    CHOOSEROWS(data,XMATCH(u,du,,-1)))

The data was =RANDARRAY(500000,,1000,9999,1) in A2 and =SEQUENCE(500000) in B2, then copy/pasted values and copied all of it to C2:F2. This resulted in the expected 9000 unique rows (1.8%).
An interesting observation was that if INDEX(data,,unique_col) is replaced with CHOOSECOLS(data,unique_col) it takes 2 seconds (TAKE(data,,unique_col) performs the same).
Maybe the latter or some other formula in your workbook is the reason behind a bad performance. Share some feedback.

Marimaria answered 26/4, 2024 at 6:28 Comment(1)

For 50,000 rows, BigBen's took about a minute to come back. Mayukh's about one second. – Flighty 26/4, 2024 at 18:5

Bottom Unique

Recommended topics

Hot tags