Remove duplicates within Excel cell
Asked Answered
C

8

7

Say I have the following text string in one single Excel cell:

John John John Mary Mary

I want to create a formula (so no menu functions or VBA, please) that would give me, on another cell

John Mary

How can I do this?

What I've tried so far was search the internet and SO about the issue and all I could find were solutions involving Excel's built-in duplicate removal or something involving countif and the replacement of duplicates for "". I've also taken a look at the list of Excel functions, especially those from the "Text" category, but couldn't find anything interesting, that could be done on one cell.

Costa answered 17/9, 2014 at 18:6 Comment(4)
What have you tried? What kind of approach are you taking? It is a trivial problem to solve with VBA; not so simple with just formulas. You might be able to do it with helper columns.Untutored
@RonRosenfeld, I'll update the Q with my puny attempts at solving the issue.Costa
I can appreciate the need for only using formulas, but may I ask why you can't use some VBA for this with a Macro attached to a shape or button on the sheet? I'm purely curious. Even an event built in to the Worksheet. hmm?Lauralee
@PJRosenburg, many of our users use Calc instead of Excel; from my experience, Calc doesn't do such a fine job interpreting anything but the simplest VBA macros.Costa
A
6

The answer is here: https://www.extendoffice.com/documents/excel/2133-excel-remove-duplicate-characters-in-string.html

Function RemoveDupes2(txt As String, Optional delim As String = " ") As String
Dim x
'Updateby20140924
With CreateObject("Scripting.Dictionary")
    .CompareMode = vbTextCompare
    For Each x In Split(txt, delim)
        If Trim(x) <> "" And Not .exists(Trim(x)) Then .Add Trim(x), Nothing
    Next
    If .Count > 0 Then RemoveDupes2 = Join(.keys, delim)
End With
End Function

Put the code above in a module

Use =RemoveDupes2(A2,",") A2 contains repeated text separated by , You may change the delimiter

Aluminate answered 5/7, 2016 at 12:2 Comment(1)
This didn't work for me, for numbers of 13 characters.Eight
E
1

Assuming you'll never have more than two distinct names in a cell, this should work:

=MID(A1&" ",1,FIND(" ",A1&" "))&
 MID(SUBSTITUTE(A1&" ",MID(A1&" ",1,FIND(" ",A1&" ")),"")&" ",1,
 FIND(" ",SUBSTITUTE(A1&" ",MID(A1&" ",1,FIND(" ",A1&" "))&" ","")))

It will show John Mary for all of these:

John John John Mary Mary
John Mary
John Mary John Mary
John Mary Mary
John John Mary

It will show John for all of these:

John
John John
John John John

And it will show nothing if A1 is blank.

Ethmoid answered 17/9, 2014 at 22:53 Comment(4)
This works for two distinct names indeed. I'll see if I can generalize it to a potential 16 names. Thanks!Costa
This particular strategy would get really hairy for 3 names, and you may hit some Excel limits long before you get to 16 names. It's basically a recursive algorithm: A) grab first word, B) remove it from the rest of the string, C) concatenate A and B, [repeat steps A-C on the results of step B, ...]. Your question highlights the weakness of Excel's string operations. (Especially the inability to use regular expressions.) Also, Excel's absence of an eval-like function makes it difficult to this sort of thing without resorting to VBA or using "hidden" cells.Ethmoid
@RickHitchcock is there any way to do this without the trailing comma and for up to 3 unique strings?Eight
Not sure what you mean about the trailing comma, but I certainly wouldn't recommend using this strategy for more than 2 unique strings. Excel doesn't handle this very well without resorting to VBA.Ethmoid
U
0

As I wrote, it is trivial to solve with VBA. If you cannot use VBA, one method is to use helper columns.

Assume: Your string is in A1

Enter the following formulas:

C1:  =IFERROR(INDEX(TRIM(MID(SUBSTITUTE($A$1," ",REPT(" ",99)),(ROW(INDIRECT("1:" & LEN($A$1)-LEN(SUBSTITUTE($A$1," ",""))+1))-1)*99+((ROW(INDIRECT("1:" & LEN($A$1)-LEN(SUBSTITUTE($A$1," ",""))+1))=1)),99)),ROWS($1:1),1),"")

D1:  =IF(COUNTIF(C1:$C$5,C1)=1,C1,"")

Select C1 and D1 and fill down until you start getting blanks

E1:  =D1
E2:  =TRIM(CONCATENATE(D2," ",E1))

Select E2 and fill down.

The contents of the last cell filled in column E will be your result.

If you want to have a cell which automatically returns the contents of the last cell in column E range, you can use a formula like:

=LOOKUP(REPT("z",99),$E$1:$E$100)
Untutored answered 17/9, 2014 at 19:8 Comment(2)
Thanks, Ron. This doesn't seem to work out-of-the-box for my real case, but I'll give it further thought.Costa
@WaldirLeoncio It works on your sample data. If your sample is not representative, edit your original question to provide information about the failures.Untutored
R
0

Without a formula: Text to Columns with space as the delimiter, transpose the output, apply Remove Duplicates to each of the columns individually, then transpose the result.

Rikkiriksdag answered 17/9, 2014 at 19:26 Comment(1)
Thank you for your input, but I really need this to work without user intervention. It is a complex spreadsheet for a client and I don't want him to do anything but fill in cells.Costa
G
0

Found a solution that might work if you are also the one making the list.

when you make the list if you are doing it by combining the cell above with the current line, you can check to see if the value is already in the above cell using the following code:

if(iserror(find(value_to_be_added,previous_concatenation)),
    previous_concatenation&" "&value_to_be_added,previous_concatenation)
Gregorio answered 29/8, 2018 at 19:39 Comment(0)
M
0

Did you try the textjoin function? (available in Excel 2016, not sure about previous versions). Was just looking for something similar and this seems to do the job for me on a column where I have multiple values more than once.

=TEXTJOIN(delimiter;ignore_empty;text)
  • define delimiter in any way you need it
  • ignore empty can be true or false, depending on what serves your needs
  • text would be your array of values - using the unique function within here (see example below) will filter out any multiples of any string (I am using it for numbers and it works)

Example:

=TEXTJOIN(" ";TRUE;UNIQUE($A$1:$A$16))

Guess this might be Excel's equivalent to google sheets' join function. Textjoin comes up if you type in =join - I took the formula provided in user11308575's post above but removed the parantheses and its content, then went from there.

Hope this helps (even though the thread is already old) ;)

Microelement answered 13/7, 2020 at 11:39 Comment(1)
The theory is there, but you just missing the point that OP has a string in a single cell where he needs to remove duplicates from.Gemeinschaft
G
0

If one has access to TEXTJOIN one could use:

=TEXTJOIN(" ",,FILTERXML("<t><s>"&SUBSTITUTE(A1," ","</s><s>")&"</s></t>","//s[not(preceding::*=.)]"))
Gemeinschaft answered 21/7, 2020 at 21:9 Comment(1)
When I tried this on a cell containing "John John John Mary Mary", it only returned "John". I would love for this to work but it's not for me. I wish I understood better what it was doing.Child
C
-1

I found the answer below in this thread https://superuser.com/questions/643909/remove-duplicate-entries-in-one-cell

=join(" ",unique(transpose(split(A1," "))))

Cauchy answered 4/4, 2019 at 0:20 Comment(1)
The original answer concerns using that function in Google Sheets. Microsoft Excel doesn't have the "join" function.Costa

© 2022 - 2024 — McMap. All rights reserved.