Excel 2007 - Generate unique ID based on text?
Asked Answered
B

6

7

I have a sheet with a list of names in Column B and an ID column in A. I was wondering if there is some kind of formula that can take the value in column B of that row and generate some kind of ID based on the text? Each name is also unique and is never repeated in any way.

It would be best if I didn't have to use VBA really. But if I have to, so be it.

Banda answered 8/11, 2011 at 7:17 Comment(1)
Any requirements for the length or characters used in the ID?Flub
A
4

Solution Without VBA.

Logic based on First 8 characters + number of character in a cell.

= CODE(cell) which returns Code number for first letter

= CODE(MID(cell,2,1)) returns Code number for second letter

= IFERROR(CODE(MID(cell,9,1)) If 9th character does not exist then return 0

= LEN(cell) number of character in a cell

Concatenating firs 8 codes + adding length of character on the end

If 8 character is not enough, then replicate additional codes for next characters in a string.

Final function:

=CODE(B2)&IFERROR(CODE(MID(B2,2,1)),0)&IFERROR(CODE(MID(B2,3,1)),0)&IFERROR(CODE(MID(B2,4,1)),0)&IFERROR(CODE(MID(B2,5,1)),0)&IFERROR(CODE(MID(B2,6,1)),0)&IFERROR(CODE(MID(B2,7,1)),0)&IFERROR(CODE(MID(B2,8,1)),0)&LEN(B2)

enter image description here

Aliaalias answered 1/2, 2015 at 22:24 Comment(5)
This is not quote unique, because the CODE of an individual character can be either 2 or 3 digits; so a combination of say 6 letters may have the same code as a combination of say 5 other letters.Clyve
Provide such examples.Aliaalias
Try converting this string of ASCII codes here back to letters; I count at least 6 ways to make proper names out of this string by flipping 1/2/3 digit characters around: 6510097109236666111983283116463280101116101 [try starting with this pattern: 232331232223222333]. Remember - the key to user inputs being calculated is always dealing with corner cases. It is the unlikely user inputs which create the most pain if your data entry is not able to handle all cases.Clyve
You are talking about reverse function. Task was to assign ID to a list of Real unique names. But enhancement to my solution is very simple to avoid mixing 2 and 3 digits ASCII codes: =(1000+CODE(B2))&IFERROR(1000+CODE(MID(B2,2,1)),0)&IFERROR(1000+CODE(MID(B2,3,1)),0)&IFERROR(1000+CODE(MID(B2,4,1)),0)&IFERROR(1000+CODE(MID(B2,5,1)),0)&IFERROR(1000+CODE(MID(B2,6,1)),0)&IFERROR(1000+CODE(MID(B2,7,1)),0)&IFERROR(1000+CODE(MID(B2,8,1)),0)&LEN(B2)Aliaalias
This is an interesting solution - force all codes to be 4 digit numbers instead of 2 or 3. Easy to do and easy to read after because if the first two digits are 10, then it's a 2 digit code remaining, and otherwise it's a 3 digit code remaining. I like it.Clyve
T
1

Sorry, I didn't found a solution with formula only even if this thread might help (trying to calculate the points in a scrabble game) but I didn't find a way to be sure the generated hash would be unique.

Yet, here is my solution, based on a UDF (Used-Defined Function):

Put the code in a module:

Public Function genId(ByVal sName As String) As Long
'Function to create a unique hash by summing the ascii value of each character of a given string
    Dim sLetter As String
    Dim i As Integer
    For i = 1 To Len(sName)
        genId = Asc(Mid(sName, i, 1)) * i + genId
    Next i
End Function

And call it in your worksheet like a formula:

=genId(A1)

[EDIT] Added the * i to take into account the order. It works on my unit tests

Ted answered 8/11, 2011 at 7:55 Comment(5)
Hi! This works pretty well :) Allthough, I do get the same results for a few names, if the name has the same amount of characters. I think I'll just split the string and pick the first letter of each and then add this ID. Should probably be unique then :)Banda
seems like the algo is missing the order! (it will generate the same ID for james Doe and Doe james. I'll edit my answer to improve my function (FWIW, I've multiplied the id by the index so that it somehow takes the order into account. I hope that will be enoughTed
@chrisneilsen: why not? I understand this doesn't use any standard lib to create a hash but I wish to read in which case it wouldn't workTed
@Ted consider this: for simple three letter words there are 52^3 = 140608 possible words. Your algo will produce a max number of 732 - clearly you can't produce unique IDs for 140000 words with <700 values! The ratio get worse the longer the words get.Wong
@chrisneilsen: ok got it. Thanks for taking time to answerTed
W
0

May be OTT for your needs, but you can use a call to CoCreateGuid to get a real GUID

Private Declare Function CoCreateGuid Lib "ole32" (ID As Any) As Long

Function GUID() As String
    Dim ID(0 To 15) As Byte
    Dim i As Long

    If CoCreateGuid(ID(0)) = 0 Then
        For i = 0 To 15
            GUID = GUID & Format(Hex$(ID(i)), "00")
        Next
    Else
        GUID = "Error while creating GUID!"
    End If

End Function

Test using

Sub testGUID()
    MsgBox GUID
End Sub

How to best implement depends on your needs. One way would be to write a macro to get a GUID populate a column where names exist. (note, using it as a udf as is is no good, since it will return a new GUID when recalculated)

EDIT
See this answer for creating a SHA1 hash of a string

Wong answered 8/11, 2011 at 10:18 Comment(0)
C
0

Do you just want an incrementing numeric id column to sit next to your values? If so, and if your values will always be unique, you can very easily do this with formulae.

If your values were in column B, starting in B2 underneath your headers for example, in A2 you would type the formula "=IF(B2="","",1+MAX(A$1:A1))". You can copy and paste that down as far as your data extends, and it will increment a numeric identifier for each row in column B which isn't blank.

If you need to do anything more complicated, like identify and re-identify repeating values, or make identifiers 'freeze' once they're populated, let me know. Currently, when you clear or add values to your list the identifers will toggle themselves up and down, so you need to be careful if your data changes.

Charades answered 8/11, 2011 at 23:13 Comment(0)
B
0

Unique identifier based on the number of specific characters in text. I used an identifier based on vowels and numbers.

=LEN($J$14)-LEN(SUBSTITUTE($J$14;"a";""))&LEN($J$14)-LEN(SUBSTITUTE($J$14;"e";""))&LEN($J$14)-LEN(SUBSTITUTE($J$14;"i";""))&LEN($J$14)-LEN(SUBSTITUTE($J$14;"j";""))&LEN($J$14)-LEN(SUBSTITUTE($J$14;"o";""))&LEN($J$14)-LEN(SUBSTITUTE($J$14;"u";""))&LEN($J$14)-LEN(SUBSTITUTE($J$14;"y";""))&LEN($J$14)-LEN(SUBSTITUTE($J$14;"1";""))&LEN($J$14)-LEN(SUBSTITUTE($J$14;"2";""))&LEN($J$14)-LEN(SUBSTITUTE($J$14;"3";""))&LEN($J$14)-LEN(SUBSTITUTE($J$14;"4";""))&LEN($J$14)-LEN(SUBSTITUTE($J$14;"5";""))&LEN($J$14)-LEN(SUBSTITUTE($J$14;"6";""))&LEN($J$14)-LEN(SUBSTITUTE($J$14;"7";""))&LEN($J$14)-LEN(SUBSTITUTE($J$14;"8";""))&LEN($J$14)-LEN(SUBSTITUTE($J$14;"9";""))&LEN($J$14)-LEN(SUBSTITUTE($J$14;"0";""))
Blenheim answered 28/8, 2015 at 14:19 Comment(1)
Note that this only works if there are no strings which have the same characters but in a different order. ie: 21 jump street & 12 jump street would be the same in this method.Clyve
S
0

You say you are confident that there are no duplicate values in your words. To push it further, are you confident that the first 8 characters in any word would be unique?

If so, you can use the below formula. It works by individually taking each character's ASCII code - 40 [assuming normal characters, this puts numbers at between 8 & 57, and letters at between 57 & 122], and multiplying that characters code by 10 ^ [that character's digit placement in the word]. Basically it takes that character code [-40], and concatenates each code onto the next.

EDIT Note that this code no longer requires that at least 8 characters exist in your word to prevent an error, as the actual word to be coded has 8 "0"'s appended to it.

=TEXT(SUM((CODE(MID(LOWER(RIGHT(REPT("0",8)&A3,8)),{1,2,3,4,5,6,7,8},1))-40)*10^{0,2,4,6,8,10,12,14}),"#")

Note that as this uses the ASCII values of the characters, the ID # could be used to identify the name directly - this does not really create anonymity, it just turns 8 unique characters into a unique number. It is obfuscated with the -40, but not really 'safe' in that sense. The -40 is just to get normal letters and numbers in the 2 digit range, so that multiplying by 10^0,2,4 etc. will create a 2 digit unique add-on to the created code.

EDIT FOR ALTERNATIVE METHOD

I had previously attempted to do this so that it would look at each letter of the alphabet, count the number of times it appears in the word, and then multiply that by 10*[that letter's position in the alphabet]. The problem with doing this (see comment below for formula) is that it required a number of 10^26-1, which is beyond Excel's floating point precision. However, I have a modified version of that method:

By limiting the number of allowed characters in the alphabet, we can get the max total size possible to 10^15-1, which Excel can properly calculate. The formula looks like this:

=RIGHT(REPT("0",15)&TEXT(SUM(LEN(A3)*10^{0,1,2,3,4,5,6,7,8,9,10,11,12,13,14}-LEN(SUBSTITUTE(A3,MID(Alphabet,{1,2,3,4,5,6,7,8,9,10,11,12,13,14,15},1),""))*10^{0,1,2,3,4,5,6,7,8,9,10,11,12,13,14}),"#"),15)

[The RIGHT("00000000000000"... portion of the formula is meant to keep all codes the same number of characters]

Note that here, Alphabet is a named string which holds the characters: "abcdehilmnorstu". For example, using the above formula, the word "asdf" counts the instances of a, s, and d, but not 'f' which isn't in my contracted alphabet. The code of "asdf" would be:

001000000001001

This only works with the following assumptions:

The letters not listed (nor numbers / special characters) are not required to make each name unique. For example, asdf & asd would have the same code in the above method.

And,

The order of the letters is not required to make each name unique. For example, asd & dsa would have the same code in the above method.

Shaunshauna answered 28/8, 2015 at 14:52 Comment(2)
Note: I tried to do this by individually counting the number of characters from a-z in each word and placing that number (assuming 0-9) in the digit of a 10^26 number, and it would have worked if 10^26 wasn't outside of Excel's accuracy with floating point values. Shown here: =TEXT(SUM(LEN(A3)*10^{1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26}-LEN(SUBSTITUTE(A3,MID(Alphabet,{1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26},1),""))*10^{1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26}),"#")Clyve
[In the above example Alphabet is a named range containing a single string of "abcd...z"].Clyve

© 2022 - 2024 — McMap. All rights reserved.