Finding similar sounding text in VBA [closed]
Asked Answered
S

5

16

My manager tells me that there is a way to evaluate names that are spelled differently but sound similar in the way they are pronounced. Ideally, we want to be able to evaluate a user-entered search name and return exact matches as well as "similar sounding" names. He called the process "Soundits" but I cannot find any info on Google.

Does this exist? Does anyone know if it is available for VBA (Access)?

Stripper answered 22/10, 2009 at 14:36 Comment(1)
This is a usefull and legitimate question,,, The author states clearly his intention... And provide enough info to generate answears (as we can see below) There is a valuable information in the question and the answers...Panettone
E
20

Nice question! You're question includes a great example of the idea itself.

There is an algorithm called the Russell Soundex algorithm, a standard technique in many applications, that evaluates names by the phonetic rather than the actual spelling. In this question, Soundits and Soundex are similar sounding names! [EDIT: Just ran the Soundex. Soundits=S532 and Soundex=S532.]

About Soundex:

The Soundex algorithm is predicated on characteristics of English such as:

  1. The first letter has high significance
  2. Many consonants sound similar
  3. Consonants affect pronunciation more than vowels

One warning: Soundex was designed for names. The shorter the better. As a name grows longer, the Soundex becomes less reliable.

Resources:

  1. Here is an example that uses VBA for Access.
  2. There is a write-up on Soundex in the VBA Developer's Handbook, 2nd Edition by Ken Getz and Mike Gilbert.
  3. There is a lot of information about Soundex and other variants such as Soundex2 (Search for 'Soundex' and 'VBA').

Code Example:

Below is some VBA code, found via a quick web search, that implements a variation of the Soundex algorithm.

Option Compare Database
Option Explicit

Public Function Soundex(varText As Variant) As Variant
On Error GoTo Err_Handler
    Dim strSource As String
    Dim strOut As String
    Dim strValue As String
    Dim strPriorValue As String
    Dim lngPos As Long

    If Not IsError(varText) Then
        strSource = Trim$(Nz(varText, vbNullString))
        If strSource <> vbNullString Then
            strOut = Left$(strSource, 1&)
            strPriorValue = SoundexValue(strOut)
            lngPos = 2&

            Do
                strValue = SoundexValue(Mid$(strSource, lngPos, 1&))
                If ((strValue <> strPriorValue) And (strValue <> vbNullString)) Or (strValue = "0") Then
                    strOut = strOut & strValue
                    strPriorValue = strValue
                End If
                lngPos = lngPos + 1&
            Loop Until Len(strOut) >= 4&
        End If
    End If

    If strOut <> vbNullString Then
        Soundex = strOut
    Else
        Soundex = Null
    End If

Exit_Handler:
    Exit Function

Err_Handler:
    MsgBox "Error " & Err.Number & ": " & Err.Description, vbExclamation, "Soundex()"
    Resume Exit_Handler
End Function
Private Function SoundexValue(strChar As String) As String
    Select Case strChar
    Case "B", "F", "P", "V"
        SoundexValue = "1"
    Case "C", "G", "J", "K", "Q", "S", "X", "Z"
        SoundexValue = "2"
    Case "D", "T"
        SoundexValue = "3"
    Case "L"
        SoundexValue = "4"
    Case "M", "N"
        SoundexValue = "5"
    Case "R"
        SoundexValue = "6"
    Case vbNullString
        SoundexValue = "0"
    Case Else
        'Return nothing for "A", "E", "H", "I", "O", "U", "W", "Y", non-alpha.
    End Select
End Function

Levenshtein distance

Another method of comparing strings is to get the Levenshtein distance. Here is the example given in VBA, it is taken from LessThanDot Wiki:

Function LevenshteinDistance(word1, word2)

Dim s As Variant
Dim t As Variant
Dim d As Variant
Dim m, n
Dim i, j, k
Dim a(2), r
Dim cost

   m = Len(word1)
   n = Len(word2)

   ''This is the only way to use
   ''variables to dimension an array
   ReDim s(m)
   ReDim t(n)
   ReDim d(m, n)

   For i = 1 To m
       s(i) = Mid(word1, i, 1)
   Next

   For i = 1 To n
       t(i) = Mid(word2, i, 1)
   Next

   For i = 0 To m
       d(i, 0) = i
   Next

   For j = 0 To n
       d(0, j) = j
   Next


   For i = 1 To m
       For j = 1 To n

           If s(i) = t(j) Then
               cost = 0
           Else
               cost = 1
           End If

           a(0) = d(i - 1, j) + 1             '' deletion
           a(1) = d(i, j - 1) + 1             '' insertion
           a(2) = d(i - 1, j - 1) + cost      '' substitution

           r = a(0)

           For k = 1 To UBound(a)
               If a(k) < r Then r = a(k)
           Next

           d(i, j) = r

       Next

   Next

   LevenshteinDistance = d(m, n)

End Function
Ernst answered 22/10, 2009 at 14:37 Comment(1)
BTW, when I cited Simil(), I was referring to an implementation of Levenshtein distance. The one I use is based on code found here: planetsourcecode.com/vb/scripts/…Longboat
F
4

Here are a couple working examples of the SOUNDEX algorithm in VBA:

Frontolysis answered 22/10, 2009 at 14:42 Comment(2)
This answer is an example of one of the several problems with link-only answers. One of the links no longer works, so the 50% of the answer is suddenly a waste of time for anyone who clicks it. If the author would have taken 15 seconds to copy/paste the relevant code into the answer, then the example's usefulness could live on "forever".Expunction
@Expunction Good point... alas, hard to travel back in time 10 years.Frontolysis
L
3

In addition to Soundex, which is often gives you too loose a match to be really useful, you should also look at Soundex2 (a variant of Soundex that is more granular), and for a different kind of matching, Simil(). I use all three.

Longboat answered 23/10, 2009 at 1:23 Comment(0)
S
0

You are looking for SOUNDEX.

Stadtholder answered 22/10, 2009 at 14:39 Comment(0)
R
0

Also consider using the first two or three letters of the first name and last name. In a database I had of 10,000 names Jo Sm (Joe/John/Joan Smith) returned only three or four records.

Also what type of first names. Are you going to get folks using the shortened version? For example my legal first name is Anthony but I'm always called Tony.

Reprobative answered 22/10, 2009 at 23:5 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.