Search and replace whole words which can be separated not only by a space
Asked Answered
P

2

3

I'm looking for a way to search and replace whole words. The whole words can be separated not only by a space but .,;:/? etc.

I'm looking to do something like this

replace([address], ***--list of separators, like .,;:/?--*** & [replacewhat] & ***--list of separators, like .,;:/?--*** ," " & [replacewith] & " ")

I don't know how to pass a list of separators instead of running a replace function once for each combination of separators (which combined with 300 words I'm replacing would amount to an insane number of queries).

Phonsa answered 30/7, 2012 at 20:0 Comment(3)
The problem is you can't define the rules under which this would operate. There are too many variations/permutations. As a result no set of logic can do it right. The best you could hope for is a process which would flag ones needing human review; check the ones you want updated and then let the system update after the manual review. Otherwise you'll be developing an AI to deal with the permuatations.Triggerfish
that's not really true, ifirst of all, MS already does it (you can search for whole words) and second, i can come up with a list of combinations and then just run each of my 300 words with every set. i'm really trying to avoid that but it's 100% possible. another thing i can do is run a replace that will replace all characters with a space and then replace my 300 words if they're surrounded by spaces. so there are ways of doing it, i'm just trying to find the best way of doing this. i have to believe that there are ways to search for whole words. thank you!!Phonsa
i already have code that will give me all permutations of 2 of any number under 9. i'm just really trying to avoid going that way.Phonsa
B
14

You can do a replacement with a regular expression using a pattern with the \b marker (for the word boundary) before and after the word you want to replace.

Public Function RegExpReplaceWord(ByVal strSource As String, _
    ByVal strFind As String, _
    ByVal strReplace As String) As String
' Purpose   : replace [strFind] with [strReplace] in [strSource]
' Comment   : [strFind] can be plain text or a regexp pattern;
'             all occurences of [strFind] are replaced
    ' early binding requires reference to Microsoft VBScript
    ' Regular Expressions:
    'Dim re As RegExp
    'Set re = New RegExp
    ' with late binding, no reference needed:
    Dim re As Object
    Set re = CreateObject("VBScript.RegExp")

    re.Global = True
    're.IgnoreCase = True ' <-- case insensitve
    re.pattern = "\b" & strFind & "\b"
    RegExpReplaceWord = re.Replace(strSource, strReplace)
    Set re = Nothing
End Function

As written, the search is case sensitive. If you want case insensitive, enable this line:

re.IgnoreCase = True

In the Immediate window ...

? RegExpReplaceWord("one too three", "too", "two")
one two three
? RegExpReplaceWord("one tool three", "too", "two")
one tool three
? RegExpReplaceWord("one too() three", "too", "two")
one two() three
? RegExpReplaceWord("one too three", "to", "two")
one too three
? RegExpReplaceWord("one too three", "t..", "two")
one two three

... and for your range of delimiters ...

? RegExpReplaceWord("one.too.three", "too", "two")
one.two.three
? RegExpReplaceWord("one,too,three", "too", "two")
one,two,three
? RegExpReplaceWord("one;too;three", "too", "two")
one;two;three
? RegExpReplaceWord("one:too:three", "too", "two")
one:two:three
? RegExpReplaceWord("one/too/three", "too", "two")
one/two/three
? RegExpReplaceWord("one?too?three", "too", "two")
one?two?three
? RegExpReplaceWord("one--too--three", "too", "two")
one--two--three
? RegExpReplaceWord("one***too***three", "too", "two")
one***two***three
Boll answered 30/7, 2012 at 21:23 Comment(11)
am i doing something wrong? i'm using this in a query Expr2: RegExpReplaceWord([newaddress1],"st","street") and it's not replacing. NewAddress1 Expr2 1 COLUMBIA ST 1 COLUMBIA STPhonsa
ohh, it's case sensitive, no problem, this is so elegant!!)))))))Phonsa
ok, i apologize, i'm new to this site and don't know how things work yet. both of your solutions are answers but i tried marking yours as an answer and it took it off Remou's. i'd like to give you both credit, how is this done? i already 1-upped you, is that enough? PS Oh, hi HansUp))))) i remember you from ShellAndWait and your WTF clause))))))) thank you againPhonsa
I didn't think to warn you about case sensitivity; I updated the answer.Boll
You can accept only one answer. I suggest you choose the one which works best for you. And you can also upvote all answers you find useful. I'm fine with a vote. And, yes, I remember you were amused by my Case Else in that earlier question; it cracked me up.Boll
BTW, what would you want from this? RegExpReplaceWord("ST JAMES INFIRMARY", "ST", "STREET")Boll
disregard, i misunderstood your question. i'm hoping this won't happen because i'm only changing the address field but sometimes the names ARE in the address field, so it will just be a mistake, nothing i can do.Phonsa
@Remou, i'm sorry, i don't know what happened. i unaccepted the answer and upvoted you and tried to reply to you but your post disappeared alltogether. i didn't know i can only mark one answer. your solution was very interesting but if i had to choose - i'd choose vba and vbs. sorry and thank youPhonsa
hey, HansUp, i hope you don't mind, i posted your solution (giving you credit) at Access-Programmers.co.uk, i had a thread there going from 2 years ago on this))))) and i linked back here. the thread is here access-programmers.co.uk/forums/…Phonsa
Fine with me :-) System is scolding me for chatting in comments, so Hans out.Boll
let us continue this discussion in chatPhonsa
D
0

Thank you for your answer. It was of great help to me.

However, as the number of iterations of this code increased due to increase in my data size, I realized that this piece of code is slowing down my application. For instance, 10,000 iterations of this code take about 20 seconds.

I was using below code based on your answer:

Function CleanString(ByVal InputString As String, Optional SplWords = "USP|BP|EP|IP|JP", _
                Optional Delim As String = "|") As String
Dim i As Integer
Dim ArrIsEmpty As Boolean
Dim ArrSplWords() As String
Dim Wrd As Variant
Dim RE As Object

CleanString = InputString
ArrSplWords = Split(SplWords, Delim)

Set RE = CreateObject("VBScript.RegExp")
RE.Global = True
RE.ignorecase = True
For Each Wrd In ArrSplWords
    RE.Pattern = "\b" & Wrd & "\b"
    If RE.test(CleanString) Then
        CleanString = RE.Replace(CleanString, "")
    End If
Next Wrd
CleanString = Application.WorksheetFunction.Trim(CleanString)
End Function

To tackle the issue of slowness, I decided to ditch the RegExp approach and came up with below code. Based on my evaluation, the below function is about 25 times faster (I timed it using timer function over 1000 iterations of each code).

Function CleanString(ByVal InputString As String, Optional SplWords As String = "USP|BP|EP|IP|JP", _
                Optional Delim As String = "|", Optional WordSeparator As String = " ", _
                Optional SplChar As String = "~|`|!|@|#|$|%|^|&|*|-|+|=|'|<|>|,|.|/|\|?|:|;") As String
Dim TestStr As String
Dim ArrSplChar() As String
Dim Char As Variant
Dim TestWords() As String
Dim Wrd As Variant
Dim Counter As Integer

TestStr = InputString
ArrSplChar = Split(SplChar, Delim, -1, vbTextCompare)

For Each Char In ArrSplChar
    TestStr = Replace(TestStr, Char, WordSeparator & Char & WordSeparator, 1, -1, vbTextCompare)
Next Char

TestWords = Split(TestStr, WordSeparator, -1, vbTextCompare)

For Each Wrd In TestWords
    Counter = IIf(Wrd = "", Counter + 1, Counter)
    If InStr(1, LCase(SplWords), LCase(Wrd), vbTextCompare) = 0 Then
        CleanString = CleanString & " " & Wrd
        Counter = Counter + 1
    End If
Next Wrd
CleanString = IIf(Counter - 1 = UBound(TestWords) - LBound(TestWords), _
                        Application.WorksheetFunction.Trim(InputString), _
                        Application.WorksheetFunction.Trim(CleanString))
End Function

This function looks a little messier than the regExp based function, but it is faster than the regExp based function.

Both the above functions generate the same output and can be called as follows:

Sub TestSub()
Debug.Print CleanString("Paracetamol USP")
End Sub

This will print "Paracetamol" in the immediate window.

Desiraedesire answered 8/1, 2022 at 0:8 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.