I am looking to implement a VBA trie-building algorithm that is able to process a substantial English lexicon (~50,000 words) in a relatively short amount of time (less than 15-20 seconds). Since I am a C++ programmer by practice (and this is my first time doing any substantial VBA work), I built a quick proof-of-concept program that was able to complete the task on my computer in about half a second. When it came time to test the VBA port however, it took almost two minutes to do the same -- an unacceptably long amount of time for my purposes. The VBA code is below:
Node Class Module:
Public letter As String
Public next_nodes As New Collection
Public is_word As Boolean
Main Module:
Dim tree As Node
Sub build_trie()
Set tree = New Node
Dim file, a, b, c As Integer
Dim current As Node
Dim wordlist As Collection
Set wordlist = New Collection
file = FreeFile
Open "C:\corncob_caps.txt" For Input As file
Do While Not EOF(file)
Dim line As String
Line Input #file, line
wordlist.add line
Loop
For a = 1 To wordlist.Count
Set current = tree
For b = 1 To Len(wordlist.Item(a))
Dim match As Boolean
match = False
Dim char As String
char = Mid(wordlist.Item(a), b, 1)
For c = 1 To current.next_nodes.Count
If char = current.next_nodes.Item(c).letter Then
Set current = current.next_nodes.Item(c)
match = True
Exit For
End If
Next c
If Not match Then
Dim new_node As Node
Set new_node = New Node
new_node.letter = char
current.next_nodes.add new_node
Set current = new_node
End If
Next b
current.is_word = True
Next a
End Sub
My question then is simply, can this algorithm be sped up? I saw from some sources that VBA Collection
s are not as efficient as Dictionary
s and so I attempted a Dictionary
-based implementation instead but it took an equal amount of time to complete with even worse memory usage (500+ MB of RAM used by Excel on my computer). As I say I am extremely new to VBA so my knowledge of both its syntax as well as its overall features/limitations is very limited -- which is why I don't believe that this algorithm is as efficient as it could possibly be; any tips/suggestions would be greatly appreciated.
Thanks in advance
NB: The lexicon file referred to by the code, "corncob_caps.txt", is available here (download the "all CAPS" file)
For...Each
and not by index. Collections have the performance characteristics of lists: #4828463. I'm surprised that movingDim
s around had any impact. – Live