Set data structure in VBA
Asked Answered
C

4

12

I'm looking for a set data structure to use in Excel VBA. What I found so far is Scripting.Dictionary which seems to be a map.

Is there also something like a set in VBA?

Basically I'm looking for a data structure that is efficient for finding out if a particular value has already been added.

Consistent answered 18/1, 2017 at 13:8 Comment(17)
Maybe you mean CollectionMorbihan
an array maybe? Or a collection? Or a class?Marvel
I try to find duplicates. As far as I understand arrays and collections or not indexed, so I'd have to traverse the whole thing to find out if a certain value is alread present.Consistent
Per definitonem a set cannot contain duplicates, right?Morbihan
Exactly and normally it has an efficient way of ensuring this and a method to check if a value is already present. This is what I want to use to find duplicate rows in a sheet.Consistent
Maybe you look here excelfunctions.net/find-duplicate-rows.htmlMorbihan
Scripting.Dictionary keys are unique, Exists method allows to check if a key is already in the dictionary.Fourierism
@Fourierism Thats what I'm trying to do now, thank you. As I only need the key and not the value, I thought the Dictionary might be overkill.Consistent
@Morbihan Thank you, for the idea. I try do it programatically, so this doesn't work in my case.Consistent
@Consistent Do you want to highlight the duplicated cells in a column, or to retrieve a list of distinct (or a list of duplicated) values from a column?Fourierism
@Fourierism I concatenate the cells of each row and want to find the ones that are the same. So far it's a programming problem, I don't know about presentation yet.Consistent
@Consistent Dictionary is actually pretty fast, and it's a hashtable which is the right implementation for a set. I personally would just write a wrapper for Dictionary that exposes the methods you need. I'll add an answer for you.Discomposure
@Consistent I wrote a HashSet class from scratch, but it's probably overkill for your purposes. It uses a dynamic array which grows as items are added, with a default load factor of 75% to minimize collisions. Collisions are handled with linked lists. If you're interested let me know - in many cases it's faster than Scripting.Dictionary because it uses ASCII instead of Unicode - I use djb2 for string hashing.Discomposure
Using a Dictionary or Collection for unique values in VBA; and using SQL against Excel.Desalinate
Answering the question - take a look at .NET ArrayList, you can instantiate the object within VBS and VBA environment Set ArrayList = CreateObject("System.Collections.ArrayList"), it has such methods as Add, Contains and Sort.Fourierism
@ZevSpitz Thank you, very informative! Especially the first one.Consistent
@Fourierism Thank you for the suggestion. The problem is, there will be very many rows. ArrayList#contains is not efficient, from the doc: This method performs a linear search; therefore, this method is an O(n) operation, where n is Count. So if I use it to find all the duplicated rows I'll have O(n^2) as opposed O(n) when using a HashSet. Also, I guess this is only possible when .NET is installed.Consistent
F
4

Take a look at .NET ArrayList, it has such methods as Add, Contains, Sort etc. You can instantiate the object within VBS and VBA environment:

Set ArrayList = CreateObject("System.Collections.ArrayList")

Scripting.Dictionary also may fit the needs, it has unique keys, Exists method allows to check if a key is already in the dictionary.

However, SQL request via ADODB probably will be more efficient for that case. The below examples shows how to retrieve unique rows via SQL query to the worksheet:

Option Explicit

Sub GetDistinctRecords()

    Dim strConnection As String
    Dim strQuery As String
    Dim objConnection As Object
    Dim objRecordSet As Object

    Select Case LCase(Mid(ThisWorkbook.Name, InStrRev(ThisWorkbook.Name, ".")))
        Case ".xls"
            strConnection = "Provider=Microsoft.Jet.OLEDB.4.0;User ID=Admin;Data Source='" & ThisWorkbook.FullName & "';Mode=Read;Extended Properties=""Excel 8.0;HDR=YES;"";"
        Case ".xlsm", ".xlsb"
            strConnection = "Provider=Microsoft.ACE.OLEDB.12.0;User ID=Admin;Data Source='" & ThisWorkbook.FullName & "';Mode=Read;Extended Properties=""Excel 12.0 Macro;HDR=YES;"";"
    End Select

    strQuery = "SELECT DISTINCT * FROM [Sheet1$]"
    Set objConnection = CreateObject("ADODB.Connection")
    objConnection.Open strConnection
    Set objRecordSet = objConnection.Execute(strQuery)
    RecordSetToWorksheet Sheets(2), objRecordSet
    objConnection.Close

End Sub

Sub RecordSetToWorksheet(objSheet As Worksheet, objRecordSet As Object)

    Dim i As Long

    With objSheet
        .Cells.Delete
        For i = 1 To objRecordSet.Fields.Count
            .Cells(1, i).Value = objRecordSet.Fields(i - 1).Name
        Next
        .Cells(2, 1).CopyFromRecordset objRecordSet
        .Cells.Columns.AutoFit
    End With

End Sub

Source data should be placed on the Sheet1, the result is output to the Sheet2. The only limitation for that method is that ADODB connects to the Excel workbook on the drive, so any changes should be saved before query to get actual results.

If you want to get only the set of non-distinct rows, then the query should be as follows (just an example, you have to put your set of fields into query):

    strQuery = "SELECT CustomerID, CustomerName, ContactName, Address, City, PostalCode, Country FROM [Sheet1$] GROUP BY CustomerID, CustomerName, ContactName, Address, City, PostalCode, Country HAVING Count(*) > 1"
Fourierism answered 18/1, 2017 at 15:4 Comment(3)
Wow, I didn't know it's possible to run SQL queries within Excel, nice! For my use case, it should work without saving too, as I use it for validation. Is this somehow possible with this method? I think I can not accept this answer as I'm asking for a set data structure. It would be missleading for people looking for a data structure if this was the accepted answer.Consistent
@Consistent could you please clarify what kind of validation do you need? How do valid and not-valid rows should be displayed?Fourierism
Your answer seems to do what I was trying to do except for needing to save first. So far it's only about validating, how the rows will be displayed is not decided yet. So far I resorted to using Scripting.Dictionary, it's not a Set, but it gets the job done.Consistent
P
3

You could use a collection and do the following function, collections enforce unique key identifiers:

Public Function InCollection(Col As Collection, key As String) As Boolean
  Dim var As Variant
  Dim errNumber As Long

  InCollection = False
  Set var = Nothing

  Err.clear
  On Error Resume Next
    var = Col.Item(key)
    errNumber = CLng(Err.Number)
  On Error GoTo 0

  '5 is not in, 0 and 438 represent incollection
  If errNumber = 5 Then ' it is 5 if not in collection
    InCollection = False
  Else
    InCollection = True
  End If

End Function
Panier answered 18/1, 2017 at 14:27 Comment(2)
Thank you for the detailed answer! What is the difference/advantage to using Scripting.Dictionary. Using a dictionary seems more intuitive as it has the method Exists. I was looking for another datastructure because both collection and dictionary seem to be key-value structures, I only need the key.Consistent
This is a good source of information for collections vs dictionary (experts-exchange.com/articles/3391/…). In reality both work fine for this purpose so pick whichever is more comfortable.Panier
D
3

Simply write a wrapper for Scripting.Dictionary that exposes only set-like operations.

clsSet

Option Explicit

Private d As Scripting.Dictionary

Private Sub Class_Initialize()
    Set d = New Scripting.Dictionary
End Sub

Public Sub Add(var As Variant)
    d.Add var, 0
End Sub

Public Function Exists(var As Variant) As Boolean
    Exists = d.Exists(var)
End Function

Public Sub Remove(var As Variant)
    d.Remove var
End Sub

And then you can use it like so:

mdlMain

Public Sub Main()
    Dim s As clsSet
    Set s = New clsSet

    Dim obj As Object

    s.Add "A"
    s.Add 3
    s.Add #1/19/2017#

    Debug.Print s.Exists("A")
    Debug.Print s.Exists("B")
    s.Remove #1/19/2017#
    Debug.Print s.Exists(#1/19/2017#)
End Sub

Which prints True, False and False as expected.

Discomposure answered 19/1, 2017 at 17:56 Comment(0)
C
0

The Collection is indexed if you provide a Key along with the Item when you add it. Note that while the item can be any kind of object, the key must be a string.

Private Sub testset()
    ' set up an array to hold source values
    Dim values()
    values = Array("Item1", "Item2", "Item1")
    ' declare a collection
    Dim col As New Collection
    ' loop through the array and add items with the key string
    For Each item In values
        On Error Resume Next
        ' ignore the error raised if the key is already present
        col.Add item:=item, Key:=item
        On Error GoTo 0
    Next
    
    ' loop through the set
    For Each item In col
        Debug.Print item
    Next
End Sub

Output:

Item1
Item2
Canister answered 16/1, 2023 at 3:35 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.