How to check when a VBA module was modified?
Asked Answered
R

3

6

I have written a version control module. The AutoExec macro launches it whenever I, or one of the other maintainers log in. It looks for database objects that have been created or modified since the previous update, and then adds an entry to the Versions table and then opens the table (filtered to the last record) so I can type in a summary of the changes I performed.

It is working great for Tables, Queries, Forms, Macros, etc but I cannot get it to work correctly for modules.

I have found two different properties that suggest a Last Modified date ...

CurrentDB.Containers("Modules").Documents("MyModule").Properties("LastUpdated").Value
CurrentProject.AllModules("MyModule").DateModified

The first one (CurrentDB) always shows "LastUpdated" as the Date it was created, unless you modify the description of the module or something in the interface. This tells me that this property is purely for the container object - not what's in it.

The second one works a lot better. It accurately shows the date when I modify and compile/save the module. The only problem is that when you save or compile a module, it saves / compiles ALL the modules again, and therefore sets the DateModified field to the same date across the board. It kind of defeats the purpose of having the DateModified property on the individual modules doesn't it?

So my next course of action is going to a bit more drastic. I am thinking I will need to maintain a list of all the modules, and count the lines of code in each module using VBA Extensions. Then, if the lines of code differs from what the list has recorded - then I know that the module has been modified - I just won't know when, other than "since the last time I checked"

Does anyone have a better approach? I'd rather not do my next course of action because I can see it noticeably affecting database performance (in the bad kind of way)

Reticular answered 22/11, 2016 at 19:6 Comment(5)
It is simple enough to export the modules as text files, then you can compare the full text. I would not recommend just checking for line count since someone could change code within a line and it would not get detected. Years ago I loved 'WinDiff.exe' that would show differences in two files (I see it can still be downloaded). I built a 'data dictionary' database that had all objects, controls, properties and values, then could compare old versus new to show changes. Also, if you used SourceSafe (or the modern version), that can track changes.Magnetograph
Shameless plug, but the Rubberduck add-in features source control.Derte
@Derte sure, but it doesn't handle any Access-specific components yet.Columbarium
This is very interesting as even DoCmd.Save acModule, "ModuleName" saves all modules opened or unopened. I am thinking there's some default admin setting.Nicolasanicolau
Wayne - I messed around with SaveAsText / LoadFromText in order to programmatically modify data macros. Its one of those things that seem simple enough to say, but in practice have too many gotchas. (encoding differences comes readily to mind). But still, the checksum approach is going to be the way to go. I too loved WinDiff.Reticular
B
4

Here's a simpler suggestion:

  1. Calculate the MD5 hash for each module.
  2. Store it in the Versions table.
  3. Recalculate it for each module during the AutoExec and compare it to the one in the Versions table. If it's different, you can assume it has been changed (while MD5 is bad for security, it's still solid for integrity).

To get the text from a module using VBE Extensibility, you can do

Dim oMod As CodeModule
Dim strMod As String
Set oMod = VBE.ActiveVBProject.VBComponents(1).CodeModule
strMod = oMod.Lines(1, oMod.CountOfLines)

And then you can use the following modified MD5 hash function from this answer as below, you can take the hash of each module to store it, then compare it in your AutoExec.

Public Function StringToMD5Hex(s As String) As String
    Dim enc
    Dim bytes() As Byte
    Dim outstr As String
    Dim pos As Integer
    Set enc = CreateObject("System.Security.Cryptography.MD5CryptoServiceProvider")
    'Convert the string to a byte array and hash it
    bytes = StrConv(s, vbFromUnicode)
    bytes = enc.ComputeHash_2((bytes))
    'Convert the byte array to a hex string
    For pos = 0 To UBound(bytes)
        outstr = outstr & LCase(Right("0" & Hex(bytes(pos)), 2))
    Next
    StringToMD5Hex = outstr
    Set enc = Nothing
End Function
Baresark answered 23/11, 2016 at 18:21 Comment(6)
Simpler than what? That's quite literally the exact same answer I gave, except you're storing the data in a table and you're not mentioning the caveats of invalidating the stale records when a module is renamed or removed. Kudos for researching a way to late-bind to the .net hashing function, hadn't realized you could do that. +1 for that.Columbarium
@Mat'sMug lol! Sorry :P I meant simpler than the OP's paragraph "So my next course of action is going to a bit more drastic. I am thinking I will need to maintain a list of all the modules, and count the lines of code in each module using VBA Extensions. Then, if the lines of code differs from what the list has recorded - then I know that the module has been modified - I just won't know when, other than "since the last time I checked""Baresark
I too liked how you late binded to the .NET function - have been wondering for a way to do that. Going to look into that. I had actually thought of using MD5 or my own checksum algorithm (its not rocket science) but figured it would be more trouble than it is worth. As for where to store the MD5 hashes, I was thinking about storing it as a property on each module. Haven't tried it thought - but it would avoid using a table and solve the problem of renames.Reticular
@Reticular If you figure out how to store a module's content hash as a property of that module, please ping me - I need this!Columbarium
CurrentProject.AllModules("MyModule").Properties.Add "PropertyName", "PropertyValue"Reticular
This will be cleared if you export the module to another access database - but in what we're trying to do here - that's good ... we would want it to be resetReticular
C
4

You can't know when a module was modified. The VBIDE API doesn't even tell you whether a module was modified, so you have to figure that out yourself.


The VBIDE API makes it excruciatingly painful - as you've noticed.

Rubberduck doesn't deal with host-specific components yet (e.g. tables, queries, etc.), but its parser does a pretty good job at telling whether a module was modified since the last parse.

"Modified since last time I checked" is really all you need to know. You can't rely on line counts though, because this:

Option Explicit

Sub DoSomething
    'todo: implement
End Sub

Would be the same as this:

Option Explicit

Sub DoSomething
    DoSomethingElse 42
End Sub

And obviously you'd want that change to be picked up and tracked. Comparing every character on every single line of code would work, but there's a much faster way.

The general idea is to grab a CodeModule's contents, hash it, and then compare against the previous content hash - if anything was modified, we're looking at a "dirty" module. It's C#, and I don't know if there's a COM library that can readily hash a string from VBA, but worst-case you could compile a little utility DLL in .NET that exposes a COM-visible function that takes a String and returns a hash for it, shouldn't be too complicated.

Here's the relevant code from Rubberduck.VBEditor.SafeComWrappers.VBA.CodeModule, if it's any help:

private string _previousContentHash;
public string ContentHash()
{
    using (var hash = new SHA256Managed())
    using (var stream = Content().ToStream())
    {
        return _previousContentHash = new string(Encoding.Unicode.GetChars(hash.ComputeHash(stream)));
    }
}

public string Content()
{
    return Target.CountOfLines == 0 ? string.Empty : GetLines(1, CountOfLines);
}

public string GetLines(Selection selection)
{
    return GetLines(selection.StartLine, selection.LineCount);
}

public string GetLines(int startLine, int count)
{
    return Target.get_Lines(startLine, count);
}

Here Target is a Microsoft.Vbe.Interop.CodeModule object - if you're in VBA land then that's simply a CodeModule, from the VBA Extensibility library; something like this:

Public Function IsModified(ByVal target As CodeModule, ByVal previousHash As String) As Boolean

    Dim content As String
    If target.CountOfLines = 0 Then
        content = vbNullString
    Else
        content = target.GetLines(1, target.CountOfLines)
    End If

    Dim hash As String
    hash = MyHashingLibrary.MyHashingFunction(content)

    IsModified = (hash <> previousHash)

End Function

So yeah, your "drastic" solution is pretty much the only reliable way to go about it. Few things to keep in mind:

  • "Keeping a list of all modules" will work, but if you only store module names, and a module was renamed, your cache is stale and you need a way to invalidate it.
  • If you store the ObjPtr of each module object rather than their names, I'm not sure if it's reliable in VBA, but I can tell you that through COM interop, a COM object's hashcode isn't going to be consistently consistent between calls - so you'll have a stale cache and a way to invalidate it, that way too. Possibly not an issue with a 100% VBA solution though.

I'd go with a Dictionary that stores the modules' object pointer as a key, and their content hash as a value.


That said as the administrator of the Rubberduck project, I'd much rather see you join us and help us integrate full-featured source control (i.e. with host-specific features) directly into the VBE =)

Rubberduck's Source Control panel

Columbarium answered 22/11, 2016 at 19:56 Comment(3)
Mat, thanks for taking the time to put so much details into your answer. I'm not a huge fan of using third party APIs. They become more hassle than they are worth in my experience "We had to reinstall access and the database isn't working". Ideally I'd like to keep this completely within the core references, but if I need to use VBE I can bend to that since its pretty standard. Easy enough to do a late binding and avoid reference issues.Reticular
@Reticular whether you late-bind or early-bind a 3rd-party library makes no difference, if the library is going to fail it's going to fail anyway. If the library isn't present on the machine it's running, you'll only know at runtime with late-binding, vs. compile-time with early-binding. Your fear of early-binding libraries seems unjustified, especially when you're talking about the VBIDE API, standard on any computer that runs VBA (or even the scripting runtime - these libraries are standard on any Windows box).Columbarium
Unless its changed in later Access versions: Early Binding ties you in to a specific version of DLL. Late binding simply ties you in with any dll that has the object you are looking for. So consider the Microsoft Excel library as an example. If I early bind with version 15 - and the machine is older and only has version 14 .. there will be a missing references issue. If however, I late bind - it will see I am looking for the "Application.Excel" object and resolve it to whichever library is present.Reticular
B
4

Here's a simpler suggestion:

  1. Calculate the MD5 hash for each module.
  2. Store it in the Versions table.
  3. Recalculate it for each module during the AutoExec and compare it to the one in the Versions table. If it's different, you can assume it has been changed (while MD5 is bad for security, it's still solid for integrity).

To get the text from a module using VBE Extensibility, you can do

Dim oMod As CodeModule
Dim strMod As String
Set oMod = VBE.ActiveVBProject.VBComponents(1).CodeModule
strMod = oMod.Lines(1, oMod.CountOfLines)

And then you can use the following modified MD5 hash function from this answer as below, you can take the hash of each module to store it, then compare it in your AutoExec.

Public Function StringToMD5Hex(s As String) As String
    Dim enc
    Dim bytes() As Byte
    Dim outstr As String
    Dim pos As Integer
    Set enc = CreateObject("System.Security.Cryptography.MD5CryptoServiceProvider")
    'Convert the string to a byte array and hash it
    bytes = StrConv(s, vbFromUnicode)
    bytes = enc.ComputeHash_2((bytes))
    'Convert the byte array to a hex string
    For pos = 0 To UBound(bytes)
        outstr = outstr & LCase(Right("0" & Hex(bytes(pos)), 2))
    Next
    StringToMD5Hex = outstr
    Set enc = Nothing
End Function
Baresark answered 23/11, 2016 at 18:21 Comment(6)
Simpler than what? That's quite literally the exact same answer I gave, except you're storing the data in a table and you're not mentioning the caveats of invalidating the stale records when a module is renamed or removed. Kudos for researching a way to late-bind to the .net hashing function, hadn't realized you could do that. +1 for that.Columbarium
@Mat'sMug lol! Sorry :P I meant simpler than the OP's paragraph "So my next course of action is going to a bit more drastic. I am thinking I will need to maintain a list of all the modules, and count the lines of code in each module using VBA Extensions. Then, if the lines of code differs from what the list has recorded - then I know that the module has been modified - I just won't know when, other than "since the last time I checked""Baresark
I too liked how you late binded to the .NET function - have been wondering for a way to do that. Going to look into that. I had actually thought of using MD5 or my own checksum algorithm (its not rocket science) but figured it would be more trouble than it is worth. As for where to store the MD5 hashes, I was thinking about storing it as a property on each module. Haven't tried it thought - but it would avoid using a table and solve the problem of renames.Reticular
@Reticular If you figure out how to store a module's content hash as a property of that module, please ping me - I need this!Columbarium
CurrentProject.AllModules("MyModule").Properties.Add "PropertyName", "PropertyValue"Reticular
This will be cleared if you export the module to another access database - but in what we're trying to do here - that's good ... we would want it to be resetReticular
R
2

I thought I would add the final code I came up with for a hash / checksum generation module, since that was really the piece I was missing. Credit to the @BlackHawk answer for filling in the gap by showing that you can late bind .NET classes - that's going to open up a lot of possibilities for me now.

I have finished writing my Version checker. There were a few caveats that I encountered that made it hard to rely on the LastUpdated date.

  1. Resizing the columns in a Table or Query changed the LastUpdated date.
  2. Compiling any Module compiled all modules, thus updated all module's LastUpdated date (as was already pointed out)
  3. Adding a filter to a form in View mode causes the form's Filter field to be updated,which in turn updates the LastUpdated date.
  4. When using SaveAsText on a Form or Report, changing a printer or display driver can affect the PrtDevMode encodings, so it is necessary to strip them out before calculating a checksum

For Tables I built a string that was a concatenation of the table name, all field names with their size and data types. I then computed the hash on those.

For Queries I simply computed the hash on the SQL.

For Modules, Macros, Forms, and Reports I used the Application.SaveAsText to save it to a temporary file. I then read that file in to a string and computed a hash on it. For Forms and Reports I didn't start adding to the string until the "Begin" line passed.

Seems to be working now and I haven't come across any situations where it would prompt for a version revision when something wasn't actually changed.

For calculating a checksum or hash, I built a Class Module named CryptoHash. Here is the full source below. I optimized the Bytes Array to Hex String conversion to be quicker.

Option Compare Database
Option Explicit

Private objProvider As Object          ' Late Bound object variable for MD5 Provider
Private objEncoder As Object           ' Late Bound object variable for Text Encoder
Private strArrHex(255) As String       ' Hexadecimal lookup table array

Public Enum hashServiceProviders
  MD5
  SHA1
  SHA256
  SHA384
  SHA512
End Enum

Private Sub Class_Initialize()
  Const C_HEX = "0123456789ABCDEF"
  Dim intIdx As Integer               ' Our Array Index Iteration variable

  ' Instantiate our two .NET class objects
  Set objEncoder = CreateObject("System.Text.UTF8Encoding")
  Set objProvider = CreateObject("System.Security.Cryptography.MD5CryptoServiceProvider")

  ' Initialize our Lookup Table (array)
  For intIdx = 0 To 255
    ' A byte is represented within two hexadecimal digits.
    ' When divided by 16, the whole number is the first hex character
    '                     the remainder is the second hex character
    ' Populate our Lookup table (array)
    strArrHex(intIdx) = Mid(C_HEX, (intIdx \ 16) + 1, 1) & Mid(C_HEX, (intIdx Mod 16) + 1, 1)
  Next

End Sub

Private Sub Class_Terminate()
  ' Explicity remove the references to our objects so Access can free memory
  Set objProvider = Nothing
  Set objEncoder = Nothing
End Sub

Public Property Let Provider(NewProvider As hashServiceProviders)

  ' Switch our Cryptographic hash provider
  Select Case NewProvider
    Case MD5:
      Set objProvider = CreateObject("System.Security.Cryptography.MD5CryptoServiceProvider")
    Case SHA1:
      Set objProvider = CreateObject("System.Security.Cryptography.SHA1CryptoServiceProvider")
    Case SHA256:
      Set objProvider = CreateObject("System.Security.Cryptography.SHA256Managed")
    Case SHA384:
      Set objProvider = CreateObject("System.Security.Cryptography.SHA384Managed")
    Case SHA512:
      Set objProvider = CreateObject("System.Security.Cryptography.SHA512Managed")
    Case Else:
      Err.Raise vbObjectError + 2029, "CryptoHash::Provider", "Invalid Provider Specified"
  End Select

End Property

' Converts an array of bytes into a hexadecimal string
Private Function Hash_BytesToHex(bytArr() As Byte) As String
  Dim lngArrayUBound As Long         ' The Upper Bound limit of our byte array
  Dim intIdx As Long                 ' Our Array Index Iteration variable

  ' Not sure if VBA re-evaluates the loop terminator with every iteration or not
  ' When speed matters, I usually put it in its own variable just to be safe
  lngArrayUBound = UBound(bytArr)

  ' For each element in our byte array, add a character to the return value
  For intIdx = 0 To lngArrayUBound
    Hash_BytesToHex = Hash_BytesToHex & strArrHex(bytArr(intIdx))
  Next
End Function

' Computes a Hash on the supplied string
Public Function Compute(SourceString As String) As String
  Dim BytArrData() As Byte           ' Byte Array produced from our SourceString
  Dim BytArrHash() As Byte           ' Byte Array returned from our MD5 Provider

  ' Note:
  ' Because some languages (including VBA) do not support method overloading,
  ' the COM system uses "name mangling" in order to allow the proper method
  ' to be called.  This name mangling appends a number at the end of the function.
  ' You can check the MSDN documentation to see how many overloaded variations exist

  ' Convert our Source String into an array of bytes.
  BytArrData = objEncoder.GetBytes_4(SourceString)

  ' Compute the MD5 hash and store in an array of bytes
  BytArrHash = objProvider.ComputeHash_2(BytArrData)

  ' Convert our Bytes into a hexadecimal representation
  Compute = Hash_BytesToHex(BytArrHash)

  ' Free up our dynamic array memory
  Erase BytArrData
  Erase BytArrHash

End Function
Reticular answered 24/11, 2016 at 19:12 Comment(5)
I don't know if this is appropriate to ask on the Code Review website, but I'm wondering if you would be willing to share the full source code for your solution as you described it here. If not in this answer, then perhaps as an external link. I am very interested in implementing source control using git for Access VBA projects and this would be a huge help. We don't have admin rights to try RubberDuck unfortunately so I'm left trying to build my own solution.Ralaigh
I don't mind sharing. I didn't share the full code for the versioning system because it was unrelated to the question. I've thrown it up on my Google Drive. The code is fully commented so it should be self explanatory. drive.google.com/file/d/0B_uoYoBD3vqMMVpuM0tyWEFmMEk If you end up using it, just give credit where credit is due. Thx for upvotes.Reticular
That is awesome! Thank you. Works like a charm. The code is well commented and the AutoExec macro is self-explanatory. Very cool! I am curious why you put the versioning check in the AutoExec when the user opens the database, instead of when the database is saved or about to close? I'm sure there is a valid reason, I'm just so used to doing code check-ins after I've made a bunch of changes, and before exiting, instead of when I first get into the code.Ralaigh
If you can give me a way to hook in reliably to Database closures - I can do it that way. I used Autoexec because usually the last thing I do is a compact / repair, which reloads the database. The Versioning code takes the latest LastUpdated / DateCreated value from all database objects where a change was detected, and uses that timestamp when adding the new version.Reticular
Ah, got it! Makes sense. I've never had to deal with a database close event in Access. From a quick Google search, looks like you're right. There's no such thing. My version control experience is mostly outside of Access. I'll make a habit of compacting/repairing before closing when developing in Access. Thanks again!Ralaigh

© 2022 - 2024 — McMap. All rights reserved.