Visit my Youtube Channel
Excel VBA: Compare Data [Before & After Versions]
Home
Note: This webpage is designed to accompany the Youtube video posted above. The video offers detailed explanation on all topics; while this webpage will serve as a repository for the code presented in the video.
Macro file can be downloaded from Github here.
In this Project, we will cover a useful way to compare two versions of data from the same source i.e. a before and after version; using Excel VBA. This would be applicable for conducting audits of master data sets such as Supplier Master, Customer Master, Price Data, Inventory SOH etc. It could also, be incorporated into data governance models or error reporting.
In our project example, we have two data sets representing Item Master data which lists attributes for each product code. One is a Previous version of the data while the other is the Current version. Our task is to identify what’s changed -> new products added, products deleted and any changes to attributes in existing products. The results will be listed in the New, Deleted and Change worksheets.
Product codes highlighted in Green are New, since they don’t exist in the previous data. Product codes highlighted in Red are Deleted, since they don’t exist in the current data. The fields highlighted in Yellow have changed values.
Any lookup logic involves comparison on a certain column or group of columns that are unique. In our case, this column is the product code which we will call Item Ids in the macro code.
We will look at different techniques to compare data starting with long-form way of using Nested For Loops. Then, we will perform the same task using Range.Find(). Later, we will replace Find with dictionaries, which is my preferred way. Lastly, we will modify the code to make it work on any dataset. All you’ll need to do is specify which column to lookup against.
Common Update Sheets macro
This macro will be used or called in every main sub procedure that we create below. It clears contents in the Change New and Deleted worksheets and adds in the Column headers.
Sub UpdateSheets()
wsChng.Cells.ClearContents
wsNew.Cells.ClearContents
wsDel.Cells.ClearContents
wsChng.Range("A1").Value = "Item"
wsChng.Range("B1").Value = "Changed Field"
wsChng.Range("C1").Value = "Previous Value"
wsChng.Range("D1").Value = "Current Value"
wsNew.Range("A1").Value = "Item"
wsDel.Range("A1").Value = "Item"
End Sub
Section One: Nested For Loop
This is the brute force technique where we will preform an outer loop on the Current data set, grab the product code (item id) and for each product code, loop over the Previous data on the product codes to find a match, and for each match loop across the columns to check whether a value has changed. Changes are logged in the first available row in the Change worksheet. If no match is found, then its a new product code and will be logged in the New worksheet.
Then, we will perform another loop over Previous data and check for the product code in the Current worksheet. If no match is found, then that product code has been deleted and would need to be logged in the Deleted worksheet.
Sub CompareData01()
Call UpdateSheets
Dim lrowPrev As Long, lrowCur As Long
Dim lcolCur As Long
lrowPrev = wsPrev.Range("A1").CurrentRegion.Rows.Count
lrowCur = wsCur.Range("A1").CurrentRegion.Rows.Count
lcolCur = wsCur.Range("A1").CurrentRegion.Columns.Count
Dim i As Long, j As Long, x As Long
Dim itemIdCur As Long, itemIdPrev As Long
Dim lrowNew As Long, lrowChng As Long
For i = 2 To lrowCur
itemIdCur = wsCur.Cells(i, 1).Value
For j = 2 To lrowPrev
itemIdPrev = wsPrev.Cells(j, 1).Value
If itemIdCur = itemIdPrev Then
For x = 2 To lcolCur
If wsCur.Cells(i, x).Value <> wsPrev.Cells(j, x).Value Then
lrowChng = wsChng.Range("A1").CurrentRegion.Rows.Count + 1
wsChng.Cells(lrowChng, 1).Value = itemIdCur
wsChng.Cells(lrowChng, 2).Value = wsPrev.Cells(1, x).Value
wsChng.Cells(lrowChng, 3).Value = wsPrev.Cells(j, x).Value
wsChng.Cells(lrowChng, 4).Value = wsCur.Cells(i, x).Value
End If
Next x
GoTo LineNext
End If
Next j
' No Match was found
lrowNew = wsNew.Range("A1").CurrentRegion.Rows.Count + 1
wsNew.Cells(lrowNew, 1).Value = itemIdCur
LineNext:
Next i
''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''
'Find Deleted Lines
Dim lrowDel As Long
For i = 2 To lrowPrev
itemIdPrev = wsPrev.Cells(i, 1).Value
For j = 2 To lrowCur
itemIdCur = wsCur.Cells(j, 1).Value
If itemIdCur = itemIdPrev Then
GoTo LineNext2
End If
Next j
' No Match was found
lrowDel = wsDel.Range("A1").CurrentRegion.Rows.Count + 1
wsDel.Cells(lrowDel, 1).Value = itemIdPrev
LineNext2:
Next i
End Sub
Section Two: Range.Find
This is much faster than the previous code. Here, we will only loop over the Current Data, grab the product code and search (Range.Find) each product code within the Previous Data. If nothing is found, log it in the New worksheet. For matches found, loop across the columns to check for changes and log any change in the Change worksheet.
Then, do the opposite and search each product code from the Previous data within the Current data column of product codes. If nothing is found, log it in the Deleted worksheet.
Sub CompareData02()
Call UpdateSheets
Dim lrowPrev As Long, lrowCur As Long
Dim lcolCur As Long
lrowPrev = wsPrev.Range("A1").CurrentRegion.Rows.Count
lrowCur = wsCur.Range("A1").CurrentRegion.Rows.Count
lcolCur = wsCur.Range("A1").CurrentRegion.Columns.Count
Dim i As Long, j As Long, x As Long
Dim itemIdCur As Long, itemIdPrev As Long
Dim lrowNew As Long, lrowChng As Long
''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''
'Declare Range for Find
''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''
Dim rngPrev As Range
Set rngPrev = wsPrev.Range("A1").CurrentRegion.Columns(1)
Dim rngFound As Range
Dim itemRow As Long
''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''
For i = 2 To lrowCur
itemIdCur = wsCur.Cells(i, 1).Value
'Find New Items
Set rngFound = rngPrev.Find( _
what:=itemIdCur, _
lookat:=xlWhole)
If rngFound Is Nothing Then
lrowNew = wsNew.Range("A1").CurrentRegion.Rows.Count + 1
wsNew.Cells(lrowNew, 1).Value = itemIdCur
GoTo LineNext
End If
'Assign Item Row
itemRow = rngFound.Row
Set rngFound = Nothing
For x = 2 To lcolCur
If wsCur.Cells(i, x).Value <> wsPrev.Cells(itemRow, x).Value Then
lrowChng = wsChng.Range("A1").CurrentRegion.Rows.Count + 1
wsChng.Cells(lrowChng, 1).Value = itemIdCur
wsChng.Cells(lrowChng, 2).Value = wsPrev.Cells(1, x).Value
wsChng.Cells(lrowChng, 3).Value = wsPrev.Cells(itemRow, x).Value
wsChng.Cells(lrowChng, 4).Value = wsCur.Cells(i, x).Value
End If
Next x
LineNext:
Next i
''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''
'Find Deleted Lines
Dim lrowDel As Long
Dim rngCur As Range
Set rngCur = wsCur.Range("A1").CurrentRegion
For i = 2 To lrowPrev
itemIdPrev = wsPrev.Cells(i, 1).Value
'Find New Items
Set rngFound = rngCur.Find( _
what:=itemIdPrev, _
lookat:=xlWhole)
If rngFound Is Nothing Then
lrowDel = wsDel.Range("A1").CurrentRegion.Rows.Count + 1
wsDel.Cells(lrowDel, 1).Value = itemIdPrev
End If
Set rngFound = Nothing
Next i
End Sub
Section Three: Dictionary
This would be my preferred way of conducting this operation. Dictionaries would be more efficient than Range.Find as the size of the data set increases.
Loop over Previous data and add product codes as key and row number as value into a dictionary.
Then, loop over the Current Data, grab the product code and check whether each product code exists within the dictionary of previous codes. If the current code doesn’t exists in the dictionary, log it in the New worksheet. if it does, grab the row number from the dictionary and loop across the columns to check for changes and log any change in the Change worksheet.
Then, do the opposite and search each product code from the Previous data in a dictionary of current product codes. If it doesn’t exists, log it in the Deleted worksheet.
Sub CompareData03()
Call UpdateSheets
Dim lrowPrev As Long, lrowCur As Long
Dim lcolCur As Long
lrowPrev = wsPrev.Range("A1").CurrentRegion.Rows.Count
lrowCur = wsCur.Range("A1").CurrentRegion.Rows.Count
lcolCur = wsCur.Range("A1").CurrentRegion.Columns.Count
Dim i As Long, j As Long, x As Long
Dim itemIdCur As Long, itemIdPrev As Long
Dim lrowNew As Long, lrowChng As Long
''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''
'Put Prev Item Ids into a Dictionary
''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''
Dim dicPrev As Object
Set dicPrev = CreateObject("Scripting.Dictionary")
Dim sKey As Long
Dim lValue As Long
For i = 2 To lrowPrev
sKey = wsPrev.Cells(i, 1).Value
lValue = i
dicPrev(sKey) = lValue
Next i
Dim itemRow As Long
''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''
For i = 2 To lrowCur
itemIdCur = wsCur.Cells(i, 1).Value
If Not dicPrev.exists(itemIdCur) Then
lrowNew = wsNew.Range("A1").CurrentRegion.Rows.Count + 1
wsNew.Cells(lrowNew, 1).Value = itemIdCur
GoTo LineNext
End If
'Assign Item Row
itemRow = dicPrev(itemIdCur)
For x = 2 To lcolCur
If wsCur.Cells(i, x).Value <> wsPrev.Cells(itemRow, x).Value Then
lrowChng = wsChng.Range("A1").CurrentRegion.Rows.Count + 1
wsChng.Cells(lrowChng, 1).Value = itemIdCur
wsChng.Cells(lrowChng, 2).Value = wsPrev.Cells(1, x).Value
wsChng.Cells(lrowChng, 3).Value = wsPrev.Cells(itemRow, x).Value
wsChng.Cells(lrowChng, 4).Value = wsCur.Cells(i, x).Value
End If
Next x
LineNext:
Next i
''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''
'Find Deleted Lines
Dim dicCur As Object
Set dicCur = CreateObject("Scripting.Dictionary")
For i = 2 To lrowCur
sKey = wsCur.Cells(i, 1).Value
lValue = i
dicCur(sKey) = lValue
Next i
Dim lrowDel As Long
For i = 2 To lrowPrev
itemIdPrev = wsPrev.Cells(i, 1).Value
'Find New Items
If Not dicCur.exists(itemIdPrev) Then
lrowDel = wsDel.Range("A1").CurrentRegion.Rows.Count + 1
wsDel.Cells(lrowDel, 1).Value = itemIdPrev
End If
Next i
End Sub
Section Four: Dynamic Code
Here, we will make our code dynamic i.e. it will work on any dataset. Till now we have hard-coded the column 1 (with product codes) as the column number to perform the lookup logic on. Now, we will create a constant at the top of the macro, which represents the column number. The user can change this to a new lookup column number for a different dataset with no further changes to the code.
Public Const compareID As Long = 1
Sub CompareData04()
Call UpdateSheets
Dim lrowPrev As Long, lrowCur As Long
Dim lcolCur As Long
lrowPrev = wsPrev.Range("A1").CurrentRegion.Rows.Count
lrowCur = wsCur.Range("A1").CurrentRegion.Rows.Count
lcolCur = wsCur.Range("A1").CurrentRegion.Columns.Count
Dim i As Long, j As Long, x As Long
'Change to Variant
Dim itemIdCur As Variant, itemIdPrev As Variant
Dim lrowNew As Long, lrowChng As Long
For i = 2 To lrowCur
itemIdCur = wsCur.Cells(i, compareID).Value
For j = 2 To lrowPrev
itemIdPrev = wsPrev.Cells(j, compareID).Value
If itemIdCur = itemIdPrev Then
'Change to 1
For x = 1 To lcolCur
If wsCur.Cells(i, x).Value <> wsPrev.Cells(j, x).Value Then
lrowChng = wsChng.Range("A1").CurrentRegion.Rows.Count + 1
wsChng.Cells(lrowChng, 1).Value = itemIdCur
wsChng.Cells(lrowChng, 2).Value = wsPrev.Cells(1, x).Value
wsChng.Cells(lrowChng, 3).Value = wsPrev.Cells(j, x).Value
wsChng.Cells(lrowChng, 4).Value = wsCur.Cells(i, x).Value
End If
Next x
GoTo LineNext
End If
Next j
' No Match was found
lrowNew = wsNew.Range("A1").CurrentRegion.Rows.Count + 1
wsNew.Cells(lrowNew, 1).Value = itemIdCur
LineNext:
Next i
''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''
'Find Deleted Lines
Dim lrowDel As Long
For i = 2 To lrowPrev
itemIdPrev = wsPrev.Cells(i, compareID).Value
For j = 2 To lrowCur
itemIdCur = wsCur.Cells(j, compareID).Value
If itemIdCur = itemIdPrev Then
GoTo LineNext2
End If
Next j
' No Match was found
lrowDel = wsDel.Range("A1").CurrentRegion.Rows.Count + 1
wsDel.Cells(lrowDel, 1).Value = itemIdPrev
LineNext2:
Next i
End Sub