Creating Excel Charts with ChatGPT - Complete AI Data Visualization Guide
Can't tell from numbers alone? Learn how AI turns your data into stunning charts visible at a glance.
Cleaning Duplicate Data
Duplicates in data cleaning are really frustrating. The same customer registered multiple times, similar company names entered differently... It's too much to find and delete them one by one manually.
Moreover, it's not just completely identical duplicates. There are "similar duplicates" with different spaces, capitalization, or spacing. Excel's basic duplicate removal feature can't find these.
With AI, you can easily find and organize even these complex duplicates. From exact duplicates to similar and logical duplicates, you can detect them all. You can even apply similarity analysis with criteria like "if 80% or more similar, treat as duplicate."
In this post, we'll learn various ways to find and clean duplicate data with AI. From simple duplicate removal to advanced fuzzy matching, we'll cover techniques you can apply right away at work.
1. Understanding Duplicate Types
AI Prompt
"Tell me how to find duplicates for these situations:
1. Completely identical rows
2. Only specific columns duplicated
3. Similar but slightly different data (e.g., spaces, capitalization)
4. Logical duplicates (different emails for same person)"
AI Response
1. Exact duplicates: All column values identical
โ Use Remove Duplicates feature
2. Partial duplicates: Check only key values
โ Count with COUNTIFS for specific columns
3. Similar duplicates: Differences in spaces/capitalization
โ Normalize with TRIM, UPPER functions then compare
4. Logical duplicates: Based on business rules
โ Identify with compound conditions
2. Basic Duplicate Removal
Removing Duplicates with Excel's Basic Feature
AI Prompt:
"Create VBA code to remove duplicates based on email (Column C)
from the A1:E100 range.
Keep the first item and delete the rest."
AI Generated VBA:
Sub RemoveEmailDuplicates()
Dim ws As Worksheet
Dim lastRow As Long
Dim deleteCount As Long
Set ws = ThisWorkbook.Sheets("CustomerData")
lastRow = ws.Cells(ws.Rows.Count, "A").End(xlUp).Row
' Confirm before action
If MsgBox("Delete rows with duplicate emails?" & vbCrLf & _
"(First item will be kept)", _
vbYesNo + vbQuestion, "Confirm Duplicate Removal") = vbNo Then
Exit Sub
End If
' Stop screen updates (speed improvement)
Application.ScreenUpdating = False
Application.Calculation = xlCalculationManual
' Execute duplicate removal
ws.Range("A1:E" & lastRow).RemoveDuplicates _
Columns:=3, _ ' Column C (email) criteria
Header:=xlYes
' Calculate deleted rows
deleteCount = lastRow - ws.Cells(ws.Rows.Count, "A").End(xlUp).Row
' Resume screen updates
Application.ScreenUpdating = True
Application.Calculation = xlCalculationAutomatic
MsgBox deleteCount & " duplicate items removed.", vbInformation
End Sub
3. Highlighting Duplicates (Without Deletion)
Highlighting Duplicates with Conditional Formatting
AI Prompt:
"Highlight duplicate values in red without deleting them.
Find values appearing 2 or more times based on phone number (Column D)."
AI Generated VBA:
Sub HighlightDuplicates_ConditionalFormat()
Dim ws As Worksheet
Dim lastRow As Long
Dim rng As Range
Set ws = ThisWorkbook.Sheets("CustomerData")
lastRow = ws.Cells(ws.Rows.Count, "D").End(xlUp).Row
Set rng = ws.Range("D2:D" & lastRow)
' Delete existing conditional formatting
rng.FormatConditions.Delete
' Add duplicate value highlight condition
With rng.FormatConditions.Add( _
Type:=xlExpression, _
Formula1:="=COUNTIF($D$2:$D$" & lastRow & ",$D2)>1")
.Interior.Color = RGB(255, 199, 206) ' Light red
.Font.Color = RGB(156, 0, 6) ' Dark red
.Font.Bold = True
End With
MsgBox "Duplicate phone numbers highlighted.", vbInformation
End Sub
Displaying Duplicate Count
AI Prompt:
"Show how many times each email appears next to each row."
AI Generated Formula:
// Display duplicate count in Column F
=COUNTIF($C$2:$C$1000, C2)
// Display only duplicate status
=IF(COUNTIF($C$2:$C$1000, C2)>1, "Duplicate", "")
// Show which number duplicate
=IF(COUNTIF($C$2:$C$1000, C2)>1,
COUNTIF($C$2:C2, C2) & "th",
"Unique")
4. Advanced Duplicate Handling
Checking Duplicates by Multiple Column Combination
AI Prompt:
"Only when name (Column A) and birth date (Column B) are both the same,
identify and mark as duplicate."
AI Generated Formula:
// Helper column (Column E) creation: Combine name+birthdate
=A2&"|"&TEXT(B2,"YYYY-MM-DD")
// Display duplicate status in Column F
=IF(COUNTIF($E$2:$E$1000, E2)>1, "Duplicate", "")
AI Generated VBA (without helper column):
Sub CheckCompoundKeyDuplicates()
Dim ws As Worksheet
Dim lastRow As Long
Dim i As Long, j As Long
Dim dict As Object
Dim key As String
Dim dupCount As Long
Set ws = ThisWorkbook.Sheets("EmployeeData")
Set dict = CreateObject("Scripting.Dictionary")
lastRow = ws.Cells(ws.Rows.Count, "A").End(xlUp).Row
Application.ScreenUpdating = False
' First pass: Find duplicates
For i = 2 To lastRow
key = ws.Cells(i, 1).Value & "|" & _
Format(ws.Cells(i, 2).Value, "YYYY-MM-DD")
If dict.Exists(key) Then
dict(key) = dict(key) + 1
Else
dict.Add key, 1
End If
Next i
' Second pass: Mark duplicates
For i = 2 To lastRow
key = ws.Cells(i, 1).Value & "|" & _
Format(ws.Cells(i, 2).Value, "YYYY-MM-DD")
If dict(key) > 1 Then
ws.Cells(i, 6).Value = "Duplicate (" & dict(key) & " cases)"
ws.Rows(i).Interior.Color = RGB(255, 235, 156)
dupCount = dupCount + 1
Else
ws.Cells(i, 6).Value = ""
ws.Rows(i).Interior.ColorIndex = xlNone
End If
Next i
Application.ScreenUpdating = True
MsgBox dupCount & " duplicate items found.", vbInformation
End Sub
5. Finding Similar Duplicates
Ignoring Spaces/Capitalization
AI Prompt:
"Find duplicates in company names that only differ in spaces and capitalization.
Example: 'Samsung', 'SAMSUNG', ' Samsung ' should all be treated as same."
AI Generated VBA:
Sub FindSimilarDuplicates()
Dim ws As Worksheet
Dim lastRow As Long
Dim i As Long
Dim normalizedDict As Object
Dim originalValue As String
Dim normalizedValue As String
Set ws = ThisWorkbook.Sheets("CompanyList")
Set normalizedDict = CreateObject("Scripting.Dictionary")
normalizedDict.CompareMode = vbTextCompare ' Ignore case
lastRow = ws.Cells(ws.Rows.Count, "A").End(xlUp).Row
' Add header to Column E
ws.Cells(1, 5).Value = "Normalized Value"
ws.Cells(1, 6).Value = "Duplicate Status"
For i = 2 To lastRow
originalValue = ws.Cells(i, 1).Value
' Normalize: Remove spaces + Convert to uppercase
normalizedValue = UCase(Trim(Replace(originalValue, " ", "")))
ws.Cells(i, 5).Value = normalizedValue
' Check duplicates
If normalizedDict.Exists(normalizedValue) Then
ws.Cells(i, 6).Value = "Duplicate (Original: " & _
normalizedDict(normalizedValue) & ")"
ws.Rows(i).Interior.Color = RGB(255, 199, 206)
Else
normalizedDict.Add normalizedValue, originalValue
ws.Cells(i, 6).Value = "First"
End If
Next i
MsgBox "Similar duplicate check complete.", vbInformation
End Sub
Conclusion
Cleaning duplicate data with AI:
- โ Detect various types of duplicates
- โ Similarity-based fuzzy matching
- โ Merging and consolidating duplicates
- โ Data quality reports
In the next post, we'll learn how to automatically write IF statements and VLOOKUP.
๐๊ฐ์ ์๋ฆฌ์ฆ
Getting Excel Function Explanations from ChatGPT - Learning Excel Basics with AI
Don't know what VLOOKUP is? AI explains complex Excel functions in simple terms and shows you practical examples.
Auto-Generating Excel Formulas with GPT - First Steps to AI Excel Automation
No more struggling with complex formulas! Just describe what you want to ChatGPT and get ready-to-use formulas instantly.
Organizing Excel Data with ChatGPT - AI Data Cleaning Know-How
Stressed about messy data! Learn how AI can quickly organize everything from removing duplicates to standardizing formats.