๐Ÿฅš
์‚ถ์€AIAI ์‹ค์ „ ๊ฐ€์ด๋“œ 120ํŽธ
๋ชฉ๋ก์œผ๋กœ
๐Ÿ’ผ AI & Business - Excel

Creating Excel Charts with ChatGPT - Complete AI Data Visualization Guide

Can't tell from numbers alone? Learn how AI turns your data into stunning charts visible at a glance.

Cleaning Duplicate Data

Duplicates in data cleaning are really frustrating. The same customer registered multiple times, similar company names entered differently... It's too much to find and delete them one by one manually.

Moreover, it's not just completely identical duplicates. There are "similar duplicates" with different spaces, capitalization, or spacing. Excel's basic duplicate removal feature can't find these.

With AI, you can easily find and organize even these complex duplicates. From exact duplicates to similar and logical duplicates, you can detect them all. You can even apply similarity analysis with criteria like "if 80% or more similar, treat as duplicate."

In this post, we'll learn various ways to find and clean duplicate data with AI. From simple duplicate removal to advanced fuzzy matching, we'll cover techniques you can apply right away at work.

1. Understanding Duplicate Types

AI Prompt

"Tell me how to find duplicates for these situations:
1. Completely identical rows
2. Only specific columns duplicated
3. Similar but slightly different data (e.g., spaces, capitalization)
4. Logical duplicates (different emails for same person)"

AI Response

1. Exact duplicates: All column values identical
   โ†’ Use Remove Duplicates feature

2. Partial duplicates: Check only key values
   โ†’ Count with COUNTIFS for specific columns

3. Similar duplicates: Differences in spaces/capitalization
   โ†’ Normalize with TRIM, UPPER functions then compare

4. Logical duplicates: Based on business rules
   โ†’ Identify with compound conditions

2. Basic Duplicate Removal

Removing Duplicates with Excel's Basic Feature

AI Prompt:

"Create VBA code to remove duplicates based on email (Column C)
from the A1:E100 range.
Keep the first item and delete the rest."

AI Generated VBA:

Sub RemoveEmailDuplicates()
    Dim ws As Worksheet
    Dim lastRow As Long
    Dim deleteCount As Long

    Set ws = ThisWorkbook.Sheets("CustomerData")
    lastRow = ws.Cells(ws.Rows.Count, "A").End(xlUp).Row

    ' Confirm before action
    If MsgBox("Delete rows with duplicate emails?" & vbCrLf & _
              "(First item will be kept)", _
              vbYesNo + vbQuestion, "Confirm Duplicate Removal") = vbNo Then
        Exit Sub
    End If

    ' Stop screen updates (speed improvement)
    Application.ScreenUpdating = False
    Application.Calculation = xlCalculationManual

    ' Execute duplicate removal
    ws.Range("A1:E" & lastRow).RemoveDuplicates _
        Columns:=3, _  ' Column C (email) criteria
        Header:=xlYes

    ' Calculate deleted rows
    deleteCount = lastRow - ws.Cells(ws.Rows.Count, "A").End(xlUp).Row

    ' Resume screen updates
    Application.ScreenUpdating = True
    Application.Calculation = xlCalculationAutomatic

    MsgBox deleteCount & " duplicate items removed.", vbInformation
End Sub

3. Highlighting Duplicates (Without Deletion)

Highlighting Duplicates with Conditional Formatting

AI Prompt:

"Highlight duplicate values in red without deleting them.
Find values appearing 2 or more times based on phone number (Column D)."

AI Generated VBA:

Sub HighlightDuplicates_ConditionalFormat()
    Dim ws As Worksheet
    Dim lastRow As Long
    Dim rng As Range

    Set ws = ThisWorkbook.Sheets("CustomerData")
    lastRow = ws.Cells(ws.Rows.Count, "D").End(xlUp).Row
    Set rng = ws.Range("D2:D" & lastRow)

    ' Delete existing conditional formatting
    rng.FormatConditions.Delete

    ' Add duplicate value highlight condition
    With rng.FormatConditions.Add( _
        Type:=xlExpression, _
        Formula1:="=COUNTIF($D$2:$D$" & lastRow & ",$D2)>1")

        .Interior.Color = RGB(255, 199, 206)  ' Light red
        .Font.Color = RGB(156, 0, 6)          ' Dark red
        .Font.Bold = True
    End With

    MsgBox "Duplicate phone numbers highlighted.", vbInformation
End Sub

Displaying Duplicate Count

AI Prompt:

"Show how many times each email appears next to each row."

AI Generated Formula:

// Display duplicate count in Column F
=COUNTIF($C$2:$C$1000, C2)

// Display only duplicate status
=IF(COUNTIF($C$2:$C$1000, C2)>1, "Duplicate", "")

// Show which number duplicate
=IF(COUNTIF($C$2:$C$1000, C2)>1,
   COUNTIF($C$2:C2, C2) & "th",
   "Unique")

4. Advanced Duplicate Handling

Checking Duplicates by Multiple Column Combination

AI Prompt:

"Only when name (Column A) and birth date (Column B) are both the same,
identify and mark as duplicate."

AI Generated Formula:

// Helper column (Column E) creation: Combine name+birthdate
=A2&"|"&TEXT(B2,"YYYY-MM-DD")

// Display duplicate status in Column F
=IF(COUNTIF($E$2:$E$1000, E2)>1, "Duplicate", "")

AI Generated VBA (without helper column):

Sub CheckCompoundKeyDuplicates()
    Dim ws As Worksheet
    Dim lastRow As Long
    Dim i As Long, j As Long
    Dim dict As Object
    Dim key As String
    Dim dupCount As Long

    Set ws = ThisWorkbook.Sheets("EmployeeData")
    Set dict = CreateObject("Scripting.Dictionary")
    lastRow = ws.Cells(ws.Rows.Count, "A").End(xlUp).Row

    Application.ScreenUpdating = False

    ' First pass: Find duplicates
    For i = 2 To lastRow
        key = ws.Cells(i, 1).Value & "|" & _
              Format(ws.Cells(i, 2).Value, "YYYY-MM-DD")

        If dict.Exists(key) Then
            dict(key) = dict(key) + 1
        Else
            dict.Add key, 1
        End If
    Next i

    ' Second pass: Mark duplicates
    For i = 2 To lastRow
        key = ws.Cells(i, 1).Value & "|" & _
              Format(ws.Cells(i, 2).Value, "YYYY-MM-DD")

        If dict(key) > 1 Then
            ws.Cells(i, 6).Value = "Duplicate (" & dict(key) & " cases)"
            ws.Rows(i).Interior.Color = RGB(255, 235, 156)
            dupCount = dupCount + 1
        Else
            ws.Cells(i, 6).Value = ""
            ws.Rows(i).Interior.ColorIndex = xlNone
        End If
    Next i

    Application.ScreenUpdating = True

    MsgBox dupCount & " duplicate items found.", vbInformation
End Sub

5. Finding Similar Duplicates

Ignoring Spaces/Capitalization

AI Prompt:

"Find duplicates in company names that only differ in spaces and capitalization.
Example: 'Samsung', 'SAMSUNG', ' Samsung ' should all be treated as same."

AI Generated VBA:

Sub FindSimilarDuplicates()
    Dim ws As Worksheet
    Dim lastRow As Long
    Dim i As Long
    Dim normalizedDict As Object
    Dim originalValue As String
    Dim normalizedValue As String

    Set ws = ThisWorkbook.Sheets("CompanyList")
    Set normalizedDict = CreateObject("Scripting.Dictionary")
    normalizedDict.CompareMode = vbTextCompare ' Ignore case
    lastRow = ws.Cells(ws.Rows.Count, "A").End(xlUp).Row

    ' Add header to Column E
    ws.Cells(1, 5).Value = "Normalized Value"
    ws.Cells(1, 6).Value = "Duplicate Status"

    For i = 2 To lastRow
        originalValue = ws.Cells(i, 1).Value

        ' Normalize: Remove spaces + Convert to uppercase
        normalizedValue = UCase(Trim(Replace(originalValue, " ", "")))
        ws.Cells(i, 5).Value = normalizedValue

        ' Check duplicates
        If normalizedDict.Exists(normalizedValue) Then
            ws.Cells(i, 6).Value = "Duplicate (Original: " & _
                normalizedDict(normalizedValue) & ")"
            ws.Rows(i).Interior.Color = RGB(255, 199, 206)
        Else
            normalizedDict.Add normalizedValue, originalValue
            ws.Cells(i, 6).Value = "First"
        End If
    Next i

    MsgBox "Similar duplicate check complete.", vbInformation
End Sub

Conclusion

Cleaning duplicate data with AI:

  • โœ… Detect various types of duplicates
  • โœ… Similarity-based fuzzy matching
  • โœ… Merging and consolidating duplicates
  • โœ… Data quality reports

In the next post, we'll learn how to automatically write IF statements and VLOOKUP.