How to Delete Duplicate Rows in MySQL
Delete duplicate rows safely while keeping the correct record instead of removing data blindly.
Before you start
Safe workflow before deletion
1. Identify the duplicate rule clearly
Do not start with the delete query. First decide which columns should be unique. For example, maybe the duplicate is based on email, or on a combination such as user_id + product_id.
If you choose the wrong columns, you can delete valid rows that only look similar.
2. Preview duplicate groups first
Use a duplicate check query first so you can see which values are repeated.
SELECT email, COUNT(*) AS cnt
FROM users
GROUP BY email
HAVING COUNT(*) > 1;
You need to confirm the duplicate pattern before you decide which rows to keep and which rows to remove.
3. Back up the table before cleanup
If the table is important, make a backup copy before deleting anything.
CREATE TABLE users_backup_20260402 AS
SELECT *
FROM users;
A quick backup table can save you from painful recovery work if the delete logic is wrong.
Delete duplicates and keep the oldest row
This pattern keeps the row with the smaller ID and deletes the later duplicate rows.
DELETE t1
FROM users t1
JOIN users t2
ON t1.email = t2.email
AND t1.id > t2.id;
For each duplicate email, the row with the bigger id is matched and removed, so the oldest row stays.
Delete duplicates and keep the newest row
If your business rule says the latest row is the one to keep, reverse the comparison.
DELETE t1
FROM users t1
JOIN users t2
ON t1.email = t2.email
AND t1.id < t2.id;
The larger id represents the latest valid row and the older rows should be removed.
Preview the exact rows before deleting
Before running DELETE, use the same logic in a SELECT query so you can inspect the rows that would be removed.
SELECT t1.*
FROM users t1
JOIN users t2
ON t1.email = t2.email
AND t1.id > t2.id;
If the preview result looks wrong, do not run the delete query yet. Fix the duplicate condition first.
Delete duplicates based on multiple columns
If duplicates are defined by more than one column, include all relevant columns in the join condition.
DELETE t1
FROM orders t1
JOIN orders t2
ON t1.user_id = t2.user_id
AND t1.product_id = t2.product_id
AND t1.id > t2.id;
This is useful when a duplicate is not defined by a single field alone.
Verify that duplicates are gone
After deletion, run the duplicate check query again to confirm cleanup succeeded.
SELECT email, COUNT(*) AS cnt
FROM users
GROUP BY email
HAVING COUNT(*) > 1;
If cleanup worked correctly, the query should return no duplicate groups for that condition.
Common mistakes
Always convert the delete logic into a SELECT first and inspect the rows to be removed.
If your join condition is incomplete, you may delete rows that are not real duplicates.
Make sure you know whether you want to keep the oldest row, the newest row, or a row selected by another business rule.
For production or business-critical tables, a backup before cleanup is the safer move.
About this guide
This guide shows safe and practical ways to delete duplicate rows in MySQL after you have already identified them. It focuses on real cleanup work such as keeping the oldest row, keeping the newest row, previewing what will be deleted first, and avoiding destructive mistakes.
How to follow this guide
- Confirm which columns define a duplicate row.
- Preview the duplicate rows before deleting anything.
- Create a backup table if the data matters.
- Use a delete query that keeps the correct row and removes the extra ones.
- Run a verification query after cleanup to confirm duplicates are gone.
Why use this method?
Deleting duplicate rows is not just about removing repeated data. The important part is keeping the correct row and removing only the extras. A safe workflow prevents accidental loss and makes cleanup repeatable in real projects.
Frequently Asked Questions
What is the safest way to delete duplicates in MySQL?
Preview the duplicates first, back up the table if needed, and then use a delete query that clearly keeps one row and removes the rest.
How do I decide which duplicate row to keep?
Most cleanup tasks keep either the smallest ID as the oldest row or the largest ID as the newest row, depending on your data rules.
Should I delete duplicates directly from production data?
Only after previewing the affected rows and taking a backup or confirming the logic on a copy first.
Can I delete duplicates based on multiple columns?
Yes. The duplicate condition can use one column or a combination of columns that should be unique together.