SQL Challenge Problem #1 – Finding Duplicate Records

Welcome back to the SQL challenge series! In this post, we’re tackling a slightly more interesting problem: “Finding Duplicate Records.” This problem is a great way to explore how SQL handles data aggregation and filtering, and we’ll see how Presto SQL and MySQL can solve it differently.

If you missed the introductory post for this SQL challenge series, check it out here: [Mastering SQL: Solving 1 Query a Day].

Let’s get started!

Problem Statement: Finding Duplicate Records

Column	Type
id	int
name	varchar
salary	int
department	varchar

The task is to identify duplicate employee records based on their name and salary. You should return the name and salary of employees that appear more than once in the table.

Example Data:

id	name	salary	department
1	John	120000	IT
2	Jane	90000	HR
3	John	120000	Finance
4	Bob	95000	IT
5	Alice	130000	HR
6	Alice	130000	IT

Expected Output:

name	salary
John	120000
Alice	130000

Approach and Solution

We will solve this problem using Presto SQL and MySQL. While the final output is the same, the approach and syntax differ slightly.

Presto SQL Solution

Presto SQL uses the HAVING clause with the GROUP BY to detect duplicates. Here’s the query:

SELECT name, salary
FROM Employees
GROUP BY name, salary
HAVING COUNT(*) > 1;

Explanation:

The GROUP BY name, salary groups the records based on name and salary.
The COUNT(*) > 1 in the HAVING clause filters out the groups that have more than one occurrence.
The SELECT statement outputs the name and salary of these groups.

MySQL Solution

In MySQL, we solve the same problem, but the syntax and approach are slightly different. While MySQL supports similar logic, let’s explicitly use a subquery for clarity.

SELECT name, salary
FROM Employees e
WHERE (name, salary) IN (
    SELECT name, salary
    FROM Employees
    GROUP BY name, salary
    HAVING COUNT(*) > 1
);

Explanation:

The subquery identifies duplicate name and salary combinations using GROUP BY and HAVING COUNT(*) > 1.
The main query uses the WHERE clause to match those duplicate records and outputs them.
The (name, salary) syntax ensures that both fields are checked together.

Key Differences Between Presto SQL and MySQL

Direct Filtering: Presto SQL allows filtering duplicate records directly in the HAVING clause.
Subqueries in MySQL: MySQL often uses a subquery to achieve similar results for clarity and performance.
Portability: While both queries are valid, MySQL might require additional structures like subqueries depending on the complexity.

Key Takeaways

Group By + Having: This is a fundamental SQL concept for identifying duplicates.
SQL Syntax Differences: Systems like Presto SQL and MySQL have minor differences in how queries are written and optimized.
Practice with Real Examples: Solving problems with different tools helps you understand SQL deeply.

What’s Next?

In the next post, we’ll solve another SQL challenge—this time focusing on aggregations or joins to uncover more insights!

If you enjoyed this post, stay tuned for more SQL challenges in this series.

Let’s solve SQL problems, grow our skills, and learn together—one query at a time!

SQL, SQL Challenge

Spread the love