Welcome back to the SQL challenge series! In this post, we’re tackling a slightly more interesting problem: “Finding Duplicate Records.” This problem is a great way to explore how SQL handles data aggregation and filtering, and we’ll see how Presto SQL and MySQL can solve it differently.
If you missed the introductory post for this SQL challenge series, check it out here: [Mastering SQL: Solving 1 Query a Day].
Let’s get started!
Problem Statement: Finding Duplicate Records
Column | Type |
---|---|
id | int |
name | varchar |
salary | int |
department | varchar |
The task is to identify duplicate employee records based on their name
and salary
. You should return the name and salary of employees that appear more than once in the table.
Example Data:
id | name | salary | department |
1 | John | 120000 | IT |
2 | Jane | 90000 | HR |
3 | John | 120000 | Finance |
4 | Bob | 95000 | IT |
5 | Alice | 130000 | HR |
6 | Alice | 130000 | IT |
Expected Output:
name | salary |
John | 120000 |
Alice | 130000 |
Approach and Solution
We will solve this problem using Presto SQL and MySQL. While the final output is the same, the approach and syntax differ slightly.
Presto SQL Solution
Presto SQL uses the HAVING
clause with the GROUP BY
to detect duplicates. Here’s the query:
SELECT name, salary
FROM Employees
GROUP BY name, salary
HAVING COUNT(*) > 1;
Explanation:
- The
GROUP BY name, salary
groups the records based onname
andsalary
. - The
COUNT(*) > 1
in theHAVING
clause filters out the groups that have more than one occurrence. - The
SELECT
statement outputs thename
andsalary
of these groups.
MySQL Solution
In MySQL, we solve the same problem, but the syntax and approach are slightly different. While MySQL supports similar logic, let’s explicitly use a subquery for clarity.
SELECT name, salary
FROM Employees e
WHERE (name, salary) IN (
SELECT name, salary
FROM Employees
GROUP BY name, salary
HAVING COUNT(*) > 1
);
Explanation:
- The subquery identifies duplicate
name
andsalary
combinations usingGROUP BY
andHAVING COUNT(*) > 1
. - The main query uses the
WHERE
clause to match those duplicate records and outputs them. - The
(name, salary)
syntax ensures that both fields are checked together.
Key Differences Between Presto SQL and MySQL
- Direct Filtering: Presto SQL allows filtering duplicate records directly in the
HAVING
clause. - Subqueries in MySQL: MySQL often uses a subquery to achieve similar results for clarity and performance.
- Portability: While both queries are valid, MySQL might require additional structures like subqueries depending on the complexity.
Key Takeaways
- Group By + Having: This is a fundamental SQL concept for identifying duplicates.
- SQL Syntax Differences: Systems like Presto SQL and MySQL have minor differences in how queries are written and optimized.
- Practice with Real Examples: Solving problems with different tools helps you understand SQL deeply.
What’s Next?
In the next post, we’ll solve another SQL challenge—this time focusing on aggregations or joins to uncover more insights!
If you enjoyed this post, stay tuned for more SQL challenges in this series.
Let’s solve SQL problems, grow our skills, and learn together—one query at a time!