SQL Challenge Problem #1 – Finding Duplicate Records

Welcome back to the SQL challenge series! In this post, we’re tackling a slightly more interesting problem: “Finding Duplicate Records.” This problem is a great way to explore how SQL handles data aggregation and filtering, and we’ll see how Presto SQL and MySQL can solve it differently.

If you missed the introductory post for this SQL challenge series, check it out here: [Mastering SQL: Solving 1 Query a Day].

Let’s get started!

Problem Statement: Finding Duplicate Records

ColumnType
idint
namevarchar
salaryint
departmentvarchar

The task is to identify duplicate employee records based on their name and salary. You should return the name and salary of employees that appear more than once in the table.

Example Data:

idnamesalarydepartment
1John120000IT
2Jane90000HR
3John120000Finance
4Bob95000IT
5Alice130000HR
6Alice130000IT

Expected Output:

namesalary
John120000
Alice130000

Approach and Solution

We will solve this problem using Presto SQL and MySQL. While the final output is the same, the approach and syntax differ slightly.

Presto SQL Solution

Presto SQL uses the HAVING clause with the GROUP BY to detect duplicates. Here’s the query:

SELECT name, salary
FROM Employees
GROUP BY name, salary
HAVING COUNT(*) > 1;

Explanation:

  1. The GROUP BY name, salary groups the records based on name and salary.
  2. The COUNT(*) > 1 in the HAVING clause filters out the groups that have more than one occurrence.
  3. The SELECT statement outputs the name and salary of these groups.
MySQL Solution

In MySQL, we solve the same problem, but the syntax and approach are slightly different. While MySQL supports similar logic, let’s explicitly use a subquery for clarity.

SELECT name, salary
FROM Employees e
WHERE (name, salary) IN (
    SELECT name, salary
    FROM Employees
    GROUP BY name, salary
    HAVING COUNT(*) > 1
);

Explanation:

  1. The subquery identifies duplicate name and salary combinations using GROUP BY and HAVING COUNT(*) > 1.
  2. The main query uses the WHERE clause to match those duplicate records and outputs them.
  3. The (name, salary) syntax ensures that both fields are checked together.

Key Differences Between Presto SQL and MySQL

  1. Direct Filtering: Presto SQL allows filtering duplicate records directly in the HAVING clause.
  2. Subqueries in MySQL: MySQL often uses a subquery to achieve similar results for clarity and performance.
  3. Portability: While both queries are valid, MySQL might require additional structures like subqueries depending on the complexity.

Key Takeaways

  1. Group By + Having: This is a fundamental SQL concept for identifying duplicates.
  2. SQL Syntax Differences: Systems like Presto SQL and MySQL have minor differences in how queries are written and optimized.
  3. Practice with Real Examples: Solving problems with different tools helps you understand SQL deeply.

What’s Next?

In the next post, we’ll solve another SQL challenge—this time focusing on aggregations or joins to uncover more insights!

If you enjoyed this post, stay tuned for more SQL challenges in this series.

Let’s solve SQL problems, grow our skills, and learn together—one query at a time!

,

Spread the love

Leave a Comment

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.