Welcome back to another SQL challenge! Today, we will solve a problem using both Presto SQL and MySQL to calculate Average Revenue Per User (ARPU) for each product. This challenge will highlight differences between the two SQL dialects and their respective approaches.
The Problem
We have two tables:
- users: Contains user-level data:
- user_id: Unique identifier for each user
- region: Region of the user
- transactions: Contains transaction-level data:
- transaction_id: Unique identifier for each transaction
- user_id: Identifier of the user who made the transaction
- product_id: Product associated with the transaction
- amount: Revenue generated from the transaction
Goal: Calculate the Average Revenue Per User (ARPU) for each product
Presto SQL Solution
WITH product_metrics AS (
SELECT
product_id,
SUM(amount) AS total_revenue,
CARDINALITY(ARRAY_AGG(DISTINCT user_id)) AS unique_users
FROM transactions
)
SELECT
product_id,
total_revenue,
unique_users,
(total_revenue / unique_users) AS average_revenue_per_user
FROM product_metrics
Explanation:
ARRAY_AGG(DISTINCT user_id)
creates an array of unique user IDs for each productCARDINALITY
calculates the number of unique users from the array- The query calculates total revenue and ARPU directly in a single CTE
MySQL Solution
SELECT
product_id,
SUM(amount) AS total_revenue,
COUNT(DISTINCT user_id) AS unique_users,
(SUM(amount) / COUNT(DISTINCT user_id)) AS average_revenue_per_user
FROM transactions
GROUP BY product_id;
Explanation:
COUNT(DISTINCT user_id)
directly computes the unique user count per product.- total revenue and ARPU are calculated in a single query without additional CTEs.
Key Differences
- Handling Unique Users:
- Presto SQL uses
ARRAY_AGG
andCARDINALITY
for unique user counts. - MySQL uses
COUNT(DISTINCT)
directly for unique user counts.
- Presto SQL uses
- Query Structure:
- Presto SQL leverages advanced array functions for flexibility in aggregation.
- MySQL simplifies the aggregation without requiring additional constructs.
- Performance:
- Presto SQL is designed for distributed querying and handles large datasets more efficiently.
- MySQL, while simpler, may struggle with performance on massive datasets.
Key Takeaways
- Presto SQL excels in handling complex operations using functions like
ARRAY_AGG
andCARDINALITY
. - MySQL provides straightforward solutions for simpler aggregations using
COUNT(DISTINCT)
. - Choose the right tool based on your data size and complexity requirements.
What’s Next
In next post, we will be solving questions based on best approach instead of Presto or MySQL comparision, as both seems to have some advantages and disadvantages. Therefore, it is better to focus on 1 instead of comparing apple and oranges down the road.
Stay tuned many more challenges ahead!