SQL Challenge Problem #2 – Calculating Average Revenue Per User (ARPU)

Welcome back to another SQL challenge! Today, we will solve a problem using both Presto SQL and MySQL to calculate Average Revenue Per User (ARPU) for each product. This challenge will highlight differences between the two SQL dialects and their respective approaches.

The Problem

We have two tables:

  1. users: Contains user-level data:
    • user_id: Unique identifier for each user
    • region: Region of the user
  2. transactions: Contains transaction-level data:
    • transaction_id: Unique identifier for each transaction
    • user_id: Identifier of the user who made the transaction
    • product_id: Product associated with the transaction
    • amount: Revenue generated from the transaction

Goal: Calculate the Average Revenue Per User (ARPU) for each product

Presto SQL Solution

WITH product_metrics AS (
    SELECT
        product_id,
        SUM(amount) AS total_revenue,
        CARDINALITY(ARRAY_AGG(DISTINCT user_id)) AS unique_users
    FROM transactions
)
SELECT
    product_id,
    total_revenue,
    unique_users,
    (total_revenue / unique_users) AS average_revenue_per_user
FROM product_metrics
Explanation:
  1. ARRAY_AGG(DISTINCT user_id)creates an array of unique user IDs for each product
  2. CARDINALITY calculates the number of unique users from the array
  3. The query calculates total revenue and ARPU directly in a single CTE

MySQL Solution

SELECT
    product_id,
    SUM(amount) AS total_revenue,
    COUNT(DISTINCT user_id) AS unique_users,
    (SUM(amount) / COUNT(DISTINCT user_id)) AS average_revenue_per_user
FROM transactions
GROUP BY product_id;
Explanation:
  1. COUNT(DISTINCT user_id)directly computes the unique user count per product.
  2. total revenue and ARPU are calculated in a single query without additional CTEs.

Key Differences

  1. Handling Unique Users:
    • Presto SQL uses ARRAY_AGG and CARDINALITY for unique user counts.
    • MySQL uses COUNT(DISTINCT) directly for unique user counts.
  2. Query Structure:
    • Presto SQL leverages advanced array functions for flexibility in aggregation.
    • MySQL simplifies the aggregation without requiring additional constructs.
  3. Performance:
    • Presto SQL is designed for distributed querying and handles large datasets more efficiently.
    • MySQL, while simpler, may struggle with performance on massive datasets.

Key Takeaways

  • Presto SQL excels in handling complex operations using functions like ARRAY_AGG and CARDINALITY.
  • MySQL provides straightforward solutions for simpler aggregations using COUNT(DISTINCT).
  • Choose the right tool based on your data size and complexity requirements.

What’s Next

In next post, we will be solving questions based on best approach instead of Presto or MySQL comparision, as both seems to have some advantages and disadvantages. Therefore, it is better to focus on 1 instead of comparing apple and oranges down the road.

Stay tuned many more challenges ahead!

Spread the love

Leave a Comment

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.