DataPartner365

Jouw partner voor datagedreven groei en inzichten

SQL Window Functions: Complete Gids voor Geavanceerde Analytics

Laatst bijgewerkt: 20 december 2024
Leestijd: 35 minuten
SQL, Window Functions, Analytics, ROW_NUMBER, RANK, LEAD, LAG, SUM OVER, PARTITION BY

Leer SQL Window Functions voor krachtige data analyse zonder self-joins. Praktische gids met voorbeelden voor ranking, running totals, time-series analyse en complexe aggregaties.

Zoek je SQL Analytics Experts?

Vind ervaren SQL Developers gespecialiseerd in geavanceerde analytics en window functions

1. Inleiding tot Window Functions

Wat zijn Window Functions?

SQL Window Functions (ook wel Analytic Functions genoemd) zijn krachtige SQL functies die berekeningen uitvoeren over een set rijen die gerelateerd zijn aan de huidige rij, zonder de resultaten te groeperen tot een enkele output rij.

In tegenstelling tot aggregatie met GROUP BY, behouden window functions de individuele rijen terwijl ze berekeningen uitvoeren over gerelateerde rijen.

Ranking Functions

ROW_NUMBER, RANK, DENSE_RANK, NTILE - voor rangschikking en percentielen

Analytic Functions

LEAD, LAG, FIRST_VALUE, LAST_VALUE - voor time-series en sequential analyse

Aggregate Functions

SUM, AVG, COUNT, MIN, MAX OVER - voor running totals en moving averages

Statistical Functions

PERCENT_RANK, CUME_DIST, STDDEV, VARIANCE - voor statistische analyse

Feature Window Functions GROUP BY Aggregatie Self-Joins
Output Rijen Behoudt alle input rijen Groepen tot één rij per groep Behoudt alle rijen, maar complex
Performance Optimaal, één tabel scan Goed voor aggregaties Slecht, multiple scans/joins
Complexiteit Medium (nieuwe syntax) Laag (bekende syntax) Hoog (complexe joins)
Use Case Running totals, ranking, gaps Totaal per groep Historische vergelijkingen
Readability Hoog (declaratief) Hoog (simpel) Laag (proceduraal)

2. Basic Syntax en Concepts

Window Function Syntax

De basis syntax van window functions bestaat uit drie hoofdcomponenten: de window function zelf, PARTITION BY clause, en ORDER BY clause.

Basic Window Function Structuur

-- BASIS SYNTAX
SELECT
    column1,
    column2,
    window_function(column3) OVER (
        [PARTITION BY partition_expression]
        [ORDER BY order_expression [ASC | DESC]]
        [window_frame_clause]
    ) AS alias_name
FROM table_name
[WHERE conditions];

-- EENVOUDIG VOORBEELD: Running total zonder PARTITION BY
SELECT
    order_id,
    order_date,
    total_amount,
    SUM(total_amount) OVER (
        ORDER BY order_date
    ) AS running_total
FROM orders
ORDER BY order_date;

-- MET PARTITION BY: Running total per klant
SELECT
    customer_id,
    order_date,
    total_amount,
    SUM(total_amount) OVER (
        PARTITION BY customer_id
        ORDER BY order_date
    ) AS customer_running_total
FROM orders
ORDER BY customer_id, order_date;

-- MET WINDOW FRAME: Moving average laatste 3 orders
SELECT
    customer_id,
    order_date,
    total_amount,
    AVG(total_amount) OVER (
        PARTITION BY customer_id
        ORDER BY order_date
        ROWS BETWEEN 2 PRECEDING AND CURRENT ROW
    ) AS moving_avg_3_orders
FROM orders
ORDER BY customer_id, order_date;

Window Function Components

-- COMPONENTEN VAN EEN WINDOW FUNCTION

-- 1. WINDOW FUNCTION TYPE
-- Ranking: ROW_NUMBER(), RANK(), DENSE_RANK(), NTILE()
-- Analytic: LEAD(), LAG(), FIRST_VALUE(), LAST_VALUE(), NTH_VALUE()
-- Aggregate: SUM(), AVG(), COUNT(), MIN(), MAX()
-- Statistical: STDDEV(), VARIANCE(), PERCENT_RANK(), CUME_DIST()

-- 2. OVER() CLAUSE - Markeert het als window function
-- Zonder PARTITION BY: Hele result set is één window
-- Met PARTITION BY: Aparte windows per partition

-- 3. PARTITION BY - Verdeelt data in groepen (vergelijkbaar met GROUP BY)
-- Syntax: PARTITION BY column1, column2, ...
-- Elk partition heeft zijn eigen window

-- 4. ORDER BY - Bepaalt volgorde binnen window
-- Syntax: ORDER BY column [ASC|DESC], ...
-- Essentieel voor ranking en running calculations

-- 5. WINDOW FRAME - Specificeert welke rijen in het window
-- Syntax: ROWS|RANGE BETWEEN start AND end
-- Options: UNBOUNDED PRECEDING, n PRECEDING, CURRENT ROW, n FOLLOWING, UNBOUNDED FOLLOWING

-- COMPLETE VOORBEELD MET ALLE COMPONENTEN
SELECT
    employee_id,
    department_id,
    salary,
    hire_date,
    
    -- Ranking binnen department
    ROW_NUMBER() OVER (
        PARTITION BY department_id
        ORDER BY salary DESC
    ) AS salary_rank_in_dept,
    
    -- Percentage van department total
    salary * 100.0 / SUM(salary) OVER (
        PARTITION BY department_id
    ) AS pct_of_dept_salary,
    
    -- Verschil met hoogste salaris in department
    MAX(salary) OVER (
        PARTITION BY department_id
    ) - salary AS diff_from_max_in_dept,
    
    -- Vorige salaris in department (op hire date)
    LAG(salary, 1) OVER (
        PARTITION BY department_id
        ORDER BY hire_date
    ) AS previous_salary_in_dept,
    
    -- Cumulative salary in department
    SUM(salary) OVER (
        PARTITION BY department_id
        ORDER BY hire_date
        ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
    ) AS cumulative_salary_in_dept
    
FROM employees
WHERE hire_date >= '2020-01-01'
ORDER BY department_id, salary DESC;

3. Ranking Functions (ROW_NUMBER, RANK, DENSE_RANK)

Ranking en Ordering Functions

Ranking functions wijzen rangnummers toe aan rijen binnen een window, gebaseerd op de ORDER BY clause. Ze zijn essentieel voor top-N queries, pagination en competitieve analyses.

ROW_NUMBER, RANK, DENSE_RANK Vergelijking

-- VOORBEELD DATA: Employee salaries
-- Stel we hebben deze data:
-- Employee | Department | Salary
-- ---------|------------|--------
-- Alice    | Sales      | 5000
-- Bob      | Sales      | 5000
-- Charlie  | Sales      | 4500
-- David    | Sales      | 4000
-- Eve      | IT         | 6000
-- Frank    | IT         | 5500

SELECT
    employee_name,
    department,
    salary,
    
    -- ROW_NUMBER: Unieke nummers, zelfs bij gelijke salarissen
    ROW_NUMBER() OVER (
        PARTITION BY department
        ORDER BY salary DESC
    ) AS row_num,
    
    -- RANK: Gelijke rank bij gelijke salarissen, gaps bij ties
    RANK() OVER (
        PARTITION BY department
        ORDER BY salary DESC
    ) AS rank,
    
    -- DENSE_RANK: Gelijke rank bij gelijke salarissen, geen gaps
    DENSE_RANK() OVER (
        PARTITION BY department
        ORDER BY salary DESC
    ) AS dense_rank,
    
    -- NTILE: Verdeel in groepen (bijv. quartiles, percentiles)
    NTILE(4) OVER (
        PARTITION BY department
        ORDER BY salary DESC
    ) AS quartile
    
FROM employees
ORDER BY department, salary DESC;

-- RESULTAAT:
-- Employee | Department | Salary | row_num | rank | dense_rank | quartile
-- ---------|------------|--------|---------|------|------------|---------
-- Alice    | Sales      | 5000   | 1       | 1    | 1          | 1
-- Bob      | Sales      | 5000   | 2       | 1    | 1          | 1
-- Charlie  | Sales      | 4500   | 3       | 3    | 2          | 2
-- David    | Sales      | 4000   | 4       | 4    | 3          | 2
-- Eve      | IT         | 6000   | 1       | 1    | 1          | 1
-- Frank    | IT         | 5500   | 2       | 2    | 2          | 2

-- PRAKTISCHE USE CASE: Top 3 medewerkers per department
WITH RankedEmployees AS (
    SELECT
        employee_name,
        department,
        salary,
        ROW_NUMBER() OVER (
            PARTITION BY department
            ORDER BY salary DESC
        ) AS salary_rank
    FROM employees
)
SELECT *
FROM RankedEmployees
WHERE salary_rank <= 3
ORDER BY department, salary_rank;

-- PAGINATION: Fetch rows 11-20 (page 2 met 10 rows per page)
WITH PaginatedData AS (
    SELECT
        *,
        ROW_NUMBER() OVER (ORDER BY order_date DESC) AS row_num
    FROM orders
    WHERE order_date >= '2024-01-01'
)
SELECT *
FROM PaginatedData
WHERE row_num BETWEEN 11 AND 20;

Geavanceerde Ranking Use Cases

-- USE CASE 1: Find duplicate records (based on business key)
WITH DuplicateCheck AS (
    SELECT
        customer_email,
        customer_name,
        registration_date,
        ROW_NUMBER() OVER (
            PARTITION BY customer_email
            ORDER BY registration_date
        ) AS duplicate_num
    FROM customers
)
SELECT *
FROM DuplicateCheck
WHERE duplicate_num > 1  -- All duplicates except first occurrence
ORDER BY customer_email, duplicate_num;

-- USE CASE 2: Gap analysis in sequences
WITH NumberedOrders AS (
    SELECT
        order_id,
        order_date,
        ROW_NUMBER() OVER (ORDER BY order_date) AS sequence_num,
        ROW_NUMBER() OVER (ORDER BY order_id) AS expected_sequence
    FROM orders
    WHERE YEAR(order_date) = 2024
)
SELECT
    order_id,
    order_date,
    sequence_num,
    expected_sequence,
    expected_sequence - sequence_num AS gap_size
FROM NumberedOrders
WHERE expected_sequence - sequence_num > 0  -- Find gaps
ORDER BY gap_size DESC;

-- USE CASE 3: Percentile analysis with NTILE
SELECT
    customer_id,
    total_purchase_amount,
    
    -- Verdeel in 10 groepen (deciles)
    NTILE(10) OVER (ORDER BY total_purchase_amount DESC) AS decile,
    
    -- Verdeel in 4 groepen (quartiles)
    NTILE(4) OVER (ORDER BY total_purchase_amount DESC) AS quartile,
    
    -- Verdeel in 100 groepen (percentiles)
    NTILE(100) OVER (ORDER BY total_purchase_amount DESC) AS percentile,
    
    -- PERCENT_RANK: Relative rank (0-1)
    PERCENT_RANK() OVER (
        ORDER BY total_purchase_amount DESC
    ) AS percent_rank,
    
    -- CUME_DIST: Cumulative distribution (0-1)
    CUME_DIST() OVER (
        ORDER BY total_purchase_amount DESC
    ) AS cumulative_distribution
    
FROM customer_purchases
ORDER BY total_purchase_amount DESC;

-- USE CASE 4: Remove middle duplicates, keep first and last
WITH DuplicateProcessing AS (
    SELECT
        *,
        ROW_NUMBER() OVER (
            PARTITION BY business_key
            ORDER BY created_at
        ) AS rn_asc,
        ROW_NUMBER() OVER (
            PARTITION BY business_key
            ORDER BY created_at DESC
        ) AS rn_desc
    FROM your_table
)
SELECT *
FROM DuplicateProcessing
WHERE rn_asc = 1  -- Keep first
    OR rn_desc = 1; -- Keep last

SQL Analytics Experts Nodig?

Vind ervaren SQL Developers gespecialiseerd in geavanceerde analytics en window functions

4. Analytic Functions (LEAD, LAG, FIRST_VALUE, LAST_VALUE)

Time-Series en Sequential Analysis

Analytic functions geven toegang tot andere rijen in het window zonder self-joins. Ze zijn essentieel voor time-series analyse, sequential comparisons en gap detection.

LEAD en LAG Functions

-- LEAD: Kijk naar volgende rij
-- LAG: Kijk naar vorige rij

-- BASIS SYNTAX:
-- LEAD(column, offset, default_value) OVER (...)
-- LAG(column, offset, default_value) OVER (...)

-- PRAKTISCH VOORBEELD: Maand-over-maand sales analyse
SELECT
    year_month,
    total_sales,
    
    -- Vorige maand
    LAG(total_sales, 1) OVER (ORDER BY year_month) AS prev_month_sales,
    
    -- Volgende maand
    LEAD(total_sales, 1) OVER (ORDER BY year_month) AS next_month_sales,
    
    -- Verschil met vorige maand
    total_sales - LAG(total_sales, 1) OVER (ORDER BY year_month) 
        AS sales_diff_prev_month,
    
    -- Percentage verandering t.o.v. vorige maand
    ROUND(
        (total_sales - LAG(total_sales, 1) OVER (ORDER BY year_month)) * 100.0 / 
        LAG(total_sales, 1) OVER (ORDER BY year_month),
        2
    ) AS pct_change_prev_month,
    
    -- 3-maand moving average
    AVG(total_sales) OVER (
        ORDER BY year_month
        ROWS BETWEEN 2 PRECEDING AND CURRENT ROW
    ) AS moving_avg_3_month
    
FROM monthly_sales
ORDER BY year_month;

-- COMPLEX VOORBEELD: Customer order sequence analysis
SELECT
    customer_id,
    order_id,
    order_date,
    order_amount,
    
    -- Vorige order van deze klant
    LAG(order_date, 1) OVER (
        PARTITION BY customer_id
        ORDER BY order_date
    ) AS prev_order_date,
    
    -- Volgende order van deze klant
    LEAD(order_date, 1) OVER (
        PARTITION BY customer_id
        ORDER BY order_date
    ) AS next_order_date,
    
    -- Dagen sinds vorige order
    DATEDIFF(
        day,
        LAG(order_date, 1) OVER (
            PARTITION BY customer_id
            ORDER BY order_date
        ),
        order_date
    ) AS days_since_prev_order,
    
    -- Running total per klant
    SUM(order_amount) OVER (
        PARTITION BY customer_id
        ORDER BY order_date
    ) AS customer_running_total,
    
    -- Gemiddelde order amount van klant tot nu toe
    AVG(order_amount) OVER (
        PARTITION BY customer_id
        ORDER BY order_date
    ) AS customer_avg_to_date
    
FROM orders
WHERE order_date >= '2024-01-01'
ORDER BY customer_id, order_date;

-- MULTIPLE OFFSETS: Kijk meerdere rijen vooruit/achteruit
SELECT
    date,
    sales,
    
    -- Vorige dag
    LAG(sales, 1, 0) OVER (ORDER BY date) AS prev_day,
    
    -- 7 dagen geleden
    LAG(sales, 7, 0) OVER (ORDER BY date) AS week_ago,
    
    -- Week-over-week verschil
    sales - LAG(sales, 7, 0) OVER (ORDER BY date) AS wow_difference,
    
    -- 30 dagen geleden (maandoverzicht)
    LAG(sales, 30, 0) OVER (ORDER BY date) AS month_ago
    
FROM daily_sales
ORDER BY date;

FIRST_VALUE, LAST_VALUE en NTH_VALUE

-- FIRST_VALUE: Eerste waarde in het window
-- LAST_VALUE: Laatste waarde in het window
-- NTH_VALUE: N-de waarde in het window

-- PRAKTISCH VOORBEELD: Product price analysis
SELECT
    product_id,
    price_date,
    price,
    
    -- Eerste prijs in de reeks
    FIRST_VALUE(price) OVER (
        PARTITION BY product_id
        ORDER BY price_date
        ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
    ) AS first_price,
    
    -- Laatste prijs in de reeks
    LAST_VALUE(price) OVER (
        PARTITION BY product_id
        ORDER BY price_date
        ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
    ) AS last_price,
    
    -- Hoogste prijs tot nu toe
    MAX(price) OVER (
        PARTITION BY product_id
        ORDER BY price_date
    ) AS max_price_to_date,
    
    -- Laagste prijs tot nu toe
    MIN(price) OVER (
        PARTITION BY product_id
        ORDER BY price_date
    ) AS min_price_to_date,
    
    -- Verschil met startprijs
    price - FIRST_VALUE(price) OVER (
        PARTITION BY product_id
        ORDER BY price_date
    ) AS diff_from_first,
    
    -- Percentage verandering vanaf start
    ROUND(
        (price - FIRST_VALUE(price) OVER (
            PARTITION BY product_id
            ORDER BY price_date
        )) * 100.0 / 
        FIRST_VALUE(price) OVER (
            PARTITION BY product_id
            ORDER BY price_date
        ),
        2
    ) AS pct_change_from_first
    
FROM product_prices
ORDER BY product_id, price_date;

-- NTH_VALUE: Find specific position in window
SELECT
    employee_id,
    department_id,
    salary,
    
    -- Tweede hoogste salaris in department
    NTH_VALUE(salary, 2) OVER (
        PARTITION BY department_id
        ORDER BY salary DESC
        ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
    ) AS second_highest_salary,
    
    -- Derde hoogste salaris in department
    NTH_VALUE(salary, 3) OVER (
        PARTITION BY department_id
        ORDER BY salary DESC
        ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
    ) AS third_highest_salary,
    
    -- Verschil met tweede hoogste
    salary - NTH_VALUE(salary, 2) OVER (
        PARTITION BY department_id
        ORDER BY salary DESC
        ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
    ) AS diff_from_second_highest
    
FROM employees
ORDER BY department_id, salary DESC;

-- COMBINATIE VAN FUNCTIES: Complex business scenario
SELECT
    customer_id,
    order_date,
    order_amount,
    
    -- Eerste order amount
    FIRST_VALUE(order_amount) OVER (
        PARTITION BY customer_id
        ORDER BY order_date
    ) AS first_order_amount,
    
    -- Meest recente order amount
    LAST_VALUE(order_amount) OVER (
        PARTITION BY customer_id
        ORDER BY order_date
        ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
    ) AS latest_order_amount,
    
    -- Gemiddelde van eerste en laatste order
    (
        FIRST_VALUE(order_amount) OVER (
            PARTITION BY customer_id
            ORDER BY order_date
        ) +
        LAST_VALUE(order_amount) OVER (
            PARTITION BY customer_id
            ORDER BY order_date
            ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
        )
    ) / 2.0 AS avg_first_last,
    
    -- Trend: stijgend/dalend t.o.v. eerste order
    CASE
        WHEN order_amount > FIRST_VALUE(order_amount) OVER (
            PARTITION BY customer_id
            ORDER BY order_date
        ) THEN 'Increasing'
        WHEN order_amount < FIRST_VALUE(order_amount) OVER (
            PARTITION BY customer_id
            ORDER BY order_date
        ) THEN 'Decreasing'
        ELSE 'Stable'
    END AS trend_vs_first_order
    
FROM orders
ORDER BY customer_id, order_date;

5. Aggregate Window Functions (SUM, AVG, COUNT OVER)

Running Calculations en Cumulative Aggregates

Aggregate window functions laten je aggregaties uitvoeren over een window terwijl individuele rijen behouden blijven. Perfect voor running totals, moving averages en cumulative distributions.

SUM, AVG, COUNT OVER Examples

-- BASIS AGGREGATE WINDOW FUNCTIONS

-- 1. RUNNING TOTAL (Cumulative sum)
SELECT
    order_date,
    daily_sales,
    
    -- Running total (default: ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)
    SUM(daily_sales) OVER (
        ORDER BY order_date
    ) AS running_total,
    
    -- Running total met PARTITION BY (per jaar)
    SUM(daily_sales) OVER (
        PARTITION BY YEAR(order_date)
        ORDER BY order_date
    ) AS ytd_running_total
    
FROM daily_sales
ORDER BY order_date;

-- 2. MOVING AVERAGE
SELECT
    date,
    sales,
    
    -- 7-day moving average
    AVG(sales) OVER (
        ORDER BY date
        ROWS BETWEEN 6 PRECEDING AND CURRENT ROW
    ) AS moving_avg_7d,
    
    -- 30-day moving average
    AVG(sales) OVER (
        ORDER BY date
        ROWS BETWEEN 29 PRECEDING AND CURRENT ROW
    ) AS moving_avg_30d,
    
    -- Centered moving average (3 dagen voor en na)
    AVG(sales) OVER (
        ORDER BY date
        ROWS BETWEEN 3 PRECEDING AND 3 FOLLOWING
    ) AS centered_moving_avg
    
FROM daily_sales
ORDER BY date;

-- 3. CUMULATIVE COUNT en PERCENTAGE
SELECT
    customer_id,
    order_date,
    order_amount,
    
    -- Aantal orders tot nu toe
    COUNT(*) OVER (
        PARTITION BY customer_id
        ORDER BY order_date
    ) AS order_number,
    
    -- Percentage van customer total
    order_amount * 100.0 / SUM(order_amount) OVER (
        PARTITION BY customer_id
    ) AS pct_of_customer_total,
    
    -- Cumulative percentage van customer total
    SUM(order_amount) OVER (
        PARTITION BY customer_id
        ORDER BY order_date
    ) * 100.0 / SUM(order_amount) OVER (
        PARTITION BY customer_id
    ) AS cumulative_pct_of_total
    
FROM orders
ORDER BY customer_id, order_date;

-- 4. MIN en MAX OVER TIME
SELECT
    product_id,
    price_date,
    price,
    
    -- Laagste prijs tot nu toe
    MIN(price) OVER (
        PARTITION BY product_id
        ORDER BY price_date
    ) AS min_price_to_date,
    
    -- Hoogste prijs tot nu toe
    MAX(price) OVER (
        PARTITION BY product_id
        ORDER BY price_date
    ) AS max_price_to_date,
    
    -- Verschil met laagste prijs tot nu toe
    price - MIN(price) OVER (
        PARTITION BY product_id
        ORDER BY price_date
    ) AS diff_from_min,
    
    -- Percentage boven laagste prijs
    (price - MIN(price) OVER (
        PARTITION BY product_id
        ORDER BY price_date
    )) * 100.0 / 
    MIN(price) OVER (
        PARTITION BY product_id
        ORDER BY price_date
    ) AS pct_above_min
    
FROM product_prices
ORDER BY product_id, price_date;

Advanced Aggregate Patterns

-- PATTERN 1: Expanding window vs Sliding window
SELECT
    date,
    revenue,
    
    -- Expanding window (cumulative)
    SUM(revenue) OVER (
        ORDER BY date
        ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
    ) AS cumulative_revenue,
    
    -- Sliding window (last 7 days)
    SUM(revenue) OVER (
        ORDER BY date
        ROWS BETWEEN 6 PRECEDING AND CURRENT ROW
    ) AS trailing_7d_revenue,
    
    -- Centered window (3 before, 3 after)
    AVG(revenue) OVER (
        ORDER BY date
        ROWS BETWEEN 3 PRECEDING AND 3 FOLLOWING
    ) AS centered_7d_avg
    
FROM daily_metrics
ORDER BY date;

-- PATTERN 2: Percentage of total per group
SELECT
    department,
    employee_name,
    salary,
    
    -- Salary als percentage van department total
    salary * 100.0 / SUM(salary) OVER (
        PARTITION BY department
    ) AS pct_of_dept_total,
    
    -- Salary als percentage van company total
    salary * 100.0 / SUM(salary) OVER () AS pct_of_company_total,
    
    -- Department total als percentage van company
    SUM(salary) OVER (
        PARTITION BY department
    ) * 100.0 / SUM(salary) OVER () 
        AS dept_pct_of_company
    
FROM employees
ORDER BY department, salary DESC;

-- PATTERN 3: Running difference from average
SELECT
    month,
    actual_sales,
    forecast_sales,
    
    -- Running average van forecast
    AVG(forecast_sales) OVER (
        ORDER BY month
        ROWS BETWEEN 2 PRECEDING AND CURRENT ROW
    ) AS forecast_moving_avg,
    
    -- Verschil met running average
    actual_sales - AVG(forecast_sales) OVER (
        ORDER BY month
        ROWS BETWEEN 2 PRECEDING AND CURRENT ROW
    ) AS diff_from_forecast_avg,
    
    -- Cumulative forecast error
    SUM(actual_sales - forecast_sales) OVER (
        ORDER BY month
    ) AS cumulative_forecast_error
    
FROM sales_forecast
ORDER BY month;

-- PATTERN 4: Hierarchical cumulative calculations
SELECT
    region,
    country,
    city,
    sales_amount,
    
    -- Cumulative per city
    SUM(sales_amount) OVER (
        PARTITION BY region, country, city
        ORDER BY sales_date
    ) AS cumulative_city_sales,
    
    -- Cumulative per country
    SUM(sales_amount) OVER (
        PARTITION BY region, country
        ORDER BY sales_date
    ) AS cumulative_country_sales,
    
    -- Cumulative per region
    SUM(sales_amount) OVER (
        PARTITION BY region
        ORDER BY sales_date
    ) AS cumulative_region_sales,
    
    -- Percentage van region total
    sales_amount * 100.0 / SUM(sales_amount) OVER (
        PARTITION BY region
    ) AS pct_of_region
    
FROM regional_sales
ORDER BY region, country, city, sales_date;

-- PATTERN 5: Gap analysis with COUNT
SELECT
    date,
    
    -- Aantal dagen met sales
    COUNT(sales_amount) OVER (
        ORDER BY date
        ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
    ) AS days_with_sales,
    
    -- Aantal opeenvolgende dagen met sales
    CASE
        WHEN sales_amount > 0 
        THEN ROW_NUMBER() OVER (
            PARTITION BY streak_group
            ORDER BY date
        )
        ELSE 0
    END AS consecutive_days_streak
    
FROM (
    SELECT
        date,
        sales_amount,
        SUM(CASE WHEN sales_amount = 0 THEN 1 ELSE 0 END) OVER (
            ORDER BY date
        ) AS streak_group
    FROM daily_sales
) streaks
ORDER BY date;

6. Window Frame Specificatie (ROWS vs RANGE)

Window Frame Concepts

Window frame specificatie bepaalt precies welke rijen in het window worden opgenomen voor berekeningen. Het onderscheid tussen ROWS en RANGE is cruciaal voor correcte resultaten.

ROWS vs RANGE Comparison

-- ROWS: Fysieke rijen tellen
-- RANGE: Logische waarden groeperen

-- VOORBEELD DATA:
-- date       | sales
-- 2024-01-01 | 100
-- 2024-01-01 | 150  (zelfde datum)
-- 2024-01-02 | 200
-- 2024-01-03 | 250

SELECT
    date,
    sales,
    
    -- ROWS: Telt 2 fysieke rijen terug
    SUM(sales) OVER (
        ORDER BY date
        ROWS BETWEEN 2 PRECEDING AND CURRENT ROW
    ) AS sum_rows_2_preceding,
    
    -- RANGE: Neemt alle rijen metzelfde of eerdere datum binnen range
    SUM(sales) OVER (
        ORDER BY date
        RANGE BETWEEN INTERVAL '2' DAY PRECEDING AND CURRENT ROW
    ) AS sum_range_2_days,
    
    -- UNBOUNDED PRECEDING: Alles vanaf start
    SUM(sales) OVER (
        ORDER BY date
        ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
    ) AS cumulative_sum,
    
    -- BETWEEN n PRECEDING AND m FOLLOWING
    AVG(sales) OVER (
        ORDER BY date
        ROWS BETWEEN 1 PRECEDING AND 1 FOLLOWING
    ) AS centered_avg
    
FROM daily_sales
ORDER BY date;

-- RESULTATEN:
-- date       | sales | sum_rows_2_preceding | sum_range_2_days | cumulative_sum | centered_avg
-- 2024-01-01 | 100   | 100                  | 250              | 100            | 125
-- 2024-01-01 | 150   | 250 (100+150)        | 250              | 250            | 150
-- 2024-01-02 | 200   | 450 (150+200+100)    | 450              | 450            | 150
-- 2024-01-03 | 250   | 600 (200+250+150)    | 700              | 700            | 225

-- ADVANCED FRAME PATTERNS
SELECT
    timestamp,
    metric_value,
    
    -- Rolling window van laatste 5 minuten (RANGE)
    AVG(metric_value) OVER (
        ORDER BY timestamp
        RANGE BETWEEN INTERVAL '5' MINUTE PRECEDING AND CURRENT ROW
    ) AS avg_last_5_minutes,
    
    -- Expanding window (standaard als ORDER BY zonder frame)
    SUM(metric_value) OVER (
        ORDER BY timestamp
    ) AS default_expanding_sum,
    
    -- Explicit hele window
    SUM(metric_value) OVER (
        ORDER BY timestamp
        ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
    ) AS total_sum,
    
    -- Window zonder CURRENT ROW
    SUM(metric_value) OVER (
        ORDER BY timestamp
        ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING
    ) AS sum_excluding_current,
    
    -- Only following rows
    AVG(metric_value) OVER (
        ORDER BY timestamp
        ROWS BETWEEN CURRENT ROW AND 2 FOLLOWING
    ) AS avg_next_3
    
FROM time_series_metrics
ORDER BY timestamp;

Frame Boundary Options

-- COMPLETE FRAME BOUNDARY REFERENCE

-- 1. UNBOUNDED PRECEDING: All rows from partition start
SUM(amount) OVER (
    ORDER BY date
    ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
) -- Default frame bij ORDER BY zonder frame specificatie

-- 2. n PRECEDING: n rows before current row
AVG(temperature) OVER (
    ORDER BY reading_time
    ROWS BETWEEN 5 PRECEDING AND CURRENT ROW
) -- Moving average laatste 6 readings

-- 3. CURRENT ROW: Current row only
ROW_NUMBER() OVER (
    ORDER BY score
    ROWS BETWEEN CURRENT ROW AND CURRENT ROW
) -- Niet praktisch, maar illustratief

-- 4. n FOLLOWING: n rows after current row
MIN(price) OVER (
    ORDER BY effective_date
    ROWS BETWEEN CURRENT ROW AND 30 FOLLOWING
) -- Minimum prijs in komende 30 dagen

-- 5. UNBOUNDED FOLLOWING: All rows to partition end
LAST_VALUE(amount) OVER (
    ORDER BY date
    ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING
) -- Laatste waarde vanaf huidige rij

-- 6. MIXED BOUNDARIES: Complex frames
AVG(value) OVER (
    ORDER BY timestamp
    ROWS BETWEEN 2 PRECEDING AND 2 FOLLOWING
) -- Centered 5-point moving average

-- 7. RANGE met INTERVAL: Voor datetime columns
SUM(revenue) OVER (
    ORDER BY transaction_time
    RANGE BETWEEN INTERVAL '1' HOUR PRECEDING AND CURRENT ROW
) -- Revenue laatste uur

-- 8. RANGE met numerieke offsets
COUNT(*) OVER (
    ORDER BY score
    RANGE BETWEEN 10 PRECEDING AND CURRENT ROW
) -- Aantal scores binnen 10 punten van huidige

-- PRACTICAL EXAMPLE: Complex business calculation
SELECT
    employee_id,
    evaluation_date,
    performance_score,
    
    -- Average van huidige en vorige 2 evaluaties
    AVG(performance_score) OVER (
        PARTITION BY employee_id
        ORDER BY evaluation_date
        ROWS BETWEEN 2 PRECEDING AND CURRENT ROW
    ) AS moving_avg_3_evaluations,
    
    -- Trend: gemiddelde laatste 2 vs eerste 2
    AVG(performance_score) OVER (
        PARTITION BY employee_id
        ORDER BY evaluation_date
        ROWS BETWEEN 1 PRECEDING AND CURRENT ROW
    ) - 
    AVG(performance_score) OVER (
        PARTITION BY employee_id
        ORDER BY evaluation_date
        ROWS BETWEEN UNBOUNDED PRECEDING AND 1 FOLLOWING
    ) AS trend_vs_early_performance,
    
    -- Percentage van beste score in laatste jaar
    performance_score * 100.0 / 
    MAX(performance_score) OVER (
        PARTITION BY employee_id
        ORDER BY evaluation_date
        RANGE BETWEEN INTERVAL '1' YEAR PRECEDING AND CURRENT ROW
    ) AS pct_of_best_last_year
    
FROM performance_reviews
ORDER BY employee_id, evaluation_date;

Klaar voor Geavanceerde SQL Projecten?

Vind de juiste SQL experts of plaats je vacature voor complexe data analytics projecten

7. PARTITION BY Deep Dive

Data Partitioning Strategieën

PARTITION BY verdeelt data in onafhankelijke windows voor parallelle berekeningen. Het is vergelijkbaar met GROUP BY maar behoudt individuele rijen.

Multi-Column Partitioning

-- MULTI-LEVEL PARTITIONING: Complex business scenarios

-- Scenario: Retail sales analysis per store, department, en product category
SELECT
    store_id,
    department_id,
    category_id,
    product_id,
    sales_date,
    daily_sales,
    
    -- Ranking binnen category per store
    RANK() OVER (
        PARTITION BY store_id, department_id, category_id
        ORDER BY daily_sales DESC
    ) AS sales_rank_in_category,
    
    -- Percentage van category total per store
    daily_sales * 100.0 / 
    SUM(daily_sales) OVER (
        PARTITION BY store_id, department_id, category_id
    ) AS pct_of_category_sales,
    
    -- Running total per store-department
    SUM(daily_sales) OVER (
        PARTITION BY store_id, department_id
        ORDER BY sales_date
    ) AS running_total_dept,
    
    -- Verschil met gemiddelde vanzelfde category in andere stores
    daily_sales - 
    AVG(daily_sales) OVER (
        PARTITION BY department_id, category_id, sales_date
    ) AS diff_from_category_avg_all_stores,
    
    -- Percentage boven/beneden store gemiddelde voor category
    (daily_sales - 
     AVG(daily_sales) OVER (
        PARTITION BY store_id, department_id, category_id
     )) * 100.0 / 
    AVG(daily_sales) OVER (
        PARTITION BY store_id, department_id, category_id
    ) AS pct_vs_store_category_avg
    
FROM retail_sales_daily
WHERE sales_date BETWEEN '2024-01-01' AND '2024-12-31'
ORDER BY store_id, department_id, category_id, sales_rank_in_category;

-- HIERARCHICAL PARTITIONING: Drill-down analysis
SELECT
    region,
    country,
    city,
    sales_person,
    sales_amount,
    sales_date,
    
    -- Percentage van region total
    sales_amount * 100.0 / 
    SUM(sales_amount) OVER (
        PARTITION BY region
    ) AS pct_of_region,
    
    -- Percentage van country total binnen region
    sales_amount * 100.0 / 
    SUM(sales_amount) OVER (
        PARTITION BY region, country
    ) AS pct_of_country,
    
    -- Percentage van city total binnen country
    sales_amount * 100.0 / 
    SUM(sales_amount) OVER (
        PARTITION BY region, country, city
    ) AS pct_of_city,
    
    -- Ranking binnen city
    ROW_NUMBER() OVER (
        PARTITION BY region, country, city
        ORDER BY sales_amount DESC
    ) AS rank_in_city,
    
    -- Cumulative per sales person in city
    SUM(sales_amount) OVER (
        PARTITION BY region, country, city, sales_person
        ORDER BY sales_date
    ) AS cumulative_per_person
    
FROM regional_sales
ORDER BY region, country, city, rank_in_city;

8. Geavanceerde Patterns en Use Cases

Complex Business Scenarios

Geavanceerde window function patterns lossen complexe business problemen op die anders multiple self-joins of procedural code zouden vereisen.

Advanced Business Patterns

-- PATTERN 1: Sessionization - Group sequential events
WITH UserSessions AS (
    SELECT
        user_id,
        event_timestamp,
        event_type,
        
        -- Bepaal of er een gap van > 30 minuten was
        CASE
            WHEN DATEDIFF(
                minute,
                LAG(event_timestamp) OVER (
                    PARTITION BY user_id
                    ORDER BY event_timestamp
                ),
                event_timestamp
            ) > 30
            THEN 1
            ELSE 0
        END AS is_new_session,
        
        -- Genereer session ID
        SUM(
            CASE
                WHEN DATEDIFF(
                    minute,
                    LAG(event_timestamp) OVER (
                        PARTITION BY user_id
                        ORDER BY event_timestamp
                    ),
                    event_timestamp
                ) > 30
                THEN 1
                ELSE 0
            END
        ) OVER (
            PARTITION BY user_id
            ORDER BY event_timestamp
        ) AS session_id
        
    FROM user_events
    WHERE event_date = '2024-12-20'
)
SELECT
    user_id,
    session_id,
    MIN(event_timestamp) AS session_start,
    MAX(event_timestamp) AS session_end,
    COUNT(*) AS event_count,
    DATEDIFF(minute, MIN(event_timestamp), MAX(event_timestamp)) AS session_duration_minutes
FROM UserSessions
GROUP BY user_id, session_id
ORDER BY user_id, session_start;

-- PATTERN 2: Gap and Island analysis
WITH NumberedStatus AS (
    SELECT
        equipment_id,
        status_date,
        status,
        
        -- Groep opeenvolgende dagen metzelfde status
        ROW_NUMBER() OVER (ORDER BY status_date) -
        ROW_NUMBER() OVER (
            PARTITION BY equipment_id, status
            ORDER BY status_date
        ) AS island_group
        
    FROM equipment_status
    WHERE equipment_id = 123
)
SELECT
    equipment_id,
    status,
    MIN(status_date) AS period_start,
    MAX(status_date) AS period_end,
    DATEDIFF(day, MIN(status_date), MAX(status_date)) + 1 AS days_in_status
FROM NumberedStatus
GROUP BY equipment_id, status, island_group
ORDER BY period_start;

-- PATTERN 3: Customer cohort analysis
WITH FirstPurchase AS (
    SELECT
        customer_id,
        MIN(order_date) AS first_purchase_date,
        DATE_TRUNC('month', MIN(order_date)) AS cohort_month
    FROM orders
    GROUP BY customer_id
),
MonthlyActivity AS (
    SELECT
        fp.customer_id,
        fp.cohort_month,
        DATE_TRUNC('month', o.order_date) AS activity_month,
        COUNT(DISTINCT o.order_id) AS order_count,
        SUM(o.order_amount) AS total_spent
    FROM FirstPurchase fp
    JOIN orders o ON fp.customer_id = o.customer_id
    GROUP BY fp.customer_id, fp.cohort_month, DATE_TRUNC('month', o.order_date)
)
SELECT
    cohort_month,
    activity_month,
    DATEDIFF(month, cohort_month, activity_month) AS months_since_first_purchase,
    COUNT(DISTINCT customer_id) AS active_customers,
    AVG(order_count) AS avg_orders_per_customer,
    AVG(total_spent) AS avg_spent_per_customer,
    
    -- Retention rate vs first month
    COUNT(DISTINCT customer_id) * 100.0 / 
    FIRST_VALUE(COUNT(DISTINCT customer_id)) OVER (
        PARTITION BY cohort_month
        ORDER BY activity_month
    ) AS retention_rate_pct
    
FROM MonthlyActivity
GROUP BY cohort_month, activity_month
ORDER BY cohort_month, activity_month;

9. Performance Optimalisatie

Window Functions Performance Tips

Performance optimalisatie voor window functions is essentieel bij grote datasets. Goede praktijken kunnen queries 10-100x versnellen.

Performance Best Practices

-- PERFORMANCE OPTIMALISATIE TIPS

-- 1. Gebruik indexes op PARTITION BY en ORDER BY columns
CREATE INDEX idx_orders_customer_date 
ON orders(customer_id, order_date);

-- 2. Beperk window size waar mogelijk
-- SLECHT: Hele tabel in window
SUM(amount) OVER (PARTITION BY customer_id ORDER BY date)

-- BETER: Beperk tot laatste 90 dagen
SUM(amount) OVER (
    PARTITION BY customer_id
    ORDER BY date
    ROWS BETWEEN 89 PRECEDING AND CURRENT ROW
)

-- 3. Vermijd nested window functions
-- SLECHT: Nested calculations
SELECT
    AVG(SUM(amount) OVER (PARTITION BY customer_id)) 
    OVER (PARTITION BY region)
FROM orders;

-- BETER: Gebruik CTEs
WITH CustomerTotals AS (
    SELECT
        customer_id,
        region,
        SUM(amount) AS total_amount
    FROM orders
    GROUP BY customer_id, region
)
SELECT
    region,
    AVG(total_amount) OVER (PARTITION BY region)
FROM CustomerTotals;

-- 4. Filter resultaten buiten de window function
-- SLECHT: Filter binnen window function
SELECT
    SUM(CASE WHEN status = 'completed' THEN amount ELSE 0 END)
    OVER (PARTITION BY customer_id)
FROM orders;

-- BETER: Filter eerst, dan window function
WITH CompletedOrders AS (
    SELECT customer_id, amount
    FROM orders
    WHERE status = 'completed'
)
SELECT
    customer_id,
    SUM(amount) OVER (PARTITION BY customer_id)
FROM CompletedOrders;

-- 5. Gebruik materialized views voor vaak gebruikte windows
CREATE MATERIALIZED VIEW mv_customer_running_totals AS
SELECT
    customer_id,
    order_date,
    order_amount,
    SUM(order_amount) OVER (
        PARTITION BY customer_id
        ORDER BY order_date
    ) AS running_total
FROM orders;

-- 6. Vermijd RANGE bij grote datasets
-- ROWS is meestal sneller dan RANGE
-- RANGE kan leiden tot onverwachte performance issues

-- 7. Partitionering optimalisatie
EXPLAIN ANALYZE
SELECT
    customer_id,
    SUM(amount) OVER (PARTITION BY customer_id)
FROM orders;

-- 8. Batch processing voor zeer grote datasets
CREATE PROCEDURE process_large_window_batches()
AS $$
DECLARE
    batch_size INT := 100000;
    offset_val INT := 0;
BEGIN
    WHILE TRUE LOOP
        INSERT INTO processed_results
        SELECT
            customer_id,
            order_date,
            SUM(amount) OVER (
                PARTITION BY customer_id
                ORDER BY order_date
            ) AS running_total
        FROM orders
        WHERE customer_id BETWEEN offset_val AND offset_val + batch_size
        ORDER BY customer_id, order_date;
        
        offset_val := offset_val + batch_size;
        
        EXIT WHEN offset_val > (SELECT MAX(customer_id) FROM orders);
    END LOOP;
END;
$$ LANGUAGE plpgsql;

10. Real-World Business Examples

Praktische Business Use Cases

Real-world voorbeelden tonen hoe window functions worden gebruikt in verschillende industrieën voor kritieke business analyses.

E-commerce: Customer Behavior Analysis

-- COMPLETE CUSTOMER JOURNEY ANALYSIS
WITH CustomerJourney AS (
    SELECT
        customer_id,
        session_id,
        event_timestamp,
        event_type,
        page_url,
        
        -- Time since previous event
        DATEDIFF(
            second,
            LAG(event_timestamp) OVER (
                PARTITION BY customer_id, session_id
                ORDER BY event_timestamp
            ),
            event_timestamp
        ) AS seconds_since_prev_event,
        
        -- Session duration so far
        DATEDIFF(
            second,
            FIRST_VALUE(event_timestamp) OVER (
                PARTITION BY customer_id, session_id
                ORDER BY event_timestamp
            ),
            event_timestamp
        ) AS seconds_in_session,
        
        -- Page sequence number
        ROW_NUMBER() OVER (
            PARTITION BY customer_id, session_id
            ORDER BY event_timestamp
        ) AS page_sequence,
        
        -- Conversion flag (purchase event)
        MAX(CASE WHEN event_type = 'purchase' THEN 1 ELSE 0 END) OVER (
            PARTITION BY customer_id, session_id
        ) AS session_converted
        
    FROM customer_events
    WHERE event_date = '2024-12-20'
)
SELECT
    customer_id,
    session_id,
    COUNT(*) AS total_events,
    MIN(event_timestamp) AS session_start,
    MAX(event_timestamp) AS session_end,
    MAX(seconds_in_session) AS session_duration_seconds,
    AVG(seconds_since_prev_event) AS avg_time_between_events,
    MAX(session_converted) AS converted,
    
    -- Customer's average session duration
    AVG(MAX(seconds_in_session)) OVER (
        PARTITION BY customer_id
    ) AS customer_avg_session_duration,
    
    -- Percentage of customer's total sessions
    100.0 / COUNT(*) OVER (
        PARTITION BY customer_id
    ) AS pct_of_customer_sessions
    
FROM CustomerJourney
GROUP BY customer_id, session_id
ORDER BY customer_id, session_start;

Financial Services: Fraud Detection

-- ADVANCED FRAUD DETECTION PATTERNS
WITH TransactionAnalysis AS (
    SELECT
        transaction_id,
        customer_id,
        transaction_timestamp,
        transaction_amount,
        merchant_id,
        location,
        
        -- Time since last transaction
        DATEDIFF(
            minute,
            LAG(transaction_timestamp) OVER (
                PARTITION BY customer_id
                ORDER BY transaction_timestamp
            ),
            transaction_timestamp
        ) AS minutes_since_last_tx,
        
        -- Amount difference from average
        transaction_amount - 
        AVG(transaction_amount) OVER (
            PARTITION BY customer_id
            ORDER BY transaction_timestamp
            ROWS BETWEEN 9 PRECEDING AND 1 PRECEDING
        ) AS amount_diff_from_avg,
        
        -- Unusual location pattern
        CASE
            WHEN location != 
                 LAG(location, 1) OVER (
                     PARTITION BY customer_id
                     ORDER BY transaction_timestamp
                 )
                 AND DATEDIFF(
                     minute,
                     LAG(transaction_timestamp, 1) OVER (
                         PARTITION BY customer_id
                         ORDER BY transaction_timestamp
                     ),
                     transaction_timestamp
                 ) < 30
            THEN 1
            ELSE 0
        END AS rapid_location_change,
        
        -- Transaction velocity (last hour)
        COUNT(*) OVER (
            PARTITION BY customer_id
            ORDER BY transaction_timestamp
            RANGE BETWEEN INTERVAL '1' HOUR PRECEDING AND CURRENT ROW
        ) AS tx_count_last_hour,
        
        -- Total amount last hour
        SUM(transaction_amount) OVER (
            PARTITION BY customer_id
            ORDER BY transaction_timestamp
            RANGE BETWEEN INTERVAL '1' HOUR PRECEDING AND CURRENT ROW
        ) AS amount_last_hour
        
    FROM transactions
    WHERE transaction_date = '2024-12-20'
)
SELECT
    *,
    
    -- Fraud risk score (0-100)
    CASE
        WHEN minutes_since_last_tx < 1 THEN 20
        ELSE 0
    END +
    CASE
        WHEN ABS(amount_diff_from_avg) > 
             AVG(transaction_amount) OVER (
                 PARTITION BY customer_id
             ) * 3
        THEN 30
        ELSE 0
    END +
    CASE
        WHEN rapid_location_change = 1 THEN 25
        ELSE 0
    END +
    CASE
        WHEN tx_count_last_hour > 10 THEN 15
        WHEN tx_count_last_hour > 5 THEN 10
        ELSE 0
    END +
    CASE
        WHEN amount_last_hour > 10000 THEN 10
        ELSE 0
    END AS fraud_risk_score
    
FROM TransactionAnalysis
WHERE fraud_risk_score >= 50  -- High risk threshold
ORDER BY fraud_risk_score DESC;

11. Window Functions vs GROUP BY vs Self-Joins

Performance en Readability Comparison

Vergelijking van verschillende benaderingen voor complexe analytics helpt bij het kiezen van de juiste oplossing voor elk scenario.

Side-by-Side Comparison

-- USE CASE: Running total per customer

-- 1. WINDOW FUNCTION (BEST)
SELECT
    customer_id,
    order_date,
    order_amount,
    SUM(order_amount) OVER (
        PARTITION BY customer_id
        ORDER BY order_date
    ) AS running_total
FROM orders
ORDER BY customer_id, order_date;

-- 2. SELF-JOIN (INEFFICIENT)
SELECT
    o1.customer_id,
    o1.order_date,
    o1.order_amount,
    SUM(o2.order_amount) AS running_total
FROM orders o1
JOIN orders o2 ON o1.customer_id = o2.customer_id
    AND o2.order_date <= o1.order_date
GROUP BY o1.customer_id, o1.order_date, o1.order_amount
ORDER BY o1.customer_id, o1.order_date;

-- 3. CORRELATED SUBQUERY (WORST)
SELECT
    customer_id,
    order_date,
    order_amount,
    (
        SELECT SUM(order_amount)
        FROM orders o2
        WHERE o2.customer_id = o1.customer_id
            AND o2.order_date <= o1.order_date
    ) AS running_total
FROM orders o1
ORDER BY customer_id, order_date;

-- PERFORMANCE STATISTIEKEN (voorbeeld op 1M rijen):
-- Window Function: 2 seconden, 1 tabel scan
-- Self-Join: 45 seconden, O(n²) complexity
-- Correlated Subquery: 3+ minuten, n subqueries

-- USE CASE: Gap analysis (days between orders)

-- 1. WINDOW FUNCTION (CLEAN)
SELECT
    customer_id,
    order_date,
    LAG(order_date) OVER (
        PARTITION BY customer_id
        ORDER BY order_date
    ) AS previous_order_date,
    DATEDIFF(
        day,
        LAG(order_date) OVER (
            PARTITION BY customer_id
            ORDER BY order_date
        ),
        order_date
    ) AS days_since_previous
FROM orders;

-- 2. ALTERNATIVE APPROACHES zijn veel complexer en minder efficient

-- KEY TAKEAWAYS:
-- 1. Window functions zijn meestal de beste keuze
-- 2. Ze verbeteren performance dramatisch
-- 3. Code is leesbaarder en onderhoudbaarder
-- 4. Minder kans op bugs en errors
-- 5. Betere optimizer support in moderne databases

Conclusie en Best Practices Checklist

Window Functions Cheat Sheet

Ranking Functions
  • ROW_NUMBER() - Unieke nummers
  • RANK() - Gelijk bij ties, met gaps
  • DENSE_RANK() - Gelijk bij ties, geen gaps
  • NTILE(n) - Verdeel in n groepen
Analytic Functions
  • LEAD(expr, n) - Volgende rij
  • LAG(expr, n) - Vorige rij
  • FIRST_VALUE(expr) - Eerste in window
  • LAST_VALUE(expr) - Laatste in window
Aggregate Functions
  • SUM(expr) OVER - Running total
  • AVG(expr) OVER - Moving average
  • COUNT(expr) OVER - Cumulative count
  • MIN/MAX(expr) OVER - Running min/max

Mastery Checklist

  • Begrijp het verschil tussen ROWS en RANGE
  • Weet wanneer PARTITION BY te gebruiken
  • Beheers window frame specificatie
  • Kan complexe business scenarios oplossen
  • Begrijp performance implicaties
  • Kan window functions combineren
  • Weet alternatieve benaderingen te vermijden
  • Kan query's debuggen en optimaliseren

Veelgestelde Vragen (FAQ)

Q: Zijn window functions beschikbaar in alle databases?

A: Ja, window functions worden ondersteund in: PostgreSQL, MySQL 8.0+, SQL Server 2005+, Oracle, SQLite 3.25+, Snowflake, Redshift, BigQuery. Alleen MySQL vóór versie 8.0 heeft beperkte ondersteuning.

Q: Kan ik window functions gebruiken in WHERE clause?

A: Nee, window functions kunnen niet direct in WHERE gebruikt worden. Gebruik een subquery of CTE: SELECT * FROM (SELECT ..., ROW_NUMBER() OVER() rn FROM table) WHERE rn = 1

Q: Wat is het verschil tussen ROWS en RANGE bij dubbele waarden?

A: Bij dubbele ORDER BY waarden: ROWS behandelt elke rij apart, RANGE groepeert rijen met gelijke waarden. RANGE kan leiden tot onverwachte resultaten bij ties.

Q: Hoe sorteer ik window function resultaten?

A: Window function ORDER BY bepaalt de window volgorde. Query ORDER BY bepaalt de uiteindelijke uitvoervolgorde. Ze kunnen verschillend zijn.