LAG() 和 LEAD() 的用法:
一、基本概念
这两个函数用于访问结果集中当前行之前或之后的行数据,而无需使用自连接。
1. LAG() - 获取前一行的值
LAG(column, offset, default_value) OVER (
[PARTITION BY partition_expression]
ORDER BY order_expression
)
2. LEAD() - 获取后一行的值
LEAD(column, offset, default_value) OVER (
[PARTITION BY partition_expression]
ORDER BY order_expression
)
二、参数详解
| 参数 |
说明 |
|---|
column |
要访问的列名 |
offset |
偏移量(默认1,可选) |
default_value |
当没有前/后行时的默认值(可选) |
PARTITION BY |
分区子句,在每个分区内独立计算 |
ORDER BY |
排序子句,决定行的前后顺序 |
三、使用示例
示例数据
CREATE TABLE sales (
sale_date DATE,
product VARCHAR(50),
amount DECIMAL(10,2)
);
INSERT INTO sales VALUES
('2024-01-01', 'A', 100),
('2024-01-02', 'A', 150),
('2024-01-03', 'A', 120),
('2024-01-01', 'B', 200),
('2024-01-02', 'B', 180),
('2024-01-03', 'B', 220);
1. 基本用法
-- 获取前一天的销售额
SELECT
sale_date,
product,
amount,
LAG(amount) OVER (PARTITION BY product ORDER BY sale_date) as prev_amount,
-- 计算日环比增长率
ROUND((amount - LAG(amount) OVER (PARTITION BY product ORDER BY sale_date))
/ LAG(amount) OVER (PARTITION BY product ORDER BY sale_date) * 100, 2) as growth_rate
FROM sales
ORDER BY product, sale_date;
2. 指定偏移量
-- 获取前两天(偏移量为2)的数据
SELECT
sale_date,
product,
amount,
LAG(amount, 2) OVER (PARTITION BY product ORDER BY sale_date) as amount_2days_ago,
LEAD(amount, 2, 0) OVER (PARTITION BY product ORDER BY sale_date) as amount_2days_later
FROM sales;
3. 设置默认值
SELECT
sale_date,
product,
amount,
-- 当没有前一天数据时,使用0作为默认值
LAG(amount, 1, 0) OVER (PARTITION BY product ORDER BY sale_date) as prev_amount,
-- 当没有后一天数据时,使用NULL作为默认值
LEAD(amount, 1, NULL) OVER (PARTITION BY product ORDER BY sale_date) as next_amount
FROM sales;
四、实际应用场景
1. 计算环比/同比
-- 计算月度环比
WITH monthly_sales AS (
SELECT
DATE_TRUNC('month', sale_date) as month,
SUM(amount) as total_amount
FROM sales
GROUP BY DATE_TRUNC('month', sale_date)
)
SELECT
month,
total_amount,
LAG(total_amount) OVER (ORDER BY month) as prev_month_amount,
ROUND((total_amount - LAG(total_amount) OVER (ORDER BY month))
/ LAG(total_amount) OVER (ORDER BY month) * 100, 2) as mom_growth
FROM monthly_sales;
2. 查找连续变化
-- 查找销售额连续下降的产品
WITH sales_trend AS (
SELECT
sale_date,
product,
amount,
LAG(amount) OVER (PARTITION BY product ORDER BY sale_date) as prev_amount
FROM sales
)
SELECT *
FROM sales_trend
WHERE amount < prev_amount; -- 当前销售额小于前一天
3. 计算时间间隔
-- 计算用户连续登录的时间间隔
SELECT
user_id,
login_date,
LAG(login_date) OVER (PARTITION BY user_id ORDER BY login_date) as last_login,
DATEDIFF(day,
LAG(login_date) OVER (PARTITION BY user_id ORDER BY login_date),
login_date) as days_between_logins
FROM user_logins;
4. 数据填充(向前/向后填充)
-- 向前填充缺失值
SELECT
date,
COALESCE(value,
LAG(value) IGNORE NULLS OVER (ORDER BY date)) as filled_value
FROM data_with_nulls;
五、与相关函数的对比
| 函数 |
方向 |
是否必须排序 |
典型用途 |
|---|
| LAG() |
向前看(之前) |
是 |
访问前一行的值 |
| LEAD() |
向后看(之后) |
是 |
访问后一行的值 |
| FIRST_VALUE() |
分区第一行 |
可选 |
获取分区第一个值 |
| LAST_VALUE() |
分区最后一行 |
可选 |
获取分区最后一个值 |
六、性能优化建议
合理使用索引:确保
PARTITION BY 和
ORDER BY 的列有索引
避免多层嵌套:尽量减少窗口函数的嵌套层级
使用CTE或子查询:复杂计算分步骤进行
限制窗口范围:使用
ROWS BETWEEN 子句限制计算范围
七、常见注意事项
ORDER BY 是关键:必须指定 ORDER BY 才能确定行的前后顺序
默认值处理:注意 NULL 值的处理,使用
COALESCE() 或默认值参数
分区边界:
LAG() 和
LEAD() 不会跨越分区边界
性能影响:大量数据时,窗口函数可能有性能开销
八、复杂示例:计算移动平均
-- 计算3天的移动平均
SELECT
sale_date,
product,
amount,
AVG(amount) OVER (
PARTITION BY product
ORDER BY sale_date
ROWS BETWEEN 2 PRECEDING AND CURRENT ROW
) as moving_avg_3days,
-- 使用LAG和LEAD手动计算
(LAG(amount, 2, 0) OVER (PARTITION BY product ORDER BY sale_date) +
LAG(amount, 1, 0) OVER (PARTITION BY product ORDER BY sale_date) +
amount) / 3.0 as manual_moving_avg
FROM sales;
这两个函数是时间序列分析、趋势计算和数据对比的强大工具,掌握它们能大大简化许多复杂的SQL查询。