1、收到问题
今天收到研发人员咨询一个单表查询sql在测试环境查不出来,咨询我有没有优化方案
需求大概是这样的
报表需要呈现每种不同类型单据,每天第一单交易中止数据
类似于统计每个班人总成绩排名第一的性别为女的数据这种需求
2、问题复现
测试环境数据如下
erp_bill_index 交易单据主表 表数据有 220w条
略做删减 研发提交的sql如下
select * from
(
select *,row_number() over(partition by profileid,billtype,billdate order by billid ) as rn
from erp_bill_index
where profileid = 200006141 and billtype in(501,504)
) t
where rn = 1 and status = 30
limit 20
上述语句流程为通过窗口函数找到每天交易的第一张单据,然后再通过status字段判断是否是交易中止状态
执行时间为56s
执行计划为
派生表查询行数较多,还有排序
看一下这个语句的逻辑读与物理读
先记录一下执行之前的
执行后语句后看一下执行之后的
通过对比得知逻辑读增加 100w,物理读也增加了20w
关于
innodb_buffer_pool_read_requests
innodb_buffer_pool_reads
innodb_buffer_pool_read_requests
the number of logical read requests.
innodb_buffer_pool_reads
the number of logical reads that innodb could not satisfy from the buffer pool, and had to read directly from disk.
3、分析解决
通过需求描述与研发sql,第一直觉是可以通过松散索引扫描解决
选建立符合松散索引扫描规则的索引
alter table erp_bill_index add index idx_test1 (profileid,billtype,billdate,billid);
然后进行等价改写
with cte1 as -- 找出每个类型,每天的第一单
(
select profileid,billtype,billdate,min(billid) as billid from erp_bill_index
where profileid = 200006141 and billtype in(501,504)
group by profileid,billtype,billdate
)
select * from erp_bill_index a
inner join cte1 b on a.billid = b.billid
where a.profileid = 200006141 and a.status = 30
limit 20;
执行时间只需要0.031s
执行计划
可以从 extra = using index for group-by 看到用到了松散索引扫描
性能从 56s到0.031s 提升 1800倍
再来看一下逻辑读与物理读
执行改进sql之前的
执行改进sql之后的
物理读为0
逻辑读也只有6773
和改进前区别很大
这么大的性能提升,最后来讲讲怎么才能符合松散索引描扫描
必须符合的5点要求
1、the query is over a single table.
2、the group by names only columns that form a leftmost prefix of the index and no other columns. (if, instead of group by, the query has a distinct clause, all distinct attributes refer to columns that form a leftmost prefix of the index.) for example, if a table t1 has an index on (c1,c2,c3), loose index scan is applicable if the query has group by c1, c2. it is not applicable if the query has group by c2, c3 (the columns are not a leftmost prefix) or group by c1, c2, c4 (c4 is not in the index).
3、the only aggregate functions used in the select list (if any) are min() and max(), and all of them refer to the same column. the column must be in the index and must immediately follow the columns in the group by.
4、any other parts of the index than those from the group by referenced in the query must be constants (that is, they must be referenced in equalities with constants), except for the argument of min() or max() functions.
5、for columns in the index, full column values must be indexed, not just a prefix. for example, with c1 varchar(20), index (c1(10)), the index uses only a prefix of c1 values and cannot be used for loose index scan.
1、必须是单表查询
2、group by 字段必须是最左匹配,且group by字段都在索引里
3、聚合函数只能是min或max,如果min和max都有必须是同一列,且列也需要在索引里
4、where过滤必须是常量过滤,过滤的列也需要在索引里
5、对于字符字段,索引不能是前缀索引
官方文档还有实际的应用举例。直接点击上面链接查看
这里需要注意的是,松散索引扫描和索引区分度刚好是反着的。区分度越低性能越高,区分度越高性能越低。