优化案例：CASE WHEN进行 SQL 改写优化

168主编 发表于 2017-7-4 10:34:20

导读今天给大家分享一个通过SQL改写而独辟蹊径的SQL优化案例。待优化场景发现SLOW QUERY LOG中有下面这样一条记录：...
# Query_time: 59.503827Lock_time: 0.000198Rows_sent: 641227Rows_examined: 13442472Rows_affected: 0
...
select uid,sum(power) powerup from t1 where
date>='2017-03-31' and
UNIX_TIMESTAMP(STR_TO_DATE(concat(date,' ',hour),'%Y-%m-%d %H'))>=1490965200 and
UNIX_TIMESTAMP(STR_TO_DATE(concat(date,' ',hour),'%Y-%m-%d %H'))<1492174801and
aType in (1,6,9) group by uid;
实话说，看到这个SQL我也忍不住想骂人啊，究竟是哪个脑残的XX狗设计的？竟然把日期时间中的 date 和 hour 给独立出来成两列，查询时再合并成一个新的条件，简直无力吐槽。吐槽归吐槽，该干活还得干活，谁让咱是DBA呢，SQL优化是咱的拿手好戏不是嘛~SQL优化之路SQL优化思路不厌其烦地再说一遍SQL优化思路。想要优化一个SQL，一般来说就是先看执行计划，观察是否尽可能用到索引，同时要关注预计扫描的行数，以及是否产生了临时表（Using temporary）或者是否需要进行排序（Using filesort），想办法消除这些情况。SQL性能瓶颈定位毫无疑问，想要优化，先看表DDL以及执行计划：CREATE TABLE `t1` (
`id` bigint(20) unsigned NOT NULL AUTO_INCREMENT,
`date` date NOT NULL DEFAULT '0000-00-00',
`hour` char(2) NOT NULL DEFAULT '00',
`kid` int(4) NOT NULL DEFAULT '0',
`uid` int(11) NOT NULL DEFAULT '0',
`aType` tinyint(2) NOT NULL DEFAULT '0',
`src` tinyint(2) NOT NULL DEFAULT '1',
`aid` int(11) NOT NULL DEFAULT '1',
`acount` int(11) NOT NULL DEFAULT '1',
`power` decimal(20,2) DEFAULT '0.00',
PRIMARY KEY (`id`,`date`),
UNIQUE KEY `did` (`date`,`hour`,`kid`,`uid`,`aType`,`src`,`aid`)
) ENGINE=InnoDB AUTO_INCREMENT=50486620 DEFAULT CHARSET=utf8mb4
/*!50500 PARTITION BY RANGECOLUMNS(`date`)
(PARTITION p20170316 VALUES LESS THAN ('2017-03-17') ENGINE = InnoDB,
PARTITION p20170317 VALUES LESS THAN ('2017-03-18') ENGINE = InnoDB
...

yejr@imysql.com> EXPLAIN select uid,sum(power) powerup from t1 where
date>='2017-03-31' and
UNIX_TIMESTAMP(STR_TO_DATE(concat(date,' ',hour),'%Y-%m-%d %H'))>=1490965200 and
UNIX_TIMESTAMP(STR_TO_DATE(concat(date,' ',hour),'%Y-%m-%d %H'))<1492174801and
aType in (1,6,9) group by uid\G
*************************** 1. row ***************************
      id: 1
select_type: SIMPLE
   table: t1
partitions: p20170324,p20170325,....all partition
      type: ALL
possible_keys: did
      key: NULL
   key_len: NULL
      ref: NULL
      rows: 25005577
filtered: 15.00
   Extra: Using where; Using temporary; Using filesort
明显的，这个SQL效率非常低，全表扫描、没有索引、有临时表、需要额外排序，什么倒霉催的全赶上了。优化思考这个SQL是想统计符合条件的power列总和，虽然 date 列已有索引，但WHERE子句中却对 date 列加了函数，而且还是 date 和 hour 两列的组合条件，那就无法用到这个索引了。还好，有个聪明伶俐的妹子，突发起想（事实上这位妹子本来就擅长做SQL优化的~），可以用 CASE WHEN 方法来改造下SQL，改成像下面这样的：select uid,sum(powerup+powerup1) from
(
select uid,
      case when concat(date,' ',hour) >='2017-03-24 13:00' then power else '0' end as powerup,
      case when concat(date,' ',hour) < '2017-03-25 13:00' then power else '0' end as powerup1
from t1
where date>='2017-03-24'
and date <'2017-03-25'
andaType in (1,6,9)
) agroup by uid;
是不是很有才，直接把这个没办法用到索引的条件给用CASE WHEN来改造了。看看新的SQL执行计划：*************************** 1. row ***************************
      id: 1
select_type: SIMPLE
   table: t1
partitions: p20170324
      type: range
possible_keys: did
      key: idx2_date_addRedType
   key_len: 4
      ref: NULL
      rows: 876375
filtered: 30.00
   Extra: Using index condition; Using temporary; Using filesort
看看这个SQL的执行代价：+----------------------------+---------+
| Variable_name          | Value |
+----------------------------+---------+
| Handler_read_first       | 1    |
| Handler_read_key       | 1834590 |
| Handler_read_last       | 0    |
| Handler_read_next       | 1834589 |
| Handler_read_prev       | 0    |
| Handler_read_rnd       | 232276|
| Handler_read_rnd_next    | 232277|
+----------------------------+---------+
及其SLOW QUERY LOG记录的信息：# Query_time: 6.381254Lock_time: 0.000166Rows_sent: 232276Rows_examined: 2299141Rows_affected: 0
# Bytes_sent: 4237347Tmp_tables: 1Tmp_disk_tables: 0Tmp_table_sizes: 4187168
# InnoDB_trx_id: 0
# QC_Hit: NoFull_scan: NoFull_join: NoTmp_table: YesTmp_table_on_disk: No
# Filesort: YesFilesort_on_disk: NoMerge_passes: 0
# InnoDB_IO_r_ops: 0InnoDB_IO_r_bytes: 0InnoDB_IO_r_wait: 0.000000
# InnoDB_rec_lock_wait: 0.000000InnoDB_queue_wait: 0.000000
# InnoDB_pages_distinct: 9311
看起来还不是太理想啊，虽然不再扫描全表了，但毕竟还是有临时表和额外排序，想办法消除后再对比看下。有个变化不知道大家注意到没，新的SLOW QUERY LOG记录多了不少信息，这是因为用了Percona分支版本的插件才支持，这个功能确实不错，甚至还能记录Profiling的详细信息，强烈推荐。我们新建个 uid 列上的索引，看看能除临时表及排序后的代价如何，看看这个的开销会不会更低。yejr@imysql.com> ALTER TABLE t1 ADD INDEX idx_uid(uid);
yejr@imysql.com> EXPLAIN select uid,sum(powerup+powerup1) from
(
select uid,
      case when concat(date,' ',hour) >='2017-03-24 13:00' then power else '0' end as powerup,
      case when concat(date,' ',hour) < '2017-03-25 13:00' then power else '0' end as powerup1
from t1
where date>='2017-03-24'
and date <'2017-03-25'
andaType in (1,6,9)
) agroup by uid\G

*************************** 1. row ***************************
      id: 1
select_type: SIMPLE
   table: if_date_hour_army_count
partitions: p20170331,p20170401...
      type: index
possible_keys: did,idx_uid
      key: idx_uid
   key_len: 4
      ref: NULL
      rows: 12701520
filtered: 15.00
   Extra: Using where
看看添加索引后SQL的执行代价：+----------------------------+---------+
| Variable_name          | Value |
+----------------------------+---------+
| Handler_read_first       | 1    |
| Handler_read_key       | 1    |
| Handler_read_last       | 0    |
| Handler_read_next       | 1834589 |
| Handler_read_prev       | 0    |
| Handler_read_rnd       | 0    |
| Handler_read_rnd_next    | 0    |
+----------------------------+---------+及其SLOW QUERY LOG记录的信息：# Query_time: 5.772286Lock_time: 0.000330Rows_sent: 232276Rows_examined: 1834589Rows_affected: 0
# Bytes_sent: 4215071Tmp_tables: 0Tmp_disk_tables: 0Tmp_table_sizes: 0
# InnoDB_trx_id: 0
# QC_Hit: NoFull_scan: YesFull_join: NoTmp_table: NoTmp_table_on_disk: No
# Filesort: NoFilesort_on_disk: NoMerge_passes: 0
# InnoDB_IO_r_ops: 0InnoDB_IO_r_bytes: 0InnoDB_IO_r_wait: 0.000000
# InnoDB_rec_lock_wait: 0.000000InnoDB_queue_wait: 0.000000
# InnoDB_pages_distinct: 11470
我们注意到，虽然加了 uid 列索引后的 SQL 扫描的 data page 更多了，但执行效率其实是更高的，因为消除了临时表和额外排序，这从 Handlerread% 的结果中也能看出来，很显然它的顺序I/O更多，随机I/O更少，所以虽然需要扫描的 data page 更多，实际上效率却是更快的。
后记再想想这个SQL还有优化空间吗，显然是有的，那就是把数据表重新设计，将 date 和 hour 列整合到一起，这样就不用费劲的拼凑条件并且也能用到索引了。
来自：腾云阁

页: [1]

168大数据's Archiver

优化案例：CASE WHEN进行 SQL 改写优化