高效使用索引的首要條件是知道什么樣的查詢會使用到索引,這個問題和B+Tree中的“最左前綴原理”有關,下面通過例子說明最左前綴原理。
這里先說一下聯合索引的概念。在上文中,我們都是假設索引只引用了單個的列,實際上,MySQL中的索引可以以一定順序引用多個列,這種索引叫做聯合索引,一般的,一個聯合索引是一個有序元組,其中各個元素均為數據表的一列,實際上要嚴格定義索引需要用到關系代數,但是這里我不想討論太多關系代數的話題,因為那樣會顯得很枯燥,所以這里就不再做嚴格定義。另外,單列索引可以看成聯合索引元素數為1的特例。
以employees.titles表為例,下面先查看其上都有哪些索引:
~~~
SHOW INDEX FROM employees.titles;
+--------+------------+----------+--------------+-------------+-----------+-------------+------+------------+
| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Null | Index_type |
+--------+------------+----------+--------------+-------------+-----------+-------------+------+------------+
| titles | 0 | PRIMARY | 1 | emp_no | A | NULL | | BTREE |
| titles | 0 | PRIMARY | 2 | title | A | NULL | | BTREE |
| titles | 0 | PRIMARY | 3 | from_date | A | 443308 | | BTREE |
| titles | 1 | emp_no | 1 | emp_no | A | 443308 | | BTREE |
+--------+------------+----------+--------------+-------------+-----------+-------------+------+------------+
~~~
從結果中可以到titles表的主索引為,還有一個輔助索引。為了避免多個索引使事情變復雜(MySQL的SQL優化器在多索引時行為比較復雜),這里我們將輔助索引drop掉:
~~~
ALTER TABLE employees.titles DROP INDEX emp_no;
~~~
這樣就可以專心分析索引PRIMARY的行為了。
## 情況一:全列匹配。
~~~
EXPLAIN SELECT * FROM employees.titles WHERE emp_no='10001' AND title='Senior Engineer' AND from_date='1986-06-26';
+----+-------------+--------+-------+---------------+---------+---------+-------------------+------+-------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+--------+-------+---------------+---------+---------+-------------------+------+-------+
| 1 | SIMPLE | titles | const | PRIMARY | PRIMARY | 59 | const,const,const | 1 | |
+----+-------------+--------+-------+---------------+---------+---------+-------------------+------+-------+
~~~
很明顯,當按照索引中所有列進行精確匹配(這里精確匹配指“=”或“IN”匹配)時,索引可以被用到。這里有一點需要注意,理論上索引對順序是敏感的,但是由于MySQL的查詢優化器會自動調整where子句的條件順序以使用適合的索引,例如我們將where中的條件順序顛倒:
~~~
EXPLAIN SELECT * FROM employees.titles WHERE from_date='1986-06-26' AND emp_no='10001' AND title='Senior Engineer';
+----+-------------+--------+-------+---------------+---------+---------+-------------------+------+-------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+--------+-------+---------------+---------+---------+-------------------+------+-------+
| 1 | SIMPLE | titles | const | PRIMARY | PRIMARY | 59 | const,const,const | 1 | |
+----+-------------+--------+-------+---------------+---------+---------+-------------------+------+-------+
~~~
效果是一樣的。
## 情況二:最左前綴匹配。
~~~
EXPLAIN SELECT * FROM employees.titles WHERE emp_no='10001';
+----+-------------+--------+------+---------------+---------+---------+-------+------+-------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+--------+------+---------------+---------+---------+-------+------+-------+
| 1 | SIMPLE | titles | ref | PRIMARY | PRIMARY | 4 | const | 1 | |
+----+-------------+--------+------+---------------+---------+---------+-------+------+-------+
~~~
當查詢條件精確匹配索引的左邊連續一個或幾個列時,如或,所以可以被用到,但是只能用到一部分,即條件所組成的最左前綴。上面的查詢從分析結果看用到了PRIMARY索引,但是key_len為4,說明只用到了索引的第一列前綴。
## 情況三:查詢條件用到了索引中列的精確匹配,但是中間某個條件未提供。
~~~
EXPLAIN SELECT * FROM employees.titles WHERE emp_no='10001' AND from_date='1986-06-26';
+----+-------------+--------+------+---------------+---------+---------+-------+------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+--------+------+---------------+---------+---------+-------+------+-------------+
| 1 | SIMPLE | titles | ref | PRIMARY | PRIMARY | 4 | const | 1 | Using where |
+----+-------------+--------+------+---------------+---------+---------+-------+------+-------------+
~~~
此時索引使用情況和情況二相同,因為title未提供,所以查詢只用到了索引的第一列,而后面的from_date雖然也在索引中,但是由于title不存在而無法和左前綴連接,因此需要對結果進行掃描過濾from_date(這里由于emp_no唯一,所以不存在掃描)。如果想讓from_date也使用索引而不是where過濾,可以增加一個輔助索引,此時上面的查詢會使用這個索引。除此之外,還可以使用一種稱之為“隔離列”的優化方法,將emp_no與from_date之間的“坑”填上。
首先我們看下title一共有幾種不同的值:
~~~
SELECT DISTINCT(title) FROM employees.titles;
+--------------------+
| title |
+--------------------+
| Senior Engineer |
| Staff |
| Engineer |
| Senior Staff |
| Assistant Engineer |
| Technique Leader |
| Manager |
+--------------------+
~~~
只有7種。在這種成為“坑”的列值比較少的情況下,可以考慮用“IN”來填補這個“坑”從而形成最左前綴:
~~~
EXPLAIN SELECT * FROM employees.titles
WHERE emp_no='10001'
AND title IN ('Senior Engineer', 'Staff', 'Engineer', 'Senior Staff', 'Assistant Engineer', 'Technique Leader', 'Manager')
AND from_date='1986-06-26';
+----+-------------+--------+-------+---------------+---------+---------+------+------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+--------+-------+---------------+---------+---------+------+------+-------------+
| 1 | SIMPLE | titles | range | PRIMARY | PRIMARY | 59 | NULL | 7 | Using where |
+----+-------------+--------+-------+---------------+---------+---------+------+------+-------------+
~~~
這次key_len為59,說明索引被用全了,但是從type和rows看出IN實際上執行了一個range查詢,這里檢查了7個key。看下兩種查詢的性能比較:
~~~
SHOW PROFILES;
+----------+------------+-------------------------------------------------------------------------------+
| Query_ID | Duration | Query |
+----------+------------+-------------------------------------------------------------------------------+
| 10 | 0.00058000 | SELECT * FROM employees.titles WHERE emp_no='10001' AND from_date='1986-06-26'|
| 11 | 0.00052500 | SELECT * FROM employees.titles WHERE emp_no='10001' AND title IN ... |
+----------+------------+-------------------------------------------------------------------------------+
~~~
“填坑”后性能提升了一點。如果經過emp_no篩選后余下很多數據,則后者性能優勢會更加明顯。當然,如果title的值很多,用填坑就不合適了,必須建立輔助索引。
## 情況四:查詢條件沒有指定索引第一列。
~~~
EXPLAIN SELECT * FROM employees.titles WHERE from_date='1986-06-26';
+----+-------------+--------+------+---------------+------+---------+------+--------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+--------+------+---------------+------+---------+------+--------+-------------+
| 1 | SIMPLE | titles | ALL | NULL | NULL | NULL | NULL | 443308 | Using where |
+----+-------------+--------+------+---------------+------+---------+------+--------+-------------+
~~~
由于不是最左前綴,索引這樣的查詢顯然用不到索引。
## 情況五:匹配某列的前綴字符串。
~~~
EXPLAIN SELECT * FROM employees.titles WHERE emp_no='10001' AND title LIKE 'Senior%';
+----+-------------+--------+-------+---------------+---------+---------+------+------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+--------+-------+---------------+---------+---------+------+------+-------------+
| 1 | SIMPLE | titles | range | PRIMARY | PRIMARY | 56 | NULL | 1 | Using where |
+----+-------------+--------+-------+---------------+---------+---------+------+------+-------------+
~~~
此時可以用到索引,~~但是如果通配符不是只出現在末尾,則無法使用索引。~~(原文表述有誤,如果通配符%不出現在開頭,則可以用到索引,但根據具體情況不同可能只會用其中一個前綴)
## 情況六:范圍查詢。
~~~
EXPLAIN SELECT * FROM employees.titles WHERE emp_no < '10010' and title='Senior Engineer';
+----+-------------+--------+-------+---------------+---------+---------+------+------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+--------+-------+---------------+---------+---------+------+------+-------------+
| 1 | SIMPLE | titles | range | PRIMARY | PRIMARY | 4 | NULL | 16 | Using where |
+----+-------------+--------+-------+---------------+---------+---------+------+------+-------------+
~~~
范圍列可以用到索引(必須是最左前綴),但是范圍列后面的列無法用到索引。同時,索引最多用于一個范圍列,因此如果查詢條件中有兩個范圍列則無法全用到索引。
~~~
EXPLAIN SELECT * FROM employees.titles
WHERE emp_no < '10010'
AND title='Senior Engineer'
AND from_date BETWEEN '1986-01-01' AND '1986-12-31';
+----+-------------+--------+-------+---------------+---------+---------+------+------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+--------+-------+---------------+---------+---------+------+------+-------------+
| 1 | SIMPLE | titles | range | PRIMARY | PRIMARY | 4 | NULL | 16 | Using where |
+----+-------------+--------+-------+---------------+---------+---------+------+------+-------------+
~~~
可以看到索引對第二個范圍索引無能為力。這里特別要說明MySQL一個有意思的地方,那就是僅用explain可能無法區分范圍索引和多值匹配,因為在type中這兩者都顯示為range。同時,用了“between”并不意味著就是范圍查詢,例如下面的查詢:
~~~
EXPLAIN SELECT * FROM employees.titles
WHERE emp_no BETWEEN '10001' AND '10010'
AND title='Senior Engineer'
AND from_date BETWEEN '1986-01-01' AND '1986-12-31';
+----+-------------+--------+-------+---------------+---------+---------+------+------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+--------+-------+---------------+---------+---------+------+------+-------------+
| 1 | SIMPLE | titles | range | PRIMARY | PRIMARY | 59 | NULL | 16 | Using where |
+----+-------------+--------+-------+---------------+---------+---------+------+------+-------------+
~~~
看起來是用了兩個范圍查詢,但作用于emp_no上的“BETWEEN”實際上相當于“IN”,也就是說emp_no實際是多值精確匹配。可以看到這個查詢用到了索引全部三個列。因此在MySQL中要謹慎地區分多值匹配和范圍匹配,否則會對MySQL的行為產生困惑。
## 情況七:查詢條件中含有函數或表達式。
很不幸,如果查詢條件中含有函數或表達式,則MySQL不會為這列使用索引(雖然某些在數學意義上可以使用)。例如:
~~~
EXPLAIN SELECT * FROM employees.titles WHERE emp_no='10001' AND left(title, 6)='Senior';
+----+-------------+--------+------+---------------+---------+---------+-------+------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+--------+------+---------------+---------+---------+-------+------+-------------+
| 1 | SIMPLE | titles | ref | PRIMARY | PRIMARY | 4 | const | 1 | Using where |
+----+-------------+--------+------+---------------+---------+---------+-------+------+-------------+
~~~
雖然這個查詢和情況五中功能相同,但是由于使用了函數left,則無法為title列應用索引,而情況五中用LIKE則可以。再如:
~~~
EXPLAIN SELECT * FROM employees.titles WHERE emp_no - 1='10000';
+----+-------------+--------+------+---------------+------+---------+------+--------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+--------+------+---------------+------+---------+------+--------+-------------+
| 1 | SIMPLE | titles | ALL | NULL | NULL | NULL | NULL | 443308 | Using where |
+----+-------------+--------+------+---------------+------+---------+------+--------+-------------+
~~~
顯然這個查詢等價于查詢emp_no為10001的函數,但是由于查詢條件是一個表達式,MySQL無法為其使用索引。看來MySQL還沒有智能到自動優化常量表達式的程度,因此在寫查詢語句時盡量避免表達式出現在查詢中,而是先手工私下代數運算,轉換為無表達式的查詢語句。