# Understanding EXPLAIN plans
> 原文:[https://docs.gitlab.com/ee/development/understanding_explain_plans.html](https://docs.gitlab.com/ee/development/understanding_explain_plans.html)
* [Nodes](#nodes)
* [Node statistics](#node-statistics)
* [Node types](#node-types)
* [Seq Scan](#seq-scan)
* [Index Only Scan](#index-only-scan)
* [Index Scan](#index-scan)
* [Bitmap Index Scan and Bitmap Heap scan](#bitmap-index-scan-and-bitmap-heap-scan)
* [Limit](#limit)
* [Sort](#sort)
* [Nested Loop](#nested-loop)
* [Optimising queries](#optimising-queries)
* [Queries that can’t be optimised](#queries-that-cant-be-optimised)
* [Cardinality and selectivity](#cardinality-and-selectivity)
* [Rewriting queries](#rewriting-queries)
* [What makes a bad plan](#what-makes-a-bad-plan)
* [Producing query plans](#producing-query-plans)
* [Rails console](#rails-console)
* [ChatOps](#chatops)
* [`#database-lab`](#database-lab)
* [Tips & Tricks](#tips--tricks)
* [Further reading](#further-reading)
# Understanding EXPLAIN plans[](#understanding-explain-plans "Permalink")
PostgreSQL 允許您使用`EXPLAIN`命令獲得查詢計劃. 嘗試確定查詢的執行方式時,此命令非常有用. 您可以在 SQL 查詢中直接使用此命令,只要查詢以它開頭即可:
```
EXPLAIN
SELECT COUNT(*)
FROM projects
WHERE visibility_level IN (0, 20);
```
在 GitLab.com 上運行時,將顯示以下輸出:
```
Aggregate (cost=922411.76..922411.77 rows=1 width=8)
-> Seq Scan on projects (cost=0.00..908044.47 rows=5746914 width=0)
Filter: (visibility_level = ANY ('{0,20}'::integer[]))
```
當*僅*使用`EXPLAIN` ,PostgreSQL 實際上不會執行我們的查詢,而是會根據可用的統計信息生成一個*估計的*執行計劃. 這意味著實際計劃可能相差很大. 幸運的是,PostgreSQL 還為我們提供了執行查詢的選項. 為此,我們需要使用`EXPLAIN ANALYZE`而不是`EXPLAIN` :
```
EXPLAIN ANALYZE
SELECT COUNT(*)
FROM projects
WHERE visibility_level IN (0, 20);
```
這將產生:
```
Aggregate (cost=922420.60..922420.61 rows=1 width=8) (actual time=3428.535..3428.535 rows=1 loops=1)
-> Seq Scan on projects (cost=0.00..908053.18 rows=5746969 width=0) (actual time=0.041..2987.606 rows=5746940 loops=1)
Filter: (visibility_level = ANY ('{0,20}'::integer[]))
Rows Removed by Filter: 65677
Planning time: 2.861 ms
Execution time: 3428.596 ms
```
如我們所見,該計劃是完全不同的,并且包含許多數據. 讓我們逐步討論一下.
由于`EXPLAIN ANALYZE`執行查詢,因此在使用會寫入數據或可能會超時的查詢時應`EXPLAIN ANALYZE`小心. 如果查詢修改了數據,請考慮將其包裝在自動回滾的事務中,如下所示:
```
BEGIN;
EXPLAIN ANALYZE
DELETE FROM users WHERE id = 1;
ROLLBACK;
```
`EXPLAIN`命令還包含其他選項,例如`BUFFERS` :
```
EXPLAIN (ANALYZE, BUFFERS)
SELECT COUNT(*)
FROM projects
WHERE visibility_level IN (0, 20);
```
然后將產生:
```
Aggregate (cost=922420.60..922420.61 rows=1 width=8) (actual time=3428.535..3428.535 rows=1 loops=1)
Buffers: shared hit=208846
-> Seq Scan on projects (cost=0.00..908053.18 rows=5746969 width=0) (actual time=0.041..2987.606 rows=5746940 loops=1)
Filter: (visibility_level = ANY ('{0,20}'::integer[]))
Rows Removed by Filter: 65677
Buffers: shared hit=208846
Planning time: 2.861 ms
Execution time: 3428.596 ms
```
有關更多信息,請參閱官方的[`EXPLAIN`文檔](https://s0www0postgresql0org.icopy.site/docs/current/sql-explain.html)并[使用`EXPLAIN`指南](https://s0www0postgresql0org.icopy.site/docs/current/using-explain.html) .
## Nodes[](#nodes "Permalink")
每個查詢計劃都由節點組成. 節點可以嵌套,并且可以由內而外執行. 這意味著最內部的節點在外部節點之前執行. 最好將其視為嵌套函數調用,并在展開時返回其結果. 例如,一個計劃以`Aggregate`開頭,然后是`Nested Loop` ,然后是`Index Only scan`可以認為是以下 Ruby 代碼:
```
aggregate(
nested_loop(
index_only_scan()
index_only_scan()
)
)
```
節點使用`->`表示,后跟所采用的節點類型. 例如:
```
Aggregate (cost=922411.76..922411.77 rows=1 width=8)
-> Seq Scan on projects (cost=0.00..908044.47 rows=5746914 width=0)
Filter: (visibility_level = ANY ('{0,20}'::integer[]))
```
這里執行的第一個節點是`Seq scan on projects` . `Filter:`是應用于節點結果的附加過濾器. 過濾器與 Ruby 的`Array#select`非常相似:它接受輸入行,應用過濾器,并生成新的行列表. 節點完成后,我們將在其上方執行`Aggregate` .
嵌套節點將如下所示:
```
Aggregate (cost=176.97..176.98 rows=1 width=8) (actual time=0.252..0.252 rows=1 loops=1)
Buffers: shared hit=155
-> Nested Loop (cost=0.86..176.75 rows=87 width=0) (actual time=0.035..0.249 rows=36 loops=1)
Buffers: shared hit=155
-> Index Only Scan using users_pkey on users users_1 (cost=0.43..4.95 rows=87 width=4) (actual time=0.029..0.123 rows=36 loops=1)
Index Cond: (id < 100)
Heap Fetches: 0
-> Index Only Scan using users_pkey on users (cost=0.43..1.96 rows=1 width=4) (actual time=0.003..0.003 rows=1 loops=36)
Index Cond: (id = users_1.id)
Heap Fetches: 0
Planning time: 2.585 ms
Execution time: 0.310 ms
```
在這里,我們首先執行兩次單獨的"僅索引"掃描,然后對這兩次掃描的結果執行"嵌套循環".
## Node statistics[](#node-statistics "Permalink")
計劃中的每個節點都有一組關聯的統計信息,例如成本,產生的行數,執行的循環數等等. 例如:
```
Seq Scan on projects (cost=0.00..908044.47 rows=5746914 width=0)
```
在這里,我們可以看到我們的成本范圍為`0.00..908044.47` (稍后我們將對此進行介紹),并且我們估計(因為我們使用的是`EXPLAIN`而不是`EXPLAIN ANALYZE` ),因此此節點將產生總計 5,746,914 行. `width`統計信息描述了每行的估計寬度,以字節為單位.
`costs`字段指定節點的價格. 成本以查詢計劃者的成本參數確定的任意單位衡量. 影響成本的因素取決于各種設置,例如`seq_page_cost` , `cpu_tuple_cost`和其他各種設置. 費用字段的格式如下:
```
STARTUP COST..TOTAL COST
```
The startup cost states how expensive it was to start the node, with the total cost describing how expensive the entire node was. In general: the greater the values, the more expensive the node.
使用`EXPLAIN ANALYZE` ,這些統計信息還將包括實際花費的時間(以毫秒為單位)以及其他運行時統計信息(例如,產生的行的實際數量):
```
Seq Scan on projects (cost=0.00..908053.18 rows=5746969 width=0) (actual time=0.041..2987.606 rows=5746940 loops=1)
```
在這里我們可以看到估計返回了 5,746,969 行,但實際上,我們返回了 5,746,940 行. 我們還可以看到, *僅*此順序掃描就花費了 2.98 秒.
使用`EXPLAIN (ANALYZE, BUFFERS)`還將為我們提供有關由過濾器刪除的行數,使用的緩沖區數等信息. 例如:
```
Seq Scan on projects (cost=0.00..908053.18 rows=5746969 width=0) (actual time=0.041..2987.606 rows=5746940 loops=1)
Filter: (visibility_level = ANY ('{0,20}'::integer[]))
Rows Removed by Filter: 65677
Buffers: shared hit=208846
```
在這里,我們可以看到我們的過濾器必須刪除 65,677 行,并使用 208,846 個緩沖區. PostgreSQL 中的每個緩沖區都是 8 KB(8192 字節),這意味著我們上面的節點使用*1.6 GB 的緩沖區* . 好多啊!
## Node types[](#node-types "Permalink")
有很多不同類型的節點,因此我們在這里僅介紹一些較常見的節點.
可以在[PostgreSQL 源文件`plannodes.h`](https://gitlab.com/postgres/postgres/blob/master/src/include/nodes/plannodes.h)找到所有可用節點及其描述的完整列表.
### Seq Scan[](#seq-scan "Permalink")
對數據庫表(的一部分)進行順序掃描. 這類似于使用`Array#each` ,但是在數據庫表上. 檢索大量行時,順序掃描可能會非常慢,因此,對于大型表,最好避免使用這些掃描.
### Index Only Scan[](#index-only-scan "Permalink")
對不需要從表中獲取任何內容的索引進行的掃描. 在某些情況下,僅索引掃描仍可能從表中獲取數據,在這種情況下,節點將包含" `Heap Fetches:`統計信息.
### Index Scan[](#index-scan "Permalink")
對索引的掃描,該索引需要從表中檢索一些數據.
### Bitmap Index Scan and Bitmap Heap scan[](#bitmap-index-scan-and-bitmap-heap-scan "Permalink")
位圖掃描介于順序掃描和索引掃描之間. 當我們從索引掃描中讀取太多數據但執行順序掃描時讀取的數據太少時,通常會使用它們. 位圖掃描使用所謂的[位圖索引](https://en.wikipedia.org/wiki/Bitmap_index)來執行其工作.
[PostgreSQL](https://gitlab.com/postgres/postgres/blob/REL_11_STABLE/src/include/nodes/plannodes.h#L441)的[源代碼在位](https://gitlab.com/postgres/postgres/blob/REL_11_STABLE/src/include/nodes/plannodes.h#L441)圖掃描中指出以下內容:
> 位圖索引掃描提供潛在元組位置的位圖; 它不訪問堆本身. 該位圖可能由祖先位圖堆掃描節點使用,可能是在經過中間位圖和和/或位圖或節點之后將其與其他位圖索引掃描的結果組合在一起的.
### Limit[](#limit "Permalink")
在輸入行上應用`LIMIT` .
### Sort[](#sort "Permalink")
使用`ORDER BY`語句對輸入行進行排序.
### Nested Loop[](#nested-loop "Permalink")
嵌套循環將針對其前面的節點產生的每一行執行其子節點. 例如:
```
-> Nested Loop (cost=0.86..176.75 rows=87 width=0) (actual time=0.035..0.249 rows=36 loops=1)
Buffers: shared hit=155
-> Index Only Scan using users_pkey on users users_1 (cost=0.43..4.95 rows=87 width=4) (actual time=0.029..0.123 rows=36 loops=1)
Index Cond: (id < 100)
Heap Fetches: 0
-> Index Only Scan using users_pkey on users (cost=0.43..1.96 rows=1 width=4) (actual time=0.003..0.003 rows=1 loops=36)
Index Cond: (id = users_1.id)
Heap Fetches: 0
```
在這里,第一個子節點( `Index Only Scan using users_pkey on users users_1` )產生 36 行,并執行一次( `rows=36 loops=1` ). 下一個節點產生 1 行( `rows=1` ),但重復 36 次( `loops=36` ). 這是因為前一個節點產生了 36 行.
這意味著,如果各個子節點繼續產生許多行,則嵌套循環會迅速降低查詢速度.
## Optimising queries[](#optimising-queries "Permalink")
順便說一句,讓我們看看如何優化查詢. 讓我們以以下查詢為例:
```
SELECT COUNT(*)
FROM users
WHERE twitter != '';
```
該查詢僅計算設置了 Twitter 個人資料的用戶數. 讓我們使用`EXPLAIN (ANALYZE, BUFFERS)`運行它:
```
EXPLAIN (ANALYZE, BUFFERS)
SELECT COUNT(*)
FROM users
WHERE twitter != '';
```
這將產生以下計劃:
```
Aggregate (cost=845110.21..845110.22 rows=1 width=8) (actual time=1271.157..1271.158 rows=1 loops=1)
Buffers: shared hit=202662
-> Seq Scan on users (cost=0.00..844969.99 rows=56087 width=0) (actual time=0.019..1265.883 rows=51833 loops=1)
Filter: ((twitter)::text <> ''::text)
Rows Removed by Filter: 2487813
Buffers: shared hit=202662
Planning time: 0.390 ms
Execution time: 1271.180 ms
```
從該查詢計劃中,我們可以看到以下內容:
1. 我們需要對`users`表執行順序掃描.
2. 此順序掃描使用`Filter`過濾掉 2,487,813 行.
3. 我們使用 202,622 個緩沖區,相當于 1.58 GB 的內存.
4. 完成所有這些操作需要 1.2 秒.
考慮到我們只是在計算用戶,這是相當昂貴的!
在開始進行任何更改之前,讓我們看一下`users`表上是否有可以使用的現有索引. 我們可以通過在`psql`控制臺中運行`\d users` ,然后向下滾動至`Indexes:`部分來獲取此信息:
```
Indexes:
"users_pkey" PRIMARY KEY, btree (id)
"index_users_on_confirmation_token" UNIQUE, btree (confirmation_token)
"index_users_on_email" UNIQUE, btree (email)
"index_users_on_reset_password_token" UNIQUE, btree (reset_password_token)
"index_users_on_static_object_token" UNIQUE, btree (static_object_token)
"index_users_on_unlock_token" UNIQUE, btree (unlock_token)
"index_on_users_name_lower" btree (lower(name::text))
"index_users_on_accepted_term_id" btree (accepted_term_id)
"index_users_on_admin" btree (admin)
"index_users_on_created_at" btree (created_at)
"index_users_on_email_trigram" gin (email gin_trgm_ops)
"index_users_on_feed_token" btree (feed_token)
"index_users_on_group_view" btree (group_view)
"index_users_on_incoming_email_token" btree (incoming_email_token)
"index_users_on_managing_group_id" btree (managing_group_id)
"index_users_on_name" btree (name)
"index_users_on_name_trigram" gin (name gin_trgm_ops)
"index_users_on_public_email" btree (public_email) WHERE public_email::text <> ''::text
"index_users_on_state" btree (state)
"index_users_on_state_and_user_type" btree (state, user_type)
"index_users_on_unconfirmed_email" btree (unconfirmed_email) WHERE unconfirmed_email IS NOT NULL
"index_users_on_user_type" btree (user_type)
"index_users_on_username" btree (username)
"index_users_on_username_trigram" gin (username gin_trgm_ops)
"tmp_idx_on_user_id_where_bio_is_filled" btree (id) WHERE COALESCE(bio, ''::character varying)::text IS DISTINCT FROM ''::text
```
在這里,我們可以看到`twitter`列上沒有索引,這意味著 PostgreSQL 在這種情況下必須執行順序掃描. 讓我們嘗試通過添加以下索引來解決此問題:
```
CREATE INDEX CONCURRENTLY twitter_test ON users (twitter);
```
如果現在使用`EXPLAIN (ANALYZE, BUFFERS)`重新運行查詢`EXPLAIN (ANALYZE, BUFFERS)`得到以下計劃:
```
Aggregate (cost=61002.82..61002.83 rows=1 width=8) (actual time=297.311..297.312 rows=1 loops=1)
Buffers: shared hit=51854 dirtied=19
-> Index Only Scan using twitter_test on users (cost=0.43..60873.13 rows=51877 width=0) (actual time=279.184..293.532 rows=51833 loops=1)
Filter: ((twitter)::text <> ''::text)
Rows Removed by Filter: 2487830
Heap Fetches: 26037
Buffers: shared hit=51854 dirtied=19
Planning time: 0.191 ms
Execution time: 297.334 ms
```
現在獲取數據只需不到 300 毫秒,而不是 1.2 秒. 但是,我們仍然使用 51,854 個緩沖區,這大約是 400 MB 的內存. 對于這種簡單的查詢,300 毫秒的速度也很慢. 要了解為什么此查詢仍然昂貴,讓我們看一下以下內容:
```
Index Only Scan using twitter_test on users (cost=0.43..60873.13 rows=51877 width=0) (actual time=279.184..293.532 rows=51833 loops=1)
Filter: ((twitter)::text <> ''::text)
Rows Removed by Filter: 2487830
```
我們從僅對索引進行索引掃描開始,但是我們仍然以某種方式應用了`Filter` ,該過濾`Filter`可過濾掉 2,487,830 行. 這是為什么? 好,讓我們看一下如何創建索引:
```
CREATE INDEX CONCURRENTLY twitter_test ON users (twitter);
```
我們只是簡單地告訴 PostgreSQL 索引`twitter`列的所有可能值,甚至是空字符串. 我們的查詢反過來使用`WHERE twitter != ''` . 這意味著索引確實可以改善事情,因為我們不需要進行順序掃描,但是我們仍然可能會遇到空字符串. 這意味著 PostgreSQL *必須*對索引結果應用過濾器以擺脫這些值.
幸運的是,我們可以使用"部分索引"來進一步改善它. 部分索引是在索引數據時具有`WHERE`條件的索引. 例如:
```
CREATE INDEX CONCURRENTLY some_index ON users (email) WHERE id < 100
```
該索引僅索引與`WHERE id < 100`相匹配的行的`email`值. 我們可以使用部分索引將我們的 Twitter 索引更改為以下內容:
```
CREATE INDEX CONCURRENTLY twitter_test ON users (twitter) WHERE twitter != '';
```
創建后,如果再次運行查詢,將得到以下計劃:
```
Aggregate (cost=1608.26..1608.27 rows=1 width=8) (actual time=19.821..19.821 rows=1 loops=1)
Buffers: shared hit=44036
-> Index Only Scan using twitter_test on users (cost=0.41..1479.71 rows=51420 width=0) (actual time=0.023..15.514 rows=51833 loops=1)
Heap Fetches: 1208
Buffers: shared hit=44036
Planning time: 0.123 ms
Execution time: 19.848 ms
```
*好多了* ! 現在僅需要 20 毫秒即可獲取數據,并且我們僅使用約 344 MB 的緩沖區(而不是原始的 1.58 GB). 之所以可行,是因為現在 PostgreSQL 不再需要應用`Filter` ,因為索引僅包含不為空的`twitter`值.
請記住,每次您要優化查詢時,不應該只添加部分索引. 每個索引都必須為每次寫入更新,并且它們可能需要相當多的空間,具體取決于索引數據的數量. 結果,首先檢查是否存在可以重用的現有索引. 如果沒有,請檢查是否可以略微更改現有查詢以適合現有查詢和新查詢. 僅當現有索引無法以任何方式使用時,才添加新索引.
## Queries that can’t be optimised[](#queries-that-cant-be-optimised "Permalink")
既然我們已經了解了如何優化查詢,讓我們看一下可能無法優化的另一個查詢:
```
EXPLAIN (ANALYZE, BUFFERS)
SELECT COUNT(*)
FROM projects
WHERE visibility_level IN (0, 20);
```
`EXPLAIN (ANALYZE, BUFFERS)`的輸出如下:
```
Aggregate (cost=922420.60..922420.61 rows=1 width=8) (actual time=3428.535..3428.535 rows=1 loops=1)
Buffers: shared hit=208846
-> Seq Scan on projects (cost=0.00..908053.18 rows=5746969 width=0) (actual time=0.041..2987.606 rows=5746940 loops=1)
Filter: (visibility_level = ANY ('{0,20}'::integer[]))
Rows Removed by Filter: 65677
Buffers: shared hit=208846
Planning time: 2.861 ms
Execution time: 3428.596 ms
```
查看輸出,我們看到以下過濾器:
```
Filter: (visibility_level = ANY ('{0,20}'::integer[]))
Rows Removed by Filter: 65677
```
查看過濾器刪除的行數,我們可能會想在`projects.visibility_level`上添加索引,以某種方式將此順序掃描+過濾器轉換為僅索引掃描.
不幸的是,這樣做不可能改善任何事情. 與某些人的看法相反,存在索引*并不能保證* PostgreSQL 會實際使用它. 例如,在執行`SELECT * FROM projects`時,僅掃描整個表而不是使用索引然后從表中獲取數據要便宜得多. 在這種情況下,PostgreSQL 可能會決定不使用索引.
其次,讓我們考慮一下查詢的作用:它使所有項目的可見性級別為 0 或 20.在上面的計劃中,我們可以看到生成了很多行(5,745,940),但相對于總行數是多少? 通過運行以下查詢來找出答案:
```
SELECT visibility_level, count(*) AS amount
FROM projects
GROUP BY visibility_level
ORDER BY visibility_level ASC;
```
對于 GitLab.com,這將產生:
```
visibility_level | amount
------------------+---------
0 | 5071325
10 | 65678
20 | 674801
```
這里的項目總數為 5,811,804,其中 5,746,126 為 0 或 20 級.這是整個表的 98%!
因此,無論我們做什么,此查詢都將檢索整個表的 98%. 由于大部分時間都花在做到這一點上,因此除了*完全不*運行查詢之外,我們幾乎沒有其他方法可以改進此查詢.
這里重要的是,盡管有些人可能建議您在看到順序掃描時立即直接添加索引,但*更重要的*是首先了解查詢的功能,檢索的數據量等等. 畢竟,您無法優化您不了解的內容.
### Cardinality and selectivity[](#cardinality-and-selectivity "Permalink")
早先我們看到查詢必須檢索表中 98%的行. 數據庫通常使用兩個術語:基數和選擇性. 基數是指表中特定列中唯一值的數量.
選擇性是相對于總行數,操作(例如索引掃描或過濾器)產生的唯一值的數量. 選擇性越高,PostgreSQL 使用索引的可能性就越大.
In the above example, there are only 3 unique values: 0, 10, and 20\. This means the cardinality is 3\. The selectivity in turn is also very low: 0.0000003% (2 / 5,811,804), because our `Filter` only filters using two values (`0` and `20`). With such a low selectivity value it’s not surprising that PostgreSQL decides using an index is not worth it, because it would produce almost no unique rows.
## Rewriting queries[](#rewriting-queries "Permalink")
因此,上述查詢無法真正按原樣進行優化,或者至少沒有太多優化. 但是,如果我們稍微改變它的目的怎么辦? 如果不是檢索`visibility_level` 0 或 20 的所有項目,而是檢索用戶與之交互的那些項目,該怎么辦?
幸運的是,GitLab 對此有一個答案,它是一個名為`user_interacted_projects`的表. 該表具有以下架構:
```
Table "public.user_interacted_projects"
Column | Type | Modifiers
------------+---------+-----------
user_id | integer | not null
project_id | integer | not null
Indexes:
"index_user_interacted_projects_on_project_id_and_user_id" UNIQUE, btree (project_id, user_id)
"index_user_interacted_projects_on_user_id" btree (user_id)
Foreign-key constraints:
"fk_rails_0894651f08" FOREIGN KEY (user_id) REFERENCES users(id) ON DELETE CASCADE
"fk_rails_722ceba4f7" FOREIGN KEY (project_id) REFERENCES projects(id) ON DELETE CASCADE
```
讓我們重寫查詢以將該表加入到我們的項目中,并獲取特定用戶的項目:
```
EXPLAIN ANALYZE
SELECT COUNT(*)
FROM projects
INNER JOIN user_interacted_projects ON user_interacted_projects.project_id = projects.id
WHERE projects.visibility_level IN (0, 20)
AND user_interacted_projects.user_id = 1;
```
我們在這里做的是以下幾點:
1. 獲得我們的項目.
2. INNER JOIN `user_interacted_projects` ,這意味著我們只剩下在`user_interacted_projects`中有對應行的`projects`中的行.
3. 將其限制為`visibility_level`為 0 或 20 的項目以及 ID 為 1 的用戶與之交互的項目.
如果運行此查詢,則會得到以下計劃:
```
Aggregate (cost=871.03..871.04 rows=1 width=8) (actual time=9.763..9.763 rows=1 loops=1)
-> Nested Loop (cost=0.86..870.52 rows=203 width=0) (actual time=1.072..9.748 rows=143 loops=1)
-> Index Scan using index_user_interacted_projects_on_user_id on user_interacted_projects (cost=0.43..160.71 rows=205 width=4) (actual time=0.939..2.508 rows=145 loops=1)
Index Cond: (user_id = 1)
-> Index Scan using projects_pkey on projects (cost=0.43..3.45 rows=1 width=4) (actual time=0.049..0.050 rows=1 loops=145)
Index Cond: (id = user_interacted_projects.project_id)
Filter: (visibility_level = ANY ('{0,20}'::integer[]))
Rows Removed by Filter: 0
Planning time: 2.614 ms
Execution time: 9.809 ms
```
在這里,僅花了不到 10 毫秒的時間即可獲取數據. 我們還可以看到我們正在檢索的項目要少得多:
```
Index Scan using projects_pkey on projects (cost=0.43..3.45 rows=1 width=4) (actual time=0.049..0.050 rows=1 loops=145)
Index Cond: (id = user_interacted_projects.project_id)
Filter: (visibility_level = ANY ('{0,20}'::integer[]))
Rows Removed by Filter: 0
```
在這里,我們看到執行了 145 個循環( `loops=145` ),每個循環產生 1 行( `rows=1` ). 這比以前少了很多,我們的查詢執行得更好!
如果我們看一下計劃,我們還會發現我們的成本非常低:
```
Index Scan using projects_pkey on projects (cost=0.43..3.45 rows=1 width=4) (actual time=0.049..0.050 rows=1 loops=145)
```
在這里,我們的成本僅為 3.45,而僅需 0.050 毫秒即可完成. 下一次索引掃描會貴一些:
```
Index Scan using index_user_interacted_projects_on_user_id on user_interacted_projects (cost=0.43..160.71 rows=205 width=4) (actual time=0.939..2.508 rows=145 loops=1)
```
此處的成本為 160.71( `cost=0.43..160.71` ),大約需要 2.5 毫秒(基于`actual time=....`的輸出`actual time=....` ).
這里最昂貴的部分是對這兩個索引掃描的結果起作用的"嵌套循環":
```
Nested Loop (cost=0.86..870.52 rows=203 width=0) (actual time=1.072..9.748 rows=143 loops=1)
```
在這里,我們必須對 203 行(9.748 毫秒)執行 870.52 磁盤頁讀取,在單個循環中產生 143 行.
這里的主要要點是,有時您必須重寫(部分)查詢以使其更好. 有時,這意味著必須稍微更改功能以適應更好的性能.
## What makes a bad plan[](#what-makes-a-bad-plan "Permalink")
這是一個很難回答的問題,因為"壞"的定義與您要解決的問題有關. 但是,在大多數情況下最好避免某些模式,例如:
* 大表上的順序掃描
* 刪除大量行的過濾器
* 執行某個步驟(例如索引掃描)需要*很多*緩沖區(例如,GitLab.com 大于 512 MB).
作為一般準則,請針對以下查詢:
1. 不超過 10 毫秒. 我們每個請求在 SQL 中花費的目標時間約為 100 毫秒,因此每個查詢應盡可能快.
2. 相對于工作負載,不使用過多的緩沖區. 例如,檢索十行應該不需要 1 GB 的緩沖區.
3. 不花費大量時間執行磁盤 IO 操作. 必須啟用設置`track_io_timing` ,此數據才能包含在`EXPLAIN ANALYZE`的輸出中.
4. 在檢索行而不對其進行匯總時應用`LIMIT` ,例如`SELECT * FROM users` .
5. 不使用`Filter`過濾掉過多的行,尤其是在查詢不使用`LIMIT`限制返回的行數的情況下. 通常可以通過添加(部分)索引來刪除過濾器.
這些是*準則* ,不是硬性要求,因為不同的需求可能需要不同的查詢. 唯一的*規則*是,您*必須始終*使用`EXPLAIN (ANALYZE, BUFFERS)`和相關工具*來衡量*您的查詢(最好使用類似生產的數據庫):
* [`explain.depesz.com`](https://explain.depesz.com/).
* [`explain.dalibo.com/`](https://explain.dalibo.com/).
## Producing query plans[](#producing-query-plans "Permalink")
有幾種獲取查詢計劃輸出的方法. 當然,您可以直接在`psql`控制臺中運行`EXPLAIN`查詢,或者可以遵循以下其他選項之一.
### Rails console[](#rails-console "Permalink")
Using the [`activerecord-explain-analyze`](https://github.com/6/activerecord-explain-analyze) you can directly generate the query plan from the Rails console:
```
pry(main)> require 'activerecord-explain-analyze'
=> true
pry(main)> Project.where('build_timeout > ?', 3600).explain(analyze: true)
Project Load (1.9ms) SELECT "projects".* FROM "projects" WHERE (build_timeout > 3600)
? (pry):12
=> EXPLAIN for: SELECT "projects".* FROM "projects" WHERE (build_timeout > 3600)
Seq Scan on public.projects (cost=0.00..2.17 rows=1 width=742) (actual time=0.040..0.041 rows=0 loops=1)
Output: id, name, path, description, created_at, updated_at, creator_id, namespace_id, ...
Filter: (projects.build_timeout > 3600)
Rows Removed by Filter: 14
Buffers: shared hit=2
Planning time: 0.411 ms
Execution time: 0.113 ms
```
### ChatOps[](#chatops "Permalink")
[GitLab 員工還可以使用`/chatops` slash 命令在 Slack 中使用我們的 ChatOps 解決方案](chatops_on_gitlabcom.html) . 您可以使用 ChatOps 通過運行以下命令來獲取查詢計劃:
```
/chatops run explain SELECT COUNT(*) FROM projects WHERE visibility_level IN (0, 20)
```
還支持使用[https://explain.depesz.com/](https://explain.depesz.com/)可視化計劃:
```
/chatops run explain --visual SELECT COUNT(*) FROM projects WHERE visibility_level IN (0, 20)
```
不需要查詢.
有關可用選項的更多信息,請運行:
```
/chatops run explain --help
```
### `#database-lab`[](#database-lab "Permalink")
GitLab 員工可以使用的另一種工具是由[Joe](https://gitlab.com/postgres-ai/joe)支持的聊天機器人,該機器人使用[Database Lab](https://gitlab.com/postgres-ai/database-lab)立即為開發人員提供他們自己的生產數據庫克隆.
Joe 在 Slack 的[`#database-lab`](https://gitlab.slack.com/archives/CLJMDRD8C)頻道中可用.
與 ChatOps 不同,它為您提供了一種執行 DDL 語句(如創建索引和表)并獲取查詢計劃的方法,該查詢計劃不僅適用于`SELECT`而且適用于`UPDATE`和`DELETE` .
例如,為了測試新索引,您可以執行以下操作:
創建索引:
```
exec CREATE INDEX index_projects_last_activity ON projects (last_activity_at) WHERE last_activity_at IS NOT NULL
```
分析表以更新其統計信息:
```
exec ANALYZE projects
```
獲取查詢計劃:
```
explain SELECT * FROM projects WHERE last_activity_at < CURRENT_DATE
```
完成后,您可以回滾您的更改:
```
reset
```
有關可用選項的更多信息,請運行:
```
help
```
#### Tips & Tricks[](#tips--tricks "Permalink")
現在,在整個會話期間都將維護數據庫連接,因此您可以對任何會話變量(例如`enable_seqscan`或`work_mem` )使用`exec set ...` 這些設置將應用于所有后續命令,直到您重置它們.
也可以使用事務. 當您處理修改數據的語句(例如 INSERT,UPDATE 和 DELETE)時,這可能很有用. `explain`命令將執行`EXPLAIN ANALYZE` ,該語句執行該語句. 為了從干凈狀態開始運行每個`explain` ,您可以將其包裝在事務中,例如:
```
exec BEGIN
explain UPDATE some_table SET some_column = TRUE
exec ROLLBACK
```
## Further reading[](#further-reading "Permalink")
在[Dalibo.org](https://www.dalibo.com/en/)的[演示中](https://public.dalibo.com/exports/conferences/_archives/_2012/201211_explain/understanding_explain.pdf)可以找到關于理解查詢計劃的更廣泛的指南.
- GitLab Docs
- Installation
- Requirements
- GitLab cloud native Helm Chart
- Install GitLab with Docker
- Installation from source
- Install GitLab on Microsoft Azure
- Installing GitLab on Google Cloud Platform
- Installing GitLab on Amazon Web Services (AWS)
- Analytics
- Code Review Analytics
- Productivity Analytics
- Value Stream Analytics
- Kubernetes clusters
- Adding and removing Kubernetes clusters
- Adding EKS clusters
- Adding GKE clusters
- Group-level Kubernetes clusters
- Instance-level Kubernetes clusters
- Canary Deployments
- Cluster Environments
- Deploy Boards
- GitLab Managed Apps
- Crossplane configuration
- Cluster management project (alpha)
- Kubernetes Logs
- Runbooks
- Serverless
- Deploying AWS Lambda function using GitLab CI/CD
- Securing your deployed applications
- Groups
- Contribution Analytics
- Custom group-level project templates
- Epics
- Manage epics
- Group Import/Export
- Insights
- Issues Analytics
- Iterations
- Public access
- SAML SSO for GitLab.com groups
- SCIM provisioning using SAML SSO for GitLab.com groups
- Subgroups
- Roadmap
- Projects
- GitLab Secure
- Security Configuration
- Container Scanning
- Dependency Scanning
- Dependency List
- Static Application Security Testing (SAST)
- Secret Detection
- Dynamic Application Security Testing (DAST)
- GitLab Security Dashboard
- Offline environments
- Standalone Vulnerability pages
- Security scanner integration
- Badges
- Bulk editing issues and merge requests at the project level
- Code Owners
- Compliance
- License Compliance
- Compliance Dashboard
- Create a project
- Description templates
- Deploy Keys
- Deploy Tokens
- File finder
- Project integrations
- Integrations
- Atlassian Bamboo CI Service
- Bugzilla Service
- Custom Issue Tracker service
- Discord Notifications service
- Enabling emails on push
- GitHub project integration
- Hangouts Chat service
- Atlassian HipChat
- Irker IRC Gateway
- GitLab Jira integration
- Mattermost Notifications Service
- Mattermost slash commands
- Microsoft Teams service
- Mock CI Service
- Prometheus integration
- Redmine Service
- Slack Notifications Service
- Slack slash commands
- GitLab Slack application
- Webhooks
- YouTrack Service
- Insights
- Issues
- Crosslinking Issues
- Design Management
- Confidential issues
- Due dates
- Issue Boards
- Issue Data and Actions
- Labels
- Managing issues
- Milestones
- Multiple Assignees for Issues
- Related issues
- Service Desk
- Sorting and ordering issue lists
- Issue weight
- Associate a Zoom meeting with an issue
- Merge requests
- Allow collaboration on merge requests across forks
- Merge Request Approvals
- Browser Performance Testing
- How to create a merge request
- Cherry-pick changes
- Code Quality
- Load Performance Testing
- Merge Request dependencies
- Fast-forward merge requests
- Merge when pipeline succeeds
- Merge request conflict resolution
- Reverting changes
- Reviewing and managing merge requests
- Squash and merge
- Merge requests versions
- Draft merge requests
- Members of a project
- Migrating projects to a GitLab instance
- Import your project from Bitbucket Cloud to GitLab
- Import your project from Bitbucket Server to GitLab
- Migrating from ClearCase
- Migrating from CVS
- Import your project from FogBugz to GitLab
- Gemnasium
- Import your project from GitHub to GitLab
- Project importing from GitLab.com to your private GitLab instance
- Import your project from Gitea to GitLab
- Import your Jira project issues to GitLab
- Migrating from Perforce Helix
- Import Phabricator tasks into a GitLab project
- Import multiple repositories by uploading a manifest file
- Import project from repo by URL
- Migrating from SVN to GitLab
- Migrating from TFVC to Git
- Push Options
- Releases
- Repository
- Branches
- Git Attributes
- File Locking
- Git file blame
- Git file history
- Repository mirroring
- Protected branches
- Protected tags
- Push Rules
- Reduce repository size
- Signing commits with GPG
- Syntax Highlighting
- GitLab Web Editor
- Web IDE
- Requirements Management
- Project settings
- Project import/export
- Project access tokens (Alpha)
- Share Projects with other Groups
- Snippets
- Static Site Editor
- Wiki
- Project operations
- Monitor metrics for your CI/CD environment
- Set up alerts for Prometheus metrics
- Embedding metric charts within GitLab-flavored Markdown
- Embedding Grafana charts
- Using the Metrics Dashboard
- Dashboard YAML properties
- Metrics dashboard settings
- Panel types for dashboards
- Using Variables
- Templating variables for metrics dashboards
- Prometheus Metrics library
- Monitoring AWS Resources
- Monitoring HAProxy
- Monitoring Kubernetes
- Monitoring NGINX
- Monitoring NGINX Ingress Controller
- Monitoring NGINX Ingress Controller with VTS metrics
- Alert Management
- Error Tracking
- Tracing
- Incident Management
- GitLab Status Page
- Feature Flags
- GitLab CI/CD
- GitLab CI/CD pipeline configuration reference
- GitLab CI/CD include examples
- Introduction to CI/CD with GitLab
- Getting started with GitLab CI/CD
- How to enable or disable GitLab CI/CD
- Using SSH keys with GitLab CI/CD
- Migrating from CircleCI
- Migrating from Jenkins
- Auto DevOps
- Getting started with Auto DevOps
- Requirements for Auto DevOps
- Customizing Auto DevOps
- Stages of Auto DevOps
- Upgrading PostgreSQL for Auto DevOps
- Cache dependencies in GitLab CI/CD
- GitLab ChatOps
- Cloud deployment
- Docker integration
- Building Docker images with GitLab CI/CD
- Using Docker images
- Building images with kaniko and GitLab CI/CD
- GitLab CI/CD environment variables
- Predefined environment variables reference
- Where variables can be used
- Deprecated GitLab CI/CD variables
- Environments and deployments
- Protected Environments
- GitLab CI/CD Examples
- Test a Clojure application with GitLab CI/CD
- Using Dpl as deployment tool
- Testing a Phoenix application with GitLab CI/CD
- End-to-end testing with GitLab CI/CD and WebdriverIO
- DevOps and Game Dev with GitLab CI/CD
- Deploy a Spring Boot application to Cloud Foundry with GitLab CI/CD
- How to deploy Maven projects to Artifactory with GitLab CI/CD
- Testing PHP projects
- Running Composer and NPM scripts with deployment via SCP in GitLab CI/CD
- Test and deploy Laravel applications with GitLab CI/CD and Envoy
- Test and deploy a Python application with GitLab CI/CD
- Test and deploy a Ruby application with GitLab CI/CD
- Test and deploy a Scala application to Heroku
- GitLab CI/CD for external repositories
- Using GitLab CI/CD with a Bitbucket Cloud repository
- Using GitLab CI/CD with a GitHub repository
- GitLab Pages
- GitLab Pages
- GitLab Pages domain names, URLs, and baseurls
- Create a GitLab Pages website from scratch
- Custom domains and SSL/TLS Certificates
- GitLab Pages integration with Let's Encrypt
- GitLab Pages Access Control
- Exploring GitLab Pages
- Incremental Rollouts with GitLab CI/CD
- Interactive Web Terminals
- Optimizing GitLab for large repositories
- Metrics Reports
- CI/CD pipelines
- Pipeline Architecture
- Directed Acyclic Graph
- Multi-project pipelines
- Parent-child pipelines
- Pipelines for Merge Requests
- Pipelines for Merged Results
- Merge Trains
- Job artifacts
- Pipeline schedules
- Pipeline settings
- Triggering pipelines through the API
- Review Apps
- Configuring GitLab Runners
- GitLab CI services examples
- Using MySQL
- Using PostgreSQL
- Using Redis
- Troubleshooting CI/CD
- GitLab Package Registry
- GitLab Container Registry
- Dependency Proxy
- GitLab Composer Repository
- GitLab Conan Repository
- GitLab Maven Repository
- GitLab NPM Registry
- GitLab NuGet Repository
- GitLab PyPi Repository
- API Docs
- API resources
- .gitignore API
- GitLab CI YMLs API
- Group and project access requests API
- Appearance API
- Applications API
- Audit Events API
- Avatar API
- Award Emoji API
- Project badges API
- Group badges API
- Branches API
- Broadcast Messages API
- Project clusters API
- Group clusters API
- Instance clusters API
- Commits API
- Container Registry API
- Custom Attributes API
- Dashboard annotations API
- Dependencies API
- Deploy Keys API
- Deployments API
- Discussions API
- Dockerfiles API
- Environments API
- Epics API
- Events
- Feature Flags API
- Feature flag user lists API
- Freeze Periods API
- Geo Nodes API
- Group Activity Analytics API
- Groups API
- Import API
- Issue Boards API
- Group Issue Boards API
- Issues API
- Epic Issues API
- Issues Statistics API
- Jobs API
- Keys API
- Labels API
- Group Labels API
- License
- Licenses API
- Issue links API
- Epic Links API
- Managed Licenses API
- Markdown API
- Group and project members API
- Merge request approvals API
- Merge requests API
- Project milestones API
- Group milestones API
- Namespaces API
- Notes API
- Notification settings API
- Packages API
- Pages domains API
- Pipeline schedules API
- Pipeline triggers API
- Pipelines API
- Project Aliases API
- Project import/export API
- Project repository storage moves API
- Project statistics API
- Project templates API
- Projects API
- Protected branches API
- Protected tags API
- Releases API
- Release links API
- Repositories API
- Repository files API
- Repository submodules API
- Resource label events API
- Resource milestone events API
- Resource weight events API
- Runners API
- SCIM API
- Search API
- Services API
- Application settings API
- Sidekiq Metrics API
- Snippets API
- Project snippets
- Application statistics API
- Suggest Changes API
- System hooks API
- Tags API
- Todos API
- Users API
- Project-level Variables API
- Group-level Variables API
- Version API
- Vulnerabilities API
- Vulnerability Findings API
- Wikis API
- GraphQL API
- Getting started with GitLab GraphQL API
- GraphQL API Resources
- API V3 to API V4
- Validate the .gitlab-ci.yml (API)
- User Docs
- Abuse reports
- User account
- Active sessions
- Deleting a User account
- Permissions
- Personal access tokens
- Profile preferences
- Threads
- GitLab and SSH keys
- GitLab integrations
- Git
- GitLab.com settings
- Infrastructure as code with Terraform and GitLab
- GitLab keyboard shortcuts
- GitLab Markdown
- AsciiDoc
- GitLab Notification Emails
- GitLab Quick Actions
- Autocomplete characters
- Reserved project and group names
- Search through GitLab
- Advanced Global Search
- Advanced Syntax Search
- Time Tracking
- GitLab To-Do List
- Administrator Docs
- Reference architectures
- Reference architecture: up to 1,000 users
- Reference architecture: up to 2,000 users
- Reference architecture: up to 3,000 users
- Reference architecture: up to 5,000 users
- Reference architecture: up to 10,000 users
- Reference architecture: up to 25,000 users
- Reference architecture: up to 50,000 users
- Troubleshooting a reference architecture set up
- Working with the bundled Consul service
- Configuring PostgreSQL for scaling
- Configuring GitLab application (Rails)
- Load Balancer for multi-node GitLab
- Configuring a Monitoring node for Scaling and High Availability
- NFS
- Working with the bundled PgBouncer service
- Configuring Redis for scaling
- Configuring Sidekiq
- Admin Area settings
- Continuous Integration and Deployment Admin settings
- Custom instance-level project templates
- Diff limits administration
- Enable and disable GitLab features deployed behind feature flags
- Geo nodes Admin Area
- GitLab Pages administration
- Health Check
- Job logs
- Labels administration
- Log system
- PlantUML & GitLab
- Repository checks
- Repository storage paths
- Repository storage types
- Account and limit settings
- Service templates
- System hooks
- Changing your time zone
- Uploads administration
- Abuse reports
- Activating and deactivating users
- Audit Events
- Blocking and unblocking users
- Broadcast Messages
- Elasticsearch integration
- Gitaly
- Gitaly Cluster
- Gitaly reference
- Monitoring GitLab
- Monitoring GitLab with Prometheus
- Performance Bar
- Usage statistics
- Object Storage
- Performing Operations in GitLab
- Cleaning up stale Redis sessions
- Fast lookup of authorized SSH keys in the database
- Filesystem Performance Benchmarking
- Moving repositories managed by GitLab
- Run multiple Sidekiq processes
- Sidekiq MemoryKiller
- Switching to Puma
- Understanding Unicorn and unicorn-worker-killer
- User lookup via OpenSSH's AuthorizedPrincipalsCommand
- GitLab Package Registry administration
- GitLab Container Registry administration
- Replication (Geo)
- Geo database replication
- Geo with external PostgreSQL instances
- Geo configuration
- Using a Geo Server
- Updating the Geo nodes
- Geo with Object storage
- Docker Registry for a secondary node
- Geo for multiple nodes
- Geo security review (Q&A)
- Location-aware Git remote URL with AWS Route53
- Tuning Geo
- Removing secondary Geo nodes
- Geo data types support
- Geo Frequently Asked Questions
- Geo Troubleshooting
- Geo validation tests
- Disaster Recovery (Geo)
- Disaster recovery for planned failover
- Bring a demoted primary node back online
- Automatic background verification
- Rake tasks
- Back up and restore GitLab
- Clean up
- Namespaces
- Maintenance Rake tasks
- Geo Rake Tasks
- GitHub import
- Import bare repositories
- Integrity check Rake task
- LDAP Rake tasks
- Listing repository directories
- Praefect Rake tasks
- Project import/export administration
- Repository storage Rake tasks
- Generate sample Prometheus data
- Uploads migrate Rake tasks
- Uploads sanitize Rake tasks
- User management
- Webhooks administration
- X.509 signatures
- Server hooks
- Static objects external storage
- Updating GitLab
- GitLab release and maintenance policy
- Security
- Password Storage
- Custom password length limits
- Restrict allowed SSH key technologies and minimum length
- Rate limits
- Webhooks and insecure internal web services
- Information exclusivity
- How to reset your root password
- How to unlock a locked user from the command line
- User File Uploads
- How we manage the TLS protocol CRIME vulnerability
- User email confirmation at sign-up
- Security of running jobs
- Proxying assets
- CI/CD Environment Variables
- Contributor and Development Docs
- Contribute to GitLab
- Community members & roles
- Implement design & UI elements
- Issues workflow
- Merge requests workflow
- Code Review Guidelines
- Style guides
- GitLab Architecture Overview
- CI/CD development documentation
- Database guides
- Database Review Guidelines
- Database Review Guidelines
- Migration Style Guide
- What requires downtime?
- Understanding EXPLAIN plans
- Rake tasks for developers
- Mass inserting Rails models
- GitLab Documentation guidelines
- Documentation Style Guide
- Documentation structure and template
- Documentation process
- Documentation site architecture
- Global navigation
- GitLab Docs monthly release process
- Telemetry Guide
- Usage Ping Guide
- Snowplow Guide
- Experiment Guide
- Feature flags in development of GitLab
- Feature flags process
- Developing with feature flags
- Feature flag controls
- Document features deployed behind feature flags
- Frontend Development Guidelines
- Accessibility & Readability
- Ajax
- Architecture
- Axios
- Design Patterns
- Frontend Development Process
- DropLab
- Emojis
- Filter
- Frontend FAQ
- GraphQL
- Icons and SVG Illustrations
- InputSetter
- Performance
- Principles
- Security
- Tooling
- Vuex
- Vue
- Geo (development)
- Geo self-service framework (alpha)
- Gitaly developers guide
- GitLab development style guides
- API style guide
- Go standards and style guidelines
- GraphQL API style guide
- Guidelines for shell commands in the GitLab codebase
- HTML style guide
- JavaScript style guide
- Migration Style Guide
- Newlines style guide
- Python Development Guidelines
- SCSS style guide
- Shell scripting standards and style guidelines
- Sidekiq debugging
- Sidekiq Style Guide
- SQL Query Guidelines
- Vue.js style guide
- Instrumenting Ruby code
- Testing standards and style guidelines
- Flaky tests
- Frontend testing standards and style guidelines
- GitLab tests in the Continuous Integration (CI) context
- Review Apps
- Smoke Tests
- Testing best practices
- Testing levels
- Testing Rails migrations at GitLab
- Testing Rake tasks
- End-to-end Testing
- Beginner's guide to writing end-to-end tests
- End-to-end testing Best Practices
- Dynamic Element Validation
- Flows in GitLab QA
- Page objects in GitLab QA
- Resource class in GitLab QA
- Style guide for writing end-to-end tests
- Testing with feature flags
- Translate GitLab to your language
- Internationalization for GitLab
- Translating GitLab
- Proofread Translations
- Merging translations from CrowdIn
- Value Stream Analytics development guide
- GitLab subscription
- Activate GitLab EE with a license