<ruby id="bdb3f"></ruby>

    <p id="bdb3f"><cite id="bdb3f"></cite></p>

      <p id="bdb3f"><cite id="bdb3f"><th id="bdb3f"></th></cite></p><p id="bdb3f"></p>
        <p id="bdb3f"><cite id="bdb3f"></cite></p>

          <pre id="bdb3f"></pre>
          <pre id="bdb3f"><del id="bdb3f"><thead id="bdb3f"></thead></del></pre>

          <ruby id="bdb3f"><mark id="bdb3f"></mark></ruby><ruby id="bdb3f"></ruby>
          <pre id="bdb3f"><pre id="bdb3f"><mark id="bdb3f"></mark></pre></pre><output id="bdb3f"></output><p id="bdb3f"></p><p id="bdb3f"></p>

          <pre id="bdb3f"><del id="bdb3f"><progress id="bdb3f"></progress></del></pre>

                <ruby id="bdb3f"></ruby>

                企業??AI智能體構建引擎,智能編排和調試,一鍵部署,支持知識庫和私有化部署方案 廣告
                ## 業務背景 按分組取出TOP值,是非常常見的業務需求。 比如提取每位歌手的下載量TOP 10的曲目、提取每個城市納稅前10的人或企業。 ## 傳統方法 傳統的方法是使用窗口查詢,PostgreSQL是支持窗口查詢的。 例子 測試表和測試數據,生成10000個分組,1000萬條記錄。 ~~~ postgres=# create table tbl(c1 int, c2 int, c3 int); CREATE TABLE postgres=# create index idx1 on tbl(c1,c2); CREATE INDEX postgres=# insert into tbl select mod(trunc(random()*10000)::int, 10000), trunc(random()*10000000) from generate_series(1,10000000); INSERT 0 10000000 ~~~ 使用窗口查詢的執行計劃 ~~~ postgres=# explain select * from (select row_number() over(partition by c1 order by c2) as rn,* from tbl) t where t.rn<=10; QUERY PLAN ---------------------------------------------------------------------------------------- Subquery Scan on t (cost=0.43..770563.03 rows=3333326 width=20) Filter: (t.rn <= 10) -> WindowAgg (cost=0.43..645563.31 rows=9999977 width=12) -> Index Scan using idx1 on tbl (cost=0.43..470563.72 rows=9999977 width=12) (4 rows) ~~~ 使用窗口查詢的結果舉例 ~~~ postgres=# select * from (select row_number() over(partition by c1 order by c2) as rn,* from tbl) t where t.rn<=10; rn | c1 | c2 | c3 ----+------+--------+---- 1 | 0 | 1657 | 2 | 0 | 3351 | 3 | 0 | 6347 | 4 | 0 | 12688 | 5 | 0 | 16991 | 6 | 0 | 19584 | 7 | 0 | 24694 | 8 | 0 | 36646 | 9 | 0 | 40882 | 10 | 0 | 41599 | 1 | 1 | 14465 | 2 | 1 | 29032 | 3 | 1 | 39969 | 4 | 1 | 41094 | 5 | 1 | 69481 | 6 | 1 | 70919 | 7 | 1 | 75575 | 8 | 1 | 81102 | 9 | 1 | 87496 | 10 | 1 | 90603 | ...... ~~~ 使用窗口查詢的效率,20.1秒 ~~~ postgres=# explain (analyze,verbose,costs,timing,buffers) select * from (select row_number() over(partition by c1 order by c2) as rn,* from tbl) t where t.rn<=10; QUERY PLAN ---------------------------------------------------------------------------------------------------------------------------------------------------- Subquery Scan on t (cost=0.43..770563.03 rows=3333326 width=20) (actual time=0.040..20813.469 rows=100000 loops=1) Output: t.rn, t.c1, t.c2, t.c3 Filter: (t.rn <= 10) Rows Removed by Filter: 9900000 Buffers: shared hit=10035535 -> WindowAgg (cost=0.43..645563.31 rows=9999977 width=12) (actual time=0.035..18268.027 rows=10000000 loops=1) Output: row_number() OVER (?), tbl.c1, tbl.c2, tbl.c3 Buffers: shared hit=10035535 -> Index Scan using idx1 on public.tbl (cost=0.43..470563.72 rows=9999977 width=12) (actual time=0.026..11913.677 rows=10000000 loops=1) Output: tbl.c1, tbl.c2, tbl.c3 Buffers: shared hit=10035535 Planning time: 0.110 ms Execution time: 20833.747 ms (13 rows) ~~~ ## 雕蟲小技 如何優化? 可以參考我之前寫的,[使用遞歸查詢,優化count distinct的方法](https://yq.aliyun.com/articles/39689)。 本文同樣需要用到遞歸查詢,獲得分組ID ~~~ postgres=# with recursive t1 as ( postgres(# (select min(c1) c1 from tbl ) postgres(# union all postgres(# (select (select min(tbl.c1) c1 from tbl where tbl.c1>t.c1) c1 from t1 t where t.c1 is not null) postgres(# ) postgres-# select * from t1; ~~~ 寫成SRF函數,如下 ~~~ postgres=# create or replace function f() returns setof tbl as $$ postgres$# declare postgres$# v int; postgres$# begin postgres$# for v in with recursive t1 as ( postgres$# (select min(c1) c1 from tbl ) postgres$# union all postgres$# (select (select min(tbl.c1) c1 from tbl where tbl.c1>t.c1) c1 from t1 t where t.c1 is not null) postgres$# ) postgres$# select * from t1 postgres$# LOOP postgres$# return query select * from tbl where c1=v order by c2 limit 10; postgres$# END LOOP; postgres$# return; postgres$# postgres$# end; postgres$# $$ language plpgsql strict; CREATE FUNCTION ~~~ 優化后的查詢結果例子 ~~~ postgres=# select * from f(); c1 | c2 | c3 ------+--------+---- 0 | 1657 | 0 | 3351 | 0 | 6347 | 0 | 12688 | 0 | 16991 | 0 | 19584 | 0 | 24694 | 0 | 36646 | 0 | 40882 | 0 | 41599 | 1 | 14465 | 1 | 29032 | 1 | 39969 | 1 | 41094 | 1 | 69481 | 1 | 70919 | 1 | 75575 | 1 | 81102 | 1 | 87496 | 1 | 90603 | ...... ~~~ 優化后,只需要464毫秒返回10000個分組的TOP 10。 ~~~ postgres=# explain (analyze,verbose,timing,costs,buffers) select * from f(); QUERY PLAN --------------------------------------------------------------------------------------------------------------------- Function Scan on public.f (cost=0.25..10.25 rows=1000 width=12) (actual time=419.218..444.810 rows=100000 loops=1) Output: c1, c2, c3 Function Call: f() Buffers: shared hit=170407, temp read=221 written=220 Planning time: 0.037 ms Execution time: 464.257 ms (6 rows) ~~~ ## 小結 1. 傳統的方法使用窗口查詢,輸出多個每個分組的TOP 10,需要掃描所有的記錄。效率較低。 2. 由于分組不是非常多,只有10000個,所以可以選擇使用遞歸的方法,用上索引取TOP 10,速度非常快。 3. 目前PostgreSQL的遞歸語法不支持遞歸的啟動表寫在subquery里面,也不支持啟動表在遞歸查詢中使用order by,所以不能直接使用遞歸得出結果,目前需要套一層函數。
                  <ruby id="bdb3f"></ruby>

                  <p id="bdb3f"><cite id="bdb3f"></cite></p>

                    <p id="bdb3f"><cite id="bdb3f"><th id="bdb3f"></th></cite></p><p id="bdb3f"></p>
                      <p id="bdb3f"><cite id="bdb3f"></cite></p>

                        <pre id="bdb3f"></pre>
                        <pre id="bdb3f"><del id="bdb3f"><thead id="bdb3f"></thead></del></pre>

                        <ruby id="bdb3f"><mark id="bdb3f"></mark></ruby><ruby id="bdb3f"></ruby>
                        <pre id="bdb3f"><pre id="bdb3f"><mark id="bdb3f"></mark></pre></pre><output id="bdb3f"></output><p id="bdb3f"></p><p id="bdb3f"></p>

                        <pre id="bdb3f"><del id="bdb3f"><progress id="bdb3f"></progress></del></pre>

                              <ruby id="bdb3f"></ruby>

                              哎呀哎呀视频在线观看