<div id="article_content" class="article_content clearfix">
<link rel="stylesheet" href="https://csdnimg.cn/release/blogv2/dist/mdeditor/css/editerView/kdoc_html_views-1a98987dfd.css">
<link rel="stylesheet" href="https://csdnimg.cn/release/blogv2/dist/mdeditor/css/editerView/ck_htmledit_views-25cebea3f9.css">
<div id="content_views" class="htmledit_views">
<h1><a name="t0"></a>1、ClicHouse語法優化規則</h1>
<p></p>
<p> <a href="https://so.csdn.net/so/search?q=ClickHouse&spm=1001.2101.3001.7020" target="_blank" class="hl hl-1" data-report-click="{"spm":"1001.2101.3001.7020","dest":"https://so.csdn.net/so/search?q=ClickHouse&spm=1001.2101.3001.7020","extra":"{\"searchword\":\"ClickHouse\"}"}" data-tit="ClickHouse" data-pretit="clickhouse">ClickHouse</a>的SQL優化規則是基于RBO(Rule Based Optimization),下面是一些優化規則。</p>
<h2><a name="t1"></a>1.1、COUNT優化</h2>
<p> 在調用count函數時,如果使用的是count()或者count(*),且沒有where條件,則會直接使用system.tables的total_rows,例如:</p>
<p style="text-align:center;"><img alt="" height="507" src="https://img-blog.csdnimg.cn/95c125b1508c47899eac0609d6e278df.png?x-oss-process=image/watermark,type_d3F5LXplbmhlaQ,shadow_50,text_Q1NETiBAWXVhbl9DU0RG,size_18,color_FFFFFF,t_70,g_se,x_16" width="576"></p>
<p></p>
<p></p>
<p> 注意:Optimized trivial count,這是對count的優化。</p>
<h2><a name="t2"></a>1.2、消除子查詢重復字段</h2>
<p> 語句子查詢中有兩個重復的id字段,會被去重。</p>
<h2><a name="t3"></a>1.3、謂詞下推</h2>
<p> 當group by有having子句,但是沒有with cube、with rollup或者with totals修飾的時候,having會下推到where提前過濾。</p>
<pre data-index="0"><code class="language-sql hljs">EXPLAIN SYNTAX <span class="hljs-keyword">SELECT</span> UserID <span class="hljs-keyword">FROM</span> hits_v1 <span class="hljs-keyword">GROUP</span> <span class="hljs-keyword">BY</span> UserID <span class="hljs-keyword">HAVING</span> UserID<span class="hljs-operator">=</span><span class="hljs-string">'123456789'</span>;</code><div class="hljs-button {2}" data-title="復制" onclick="hljs.copyCode(event)"></div></pre>
<p style="text-align:center;"><img alt="" height="281" src="https://img-blog.csdnimg.cn/94db83a9631c4386baf1a76e9f58ff09.png?x-oss-process=image/watermark,type_d3F5LXplbmhlaQ,shadow_50,text_Q1NETiBAWXVhbl9DU0RG,size_20,color_FFFFFF,t_70,g_se,x_16" width="673"></p>
<h2><a name="t4"></a>1.4、聚合計算外推</h2>
<p> 聚合函數內的計算,會外推,如下:</p>
<pre data-index="1"><code class="language-sql hljs">EXPLAIN SYNTAX <span class="hljs-keyword">SELECT</span> <span class="hljs-built_in">sum</span>(UserID <span class="hljs-operator">*</span> <span class="hljs-number">2</span>) <span class="hljs-keyword">FROM</span> visits_v1;</code><div class="hljs-button {2}" data-title="復制" onclick="hljs.copyCode(event)"></div></pre>
<p style="text-align:center;"><img alt="" height="223" src="https://img-blog.csdnimg.cn/655b2e0198994b2e911e70efb7b44753.png?x-oss-process=image/watermark,type_d3F5LXplbmhlaQ,shadow_50,text_Q1NETiBAWXVhbl9DU0RG,size_14,color_FFFFFF,t_70,g_se,x_16" width="463"></p>
<h2><a name="t5"></a>1.5、聚合函數消除</h2>
<p> 如果對聚合鍵,也就是group by key使用min、max等聚合函數,則會將函數消除。</p>
<pre data-index="2"><code class="language-sql hljs">EXPLAIN SYNTAX <span class="hljs-keyword">SELECT</span> <span class="hljs-built_in">sum</span>(UserID <span class="hljs-operator">*</span> <span class="hljs-number">2</span>),<span class="hljs-built_in">max</span>(VisitID),<span class="hljs-built_in">max</span>(UserID) <span class="hljs-keyword">FROM</span> visits_v1 <span class="hljs-keyword">GROUP</span> <span class="hljs-keyword">BY</span> UserID;</code><div class="hljs-button {2}" data-title="復制" onclick="hljs.copyCode(event)"></div></pre>
<p><img alt="" height="340" src="https://img-blog.csdnimg.cn/027a2031177141d599fed92891122536.png?x-oss-process=image/watermark,type_d3F5LXplbmhlaQ,shadow_50,text_Q1NETiBAWXVhbl9DU0RG,size_20,color_FFFFFF,t_70,g_se,x_16" width="745"></p>
<h2><a name="t6"></a>1.6、刪除重復的order by key、刪除重復的limit by key</h2>
<p> 當order by、limit后有跟相同的key,SYNTAX會提示可進行優化。</p>
<h2><a name="t7"></a>1.7、三元運算優化</h2>
<p> set optimize_if_chain_to_multiif = 1;</p>
<p> 或者直接使用如下SQL:</p>
<pre data-index="3"><code class="language-sql hljs"><ol class="hljs-ln" style="width:100%"><li><div class="hljs-ln-numbers"><div class="hljs-ln-line hljs-ln-n" data-line-number="1"></div></div><div class="hljs-ln-code"><div class="hljs-ln-line"><span class="hljs-comment">-- 查看語法優化</span></div></div></li><li><div class="hljs-ln-numbers"><div class="hljs-ln-line hljs-ln-n" data-line-number="2"></div></div><div class="hljs-ln-code"><div class="hljs-ln-line">EXPLAIN SYNTAX <span class="hljs-keyword">SELECT</span> number <span class="hljs-operator">=</span><span class="hljs-number">1</span>? <span class="hljs-string">'hello'</span>:(number<span class="hljs-operator">=</span><span class="hljs-number">2</span>?<span class="hljs-string">'world'</span>:<span class="hljs-string">'abcdef'</span>) <span class="hljs-keyword">from</span> numbers(<span class="hljs-number">10</span>)</div></div></li><li><div class="hljs-ln-numbers"><div class="hljs-ln-line hljs-ln-n" data-line-number="3"></div></div><div class="hljs-ln-code"><div class="hljs-ln-line">settings optimize_if_chain_to_multiif <span class="hljs-operator">=</span> <span class="hljs-number">1</span>;</div></div></li></ol></code><div class="hljs-button {2}" data-title="復制" onclick="hljs.copyCode(event)"></div></pre>
<h1><a name="t8"></a>2、查詢優化</h1>
<h2><a name="t9"></a>2.1、單表查詢</h2>
<h3><a name="t10"></a>2.1.1、Prewhere替代where</h3>
<p> Prewhere和where語句的作用相同,用來過濾數據。不同之處在于prewhere只支持*MergeTree族系列引擎的表,首先會讀取指定的列數據,來判斷數據過濾,等待數據過濾之后再讀取select聲明的列字段來補全其余屬性。</p>
<p> 當查詢列明顯多于篩選列時使用PreWhere可十倍提升查詢性能,Prewhere會自動優化執行過濾階段的數據讀取方式,降低io操作。參數optimize_move_to_prewhere=1為開啟狀態,可以設置為0對其關閉。</p>
<p> 在某些場合下,prewhere語句比where語句處理的數據量更少性能更高。</p>
<p> 默認情況,我們肯定不會關閉where自動優化成prewhere,但是某些場景即使開啟優化,也不會自動轉換成prewhere,需要手動指定prewhere:</p>
<ul><li>使用常量表達式</li><li>使用默認值為alias類型的字段</li><li>包含了arrayJOIN,globalIn,globalNotIn或者IndexHint的查詢</li><li>select查詢的列字段和where的謂詞相同</li><li>使用了主鍵字段</li></ul>
<h3><a name="t11"></a>2.1.2、數據采樣</h3>
<p> 通過采樣運算可極大提升數據分析的性能</p>
<pre data-index="4"><code class="language-sql hljs"><ol class="hljs-ln" style="width:100%"><li><div class="hljs-ln-numbers"><div class="hljs-ln-line hljs-ln-n" data-line-number="1"></div></div><div class="hljs-ln-code"><div class="hljs-ln-line"><span class="hljs-keyword">select</span> Title,<span class="hljs-built_in">count</span>(<span class="hljs-operator">*</span>) <span class="hljs-keyword">As</span> PageViews </div></div></li><li><div class="hljs-ln-numbers"><div class="hljs-ln-line hljs-ln-n" data-line-number="2"></div></div><div class="hljs-ln-code"><div class="hljs-ln-line"><span class="hljs-keyword">FROM</span> hits_v1</div></div></li><li><div class="hljs-ln-numbers"><div class="hljs-ln-line hljs-ln-n" data-line-number="3"></div></div><div class="hljs-ln-code"><div class="hljs-ln-line">SAMPLE <span class="hljs-number">01.</span></div></div></li><li><div class="hljs-ln-numbers"><div class="hljs-ln-line hljs-ln-n" data-line-number="4"></div></div><div class="hljs-ln-code"><div class="hljs-ln-line"><span class="hljs-keyword">WHERE</span> CounterID <span class="hljs-operator">=</span> <span class="hljs-number">57</span></div></div></li><li><div class="hljs-ln-numbers"><div class="hljs-ln-line hljs-ln-n" data-line-number="5"></div></div><div class="hljs-ln-code"><div class="hljs-ln-line"><span class="hljs-keyword">GROUP</span> <span class="hljs-keyword">BY</span> Title</div></div></li><li><div class="hljs-ln-numbers"><div class="hljs-ln-line hljs-ln-n" data-line-number="6"></div></div><div class="hljs-ln-code"><div class="hljs-ln-line"><span class="hljs-keyword">ORDER</span> <span class="hljs-keyword">BY</span> PageViews <span class="hljs-keyword">DESC</span> LIMIT <span class="hljs-number">1000</span></div></div></li></ol></code><div class="hljs-button {2}" data-title="復制" onclick="hljs.copyCode(event)"></div></pre>
<p> 采樣修飾符只有在MergeTree engine表中才有效,且在創建表時需要制定采樣策略。</p>
<h3><a name="t12"></a>2.1.3、列裁剪與分區裁剪</h3>
<p> 數據量太大應避免使用select * 操作,查詢的性能會與查詢的字段大小和數量成線性關系,字段越少,消耗的io資源越少,性能就會越高。</p>
<ul><li>列裁剪表示只選取所需要的列。</li><li>分區裁剪就是只讀取需要的分區,在過濾條件中指定。</li></ul>
<h3><a name="t13"></a>2.1.4、order by 結合where、limit使用</h3>
<p> 千萬以上數據集進行order by 查詢時需要配合where條件和limit語句一起使用。</p>
<h3><a name="t14"></a>2.1.5、避免構建虛擬列</h3>
<p> 如非必須,不要在結果集上構建虛擬列,虛擬列非常消耗資源浪費性能,可以考慮在前端處理,或者在表中構造實際字段進行額外存儲。</p>
<h3><a name="t15"></a>2.1.6、uniqueCombined替代distinct</h3>
<p> 性能可提升10倍以上,uniqCombined底層采用類似HyperLogLog算法實現,能接收2%左右的數據誤差,可直接使用這種去重方式提升查詢性能。Count(distinct)會使用uniqExact精確去重。</p>
<p> 不建議在千萬級不同數據上執行distinct去重查詢,改為近似去重uniqCombined。</p>
<h2><a name="t16"></a>2.2、 多表關聯</h2>
<h3><a name="t17"></a>2.2.1、用in代替join</h3>
<p> 當多表聯查時,查詢的數據僅從其中一張表出事,可以考慮用in操作而不是join</p>
<h3><a name="t18"></a>2.2.2、大小表join</h3>
<p> 多表join時要滿足小表在右的原則,右表關聯時被加載到內存中與左表進行比較,ClickHouse中無論是Left join、Right Join還是Inner join永遠都是拿著右表中的每一條記錄到左表中查找該記錄是否存在,所以右表必須是小表。</p>
<h3><a name="t19"></a>2.2.3、注意謂詞下推(版本差異)</h3>
<p> ClikHouses在join查詢時不會主動發起謂詞下推的操作,需要每個子查詢提前完成過濾操作,需要注意的是,是否執行謂詞下推,對性能影響差別很大(新版本中已經不存在此問題,但是需要注意謂詞的位置不用依然有性能的差異)。</p>
<h3><a name="t20"></a>2.2.4、分布式表使用GLOBAL</h3>
<p> 兩張分布式表的in和join之前必須加上GLOBAL關鍵字,右表只會在接收查詢請求的那個節點查詢一次,并將其分發到其它節點上,如果不加GLOBAL關鍵字的話,每個節點都會單獨發起一次對右表的查詢,而右表又是分布式表,就導致右表一共會被查詢次(N是該分布式表的分片數量),這就是查詢放大,會帶來很大開銷。</p>
<h3><a name="t21"></a>2.2.5、使用字典表</h3>
<p> 將一些需要關聯分析的業務創建成字典表進行join操作,前提是字典表不宜太大,因為字典表會常駐內存。</p>
<h3><a name="t22"></a>2.2.6、提前過濾</h3>
<p> 通過增加邏輯過濾可以減少數據掃描,達到提供執行速度及降低內存消息的目的。</p>
<h1><a name="t23"></a>3、數據一致性</h1>
<p> ReplacingMergeeTree:數據的去重只會在數據合并期間進行。合并會在后臺一個不確定的時間進行,因此你無法預先作出計劃。有一些數據可能仍未被處理。集暖你可以調用optimize語句發起計劃外的合并,但請不要依靠它,因為OPTIMIZE語句會引發對護具的大量讀寫。</p>
<p> 因此,ReplacingMergeTree適用于在后臺清除重復的數據以節省空間,但是它不保證沒有重復的數據出現。</p>
<h2><a name="t24"></a>3.1、案例</h2>
<p> 1、創建表</p>
<pre data-index="5"><code class="language-sql hljs"><ol class="hljs-ln" style="width:100%"><li><div class="hljs-ln-numbers"><div class="hljs-ln-line hljs-ln-n" data-line-number="1"></div></div><div class="hljs-ln-code"><div class="hljs-ln-line"><span class="hljs-keyword">CREATE</span> <span class="hljs-keyword">TABLE</span> test_a(</div></div></li><li><div class="hljs-ln-numbers"><div class="hljs-ln-line hljs-ln-n" data-line-number="2"></div></div><div class="hljs-ln-code"><div class="hljs-ln-line"> user_id UInt64,</div></div></li><li><div class="hljs-ln-numbers"><div class="hljs-ln-line hljs-ln-n" data-line-number="3"></div></div><div class="hljs-ln-code"><div class="hljs-ln-line"> score String,</div></div></li><li><div class="hljs-ln-numbers"><div class="hljs-ln-line hljs-ln-n" data-line-number="4"></div></div><div class="hljs-ln-code"><div class="hljs-ln-line"> deleted UInt8 <span class="hljs-keyword">DEFAULT</span> <span class="hljs-number">0</span>,</div></div></li><li><div class="hljs-ln-numbers"><div class="hljs-ln-line hljs-ln-n" data-line-number="5"></div></div><div class="hljs-ln-code"><div class="hljs-ln-line"> create_time DateTime <span class="hljs-keyword">DEFAULT</span> toDateTime(<span class="hljs-number">0</span>)</div></div></li><li><div class="hljs-ln-numbers"><div class="hljs-ln-line hljs-ln-n" data-line-number="6"></div></div><div class="hljs-ln-code"><div class="hljs-ln-line">)ENGINE <span class="hljs-operator">=</span> ReplacingMergeTree(create_time)</div></div></li><li><div class="hljs-ln-numbers"><div class="hljs-ln-line hljs-ln-n" data-line-number="7"></div></div><div class="hljs-ln-code"><div class="hljs-ln-line"><span class="hljs-keyword">ORDER</span> <span class="hljs-keyword">BY</span> user_id;</div></div></li></ol></code><div class="hljs-button {2}" data-title="復制" onclick="hljs.copyCode(event)"></div></pre>
<p> 其中:</p>
<ul><li>user_id是數據去重更新的標識;</li><li>create_time是版本號字段,每組數據中create_time最大的一行表示最新的數據;</li><li>delete是自定義的一個標記位,比如0代表為刪除,1代表刪除數據。</li></ul>
<p> 2、寫入與修改數據</p>
<pre data-index="6"><code class="language-sql hljs"><ol class="hljs-ln" style="width:100%"><li><div class="hljs-ln-numbers"><div class="hljs-ln-line hljs-ln-n" data-line-number="1"></div></div><div class="hljs-ln-code"><div class="hljs-ln-line"><span class="hljs-keyword">INSERT</span> <span class="hljs-keyword">INTO</span> <span class="hljs-keyword">TABLE</span> test_a(user_id,score)</div></div></li><li><div class="hljs-ln-numbers"><div class="hljs-ln-line hljs-ln-n" data-line-number="2"></div></div><div class="hljs-ln-code"><div class="hljs-ln-line"><span class="hljs-keyword">WITH</span>(</div></div></li><li><div class="hljs-ln-numbers"><div class="hljs-ln-line hljs-ln-n" data-line-number="3"></div></div><div class="hljs-ln-code"><div class="hljs-ln-line"> <span class="hljs-keyword">SELECT</span> [<span class="hljs-string">'A'</span>,<span class="hljs-string">'B'</span>,<span class="hljs-string">'C'</span>,<span class="hljs-string">'D'</span>,<span class="hljs-string">'E'</span>,<span class="hljs-string">'F'</span>,<span class="hljs-string">'G'</span>]</div></div></li><li><div class="hljs-ln-numbers"><div class="hljs-ln-line hljs-ln-n" data-line-number="4"></div></div><div class="hljs-ln-code"><div class="hljs-ln-line">)<span class="hljs-keyword">AS</span> dict</div></div></li><li><div class="hljs-ln-numbers"><div class="hljs-ln-line hljs-ln-n" data-line-number="5"></div></div><div class="hljs-ln-code"><div class="hljs-ln-line"><span class="hljs-keyword">SELECT</span> number <span class="hljs-keyword">AS</span> user_id,dict[number<span class="hljs-operator">%</span><span class="hljs-number">7</span><span class="hljs-operator">+</span><span class="hljs-number">1</span>] <span class="hljs-keyword">FROM</span> numbers(<span class="hljs-number">10000</span>);</div></div></li><li><div class="hljs-ln-numbers"><div class="hljs-ln-line hljs-ln-n" data-line-number="6"></div></div><div class="hljs-ln-code"><div class="hljs-ln-line"> </div></div></li><li><div class="hljs-ln-numbers"><div class="hljs-ln-line hljs-ln-n" data-line-number="7"></div></div><div class="hljs-ln-code"><div class="hljs-ln-line"> </div></div></li><li><div class="hljs-ln-numbers"><div class="hljs-ln-line hljs-ln-n" data-line-number="8"></div></div><div class="hljs-ln-code"><div class="hljs-ln-line"><span class="hljs-keyword">INSERT</span> <span class="hljs-keyword">INTO</span> <span class="hljs-keyword">TABLE</span> test_a(user_id,score,create_time)</div></div></li><li><div class="hljs-ln-numbers"><div class="hljs-ln-line hljs-ln-n" data-line-number="9"></div></div><div class="hljs-ln-code"><div class="hljs-ln-line"><span class="hljs-keyword">WITH</span>(</div></div></li><li><div class="hljs-ln-numbers"><div class="hljs-ln-line hljs-ln-n" data-line-number="10"></div></div><div class="hljs-ln-code"><div class="hljs-ln-line"> <span class="hljs-keyword">SELECT</span> [<span class="hljs-string">'AA'</span>,<span class="hljs-string">'BB'</span>,<span class="hljs-string">'CC'</span>,<span class="hljs-string">'DD'</span>,<span class="hljs-string">'EE'</span>,<span class="hljs-string">'FF'</span>,<span class="hljs-string">'GG'</span>]</div></div></li><li><div class="hljs-ln-numbers"><div class="hljs-ln-line hljs-ln-n" data-line-number="11"></div></div><div class="hljs-ln-code"><div class="hljs-ln-line">)<span class="hljs-keyword">AS</span> dict</div></div></li><li><div class="hljs-ln-numbers"><div class="hljs-ln-line hljs-ln-n" data-line-number="12"></div></div><div class="hljs-ln-code"><div class="hljs-ln-line"><span class="hljs-keyword">SELECT</span> number <span class="hljs-keyword">AS</span> user_id,dict[number<span class="hljs-operator">%</span><span class="hljs-number">7</span><span class="hljs-operator">+</span><span class="hljs-number">1</span>],now() <span class="hljs-keyword">as</span> create_time <span class="hljs-keyword">FROM</span> numbers(<span class="hljs-number">5000</span>);</div></div></li></ol></code><div class="hljs-button {2}" data-title="復制" onclick="hljs.copyCode(event)"></div></pre>
<p> 3、此時尚未觸發分區合并,所以還未去重。</p>
<h2><a name="t25"></a>3.2、手動OPTIMIZE</h2>
<p> 在寫入數據后,立刻執行OPTIMIZE強制觸發新寫入分區的合并動作</p>
<p> 語法:OPTIMIZE TABLE [db.]name [ON CLUSTER cluster] [PARTITION partition | PARTITION ID 'partition_id'] [FINAL] [DEDUPLICATE [BY expression]]</p>
<pre data-index="7"><code class="language-sql hljs">OPTIMIZE <span class="hljs-keyword">TABLE</span> test_a <span class="hljs-keyword">FINAL</span>;</code><div class="hljs-button {2}" data-title="復制" onclick="hljs.copyCode(event)"></div></pre>
<h2><a name="t26"></a>3.3、通過Group by 去重</h2>
<p> 1、執行去重的查詢</p>
<pre data-index="8"><code class="language-sql hljs"><ol class="hljs-ln" style="width:100%"><li><div class="hljs-ln-numbers"><div class="hljs-ln-line hljs-ln-n" data-line-number="1"></div></div><div class="hljs-ln-code"><div class="hljs-ln-line"><span class="hljs-keyword">SELECT</span> </div></div></li><li><div class="hljs-ln-numbers"><div class="hljs-ln-line hljs-ln-n" data-line-number="2"></div></div><div class="hljs-ln-code"><div class="hljs-ln-line"> user_id,</div></div></li><li><div class="hljs-ln-numbers"><div class="hljs-ln-line hljs-ln-n" data-line-number="3"></div></div><div class="hljs-ln-code"><div class="hljs-ln-line"> argMax(score,create_time) <span class="hljs-keyword">AS</span> score,</div></div></li><li><div class="hljs-ln-numbers"><div class="hljs-ln-line hljs-ln-n" data-line-number="4"></div></div><div class="hljs-ln-code"><div class="hljs-ln-line"> argMax(deleted,create_time) <span class="hljs-keyword">AS</span> deleted,</div></div></li><li><div class="hljs-ln-numbers"><div class="hljs-ln-line hljs-ln-n" data-line-number="5"></div></div><div class="hljs-ln-code"><div class="hljs-ln-line"> <span class="hljs-built_in">max</span>(create_time) <span class="hljs-keyword">AS</span> ctime</div></div></li><li><div class="hljs-ln-numbers"><div class="hljs-ln-line hljs-ln-n" data-line-number="6"></div></div><div class="hljs-ln-code"><div class="hljs-ln-line"><span class="hljs-keyword">FROM</span> test_a</div></div></li><li><div class="hljs-ln-numbers"><div class="hljs-ln-line hljs-ln-n" data-line-number="7"></div></div><div class="hljs-ln-code"><div class="hljs-ln-line"><span class="hljs-keyword">GROUP</span> <span class="hljs-keyword">BY</span> user_id</div></div></li><li><div class="hljs-ln-numbers"><div class="hljs-ln-line hljs-ln-n" data-line-number="8"></div></div><div class="hljs-ln-code"><div class="hljs-ln-line"><span class="hljs-keyword">HAVING</span> deleted <span class="hljs-operator">=</span> <span class="hljs-number">0</span>;</div></div></li></ol></code><div class="hljs-button {2}" data-title="復制" onclick="hljs.copyCode(event)"></div></pre>
<p> 函數說明:</p>
<ul><li>argMax(field1,field2):按照field2的最大值取field1的值。</li></ul>
<p> 當我們更新數據時,會寫入一行新的數據,例如上面語句中,通過查詢最大的create_time得到修改后的score字段值。</p>
<p> 2、創建視圖</p>
<pre data-index="9"><code class="language-sql hljs"><ol class="hljs-ln" style="width:100%"><li><div class="hljs-ln-numbers"><div class="hljs-ln-line hljs-ln-n" data-line-number="1"></div></div><div class="hljs-ln-code"><div class="hljs-ln-line"><span class="hljs-keyword">CREATE</span> <span class="hljs-keyword">VIEW</span> view_test_a <span class="hljs-keyword">AS</span></div></div></li><li><div class="hljs-ln-numbers"><div class="hljs-ln-line hljs-ln-n" data-line-number="2"></div></div><div class="hljs-ln-code"><div class="hljs-ln-line"><span class="hljs-keyword">SELECT</span> </div></div></li><li><div class="hljs-ln-numbers"><div class="hljs-ln-line hljs-ln-n" data-line-number="3"></div></div><div class="hljs-ln-code"><div class="hljs-ln-line"> user_id,</div></div></li><li><div class="hljs-ln-numbers"><div class="hljs-ln-line hljs-ln-n" data-line-number="4"></div></div><div class="hljs-ln-code"><div class="hljs-ln-line"> argMax(score,create_time) <span class="hljs-keyword">AS</span> score,</div></div></li><li><div class="hljs-ln-numbers"><div class="hljs-ln-line hljs-ln-n" data-line-number="5"></div></div><div class="hljs-ln-code"><div class="hljs-ln-line"> argMax(deleted,create_time) <span class="hljs-keyword">AS</span> deleted,</div></div></li><li><div class="hljs-ln-numbers"><div class="hljs-ln-line hljs-ln-n" data-line-number="6"></div></div><div class="hljs-ln-code"><div class="hljs-ln-line"> <span class="hljs-built_in">max</span>(create_time) <span class="hljs-keyword">AS</span> ctime</div></div></li><li><div class="hljs-ln-numbers"><div class="hljs-ln-line hljs-ln-n" data-line-number="7"></div></div><div class="hljs-ln-code"><div class="hljs-ln-line"><span class="hljs-keyword">FROM</span> test_a</div></div></li><li><div class="hljs-ln-numbers"><div class="hljs-ln-line hljs-ln-n" data-line-number="8"></div></div><div class="hljs-ln-code"><div class="hljs-ln-line"><span class="hljs-keyword">GROUP</span> <span class="hljs-keyword">BY</span> user_id</div></div></li><li><div class="hljs-ln-numbers"><div class="hljs-ln-line hljs-ln-n" data-line-number="9"></div></div><div class="hljs-ln-code"><div class="hljs-ln-line"><span class="hljs-keyword">HAVING</span> deleted <span class="hljs-operator">=</span> <span class="hljs-number">0</span>;</div></div></li></ol></code><div class="hljs-button {2}" data-title="復制" onclick="hljs.copyCode(event)"></div></pre>
<p> 3、刪除數據測試</p>
<pre data-index="10"><code class="language-sql hljs"><ol class="hljs-ln" style="width:100%"><li><div class="hljs-ln-numbers"><div class="hljs-ln-line hljs-ln-n" data-line-number="1"></div></div><div class="hljs-ln-code"><div class="hljs-ln-line"><span class="hljs-comment">-- 原有1W條數據</span></div></div></li><li><div class="hljs-ln-numbers"><div class="hljs-ln-line hljs-ln-n" data-line-number="2"></div></div><div class="hljs-ln-code"><div class="hljs-ln-line"><span class="hljs-keyword">select</span> <span class="hljs-built_in">count</span>(<span class="hljs-operator">*</span>) <span class="hljs-keyword">from</span> test_a;</div></div></li><li><div class="hljs-ln-numbers"><div class="hljs-ln-line hljs-ln-n" data-line-number="3"></div></div><div class="hljs-ln-code"><div class="hljs-ln-line"> </div></div></li><li><div class="hljs-ln-numbers"><div class="hljs-ln-line hljs-ln-n" data-line-number="4"></div></div><div class="hljs-ln-code"><div class="hljs-ln-line"><span class="hljs-comment">-- 插入并查詢,原表有10001條數據,視圖表有10000條數據</span></div></div></li><li><div class="hljs-ln-numbers"><div class="hljs-ln-line hljs-ln-n" data-line-number="5"></div></div><div class="hljs-ln-code"><div class="hljs-ln-line"><span class="hljs-keyword">INSERT</span> <span class="hljs-keyword">INTO</span> <span class="hljs-keyword">TABLE</span> test_a(user_id,score,create_time) <span class="hljs-keyword">VALUES</span> (<span class="hljs-number">0</span>,<span class="hljs-string">'AAAA'</span>,now());</div></div></li><li><div class="hljs-ln-numbers"><div class="hljs-ln-line hljs-ln-n" data-line-number="6"></div></div><div class="hljs-ln-code"><div class="hljs-ln-line"> </div></div></li><li><div class="hljs-ln-numbers"><div class="hljs-ln-line hljs-ln-n" data-line-number="7"></div></div><div class="hljs-ln-code"><div class="hljs-ln-line"><span class="hljs-comment">-- 插入并查詢,原表有10002條數據,視圖表有9999條數據</span></div></div></li><li><div class="hljs-ln-numbers"><div class="hljs-ln-line hljs-ln-n" data-line-number="8"></div></div><div class="hljs-ln-code"><div class="hljs-ln-line"><span class="hljs-keyword">INSERT</span> <span class="hljs-keyword">INTO</span> <span class="hljs-keyword">TABLE</span> test_a(user_id,score,deleted,create_time) <span class="hljs-keyword">VALUES</span> (<span class="hljs-number">0</span>,<span class="hljs-string">'AAAA'</span>,<span class="hljs-number">1</span>,now());</div></div></li></ol></code><div class="hljs-button {2}" data-title="復制" onclick="hljs.copyCode(event)"></div></pre>
<p style="text-align:center;"><img alt="" height="547" src="https://img-blog.csdnimg.cn/cf2faafcd5824ac4bc01b87a8f207433.png?x-oss-process=image/watermark,type_d3F5LXplbmhlaQ,shadow_50,text_Q1NETiBAWXVhbl9DU0RG,size_20,color_FFFFFF,t_70,g_se,x_16" width="829"></p>
<p> 這行數據并沒有真正的被刪除,而是被過濾掉了。在一些合適場景下,可以結合表級別的TTL最終將物理數據刪除。</p>
<h2><a name="t27"></a>3.4、通過FINAL查詢</h2>
<p> 在查詢語句后增加FINAL修飾符,這樣在查詢的過程中將會執行Merge的特殊邏輯(例如數據去重,預聚合等)。</p>
<p> 但是這種方法在早期版本基本沒有人使用,因為在增加FINAL之后,我們的查詢將會變成一個單線程的執行過程,查詢速度非常慢。</p>
<p> <span style="color:#fe2c24;"><strong>在v20.5.2.7-stable版本中,FINAL查詢是支持多線程執行</strong>,</span><span style="color:#0d0016;">并且可以通過</span><span style="color:#fe2c24;">max_final_threads參數</span><span style="color:#0d0016;">控制單個查詢的線程數。但是目前讀取part部分的動作依然是串行的。</span></p>
<p><span style="color:#0d0016;"> FIANL查詢最終的性能和很多因素相關,列字段的大小、分區的數量等等都會影響到最終的查詢時間,所以還要結合實際場景取舍。</span></p>
<p><span style="color:#0d0016;"> 參考鏈接:https://github.com/ClickHouse/ClickHouse/pull/10463</span></p>
<h1><a name="t28"></a><span style="color:#0d0016;">4、物化視圖</span></h1>
<p><span style="color:#0d0016;"> ClickHouse的物化視圖是一種查詢結果的持久化,它確實是給我們帶來了查詢效率的提升。用戶查起來跟表沒有quiet,它就會一張表,它也像是一張時刻在預計算的表,創建的過程它是用了一個特殊引擎,加上后來as select,就是create一個table as select的寫法。</span></p>
<p><span style="color:#0d0016;"> “查詢結果集”的范圍很寬泛,可以是基礎表中部分數據的一份簡單拷貝,也可以使多表Join之后產生的結果或其子集,或者原始數據的聚合指標等等。所以,物化視圖不會隨著基礎表的變化而變化,所以它也稱為快照(snapshot)。</span></p>
<h2><a name="t29"></a><span style="color:#0d0016;">4.1、概述</span></h2>
<h3><a name="t30"></a><span style="color:#0d0016;">4.1.1、物化視圖與普通視圖的區別</span></h3>
<p><span style="color:#0d0016;"> </span><span style="color:#fe2c24;"> 普通視圖不保存數據,保存的僅僅是查詢語句,</span>查詢的時候還是從原表讀取數據,可以將普通視圖理解為是個子查詢。<span style="color:#fe2c24;">物化視圖是把查詢的結果根據相應的引擎存入到了磁盤或內存中,</span>對數據重新進行了組織,你可以理解物化視圖是完全的一張新表。</p>
<h3><a name="t31"></a>4.1.2、優缺點</h3>
<p> 優點:查詢速度快,要是把物化視圖這些規則全部寫好, 它比元數據查詢快樂很多,總的行數少了,因為都預計算好了。</p>
<p> 缺點:它的本質是一個流式數據的使用場景,是累加式的技術,所以要用歷史數據做去重,去核這樣的分析,在物化視圖里面是不太好用的。在某些場景的使用也是有限的。而且如果一戰表加了很多物化視圖,在寫這張表的時候,就會消耗很多機器的資源,比如護具帶寬占滿、存儲一下子增加了很多。</p>
<h3><a name="t32"></a>4.1.3、基本語法</h3>
<p> 也是create語法,會創建一個隱藏的目標表來保存視圖數據。也可以TO表名,保存到一張顯示的表。沒有加TO表名,表名默認就會.inner.物化視圖名。</p>
<pre data-index="11"><code class="language-sql hljs"><span class="hljs-keyword">CREATE</span> [MATERIALIZED] <span class="hljs-keyword">VIEW</span> [IF <span class="hljs-keyword">NOT</span> <span class="hljs-keyword">EXISTS</span>] [db.]table_name [<span class="hljs-keyword">TO</span>[db.]name] [ENGINE <span class="hljs-operator">=</span> engine] [POPULATE] <span class="hljs-keyword">AS</span> <span class="hljs-keyword">SELECT</span> ...</code><div class="hljs-button {2}" data-title="復制" onclick="hljs.copyCode(event)"></div></pre>
<p> 1、創建物化視圖的限制</p>
<ol><li>必須指定物化視圖的engine用于數據存儲</li><li>TO [db].[table]語法的使用,不得使用POPULATE。</li><li>查詢語句(select)可以包含下面的子句:DISTINCT,GROUP BY,ORDER BY,LIMIT...</li><li>物化視圖的alter操作有些限制,操作起來不大方便。</li><li>如物化視圖的定義使用了 TO [db.]name子語句,則可以將目標表的視圖卸載DETACH再裝載ATTACH。</li></ol>
<p> 2、物化視圖的數據更新</p>
<ol><li>物化視圖創建好之后,若源表被寫入新數據則物化視圖也會同步更新。</li><li>POPULATE關鍵字決定了物化視圖的更新策略:
<ol><li>若有POPULATE則在創建視圖的過程會將源表已經存在的數據一并導入,類似于create table ... as</li><li>若無POPULATE則物化視圖在創建之后沒有數據,只會在創建只有同步之后寫入源表的數據。</li><li>clickhouse官方并不推薦使用POPULATE,因為在創建物化視圖的過程中同時寫入的數據不能被插入物化視圖。</li></ol></li><li>物化視圖不支持同步刪除,若源表的數據不存在(刪除了)則物化視圖的數據仍然保留。</li><li>物化視圖是一種特殊的數據表,可以用show table查看。</li></ol>
<h2><a name="t33"></a></h2>
<p></p>
<p></p>
<p></p>
</div><div data-report-view="{"mod":"1585297308_001","spm":"1001.2101.3001.6548","dest":"https://blog.csdn.net/Yuan_CSDF/article/details/122293843","extend1":"pc","ab":"new"}"><div></div></div>
</div>