<ruby id="bdb3f"></ruby>

    <p id="bdb3f"><cite id="bdb3f"></cite></p>

      <p id="bdb3f"><cite id="bdb3f"><th id="bdb3f"></th></cite></p><p id="bdb3f"></p>
        <p id="bdb3f"><cite id="bdb3f"></cite></p>

          <pre id="bdb3f"></pre>
          <pre id="bdb3f"><del id="bdb3f"><thead id="bdb3f"></thead></del></pre>

          <ruby id="bdb3f"><mark id="bdb3f"></mark></ruby><ruby id="bdb3f"></ruby>
          <pre id="bdb3f"><pre id="bdb3f"><mark id="bdb3f"></mark></pre></pre><output id="bdb3f"></output><p id="bdb3f"></p><p id="bdb3f"></p>

          <pre id="bdb3f"><del id="bdb3f"><progress id="bdb3f"></progress></del></pre>

                <ruby id="bdb3f"></ruby>

                ??一站式輕松地調用各大LLM模型接口,支持GPT4、智譜、豆包、星火、月之暗面及文生圖、文生視頻 廣告
                [TOC] # 自定義函數 當Hive提供的內置函數無法滿足你的業務處理需要時,此時就可以考慮使用用戶自定義函數(`UDF:user-defined function`) # 自定義函數類別 * UDF 一進一出.作用于單個數據行,產生一個數據行作為輸出(數學函數,字符串函數). * UDAF 多進一出.(用戶定義聚集函數):接收多個輸入數據行,并產生一個輸出數據行(count,max).聚集函數.類似于count/max/min * UDTF 一進多出. # UDF開發實例 ## 編程步驟 1. 繼承org.apache.hadoop.hive.ql.UDF 2. 需要實現evaluate函數,evaluate函數支持重載 3. 在hive的命令行窗口創建函數: a. 添加jar,`add jar linux_jar_path` b. 創建`function create [temporary] function [dbname] function_name AS class_name;` 4. 在hive的命令行窗口刪除函數 `Drop [temporary] function [if exists] [dbname] function_name.` ## 簡單UDF示例 1. 先開發一個java類,繼承UDF,并重載evaluate方法 ~~~ import org.apache.hadoop.hive.ql.exec.UDF; import org.apache.hadoop.io.Text; public final class Lower extends UDF{ public Text evaluate(final Text s){ if(s==null){return null;} return new Text(s.toString().toLowerCase()); } } ~~~ 2. 打成jar包上傳到服務器 3. 將jar包添加到hive的classpath ~~~ hive>add JAR /home/hadoop/udf.jar; ~~~ 4. 創建臨時函數與開發好的java class關聯 ~~~ Hive>create temporary function tolowercase as 'cn.bigdata.Lower'; ~~~ 5. 即可在hql中使用自定義的函數tolowercase ip? ~~~ select tolowercase(name),age from t_test; ~~~ **自定義函數只在當前session有用** ## Json數據解析UDF開發 有原始json數據如下: ~~~ {"movie":"1193","rate":"5","timeStamp":"978300760","uid":"1"} {"movie":"661","rate":"3","timeStamp":"978302109","uid":"1"} {"movie":"914","rate":"3","timeStamp":"978301968","uid":"1"} {"movie":"3408","rate":"4","timeStamp":"978300275","uid":"1"} {"movie":"2355","rate":"5","timeStamp":"978824291","uid":"1"} {"movie":"1197","rate":"3","timeStamp":"978302268","uid":"1"} {"movie":"1287","rate":"5","timeStamp":"978302039","uid":"1"} ~~~ 需要將數據導入到hive數據倉庫中 ~~~ create table rat_json(line string) row format delimited; ~~~ ~~~ load data local inpath '/root/rating.json' into table rat_json; ~~~ 我不管你中間用幾個表,最終我要得到一個結果表: ~~~ movie rate timestamp uid 1197 3 978302268 1 ~~~ 注:全在hive中完成,可以用自定義函數 ~~~ //{"movie":"1721","rate":"3","timeStamp":"965440048","uid":"5114"} public class MovieRateBean { private String movie; private String rate; private String timeStamp; private String uid; public String getMovie() { return movie; } public void setMovie(String movie) { this.movie = movie; } public String getRate() { return rate; } public void setRate(String rate) { this.rate = rate; } public String getTimeStamp() { return timeStamp; } public void setTimeStamp(String timeStamp) { this.timeStamp = timeStamp; } public String getUid() { return uid; } public void setUid(String uid) { this.uid = uid; } @Override public String toString() { return movie + "\t" + rate + "\t" + timeStamp + "\t" + uid; } } ~~~ ~~~ import java.io.IOException; import org.apache.hadoop.hive.ql.exec.UDF; import parquet.org.codehaus.jackson.JsonParseException; import parquet.org.codehaus.jackson.map.JsonMappingException; //import org.codehaus.jackson.map.ObjectMapper; import parquet.org.codehaus.jackson.map.ObjectMapper; public class JsonParser extends UDF { public String evaluate(String jsonLine) { //可以用這個類幫我們解析json ObjectMapper objectMapper = new ObjectMapper(); try { MovieRateBean bean = objectMapper.readValue(jsonLine, MovieRateBean.class); return bean.toString(); } catch (Exception e) { } return ""; } } ~~~ 然后把這個函數打成jar包并上傳上去 添加jar包 ~~~ add JAR /root/hiveStudy/jsonParser.jar; ~~~ 創建臨時函數 ~~~ create temporary function parsejson as 'com.hive.JsonParser'; ~~~ ~~~ select parsejson(line) from rat_json limit 10; ~~~ 出現這個錯誤,就是函數寫錯了,或者打包出錯了 ~~~ SemanticException [Error 10014]: Line 1:7 Wrong arguments 'line': No matching method for class com.hive.ParseJson with (string). Possible choices: ~~~ ## 內置解析json `get_json_object` ~~~ hive> select * from rat_json limit 10; OK {"movie":"1193","rate":"5","timeStamp":"978300760","uid":"1"} {"movie":"661","rate":"3","timeStamp":"978302109","uid":"1"} {"movie":"914","rate":"3","timeStamp":"978301968","uid":"1"} {"movie":"3408","rate":"4","timeStamp":"978300275","uid":"1"} {"movie":"2355","rate":"5","timeStamp":"978824291","uid":"1"} {"movie":"1197","rate":"3","timeStamp":"978302268","uid":"1"} {"movie":"1287","rate":"5","timeStamp":"978302039","uid":"1"} {"movie":"2804","rate":"5","timeStamp":"978300719","uid":"1"} {"movie":"594","rate":"4","timeStamp":"978302268","uid":"1"} {"movie":"919","rate":"4","timeStamp":"978301368","uid":"1"} Time taken: 0.131 seconds, Fetched: 10 row(s) hive> select get_json_object(line,'$.movie') as moive,get_json_object(line,'$.rate') as rate from rat_json limit 10; OK 1193 5 661 3 914 3 3408 4 2355 5 1197 3 1287 5 2804 5 594 4 919 4 Time taken: 0.108 seconds, Fetched: 10 row(s) ~~~ # 函數的有效區域 ## 臨時函數 1. 自定義UDF需要繼承org.apache.hadoop.hive.ql.UDF。 2. 需要實現evaluate函數,evaluate函數支持重載。(注意:在一個庫中創建的UDF函數,不能在另一個庫中使用 ) 3. 把程序打包放到目標機器上去; 4. 進入hive客戶端,添加jar包:`hive>add jar /run/jar/udf_test.jar;` 5. 創建臨時函數:`hive>create temporary function add_example AS 'hive.udf.Add';` 不加temporary就是永久 6. 查詢HQL語句: ~~~ SELECT add_example(8, 9) FROM scores; SELECT add_example(scores.math, scores.art) FROM scores; SELECT add_example(6, 7, 8, 6.8) FROM scores; ~~~ 7. 銷毀臨時函數:hive> DROP TEMPORARY FUNCTION add_example; 注:**UDF只能實現一進一出的操作,如果需要實現多進一出,則需要實現UDAF** ## 永久函數 1. 自定義UDF需要繼承org.apache.hadoop.hive.ql.UDF。(注意該類的包名必須是org.apache.hadoop.hive.ql.udf) 2. 需要實現evaluate函數,evaluate函數支持重載。 3. 修改FunctionRegistry這個類,注冊定義的udf 4. 把udf函數編譯成class放到hive-exec-0.12.0-cdh5.0.0.jar中org\apache\hadoop\hive\ql\udf 路徑下面 5. 將新的FunctionRegistry編譯后的class文件替換hive-exec-0.12.0-cdh5.0.0.jar中的org.apache.hadoop.hive.ql.exec # 查看函數 show functions 能夠找到這個創建的函數. ~~~ hive> show functions; ~~~ 查看該自定義函數的函數描述: ~~~ hive> desc function cz; ~~~ 沒有描述的話,是因為自定義這個函數的時候沒有寫 在jar源代碼包中看關于hive內置year函數的源碼 ![](https://box.kancloud.cn/a1bbe6e1f403e930f3b90d182a75d033_706x308.png) 可以仿照這個內置的函數在編寫自定義hive函數的時候,定義函數的描述description. year函數在hive中的描述 ![](https://box.kancloud.cn/b31955fd2fb07af9446e2ad5668d8526_716x270.png)
                  <ruby id="bdb3f"></ruby>

                  <p id="bdb3f"><cite id="bdb3f"></cite></p>

                    <p id="bdb3f"><cite id="bdb3f"><th id="bdb3f"></th></cite></p><p id="bdb3f"></p>
                      <p id="bdb3f"><cite id="bdb3f"></cite></p>

                        <pre id="bdb3f"></pre>
                        <pre id="bdb3f"><del id="bdb3f"><thead id="bdb3f"></thead></del></pre>

                        <ruby id="bdb3f"><mark id="bdb3f"></mark></ruby><ruby id="bdb3f"></ruby>
                        <pre id="bdb3f"><pre id="bdb3f"><mark id="bdb3f"></mark></pre></pre><output id="bdb3f"></output><p id="bdb3f"></p><p id="bdb3f"></p>

                        <pre id="bdb3f"><del id="bdb3f"><progress id="bdb3f"></progress></del></pre>

                              <ruby id="bdb3f"></ruby>

                              哎呀哎呀视频在线观看