java api · TUNA-daily

[TOC] > * 檢索可以： * 全文搜索的數據首先會被analysis分析器分詞解析成倒排索引，進而存儲成為文檔，以便于全文搜索。 1. 結構化查詢：在某字段精準的結構化匹配查詢（類似于關系型數據庫） 2. 全文搜索：使用文檔的所有字段匹配某個關鍵字，然后按照匹配的相似程度排序輸出搜索結果 3. 結合以上兩條搜索方式 > * search api 1. 簡單風格：查詢字符串(query string)將所有參數通過查詢字符串定義 2. 結構化查詢語句（DSL）：使用json表示完整的請求體 > * elasticsearch的數據類型 1. 確切值：如名字，日期等確定的唯一的詞或短語（數值、日期。。。） 2. 全文文本：人類語言書寫的文章，郵件內容等，為了對全文文本進行分析，elasticsearch會對文本進行分析（分詞），形成倒排索引（字符串）。 ### 1. 中文分詞器 ----------------------------------------------------------------------------------------------------------------------------------------------- 1.安裝 ~~~ ./elasticsearch-plugin install https://github.com/medcl/elasticsearch-analysis-ik/releases/download/v5.5.2/elasticsearch-analysis-ik-5.5.2.zip ~~~ 2.創建索引 `curl -XPUT http://192.168.56.130:9200/index` 3.設置屬性 ~~~ curl -XPOST http://192.168.56.130:9200/index/fulltext/_mapping -d' { "properties": { "content": { "type": "text", "analyzer": "ik_max_word", "search_analyzer": "ik_max_word" } } }' ~~~ 4. 插入數據 ~~~ curl -XPOST http://192.168.56.130:9200/index/fulltext/1 -d' {"content":"美國留給伊拉克的是個爛攤子嗎"}' curl -XPOST http://192.168.56.130:9200/index/fulltext/2 -d' {"content":"公安部：各地校車將享最高路權"}' curl -XPOST http://192.168.56.130:9200/index/fulltext/3 -d' {"content":"中韓漁警沖突調查：韓警平均每天扣1艘中國漁船"}' curl -XPOST http://192.168.56.130:9200/index/fulltext/4 -d' {"content":"中國駐洛杉磯領事館遭亞裔男子槍擊嫌犯已自首"}' ~~~ 5.查詢數據 ~~~ curl -XPOST http://192.168.56.130:9200/index/fulltext/_search?pretty -d' { "query" : { "match" : { "content" : "中國" }}, "highlight" : { "pre_tags" : ["<tag1>", "<tag2>"], "post_tags" : ["</tag1>", "</tag2>"], "fields" : { "content" : {} } } }' ~~~ 6. 查詢類型 ![](https://box.kancloud.cn/415c688031f65288c4fc27664550697b_1482x609.png) ### 2. java 客戶端 1. pom.xml ~~~ <?xml version="1.0" encoding="UTF-8"?> <project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd"> <modelVersion>4.0.0</modelVersion> <groupId>com.aixin.elasticsearch</groupId> <artifactId>elasticearchclient</artifactId> <version>1.0</version> <dependencies> <dependency> <groupId>org.elasticsearch.client</groupId> <artifactId>transport</artifactId> <version>5.5.2</version> </dependency> <dependency> <groupId>org.apache.logging.log4j</groupId> <artifactId>log4j-core</artifactId> <version>2.8.2</version> </dependency> <dependency> <groupId>junit</groupId> <artifactId>junit</artifactId> <version>4.12</version> </dependency> <dependency> <groupId>org.elasticsearch</groupId> <artifactId>elasticsearch</artifactId> <version>5.5.1</version> </dependency> <dependency> <groupId>org.elasticsearch.client</groupId> <artifactId>x-pack-transport</artifactId> <version>5.5.1</version> </dependency> <dependency> <groupId>org.elasticsearch.client</groupId> <artifactId>rest</artifactId> <version>5.5.1</version> </dependency> <dependency> <groupId>com.fasterxml.jackson.core</groupId> <artifactId>jackson-databind</artifactId> <version>2.9.1</version> </dependency> </dependencies> <build> <plugins> <plugin> <groupId>org.apache.maven.plugins</groupId> <artifactId>maven-assembly-plugin</artifactId> <version>2.5.5</version> <configuration> <descriptorRefs> <descriptorRef>jar-with-dependencies</descriptorRef> </descriptorRefs> </configuration> <executions> <execution> <id>make-assembly</id> <phase>package</phase> <goals> <goal>single</goal> </goals> </execution> </executions> </plugin> <plugin> <groupId>org.apache.maven.plugins</groupId> <artifactId>maven-source-plugin</artifactId> <executions> <execution> <id>attach-sources</id> <goals> <goal>jar</goal> </goals> </execution> </executions> </plugin> </plugins> </build> </project> ~~~ ### 2.1 node 自動探查 1. client集群自動探查 > 1. 默認情況下，是根據我們手動指定的所有節點，依次輪詢這些節點，來發送各種請求的，如下面的代碼，我們可以手動為client指定多個節點 ~~~ TransportClient client = new PreBuiltTransportClient(settings) .addTransportAddress(new InetSocketTransportAddress(InetAddress.getByName("localhost1"), 9300)) .addTransportAddress(new InetSocketTransportAddress(InetAddress.getByName("localhost2"), 9300)) .addTransportAddress(new InetSocketTransportAddress(InetAddress.getByName("localhost3"), 9300)); ~~~ >2. 但是問題是，如果我們有成百上千個節點呢？為了對來自客戶端的請求可以負載均衡，難道也要這樣手動添加嗎？ > 3. es client提供了一種集群節點自動探查的功能，打開這個自動探查機制以后，es client會根據我們手動指定的幾個節點連接過去，然后通過集群狀態`自動獲取`當前集群中的`所有data node`，然后用這份完整的列表更新自己內部要發送請求的node list。默認每隔5秒鐘，就會更新一次node list。 > 但是注意，es cilent是不會將Master node納入node list的，因為要避免給master node發送搜索等請求。 > 這樣的話，我們其實直接就指定幾個master node，或者1個node就好了，client會自動去探查集群的所有節點，而且每隔5秒還會自動刷新。非常棒。 ~~~ Settings settings = Settings.builder() .put("client.transport.sniff", true).build(); # 開啟自動探查 TransportClient client = new PreBuiltTransportClient(settings); ~~~ * 使用上述的settings配置，將client.transport.sniff設置為true即可打開集群節點自動探查功能 2. 代碼 ~~~ package com.aixin.elasticsearch.client; import org.apache.lucene.index.Fields; import org.apache.lucene.index.Terms; import org.apache.lucene.index.TermsEnum; import org.elasticsearch.action.delete.DeleteResponse; import org.elasticsearch.action.get.GetResponse; import org.elasticsearch.action.search.SearchRequestBuilder; import org.elasticsearch.action.search.SearchResponse; import org.elasticsearch.action.search.SearchType; import org.elasticsearch.action.termvectors.TermVectorsRequest; import org.elasticsearch.action.termvectors.TermVectorsResponse; import org.elasticsearch.client.transport.TransportClient; import org.elasticsearch.common.settings.Settings; import org.elasticsearch.common.transport.InetSocketTransportAddress; import org.elasticsearch.index.query.QueryBuilder; import org.elasticsearch.index.query.QueryBuilders; import org.elasticsearch.index.query.TermQueryBuilder; import org.elasticsearch.search.SearchHit; import org.elasticsearch.search.SearchHits; import org.elasticsearch.search.fetch.subphase.highlight.HighlightBuilder; import org.elasticsearch.search.fetch.subphase.highlight.HighlightField; import org.elasticsearch.xpack.client.PreBuiltXPackTransportClient; import java.io.IOException; import java.net.InetAddress; import java.util.*; import static org.elasticsearch.index.query.QueryBuilders.matchQuery; /** * Created by dailin on 2017/8/31. */ public class ElasticsearchClient { private static ElasticsearchClient elasticsearchClient = null; private static TransportClient client = null; private ElasticsearchClient(String host,String user,String password) { try { Settings settings = Settings.builder() .put("xpack.security.user", user+":"+password) .put("cluster.name", "e-cluster").build(); client = new PreBuiltXPackTransportClient(settings) .addTransportAddress(new InetSocketTransportAddress(InetAddress.getByName(host), 9300)); } catch (Exception e) { e.printStackTrace(); } } public static ElasticsearchClient getInstance(String host,String user,String password) { if (elasticsearchClient == null) { synchronized (ElasticsearchClient.class) { if (elasticsearchClient == null) { elasticsearchClient = new ElasticsearchClient(host,user,password); } } } return elasticsearchClient; } /** * 獲取文檔 * @param index 索引 * @param type 類型 * @param id 文檔id * @return */ public Document get(String index, String type, String id) { Document document = new Document(index,type,id); Map<String, String> result = new HashMap<String, String>(); GetResponse response = client.prepareGet(index, type, id).get(); Map<String, Object> source = response.getSource(); for (Map.Entry<String, Object> map : source.entrySet()) { document.puToFile(map.getKey(), map.getValue().toString()); } return document; } /** * 刪除文檔 * @param index 索引 * @param type 類型 * @param id 文檔id */ public void deleteById(String index, String type, String id) { DeleteResponse dr = client.prepareDelete(index, type, id).get(); } public List<String> matchs(String index, String field, String text) { List<String> result = new ArrayList<String>(); QueryBuilder qb = matchQuery(field, text); SearchResponse response = client.prepareSearch(index) .setSearchType(SearchType.QUERY_THEN_FETCH) .setQuery(qb)// 設置字段和值 .get(); SearchHits hits = response.getHits(); SearchHit[] hits1 = hits.getHits(); for (SearchHit sh : hits1) { for (Map.Entry<String, Object> map : sh.getSource().entrySet()) { result.add(map.getValue().toString()); } } return result; } /** * 文檔詞頻統計 * @param index * @param type * @param id * @return Map<詞語，頻率> * @throws IOException */ public Map<String,Integer> termVectos(String index, String type, String id,String field) throws IOException { TermVectorsRequest.FilterSettings filterSettings = new TermVectorsRequest.FilterSettings(); filterSettings.minWordLength = 2; TermVectorsResponse resp = client.prepareTermVectors(index, type, id) .setFilterSettings(filterSettings) .setSelectedFields(field) .execute().actionGet(); //獲取字段 Fields fields = resp.getFields(); Iterator<String> iterator = fields.iterator(); Map<String,Integer> result = new HashMap<String, Integer>(); while (iterator.hasNext()){ String dfield = iterator.next(); Terms terms = fields.terms(dfield); //獲取字段對應的terms TermsEnum termsEnum = terms.iterator(); //termsEnum包含詞語統計信息 while (termsEnum.next() != null){ String word = termsEnum.term().utf8ToString(); int freq = termsEnum.postings(null,120).freq(); result.put(word,freq); } } return result; } /** * 字段之上精準查詢 * @param index * @param type * @param field * @param text * @return */ public List<String> termQuery( String index, String type,String field,String text) { TermQueryBuilder termQueryBuilder = QueryBuilders.termQuery(field, text); SearchRequestBuilder searchRequestBuilder = client.prepareSearch(index) .setTypes(type) .setQuery(termQueryBuilder); SearchResponse searchResponse = searchRequestBuilder.get(); SearchHit[] hits = searchResponse.getHits().getHits(); List<String> values = new ArrayList<String>(); for (SearchHit hit : hits) { for (Map.Entry<String, Object> map : hit.getSource().entrySet()) { values.add(map.getValue().toString()); } } return values; } /** * 分組聚合，對應操作一節中7。查詢聚合（多次分組） */ public void doubleAggSearch() { SearchResponse searchResponse = client.prepareSearch("company") .addAggregation(AggregationBuilders.terms("group_by_country").field("country") .subAggregation(AggregationBuilders .dateHistogram("group_by_join_date") .field("join_date") .dateHistogramInterval(DateHistogramInterval.YEAR) .subAggregation(AggregationBuilders.avg("avg_age").field("age"))) ) .execute().actionGet(); Map<String, Aggregation> aggrMap = searchResponse.getAggregations().asMap(); StringTerms groupByCountry = (StringTerms) aggrMap.get("group_by_country"); Iterator<StringTerms.Bucket> groupByCountryBucketIterator = groupByCountry.getBuckets().iterator(); //country 組 while (groupByCountryBucketIterator.hasNext()) { StringTerms.Bucket groupByCountryBucket = groupByCountryBucketIterator.next(); System.out.println(groupByCountryBucket.getKey() + ":" + groupByCountryBucket.getDocCount()); Histogram groupByJoinDate = (Histogram) groupByCountryBucket.getAggregations().asMap().get("group_by_join_date"); //獲取country內日期組 Iterator<? extends Histogram.Bucket> groupByJoinDateBucketIterator = groupByJoinDate.getBuckets().iterator(); while (groupByJoinDateBucketIterator.hasNext()) { org.elasticsearch.search.aggregations.bucket.histogram.Histogram.Bucket groupByJoinDateBucket = groupByJoinDateBucketIterator.next(); System.out.println(groupByJoinDateBucket.getKey() + ":" + groupByJoinDateBucket.getDocCount()); Avg avg = (Avg) groupByJoinDateBucket.getAggregations().asMap().get("avg_age"); //日期組內平均年齡 System.out.println(avg.getValue()); } } } /** * 全文搜索 * @param index 索引 * @param field 字段 * @param text 查詢語句 * @return Document對象 */ public List<Document> match(String index, String field, String text) { List<Document> result = new ArrayList<Document>(); QueryBuilder qb = matchQuery(field, text); SearchResponse response = client.prepareSearch(index) .setSearchType(SearchType.QUERY_THEN_FETCH) .setQuery(qb)// 設置字段和值 .get(); SearchHits hits = response.getHits(); SearchHit[] searchHits = hits.internalHits(); for (SearchHit hist : searchHits){ Document document = new Document(hist.getIndex(),hist.getType(),hist.getId()); result.add(document); Map<String, Object> source = hist.getSource(); Set<Map.Entry<String, Object>> entries = source.entrySet(); for (Map.Entry<String,Object> v : entries){ document.puToFile(v.getKey(),v.getValue().toString()); } } return result; } /** * 標記與查詢語句相匹配的詞語 * @param index 索引 * @param field 字段 * @param text 查詢語句 * @return 文檔（Document）對象 */ public List<Document> matchHighLight(String index, String field, String text) { List<Document> result = new ArrayList<Document>(); HighlightBuilder highlightBuilder = new HighlightBuilder(); highlightBuilder.postTags("<mark>"); highlightBuilder.preTags("</mark>"); highlightBuilder.field(field); QueryBuilder qb = matchQuery(field, text); SearchResponse response = client.prepareSearch(index) .highlighter(highlightBuilder) .setSearchType(SearchType.QUERY_THEN_FETCH) .setQuery(qb)// 設置字段和值 .get(); SearchHits hits = response.getHits(); SearchHit[] searchHits = hits.internalHits(); for (SearchHit hist : searchHits){ Map<String, HighlightField> highlightFields = hist.getHighlightFields(); Set<Map.Entry<String, HighlightField>> entries1 = highlightFields.entrySet(); for (Map.Entry<String,HighlightField> v : entries1){ Document document = new Document(hist.getIndex(),hist.getType(),hist.getId()); document.puToFile( v.getKey(),v.getValue().toString()); result.add(document); } } return result; } } ~~~ ### 2.2 更新字段 ~~~ package com.roncoo.es.senior; import java.net.InetAddress; import org.elasticsearch.action.index.IndexRequest; import org.elasticsearch.action.update.UpdateRequest; import org.elasticsearch.action.update.UpdateResponse; import org.elasticsearch.client.transport.TransportClient; import org.elasticsearch.common.settings.Settings; import org.elasticsearch.common.transport.InetSocketTransportAddress; import org.elasticsearch.common.xcontent.XContentFactory; import org.elasticsearch.transport.client.PreBuiltTransportClient; public class UpsertCarInfoApp { @SuppressWarnings({ "unchecked", "resource" }) public static void main(String[] args) throws Exception { Settings settings = Settings.builder() .put("cluster.name", "e-cluster") .put("client.transport.sniff", true) .build(); TransportClient client = new PreBuiltTransportClient(settings) .addTransportAddress(new InetSocketTransportAddress(InetAddress.getByName("192.168.56.130"), 9300)); IndexRequest indexRequest = new IndexRequest("car_shop", "cars", "1") .source(XContentFactory.jsonBuilder() .startObject() .field("brand", "寶馬") .field("name", "寶馬320") .field("price", 310000) .field("produce_date", "2017-01-01") .endObject()); UpdateRequest updateRequest = new UpdateRequest("car_shop", "cars", "1") .doc(XContentFactory.jsonBuilder() .startObject() .field("price", 320000) .endObject()) .upsert(indexRequest); //如果有這個index更新，沒有就插入 UpdateResponse updateResponse = client.update(updateRequest).get(); System.out.println(updateResponse.getVersion()); client.close(); } } ~~~