Full-Body Search

在Search DSL中,需要通过GET方法带Body请求:

GET /_search
{} 

GET /index_2014*/type1,type2/_search
{}

GET /_search
{
  "from": 30,
  "size": 10
}

带Body的GET请求看上去可能很奇怪,某些语言的HTTP类库甚至不支持GET请求带Body,如JavaScript。但是这实际是符合RFC 7231标准的,但是标准并未定义GET带Body应该作何种响应。所以基于这种原因,某些HTTP Server支持,某些不支持,尤其是代理服务器。ES开发者认为GET相比POST更合语义,但是由于某些类库不支持GET带Body,因此某些API也支持POST请求。 如:

POST /_search
{
  "from": 30,
  "size": 10
}

Query DSL

要使用Query DSL,给query字段传递一个查询条件即可:

GET /_search
{
    "query": YOUR_QUERY_HERE
}

空query({})等价于使用match_all:

GET /_search
{
    "query": {
        "match_all": {}
    }
}

Query结构

一个Query通常结构如下:

{
    QUERY_NAME: {
        ARGUMENT: VALUE,
        ARGUMENT: VALUE,...
    }
}

如果引用了一个特定的字段,还应该有如下的结构:

{
    QUERY_NAME: {
        FIELD_NAME: {
            ARGUMENT: VALUE,
            ARGUMENT: VALUE,...
        }
    }
}

例如,可以使用match查询tweet字段是否提到过elasticsearch:

{
    "match": {
        "tweet": "elasticsearch"
    }
}

完整的查询就像这样:

GET /_search
{
    "query": {
        "match": {
            "tweet": "elasticsearch"
        }
    }
}

组合多个查询条件

Query clauses可以组合简单查询条件成为复杂查询:

  • Leaf clauses (like the match clause) that are used to compare a field (or fields) to a query string.
  • Compound clauses that are used to combine other query clauses. For instance, a bool clause allows you to combine other clauses that either must match, must_not match, or should match if possible. They can also include non-scoring, filters for structured search:
    {
      "bool": {
          "must":     { "match": { "tweet": "elasticsearch" }},
          "must_not": { "match": { "name":  "mary" }},
          "should":   { "match": { "tweet": "full text" }},
          "filter":   { "range": { "age" : { "gt" : 30 }} }
      }
    }
    

Query and Filters

ES的Query DSL是一组查询条件的集合。每一组查询都可以分为filtering context and query context.

filtering表示non-scoringfiltering查询,如"Does this document match?". The answer is always a simple, binary yes|no.

query指的是"scoring" query。这将判断文档是否匹配,以及文档如何匹配。

As a general rule, use query clauses for full-text search or for any condition that should affect the relevance score, and use filters for everything else.

几个常用的查询:

{ "match_all": {}}

{ "match": { "age":    26           }}
{ "match": { "date":   "2014-09-01" }}
{ "match": { "public": true         }}
{ "match": { "tag":    "full_text"  }}

{
    "multi_match": {
        "query":    "full text search",
        "fields":   [ "title", "body" ]
    }
}

{
    "range": {
        "age": {
            "gte":  20,
            "lt":   30
        }
    }
}

{ "term": { "age":    26           }}
{ "term": { "date":   "2014-09-01" }}
{ "term": { "public": true         }}
{ "term": { "tag":    "full_text"  }}

{ "terms": { "tag": [ "search", "full_text", "nosql" ] }}

{
    "exists":   {
        "field":    "title"
    }
}

The term query is used to search by exact values, be they numbers, dates, Booleans, or not_analyzed exact-value string fields.

联合查询

使用bool查询将多个子查询联合在一起。包含以下几个属性:

  • must: Clauses that must match for the document to be included.
  • must_not: Clauses that must not match for the document to be included.
  • should: If these clauses match, they increase the _score; otherwise, they have no effect. They are simply used to refine the relevance score for each document.
  • filter: Clauses that must match, but are run in non-scoring, filtering mode. These clauses do not contribute to the score, instead they simply include/exclude documents based on their criteria.

每个子查询都会分别为每个文档独立的计算关联性得分,bool将每个子查询的得分进行合并汇总。

以下这个查询将查找title匹配字符串how to make millions,并且mark不是spam的文档。如果有文档属于starred,或者是2014年以后的,排名就比那些不匹配的高。如果同时满足这两种条件排名会更高:

{
    "bool": {
        "must":     { "match": { "title": "how to make millions" }},
        "must_not": { "match": { "tag":   "spam" }},
        "should": [
            { "match": { "tag": "starred" }},
            { "range": { "date": { "gte": "2014-01-01" }}}
        ]
    }
}

TIP: 如果没有must条件,那么should条件至少会匹配一个。但是,如果含有至少一个must,那么should条件可以无需匹配到。

如果不想让date影响到分数,我们可以用filter子句预限定文档范围:

{
    "bool": {
        "must":     { "match": { "title": "how to make millions" }},
        "must_not": { "match": { "tag":   "spam" }},
        "should": [
            { "match": { "tag": "starred" }}
        ],
        "filter": {
          "range": { "date": { "gte": "2014-01-01" }} 
        }
    }
}

任何查询条件都可以以这种形式,简单的移动到filter子句中,自动转换为non-scoring查询。

如果你需要以多种不同形式查询,bool本身就可以放入filter作为non-scoring查询:

{
    "bool": {
        "must":     { "match": { "title": "how to make millions" }},
        "must_not": { "match": { "tag":   "spam" }},
        "should": [
            { "match": { "tag": "starred" }}
        ],
        "filter": {
          "bool": { 
              "must": [
                  { "range": { "date": { "gte": "2014-01-01" }}},
                  { "range": { "price": { "lte": 29.99 }}}
              ],
              "must_not": [
                  { "term": { "category": "ebooks" }}
              ]
          }
        }
    }
}

检验查询

validate-query API可以验证一个查询是否合法:

GET /gb/tweet/_validate/query
{
   "query": {
      "tweet" : {
         "match" : "really powerful"
      }
   }
}
{
  "valid" :         false,
  "_shards" : {
    "total" :       1,
    "successful" :  1,
    "failed" :      0
  }
}

想要知道为什么查询不合法,给query string追加explain参数即可:

GET /gb/tweet/_validate/query?explain 
{
   "query": {
      "tweet" : {
         "match" : "really powerful"
      }
   }
}
{
  "valid" :     false,
  "_shards" :   { ... },
  "explanations" : [ {
    "index" :   "gb",
    "valid" :   false,
    "error" :   "org.elasticsearch.index.query.QueryParsingException:
                 [gb] No query registered for [tweet]"
  } ]
}

explain有助于理解ES的查询过程:

GET /_validate/query?explain
{
   "query": {
      "match" : {
         "tweet" : "really powerful"
      }
   }
}
{
  "valid" :         true,
  "_shards" :       { ... },
  "explanations" : [ {
    "index" :       "us",
    "valid" :       true,
    "explanation" : "tweet:really tweet:powerful"
  }, {
    "index" :       "gb",
    "valid" :       true,
    "explanation" : "tweet:realli tweet:power"
  } ]
}

results matching ""

    No results matching ""