全文检索

匹配查询

匹配查询是go-to查询，查询首先要知道你要查询那个字段，然后再做其他处理，这就是说match可以处理全文查询和精确值查询。
索引
首先指定了主分片数,然后用_bulk插入。

PUT /my_index
{ "settings": { "number_of_shards": 1 }}

POST /my_index/my_type/_bulk
{ "index": { "_id": 1 }}
{ "title": "The quick brown fox" }
{ "index": { "_id": 2 }}
{ "title": "The quick brown fox jumps over the lazy dog" }
{ "index": { "_id": 3 }}
{ "title": "The quick brown fox jumps over the quick dog" }
{ "index": { "_id": 4 }}
{ "title": "Brown fox brown dog" }

查询条件

GET /my_index/my_type/_search
{
  "query": {
    "match": {
      "title": "QUICK!"
    }
  }
}

结果

  "hits": {
    "total": 3,
    "max_score": 0.5,
    "hits": [
      {
        "_index": "my_index",
        "_type": "my_type",
        "_id": "1",
        "_score": 0.5,
        "_source": {
          "title": "The quick brown fox"
        }
      },
      {
        "_index": "my_index",
        "_type": "my_type",
        "_id": "3",
        "_score": 0.44194174,
        "_source": {
          "title": "The quick brown fox jumps over the quick dog"
        }
      },
      {
        "_index": "my_index",
        "_type": "my_type",
        "_id": "2",
        "_score": 0.3125,
        "_source": {
          "title": "The quick brown fox jumps over the lazy dog"
        }
      }
    ]
  }
}

匹配查询的步骤

检查要查字段的type title字段是全文(analyzed) string字段，即意味着query string也应该被分词。
分析 query string 用的是标准分词器，结果是一个单一条件 quick
找到匹配的文档
计算每个文档的得分 TF IDF 每个字段的长度(越短的字段相关性越高)具体看What is Relevance?

提高精确性

原来的查询方式

GET /my_index/my_type/_search
{
  "query": {
    "match": {
      "title": "BROWN DOG!"
    }
  }
}

新的方式

GET /my_index/my_type/_search
{
  "query": {
    "match": {
      "title": {
        "query": "BROWN DOG!",
        "operator": "and"
      }
    }
  }
}

匹配查询接受operator参数，默认的operator参数是or，可以改成and的形式
你也可以自己指定精确度，使用minimum_should_match参数，表示最少匹配多少百分比。如:

GET /my_index/my_type/_search
{
  "query": {
    "match": {
      "title": {
        "query": "quick brown dog",
        "minimum_should_match": "75%"
      }
    }
  }
}

这个参数是很灵活的，用户输入的条件的项数不同则应用不同的规则，详细请看Minimum Should Match

联合查询

GET /my_index/my_type/_search
{
  "query": {
    "bool": {
      "must":     { "match": { "title": "quick" }},
      "must_not": { "match": { "title": "lazy"  }},
      "should": [
                  { "match": { "title": "brown" }},
                  { "match": { "title": "dog"   }}
      ]
    }
  }
}

结果

{
  "hits": [
     {
        "_id":      "3",
        "_score":   0.70134366, 
        "_source": {
           "title": "The quick brown fox jumps over the quick dog"
        }
     },
     {
        "_id":      "1",
        "_score":   0.3312608,
        "_source": {
           "title": "The quick brown fox"
        }
     }
  ]
}

在使用should时，查询的结果可以没有should的结果，只是得分比较低。也是可以控制精确度的，也是使用minimum_should_match

GET /my_index/my_type/_search
{
  "query": {
    "bool": {
      "should": [
        { "match": { "title": "brown" }},
        { "match": { "title": "fox"   }},
        { "match": { "title": "dog"   }}
      ],
      "minimum_should_match": 2 
    }
  }
}

怎样匹配使用bool

下面的两个查询的结果的一样的

{
    "match": { "title": "brown fox"}
}

{
  "bool": {
    "should": [
      { "term": { "title": "brown" }},
      { "term": { "title": "fox"   }}
    ]
  }
}

{
    "match": {
        "title": {
            "query":    "brown fox",
            "operator": "and"
        }
    }
}

{
  "bool": {
    "must": [
      { "term": { "title": "brown" }},
      { "term": { "title": "fox"   }}
    ]
  }
}

主要看一下多字段查询

{
    "match": {
        "title": {
            "query":                "quick brown fox",
            "minimum_should_match": "75%"
        }
    }
}

{
  "bool": {
    "should": [
      { "term": { "title": "brown" }},
      { "term": { "title": "fox"   }},
      { "term": { "title": "quick" }}
    ],
    "minimum_should_match": 2 
  }
}

增强查询子句条款(Boosting Query Clauses)

其目的就是提高某个查询条件的权重，以更多的影响到查询结果的排名。使用boost参数

GET /_search
{
    "query": {
        "bool": {
            "must": {
                "match": {  
                    "content": {
                        "query":    "full text search",
                        "operator": "and"
                    }
                }
            },
            "should": [
                { "match": {
                    "content": {
                        "query": "Elasticsearch",
                        "boost": 3 
                    }
                }},
                { "match": {
                    "content": {
                        "query": "Lucene",
                        "boost": 2 
                    }
                }}
            ]
        }
    }
}

结果

  "hits": {
    "total": 4,
    "max_score": 0.783625,
    "hits": [
      {
        "_index": "my_index",
        "_type": "my_type",
        "_id": "4",
        "_score": 0.783625,
        "_source": {
          "content": "Full text search with Lucene and Elasticsearch"
        }
      },
      {
        "_index": "my_index",
        "_type": "my_type",
        "_id": "3",
        "_score": 0.40938914,
        "_source": {
          "content": "Full text search with Elasticsearch"
        }
      },
      {
        "_index": "my_index",
        "_type": "my_type",
        "_id": "2",
        "_score": 0.30934063,
        "_source": {
          "content": "Full text search with Lucene"
        }
      },
      {
        "_index": "my_index",
        "_type": "my_type",
        "_id": "1",
        "_score": 0.05462181,
        "_source": {
          "content": "Full text search is great"
        }
      }
    ]
  }
}

NOTE: boost可以大于1也可以小于1，但其效果并不是线性的，只能说boost越大，_score就越高。具体算法就需要专门的书籍了。

控制分词

具体在不同环境中其有不同的默认分词器，详细请看Controlling Analysis

全文检索

全文检索

匹配查询

提高精确性

联合查询

怎样匹配使用bool

增强查询子句条款(Boosting Query Clauses)

控制分词

相关性是碎了！(Relevance Is Broken!)

results matching ""

No results matching ""