自学内容网 自学内容网

【ElasticSearch】ES 5.6.15 向量插件支持

参考 :
https://github.com/lior-k/fast-elasticsearch-vector-scoring

  1. 下载插件

  2. 安装插件
    插件目录:
    elasticsearch/plugins,
    安装后的目录如下

     plugins
     └── vector
         ├── elasticsearch-binary-vector-scoring-5.6.9.jar
         └── plugin-descriptor.properties
    

    修改 plugin-descriptor.properties 中的 elasticsearch.version 为 5.6.15(因为这里使用的是5.6.15版本ES),安装完成后重启ES。

  3. 构建测试索引

    PUT /vector_test
    {
      "settings": {
        "index": {
          "number_of_shards": 3,
          "number_of_replicas": 0
        }
      },
      "mappings": {
        "resume": {
          "dynamic": "strict",
          "properties": {
            "file_hash": {
              "type": "keyword"
            },
            "embedding_vector": {
              "type": "binary",
              "doc_values": true
            },
            "doc": {
              "type": "text"
            }
          }
        }
      }
    }
    
  4. 构建测试数据

使用如下方法生成向量base64字符串

import base64
import numpy as np
 
dfloat32 = np.dtype('>f4')
 
def decode_float_list(base64_string):
    bytes = base64.b64decode(base64_string)
    return np.frombuffer(bytes, dtype=dfloat32).tolist()
 
def encode_array(arr):
    base64_str = base64.b64encode(np.array(arr).astype(dfloat32)).decode("utf-8")
    return base64_str

print(encode_array([0.1,0.2,0.3,0.4,0.5,0.6,0.7,0.8,0.9,1.0]))
print(encode_array([0.001,0.002,0.003,0.004,0.005,0.006,0.007,0.008,0.009,0.010]))

将上述得到的结果放到下面内容(embedding_vector)中,这里 embedding_vector 要求传入上述方式base64生成的字符串

PUT /vector_test/resume/1
{
  "file_hash": "hash1",
  "embedding_vector": "PczMzT5MzM0+mZmaPszMzT8AAAA/GZmaPzMzMz9MzM0/ZmZmP4AAAA==",
  "doc": "This is the content of the first document."
}

PUT /vector_test/resume/2
{
  "file_hash": "hash2",
  "embedding_vector": "OoMSbzsDEm87RJumO4MSbzuj1wo7xJumO+VgQjwDEm88E3S8PCPXCg==",
  "doc": "This is the content of the second document."
}
  1. 查询测试

    POST /vector_test/resume/_search
    {
      "query": {
        "function_score": {
          "boost_mode": "replace",
          "script_score": {
            "script": {
              "source": "binary_vector_score",
              "lang": "knn",
              "params": {
                "cosine": true,
                "field": "embedding_vector",
                "vector": [
                  1.0,
                  0.8,
                  0.2223,
                  0.7,
                  0.6,
                  0.5,
                  0.4,
                  0.3,
                  0.2,
                  0.1
                ]
              }
            }
          }
        }
      },
      "size": 2,
      "_source": [
        "file_hash"
      ]
    }
    

    查询结果

    {
      "took": 1,
      "timed_out": false,
      "_shards": {
        "total": 5,
        "successful": 5,
        "skipped": 0,
        "failed": 0
      },
      "hits": {
        "total": 4,
        "max_score": 0.998783,
        "hits": [
          {
            "_index": "vector_test",
            "_type": "resume",
            "_id": "4",
            "_score": 0.998783,
            "_source": {
              "file_hash": "hash4"
            }
          },
          {
            "_index": "vector_test",
            "_type": "resume",
            "_id": "1",
            "_score": 0.5818508,
            "_source": {
              "file_hash": "hash1"
            }
          }
        ]
      }
    }
    

原文地址:https://blog.csdn.net/qq_20623849/article/details/140326854

免责声明:本站文章内容转载自网络资源,如本站内容侵犯了原著者的合法权益,可联系本站删除。更多内容请关注自学内容网(zxcms.com)!