【ElasticSearch】ES 5.6.15 向量插件支持
参考 :
https://github.com/lior-k/fast-elasticsearch-vector-scoring
-
下载插件
-
安装插件
插件目录:
elasticsearch/plugins,
安装后的目录如下plugins └── vector ├── elasticsearch-binary-vector-scoring-5.6.9.jar └── plugin-descriptor.properties
修改 plugin-descriptor.properties 中的
elasticsearch.version
为 5.6.15(因为这里使用的是5.6.15版本ES),安装完成后重启ES。 -
构建测试索引
PUT /vector_test { "settings": { "index": { "number_of_shards": 3, "number_of_replicas": 0 } }, "mappings": { "resume": { "dynamic": "strict", "properties": { "file_hash": { "type": "keyword" }, "embedding_vector": { "type": "binary", "doc_values": true }, "doc": { "type": "text" } } } } }
-
构建测试数据
使用如下方法生成向量base64字符串
import base64
import numpy as np
dfloat32 = np.dtype('>f4')
def decode_float_list(base64_string):
bytes = base64.b64decode(base64_string)
return np.frombuffer(bytes, dtype=dfloat32).tolist()
def encode_array(arr):
base64_str = base64.b64encode(np.array(arr).astype(dfloat32)).decode("utf-8")
return base64_str
print(encode_array([0.1,0.2,0.3,0.4,0.5,0.6,0.7,0.8,0.9,1.0]))
print(encode_array([0.001,0.002,0.003,0.004,0.005,0.006,0.007,0.008,0.009,0.010]))
将上述得到的结果放到下面内容(embedding_vector)中,这里 embedding_vector
要求传入上述方式base64生成的字符串
PUT /vector_test/resume/1
{
"file_hash": "hash1",
"embedding_vector": "PczMzT5MzM0+mZmaPszMzT8AAAA/GZmaPzMzMz9MzM0/ZmZmP4AAAA==",
"doc": "This is the content of the first document."
}
PUT /vector_test/resume/2
{
"file_hash": "hash2",
"embedding_vector": "OoMSbzsDEm87RJumO4MSbzuj1wo7xJumO+VgQjwDEm88E3S8PCPXCg==",
"doc": "This is the content of the second document."
}
-
查询测试
POST /vector_test/resume/_search { "query": { "function_score": { "boost_mode": "replace", "script_score": { "script": { "source": "binary_vector_score", "lang": "knn", "params": { "cosine": true, "field": "embedding_vector", "vector": [ 1.0, 0.8, 0.2223, 0.7, 0.6, 0.5, 0.4, 0.3, 0.2, 0.1 ] } } } } }, "size": 2, "_source": [ "file_hash" ] }
查询结果
{ "took": 1, "timed_out": false, "_shards": { "total": 5, "successful": 5, "skipped": 0, "failed": 0 }, "hits": { "total": 4, "max_score": 0.998783, "hits": [ { "_index": "vector_test", "_type": "resume", "_id": "4", "_score": 0.998783, "_source": { "file_hash": "hash4" } }, { "_index": "vector_test", "_type": "resume", "_id": "1", "_score": 0.5818508, "_source": { "file_hash": "hash1" } } ] } }
原文地址:https://blog.csdn.net/qq_20623849/article/details/140326854
免责声明:本站文章内容转载自网络资源,如本站内容侵犯了原著者的合法权益,可联系本站删除。更多内容请关注自学内容网(zxcms.com)!