Use the Synonyms APIs to Update Synonyms Conveniently in Elasticsearch
The synonyms feature of Elasticsearch is very powerful and can significantly enhance your search engine's efficiency when properly used. A common issue when using the synonyms feature is to update the synonyms set.
The synonyms defined inline in the settings of an index cannot be updated directly, and we need to close the index, update the settings, and re-open the index to make the changes effective. Another way is to use a synonyms file which can be updated by reloading the index. However, using an index file is difficult to manage when the Elasticsearch server is distributed or hosted in the cloud. This is because we need to put the file on all cluster nodes.
The good news is that there is a third way to do it now, which is much more convenient than the previous two. We can now use the synonyms APIs to manage synonyms. Even though it's still a beta functionality of Elasticsearch at the time of writing, I think it will be adopted soon because this functionality is highly demanded by developers and can solve the tricky problem of updating synonyms sets very conveniently. We will explore the common usage of the synonyms APIs in this post.
Preparation
We will use the following docker-compose.yaml
file to start Elasticsearch and Kinana locally for demonstration.
version: "3.9"
services:
elasticsearch:
image: elasticsearch:8.11.1
environment:
- discovery.type=single-node
- ES_JAVA_OPTS=-Xms1g -Xmx1g
- xpack.security.enabled=false
ports:
- target: 9200
published: 9200
networks:
- elastic
kibana:
image: kibana:8.11.1
ports:
- target: 5601
published: 5601
depends_on:
- elasticsearch
networks:
- elastic
networks:
elastic:
name: elastic
driver: bridge
Note that you need to have at least version 8.10.0 of Elasticsearch to use the synonyms APIs. The latest version available would be the best as the feature should have become more mature then.
Create a synonyms set
When Elasticsearch and Kibana are started with the above docker-compose.yaml
file, we can go to http://localhost:5601 to manage Elasticsearch indexes and synonyms.
To use the synonyms APIs to manage synonyms, we need to create a synonyms set first before it can be used in an Elasticsearch index.
We can use the _synonyms
endpoint to create or update a synonyms set using the PUT
request:
PUT _synonyms/inventory-synonyms-set
{
"synonyms_set": [
{
"id": "synonym-1",
"synonyms": "ps => playstation"
},
{
"synonyms": "javascript,ecmascript,js"
}
]
}
products-synonyms-set
is a user-defined name of the synonyms set.synonyms_set
is the required key for the request body which includes an array of synonyms rules.- Each synonym rule is an object with an optional
id
key and a mandatorysynonyms
key. If theid
is not provided, an identifier will be created by Elasticsearch. The value forsynonyms
is a rule defined in the Solr format as used in the previous post.
Create an Elasticsearch index using the synonyms set
When the synonyms set is created, it can be used in the synonym
or synonym_graph
token filters when an index is created:
PUT /inventory
{
"settings": {
"index": {
"analysis": {
"analyzer": {
"index_analyzer": {
"tokenizer": "standard",
"filter": [
"lowercase"
]
},
"search_analyzer": {
"tokenizer": "standard",
"filter": [
"lowercase",
"synonym_filter"
]
}
},
"filter": {
"synonym_filter": {
"type": "synonym_graph",
"synonyms_set": "inventory-synonyms-set",
"updateable": true
}
}
}
}
},
"mappings": {
"properties": {
"name": {
"type": "text",
"analyzer": "index_analyzer",
"search_analyzer": "search_analyzer"
}
}
}
}
As we can see, the configurations for using a synonyms set are very similar to those using a synonyms file, as demonstrated in this post with great details. We just need to change synonyms_path
to synonyms_set
.
Test the synonyms set
We can use the _analyze
endpoint to analyze some text and test the synonyms added in the previous step using the API:
GET /inventory/_analyze
{
"analyzer": "search_analyzer",
"text": "PS"
}
{
"tokens": [
{
"token": "playstation",
"start_offset": 0,
"end_offset": 2,
"type": "SYNONYM",
"position": 0
}
]
}
GET /inventory/_analyze
{
"analyzer": "search_analyzer",
"text": "JS"
}
{
"tokens": [
{
"token": "javascript",
......
},
{
"token": "ecmascript",
......
},
{
"token": "js",
......
}
]
}
It shows that the synonyms added by the API are working properly.
Update a synonyms set
Let's now use the synonyms API to update the synonyms set. This is where the synonyms API really shines because we don't need to close, open, or reload the corresponding Elasticsearch indexes, which alleviates the pain dramatically for developers.
We can use the PUT
method to update the synonyms set as a whole. Be extremely careful here, otherwise, you will replace the original set with a new one containing only the new synonym rules, which is very destructive in production.
Let's add a new synonym rule to inventory-synonyms-set
:
PUT _synonyms/inventory-synonyms-set
{
"synonyms_set": [
{
"id": "synonym-1",
"synonyms": "ps => playstation"
},
{
"synonyms": "javascript,ecmascript,js"
},
{
"synonyms": "py => Python"
}
]
}
Note that the original Synonyms set should be added here as well.
When an existing synonyms set is updated, the search analyzers that use the synonym set are reloaded automatically for all indices, which can be seen in the response of the PUT
request above:
{
"result": "updated",
"reload_analyzers_details": {
"_shards": {
"total": 2,
"successful": 1,
"failed": 0
},
"reload_details": [
{
"index": "inventory",
"reloaded_analyzers": [
"search_analyzer"
],
"reloaded_node_ids": [
"MD0GUTvcQAOvsHdUIBunNw"
]
}
]
}
}
We can use the _analyze
endpoint to test the newly added synonym:
GET /inventory/_analyze
{
"analyzer": "search_analyzer",
"text": "py"
}
{
"tokens": [
{
"token": "python",
"start_offset": 0,
"end_offset": 2,
"type": "SYNONYM",
"position": 0
}
]
}
Yes, the synonyms set is updated successfully and it's effective in real time with no need to close, open, or reload the corresponding index.
Besides, we can also perform incremental updates by adding a rule directly. In this case, we need to specify an id
for the rule in the path:
PUT _synonyms/inventory-synonyms-set/synonym-ipod
{
"synonyms": "i-pod, i pod ⇒ ipod"
}
You can also delete a synonym rule by its id:
DELETE _synonyms/inventory-synonyms-set/synonym-ipod
Updating the synonym rules individually works in the same as when the synonyms set is updated as a whole.
Monitor synonyms sets
We can use the GET
method to retrieve the content of a synonyms set directly:
GET _synonyms/inventory-synonyms-set
{
"count": 3,
"synonyms_set": [
{
"id": "qP8w_osBr1bbhRIfbz1c",
"synonyms": "javascript,ecmascript,js"
},
{
"id": "qf8w_osBr1bbhRIfbz1c",
"synonyms": "py => python"
},
{
"id": "synonym-1",
"synonyms": "ps => playstation"
}
]
}
In practice, we would want to write some script to count the number of synonyms for an index to make sure the synonyms are not removed accidentally as shown above. The following code snippet in Python can count the number of synonyms in a synonyms set:
from elasticsearch import Elasticsearch
es_client = Elasticsearch("http://localhost:9200")
es_client.synonyms.get_synonyms_sets()
# ObjectApiResponse({'count': 1, 'results': [{'synonyms_set': 'inventory-synonyms-set', 'count': 3}]})
In this work, we introduced the basics of Elasticsearch synonyms API and demonstrated how to use it to manage synonyms conveniently. With this functionality, we don't need to close and open an index as with inline synonyms or reload the index manually as with a synonyms file. It can make our search engines more stable and our work as a developer much easier.