You can add, update, replace, and delete your indexed data using different ways provided by Manticore. Manticore supports working with external storages such as databases, XML, CSV, and TSV documents. For insert and delete operations, a transaction mechanism is supported.
Also, for insert and replace queries, Manticore supports Elasticsearch-like query format along with its own format. For details, see the corresponding examples in the Adding documents to a real-time table and REPLACE sections.
If you're looking for information on adding documents to a plain table, please refer to the section on adding data from external storages.
Adding documents in real-time is supported only for Real-Time and percolate tables. The corresponding SQL command, HTTP endpoint, or client functions insert new rows (documents) into a table with the provided field values. It's not necessary for a table to exist before adding documents to it. If the table doesn't exist, Manticore will attempt to create it automatically. For more information, see Auto schema.
You can insert a single or multiple documents with values for all fields of the table or just a portion of them. In this case, the other fields will be filled with their default values (0 for scalar types, an empty string for text types).
Expressions are not currently supported in INSERT
, so values must be explicitly specified.
The ID field/value can be omitted, as RT and PQ tables support auto-id functionality. You can also use 0
as the id value to force automatic ID generation. Rows with duplicate IDs will not be overwritten by INSERT
. Instead, you can use REPLACE for that purpose.
When using the HTTP JSON protocol, you have two different request formats to choose from: a common Manticore format and an Elasticsearch-like format. Both formats are demonstrated in the examples below.
Additionally, when using the Manticore JSON request format, keep in mind that the doc
node is required, and all the values should be provided within it.
INSERT INTO <table name> [(column, ...)]
VALUES (value, ...)
[, (...)]
INSERT INTO products(title,price) VALUES ('Crossbody Bag with Tassel', 19.85);
INSERT INTO products(title) VALUES ('Crossbody Bag with Tassel');
INSERT INTO products VALUES (0,'Yellow bag', 4.95);
Query OK, 1 rows affected (0.00 sec)
Query OK, 1 rows affected (0.00 sec)
Query OK, 1 rows affected (0.00 sec)
POST /insert
{
"index":"products",
"id":1,
"doc":
{
"title" : "Crossbody Bag with Tassel",
"price" : 19.85
}
}
POST /insert
{
"index":"products",
"id":2,
"doc":
{
"title" : "Crossbody Bag with Tassel"
}
}
POST /insert
{
"index":"products",
"id":0,
"doc":
{
"title" : "Yellow bag"
}
}
{
"_index": "products",
"_id": 1,
"created": true,
"result": "created",
"status": 201
}
{
"_index": "products",
"_id": 2,
"created": true,
"result": "created",
"status": 201
}
{
"_index": "products",
"_id": 0,
"created": true,
"result": "created",
"status": 201
}
POST /products/_create/3
{
"title": "Yellow Bag with Tassel",
"price": 19.85
}
POST /products/_create/
{
"title": "Red Bag with Tassel",
"price": 19.85
}
{
"_id":3,
"_index":"products",
"_primary_term":1,
"_seq_no":0,
"_shards":{
"failed":0,
"successful":1,
"total":1
},
"_type":"_doc",
"_version":1,
"result":"updated"
}
{
"_id":2235747273424240642,
"_index":"products",
"_primary_term":1,
"_seq_no":0,
"_shards":{
"failed":0,
"successful":1,
"total":1
},
"_type":"_doc",
"_version":1,
"result":"updated"
}
$index->addDocuments([
['id' => 1, 'title' => 'Crossbody Bag with Tassel', 'price' => 19.85]
]);
$index->addDocuments([
['id' => 2, 'title' => 'Crossbody Bag with Tassel']
]);
$index->addDocuments([
['id' => 0, 'title' => 'Yellow bag']
]);
indexApi.insert({"index" : "test", "id" : 1, "doc" : {"title" : "Crossbody Bag with Tassel", "price" : 19.85}})
indexApi.insert({"index" : "test", "id" : 2, "doc" : {"title" : "Crossbody Bag with Tassel"}})
indexApi.insert({"index" : "test", "id" : 0, "doc" : {{"title" : "Yellow bag"}})
res = await indexApi.insert({"index" : "test", "id" : 1, "doc" : {"title" : "Crossbody Bag with Tassel", "price" : 19.85}});
res = await indexApi.insert({"index" : "test", "id" : 2, "doc" : {"title" : "Crossbody Bag with Tassel"}});
res = await indexApi.insert({"index" : "test", "id" : 0, "doc" : {{"title" : "Yellow bag"}});
InsertDocumentRequest newdoc = new InsertDocumentRequest();
HashMap<String,Object> doc = new HashMap<String,Object>(){{
put("title","Crossbody Bag with Tassel");
put("price",19.85);
}};
newdoc.index("products").id(1L).setDoc(doc);
sqlresult = indexApi.insert(newdoc);
newdoc = new InsertDocumentRequest();
HashMap<String,Object> doc = new HashMap<String,Object>(){{
put("title","Crossbody Bag with Tassel");
}};
newdoc.index("products").id(2L).setDoc(doc);
sqlresult = indexApi.insert(newdoc);
newdoc = new InsertDocumentRequest();
HashMap<String,Object> doc = new HashMap<String,Object>(){{
put("title","Yellow bag");
}};
newdoc.index("products").id(0L).setDoc(doc);
sqlresult = indexApi.insert(newdoc);
Dictionary<string, Object> doc = new Dictionary<string, Object>();
doc.Add("title", "Crossbody Bag with Tassel");
doc.Add("price", 19.85);
InsertDocumentRequest newdoc = new InsertDocumentRequest(index: "products", id: 1, doc: doc);
var sqlresult = indexApi.Insert(newdoc);
doc = new Dictionary<string, Object>();
doc.Add("title", "Crossbody Bag with Tassel");
newdoc = new InsertDocumentRequest(index: "products", id: 2, doc: doc);
sqlresult = indexApi.Insert(newdoc);
doc = new Dictionary<string, Object>();
doc.Add("title", "Yellow bag");
newdoc = new InsertDocumentRequest(index: "products", id: 0, doc: doc);
sqlresult = indexApi.Insert(newdoc);
Manticore features an automatic table creation mechanism, which activates when a specified table in the insert query doesn't yet exist. This mechanism is enabled by default. To disable it, set auto_schema = 0
in the Searchd section of your Manticore config file.
By default, all text values in the VALUES
clause are considered to be of the text
type, except for values representing valid email addresses, which are treated as the string
type.
If you attempt to INSERT multiple rows with different, incompatible value types for the same field, auto table creation will be canceled, and an error message will be returned. However, if the different value types are compatible, the resulting field type will be the one that accommodates all the values. Some automatic data type conversions that may occur include:
Keep in mind that the /bulk
HTTP endpoint does not support automatic table creation (auto schema). Only the /_bulk
(Elasticsearch-like) HTTP endpoint and the SQL interface support this feature.
MySQL [(none)]> drop table if exists t; insert into t(i,f,t,s,j,b,m,mb) values(123,1.2,'text here','test@mail.com','{"a": 123}',1099511627776,(1,2),(1099511627776,1099511627777)); desc t; select * from t;
--------------
drop table if exists t
--------------
Query OK, 0 rows affected (0.42 sec)
--------------
insert into t(i,f,t,j,b,m,mb) values(123,1.2,'text here','{"a": 123}',1099511627776,(1,2),(1099511627776,1099511627777))
--------------
Query OK, 1 row affected (0.00 sec)
--------------
desc t
--------------
+-------+--------+----------------+
| Field | Type | Properties |
+-------+--------+----------------+
| id | bigint | |
| t | text | indexed stored |
| s | string | |
| j | json | |
| i | uint | |
| b | bigint | |
| f | float | |
| m | mva | |
| mb | mva64 | |
+-------+--------+----------------+
8 rows in set (0.00 sec)
--------------
select * from t
--------------
+---------------------+------+---------------+----------+------+-----------------------------+-----------+---------------+------------+
| id | i | b | f | m | mb | t | s | j |
+---------------------+------+---------------+----------+------+-----------------------------+-----------+---------------+------------+
| 5045949922868723723 | 123 | 1099511627776 | 1.200000 | 1,2 | 1099511627776,1099511627777 | text here | test@mail.com | {"a": 123} |
+---------------------+------+---------------+----------+------+-----------------------------+-----------+---------------+------------+
1 row in set (0.00 sec)
POST /insert -d
{
"index":"t",
"id": 2,
"doc":
{
"i" : 123,
"f" : 1.23,
"t": "text here",
"s": "test@mail.com",
"j": {"a": 123},
"b": 1099511627776,
"m": [1,2],
"mb": [1099511627776,1099511627777]
}
}
{"_index":"t","_id":2,"created":true,"result":"created","status":201}
Manticore provides an auto ID generation functionality for the column ID of documents inserted or replaced into a real-time or Percolate table. The generator produces a unique ID for a document with some guarantees, but it should not be considered an auto-incremented ID.
The generated ID value is guaranteed to be unique under the following conditions:
The auto ID generator creates a 64-bit integer for a document ID and uses the following schema:
This schema ensures that the generated ID is unique among all nodes in the cluster and that data inserted into different cluster nodes does not create collisions between the nodes.
As a result, the first ID from the generator used for auto ID is NOT 1 but a larger number. Additionally, the document stream inserted into a table might have non-sequential ID values if inserts into other tables occur between calls, as the ID generator is singular in the server and shared between all its tables.
INSERT INTO products(title,price) VALUES ('Crossbody Bag with Tassel', 19.85);
INSERT INTO products VALUES (0,'Yello bag', 4.95);
select * from products;
+---------------------+-----------+---------------------------+
| id | price | title |
+---------------------+-----------+---------------------------+
| 1657860156022587404 | 19.850000 | Crossbody Bag with Tassel |
| 1657860156022587405 | 4.950000 | Yello bag |
+---------------------+-----------+---------------------------+
POST /insert
{
"index":"products",
"id":0,
"doc":
{
"title" : "Yellow bag"
}
}
GET /search
{
"index":"products",
"query":{
"query_string":""
}
}
{
"took": 0,
"timed_out": false,
"hits": {
"total": 1,
"hits": [
{
"_id": "1657860156022587406",
"_score": 1,
"_source": {
"price": 0,
"title": "Yellow bag"
}
}
]
}
}
$index->addDocuments([
['id' => 0, 'title' => 'Yellow bag']
]);
indexApi.insert({"index" : "products", "id" : 0, "doc" : {"title" : "Yellow bag"}})
res = await indexApi.insert({"index" : "products", "id" : 0, "doc" : {"title" : "Yellow bag"}});
newdoc = new InsertDocumentRequest();
HashMap<String,Object> doc = new HashMap<String,Object>(){{
put("title","Yellow bag");
}};
newdoc.index("products").id(0L).setDoc(doc);
sqlresult = indexApi.insert(newdoc);
Dictionary<string, Object> doc = new Dictionary<string, Object>();
doc.Add("title", "Yellow bag");
InsertDocumentRequest newdoc = new InsertDocumentRequest(index: "products", id: 0, doc: doc);
var sqlresult = indexApi.Insert(newdoc);
You can insert not just a single document into a real-time table, but as many as you'd like. It's perfectly fine to insert batches of tens of thousands of documents into a real-time table. However, it's important to keep the following points in mind:
Note that the /bulk
HTTP endpoint does not support automatic creation of tables (auto schema). Only the /_bulk
(Elasticsearch-like) HTTP endpoint and the SQL interface support this feature.
The /bulk
(Manticore mode) endpoint supports Chunked transfer encoding. You can use it to transmit large batches. It:
max_packet_size
(128MB), for example, 1GB at a time.INSERT INTO <table name>[(column1, column2, ...)] VALUES ()[,(value1,[value2, ...])]
INSERT INTO products(title,price) VALUES ('Crossbody Bag with Tassel', 19.85), ('microfiber sheet set', 19.99), ('Pet Hair Remover Glove', 7.99);
Query OK, 3 rows affected (0.01 sec)
POST /bulk
-H "Content-Type: application/x-ndjson" -d '
{"insert": {"index":"products", "id":1, "doc": {"title":"Crossbody Bag with Tassel","price" : 19.85}}}
{"insert":{"index":"products", "id":2, "doc": {"title":"microfiber sheet set","price" : 19.99}}}
'
POST /bulk
-H "Content-Type: application/x-ndjson" -d '
{"insert":{"index":"test1","id":21,"doc":{"int_col":1,"price":1.1,"title":"bulk doc one"}}}
{"insert":{"index":"test1","id":22,"doc":{"int_col":2,"price":2.2,"title":"bulk doc two"}}}
{"insert":{"index":"test1","id":23,"doc":{"int_col":3,"price":3.3,"title":"bulk doc three"}}}
{"insert":{"index":"test2","id":24,"doc":{"int_col":4,"price":4.4,"title":"bulk doc four"}}}
{"insert":{"index":"test2","id":25,"doc":{"int_col":5,"price":5.5,"title":"bulk doc five"}}}
'
{
"items": [
{
"bulk": {
"_index": "products",
"_id": 2,
"created": 2,
"deleted": 0,
"updated": 0,
"result": "created",
"status": 201
}
}
],
"current_line": 4,
"skipped_lines": 0,
"errors": false,
"error": ""
}
{
"items": [
{
"bulk": {
"_index": "test1",
"_id": 22,
"created": 2,
"deleted": 0,
"updated": 0,
"result": "created",
"status": 201
}
},
{
"bulk": {
"_index": "test1",
"_id": 23,
"created": 1,
"deleted": 0,
"updated": 0,
"result": "created",
"status": 201
}
},
{
"bulk": {
"_index": "test2",
"_id": 25,
"created": 2,
"deleted": 0,
"updated": 0,
"result": "created",
"status": 201
}
}
],
"current_line": 8,
"skipped_lines": 0,
"errors": false,
"error": ""
}
POST /_bulk
-H "Content-Type: application/x-ndjson" -d '
{ "index" : { "_index" : "products" } }
{ "title" : "Yellow Bag", "price": 12 }
{ "create" : { "_index" : "products" } }
{ "title" : "Red Bag", "price": 12.5, "id": 3 }
'
{
"items": [
{
"index": {
"_index": "products",
"_type": "doc",
"_id": "0",
"_version": 1,
"result": "created",
"_shards": {
"total": 1,
"successful": 1,
"failed": 0
},
"_seq_no": 0,
"_primary_term": 1,
"status": 201
}
},
{
"create": {
"_index": "products",
"_type": "doc",
"_id": "3",
"_version": 1,
"result": "created",
"_shards": {
"total": 1,
"successful": 1,
"failed": 0
},
"_seq_no": 0,
"_primary_term": 1,
"status": 201
}
}
],
"errors": false,
"took": 1
}
$index->addDocuments([
['id' => 1, 'title' => 'Crossbody Bag with Tassel', 'price' => 19.85],
['id' => 2, 'title' => 'microfiber sheet set', 'price' => 19.99],
['id' => 3, 'title' => 'Pet Hair Remover Glove', 'price' => 7.99]
]);
docs = [ \
{"insert": {"index" : "products", "id" : 1, "doc" : {"title" : "Crossbody Bag with Tassel", "price" : 19.85}}}, \
{"insert": {"index" : "products", "id" : 2, "doc" : {"title" : "microfiber sheet set", "price" : 19.99}}}, \
{"insert": {"index" : "products", "id" : 3, "doc" : {"title" : "CPet Hair Remover Glove", "price" : 7.99}}}
]
res = indexApi.bulk('\n'.join(map(json.dumps,docs)))
let docs = [
{"insert": {"index" : "products", "id" : 3, "doc" : {"title" : "Crossbody Bag with Tassel", "price" : 19.85}}},
{"insert": {"index" : "products", "id" : 4, "doc" : {"title" : "microfiber sheet set", "price" : 19.99}}},
{"insert": {"index" : "products", "id" : 5, "doc" : {"title" : "CPet Hair Remover Glove", "price" : 7.99}}}
];
res = await indexApi.bulk(docs.map(e=>JSON.stringify(e)).join('\n'));
String body = "{\"insert\": {\"index\" : \"products\", \"id\" : 1, \"doc\" : {\"title\" : \"Crossbody Bag with Tassel\", \"price\" : 19.85}}}"+"\n"+
"{\"insert\": {\"index\" : \"products\", \"id\" : 4, \"doc\" : {\"title\" : \"microfiber sheet set\", \"price\" : 19.99}}}"+"\n"+
"{\"insert\": {\"index\" : \"products\", \"id\" : 5, \"doc\" : {\"title\" : \"CPet Hair Remover Glove\", \"price\" : 7.99}}}"+"\n";
BulkResponse bulkresult = indexApi.bulk(body);
string body = "{\"insert\": {\"index\" : \"products\", \"id\" : 1, \"doc\" : {\"title\" : \"Crossbody Bag with Tassel\", \"price\" : 19.85}}}"+"\n"+
"{\"insert\": {\"index\" : \"products\", \"id\" : 4, \"doc\" : {\"title\" : \"microfiber sheet set\", \"price\" : 19.99}}}"+"\n"+
"{\"insert\": {\"index\" : \"products\", \"id\" : 5, \"doc\" : {\"title\" : \"CPet Hair Remover Glove\", \"price\" : 7.99}}}"+"\n";
BulkResponse bulkresult = indexApi.Bulk(string.Join("\n", docs));
Multi-value attributes (MVA) are inserted as arrays of numbers.
INSERT INTO products(title, sizes) VALUES('shoes', (40,41,42,43));
POST /insert
{
"index":"products",
"id":1,
"doc":
{
"title" : "shoes",
"sizes" : [40, 41, 42, 43]
}
}
POST /products/_create/1
{
"title": "shoes",
"sizes" : [40, 41, 42, 43]
}
POST /products/_doc/
{
"title": "shoes",
"sizes" : [40, 41, 42, 43]
}
$index->addDocument(
['title' => 'shoes', 'sizes' => [40,41,42,43]],
1
);
indexApi.insert({"index" : "products", "id" : 0, "doc" : {"title" : "Yellow bag","sizes":[40,41,42,43]}})
res = await indexApi.insert({"index" : "products", "id" : 0, "doc" : {"title" : "Yellow bag","sizes":[40,41,42,43]}});
newdoc = new InsertDocumentRequest();
HashMap<String,Object> doc = new HashMap<String,Object>(){{
put("title","Yellow bag");
put("sizes",new int[]{40,41,42,43});
}};
newdoc.index("products").id(0L).setDoc(doc);
sqlresult = indexApi.insert(newdoc);
Dictionary<string, Object> doc = new Dictionary<string, Object>();
doc.Add("title", "Yellow bag");
doc.Add("sizes", new List<Object> {40,41,42,43});
InsertDocumentRequest newdoc = new InsertDocumentRequest(index: "products", id: 0, doc: doc);
var sqlresult = indexApi.Insert(newdoc);
JSON value can be inserted as an escaped string (via SQL or JSON) or as a JSON object (via the JSON interface).
INSERT INTO products VALUES (1, 'shoes', '{"size": 41, "color": "red"}');
POST /insert
{
"index":"products",
"id":1,
"doc":
{
"title" : "shoes",
"meta" : {
"size": 41,
"color": "red"
}
}
}
POST /insert
{
"index":"products",
"id":1,
"doc":
{
"title" : "shoes",
"meta" : "{\"size\": 41, \"color\": \"red\"}"
}
}
POST /products/_create/1
{
"title": "shoes",
"meta" : {
"size": 41,
"color": "red"
}
}
POST /products/_doc/
{
"title": "shoes",
"meta" : {
"size": 41,
"color": "red"
}
}
$index->addDocument(
['title' => 'shoes', 'meta' => '{"size": 41, "color": "red"}'],
1
);
indexApi = api = manticoresearch.IndexApi(client)
indexApi.insert({"index" : "products", "id" : 0, "doc" : {"title" : "Yellow bag","meta":'{"size": 41, "color": "red"}'}})
res = await indexApi.insert({"index" : "products", "id" : 0, "doc" : {"title" : "Yellow bag","meta":'{"size": 41, "color": "red"}'}});
newdoc = new InsertDocumentRequest();
HashMap<String,Object> doc = new HashMap<String,Object>(){{
put("title","Yellow bag");
put("meta",
new HashMap<String,Object>(){{
put("size",41);
put("color","red");
}});
}};
newdoc.index("products").id(0L).setDoc(doc);
sqlresult = indexApi.insert(newdoc);
Dictionary<string, Object> meta = new Dictionary<string, Object>();
meta.Add("size", 41);
meta.Add("color", "red");
Dictionary<string, Object> doc = new Dictionary<string, Object>();
doc.Add("title", "Yellow bag");
doc.Add("meta", meta);
InsertDocumentRequest newdoc = new InsertDocumentRequest(index: "products", id: 0, doc: doc);
var sqlresult = indexApi.Insert(newdoc);
In a percolate table documents that are percolate query rules are stored and must follow the exact schema of four fields:
field | type | description |
---|---|---|
id | bigint | PQ rule identifier (if omitted, it will be assigned automatically) |
query | string | Full-text query (can be empty) compatible with the percolate table |
filters | string | Additional filters by non-full-text fields (can be empty) compatible with the percolate table |
tags | string | A string with one or many comma-separated tags, which may be used to selectively show/delete saved queries |
Any other field names are not supported and will trigger an error.
Warning: Inserting/replacing JSON-formatted PQ rules via SQL will not work. In other words, the JSON-specific operators (match
etc) will be considered just parts of the rule's text that should match with documents. If you prefer JSON syntax, use the HTTP endpoint instead of INSERT
/REPLACE
.
INSERT INTO pq(id, query, filters) VALUES (1, '@title shoes', 'price > 5');
INSERT INTO pq(id, query, tags) VALUES (2, '@title bag', 'Louis Vuitton');
SELECT * FROM pq;
+------+--------------+---------------+---------+
| id | query | tags | filters |
+------+--------------+---------------+---------+
| 1 | @title shoes | | price>5 |
| 2 | @title bag | Louis Vuitton | |
+------+--------------+---------------+---------+
PUT /pq/pq_table/doc/1
{
"query": {
"match": {
"title": "shoes"
},
"range": {
"price": {
"gt": 5
}
}
},
"tags": ["Loius Vuitton"]
}
PUT /pq/pq_table/doc/2
{
"query": {
"ql": "@title shoes"
},
"filters": "price > 5",
"tags": ["Loius Vuitton"]
}
$newstoredquery = [
'index' => 'test_pq',
'body' => [
'query' => [
'match' => [
'title' => 'shoes'
]
],
'range' => [
'price' => [
'gt' => 5
]
]
],
'tags' => ['Loius Vuitton']
];
$client->pq()->doc($newstoredquery);
newstoredquery ={"index" : "test_pq", "id" : 2, "doc" : {"query": {"ql": "@title shoes"},"filters": "price > 5","tags": ["Loius Vuitton"]}}
indexApi.insert(newstoredquery)
newstoredquery ={"index" : "test_pq", "id" : 2, "doc" : {"query": {"ql": "@title shoes"},"filters": "price > 5","tags": ["Loius Vuitton"]}};
indexApi.insert(newstoredquery);
newstoredquery = new HashMap<String,Object>(){{
put("query",new HashMap<String,Object >(){{
put("q1","@title shoes");
put("filters","price>5");
put("tags",new String[] {"Loius Vuitton"});
}});
}};
newdoc.index("test_pq").id(2L).setDoc(doc);
indexApi.insert(newdoc);
Dictionary<string, Object> query = new Dictionary<string, Object>();
query.Add("q1", "@title shoes");
query.Add("filters", "price>5");
query.Add("tags", new List<string> {"Loius Vuitton"});
Dictionary<string, Object> newstoredquery = new Dictionary<string, Object>();
newstoredquery.Add("query", query);
InsertDocumentRequest newdoc = new InsertDocumentRequest(index: "test_pq", id: 2, doc: doc);
indexApi.Insert(newdoc);
If you don't specify an ID, it will be assigned automatically. You can read more about auto-ID here.
INSERT INTO pq(query, filters) VALUES ('wristband', 'price > 5');
SELECT * FROM pq;
+---------------------+-----------+------+---------+
| id | query | tags | filters |
+---------------------+-----------+------+---------+
| 1657843905795719192 | wristband | | price>5 |
+---------------------+-----------+------+---------+
PUT /pq/pq_table/doc
{
"query": {
"match": {
"title": "shoes"
},
"range": {
"price": {
"gt": 5
}
}
},
"tags": ["Loius Vuitton"]
}
PUT /pq/pq_table/doc
{
"query": {
"ql": "@title shoes"
},
"filters": "price > 5",
"tags": ["Loius Vuitton"]
}
{
"index": "pq_table",
"type": "doc",
"_id": "1657843905795719196",
"result": "created"
}
{
"index": "pq_table",
"type": "doc",
"_id": "1657843905795719198",
"result": "created"
}
$newstoredquery = [
'index' => 'pq_table',
'body' => [
'query' => [
'match' => [
'title' => 'shoes'
]
],
'range' => [
'price' => [
'gt' => 5
]
]
],
'tags' => ['Loius Vuitton']
];
$client->pq()->doc($newstoredquery);
Array(
[index] => pq_table
[type] => doc
[_id] => 1657843905795719198
[result] => created
)
indexApi = api = manticoresearch.IndexApi(client)
newstoredquery ={"index" : "test_pq", "doc" : {"query": {"ql": "@title shoes"},"filters": "price > 5","tags": ["Loius Vuitton"]}}
indexApi.insert(store_query)
{'created': True,
'found': None,
'id': 1657843905795719198,
'index': 'test_pq',
'result': 'created'}
newstoredquery ={"index" : "test_pq", "doc" : {"query": {"ql": "@title shoes"},"filters": "price > 5","tags": ["Loius Vuitton"]}};
res = await indexApi.insert(store_query);
{"_index":"test_pq","_id":1657843905795719198,"created":true,"result":"created"}
newstoredquery = new HashMap<String,Object>(){{
put("query",new HashMap<String,Object >(){{
put("q1","@title shoes");
put("filters","price>5");
put("tags",new String[] {"Loius Vuitton"});
}});
}};
newdoc.index("test_pq").setDoc(doc);
indexApi.insert(newdoc);
Dictionary<string, Object> query = new Dictionary<string, Object>();
query.Add("q1", "@title shoes");
query.Add("filters", "price>5");
query.Add("tags", new List<string> {"Loius Vuitton"});
Dictionary<string, Object> newstoredquery = new Dictionary<string, Object>();
newstoredquery.Add("query", query);
InsertDocumentRequest newdoc = new InsertDocumentRequest(index: "test_pq", doc: doc);
indexApi.Insert(newdoc);
In case of omitted schema in SQL INSERT
command, the following parameters are expected:
1. ID. You can use 0
as the ID to trigger auto-ID generation.
2. Query - Full-text query.
3. Tags - PQ rule tags string.
4. Filters - Additional filters by attributes.
INSERT INTO pq VALUES (0, '@title shoes', '', '');
INSERT INTO pq VALUES (0, '@title shoes', 'Louis Vuitton', '');
SELECT * FROM pq;
+---------------------+--------------+---------------+---------+
| id | query | tags | filters |
+---------------------+--------------+---------------+---------+
| 2810855531667783688 | @title shoes | | |
| 2810855531667783689 | @title shoes | Louis Vuitton | |
+---------------------+--------------+---------------+---------+
To replace an existing PQ rule with a new one in SQL, just use a regular REPLACE command. There's a special syntax ?refresh=1
to replace a PQ rule defined in JSON mode via the HTTP JSON interface.
mysql> select * from pq;
+---------------------+--------------+------+---------+
| id | query | tags | filters |
+---------------------+--------------+------+---------+
| 2810823411335430148 | @title shoes | | |
+---------------------+--------------+------+---------+
1 row in set (0.00 sec)
mysql> replace into pq(id,query) values(2810823411335430148,'@title boots');
Query OK, 1 row affected (0.00 sec)
mysql> select * from pq;
+---------------------+--------------+------+---------+
| id | query | tags | filters |
+---------------------+--------------+------+---------+
| 2810823411335430148 | @title boots | | |
+---------------------+--------------+------+---------+
1 row in set (0.00 sec)
GET /pq/pq/doc/2810823411335430149
{
"took": 0,
"timed_out": false,
"hits": {
"total": 1,
"hits": [
{
"_id": "2810823411335430149",
"_score": 1,
"_source": {
"query": {
"match": {
"title": "shoes"
}
},
"tags": "",
"filters": ""
}
}
]
}
}
PUT /pq/pq/doc/2810823411335430149?refresh=1 -d '{
"query": {
"match": {
"title": "boots"
}
}
}'
GET /pq/pq/doc/2810823411335430149
{
"took": 0,
"timed_out": false,
"hits": {
"total": 1,
"hits": [
{
"_id": "2810823411335430149",
"_score": 1,
"_source": {
"query": {
"match": {
"title": "boots"
}
},
"tags": "",
"filters": ""
}
}
]
}
}
Plain tables are tables that are created one-time by fetching data at creation from one or several sources. A plain table is immutable as documents cannot be added or deleted during its lifespan. It is only possible to update values of numeric attributes (including MVA). Refreshing the data is only possible by recreating the whole table.
Plain tables are available only in the Plain mode and their definition is made up of a table declaration and one or several source declarations. The data gathering and table creation are not made by the searchd
server but by the auxiliary tool indexer
.
Indexer is a command-line tool that can be called directly from the command line or from shell scripts.
It can accept a number of arguments when called, but there are also several settings of its own in the Manticore configuration file.
In the typical scenario, indexer does the following:
The indexer
tool is used to create plain tables in Manticore Search. It has a general syntax of:
indexer [OPTIONS] [table_name1 [table_name2 [...]]]
When creating tables with indexer
, the generated table files must be made with permissions that allow searchd
to read, write, and delete them. In case of the official Linux packages, searchd
runs under the manticore
user. Therefore, indexer
must also run under the manticore
user:
sudo -u manticore indexer ...
If you are running searchd
differently, you might need to omit sudo -u manticore
. Just make sure that the user under which your searchd
instance is running has read/write permissions to the tables generated using indexer
.
To create a plain table, you need to list the table(s) you want to process. For example, if your manticore.conf
file contains details on two tables, mybigindex
and mysmallindex
, you could run:
sudo -u manticore indexer mysmallindex mybigindex
You can also use wildcard tokens to match table names:
?
matches any single character*
matches any count of any characters%
matches none or any single charactersudo -u manticore indexer indexpart*main --rotate
The exit codes for indexer are as follows:
--rotate
was specified, it was skipped) or an operation emitted a warning--rotate
attempt failedAlso, you can run indexer
via a systemctl unit file:
systemctl start --no-block manticore-indexer
Or, in case you want to build a specific table:
systemctl start --no-block manticore-indexer@specific-table-name
Find more information about scheduling indexer
via systemd below.
--config <file>
(-c <file>
for short) tells indexer
to use the given file as its configuration. Normally, it will look for manticore.conf
in the installation directory (e.g. /etc/manticoresearch/manticore.conf
), followed by the current directory you are in when calling indexer
from the shell. This is most useful in shared environments where the binary files are installed in a global folder, e.g. /usr/bin/
, but you want to provide users with the ability to make their own custom Manticore set-ups, or if you want to run multiple instances on a single server. In cases like those you could allow them to create their own manticore.conf
files and pass them to indexer
with this option. For example:shell
sudo -u manticore indexer --config /home/myuser/manticore.conf mytable
--all
tells indexer
to update every table listed in manticore.conf
instead of listing individual tables. This would be useful in small configurations or cron-kind or maintenance jobs where the entire table set will get rebuilt each day or week or whatever period is best. Please note that since --all
tries to update all found tables in the configuration, it will issue a warning if it encounters RealTime tables and the exit code of the command will be 1
not 0
even if the plain tables finished without issue. Example usage:shell
sudo -u manticore indexer --config /home/myuser/manticore.conf --all
--rotate
is used for rotating tables. Unless you have the situation where you can take the search function offline without troubling users you will almost certainly need to keep search running whilst indexing new documents. --rotate
creates a second table, parallel to the first (in the same place, simply including .new
in the filenames). Once complete, indexer
notifies searchd
via sending the SIGHUP
signal, and the searchd
will attempt to rename the tables (renaming the existing ones to include .old
and renaming the .new
to replace them), and then will start serving from the newer files. Depending on the setting of seamless_rotate there may be a slight delay in being able to search the newer tables. In case multiple tables are rotated at once which are chained by killlist_target relations rotation will start with the tables that are not targets and finish with the ones at the end of target chain. Example usage:shell
sudo -u manticore indexer --rotate --all
--quiet
tells indexer
ot to output anything, unless there is an error. This is mostly used for cron-type or other scripted jobs where the output is irrelevant or unnecessary, except in the event of some kind of error. Example usage:shell
sudo -u manticore indexer --rotate --all --quiet
--noprogress
does not display progress details as they occur. Instead, the final status details (such as documents indexed, speed of indexing and so on are only reported at completion of indexing. In instances where the script is not being run on a console (or 'tty'), this will be on by default. Example usage:shell
sudo -u manticore indexer --rotate --all --noprogress
--buildstops <outputfile.text> <N>
reviews the table source, as if it were indexing the data, and produces a list of the terms that are being indexed. In other words, it produces a list of all the searchable terms that are becoming part of the table. Note, it does not update the table in question, it simply processes the data as if it were indexing, including running queries defined with sql_query_pre or sql_query_post. outputfile.txt
will contain the list of words, one per line, sorted by frequency with most frequent first, and N
specifies the maximum number of words that will be listed. If it's sufficiently large to encompass every word in the table, only that many words will be returned. Such a dictionary list could be used for client application features around "Did you mean…" functionality, usually in conjunction with --buildfreqs
, below. Example:shell
sudo -u manticore indexer mytable --buildstops word_freq.txt 1000
This would produce a document in the current directory, word_freq.txt
, with the 1,000 most common words in 'mytable', ordered by most common first. Note that the file will pertain to the last table indexed when specified with multiple tables or --all
(i.e. the last one listed in the configuration file)
--buildfreqs
works with --buildstops
(and is ignored if --buildstops
is not specified). As --buildstops
provides the list of words used within the table, --buildfreqs
adds the quantity present in the table, which would be useful in establishing whether certain words should be considered stopwords if they are too prevalent. It will also help with developing "Did you mean…" features where you need to know how much more common a given word compared to another, similar one. For example:shell
sudo -u manticore indexer mytable --buildstops word_freq.txt 1000 --buildfreqs
This would produce the word_freq.txt
as above, however after each word would be the number of times it occurred in the table in question.
--merge <dst-table> <src-table>
is used for physically merging tables together, for example if you have a main+delta scheme, where the main table rarely changes, but the delta table is rebuilt frequently, and --merge
would be used to combine the two. The operation moves from right to left - the contents of src-table
get examined and physically combined with the contents of dst-table
and the result is left in dst-table
. In pseudo-code, it might be expressed as: dst-table += src-table
An example:shell
sudo -u manticore indexer --merge main delta --rotate
In the above example, where the main is the master, rarely modified table, and the delta is more frequently modified one, you might use the above to call indexer
to combine the contents of the delta into the main table and rotate the tables.
--merge-dst-range <attr> <min> <max>
runs the filter range given upon merging. Specifically, as the merge is applied to the destination table (as part of --merge
, and is ignored if --merge
is not specified), indexer
will also filter the documents ending up in the destination table, and only documents will pass through the filter given will end up in the final table. This could be used for example, in a table where there is a 'deleted' attribute, where 0 means 'not deleted'. Such a table could be merged with:shell
sudo -u manticore indexer --merge main delta --merge-dst-range deleted 0 0
Any documents marked as deleted (value 1) will be removed from the newly-merged destination table. It can be added several times to the command line, to add successive filters to the merge, all of which must be met in order for a document to become part of the final table.
merge-killlists
(and its shorter alias --merge-klists
) changes the way kill lists are processed when merging tables. By default, both kill lists get discarded after a merge. That supports the most typical main+delta merge scenario. With this option enabled, however, kill lists from both tables get concatenated and stored into the destination table. Note that a source (delta) table kill list will be used to suppress rows from a destination (main) table at all times.--keep-attrs
allows to reuse existing attributes on reindexing. Whenever the table is rebuilt, each new document id is checked for presence in the "old" table, and if it already exists, its attributes are transferred to the "new" table; if not found, attributes from the new table are used. If the user has updated attributes in the table, but not in the actual source used for the table, all updates will be lost when reindexing; using --keep-attrs
enables saving the updated attribute values from the previous table. It is possible to specify a path for table files to be used instead of the reference path from the config:shell
sudo -u manticore indexer mytable --keep-attrs=/path/to/index/files
--keep-attrs-names=<attributes list>
allows you to specify attributes to reuse from an existing table on reindexing. By default, all attributes from the existing table are reused in the new table:shell
sudo -u manticore indexer mytable --keep-attrs=/path/to/table/files --keep-attrs-names=update,state
--dump-rows <FILE>
dumps rows fetched by SQL source(s) into the specified file, in a MySQL compatible syntax. The resulting dumps are the exact representation of data as received by indexer
and can help repeat indexing-time issues. The command performs fetching from the source and creates both table files and the dump file.--print-rt <rt_index> <table>
outputs fetched data from the source as INSERTs for a real-time table. The first lines of the dump will contain the real-time fields and attributes (as a reflection of the plain table fields and attributes). The command performs fetching from the source and creates both table files and the dump output. The command can be used as sudo -u manticore indexer -c manticore.conf --print-rt indexrt indexplain > dump.sql
. Only SQL-based sources are supported. MVAs are not supported.--sighup-each
is useful when you are rebuilding many big tables and want each one rotated into searchd
as soon as possible. With --sighup-each
, indexer
will send the SIGHUP signal to searchd after successfully completing work on each table. (The default behavior is to send a single SIGHUP after all the tables are built).--nohup
is useful when you want to check your table with indextool before actually rotating it. indexer won't send the SIGHUP if this option is on. Table files are renamed to .tmp. Use indextool to rename table files to .new and rotate it. Example usage:shell
sudo -u manticore indexer --rotate --nohup mytable
sudo -u manticore indextool --rotate --check mytable
--print-queries
prints out SQL queries that indexer
sends to the database, along with SQL connection and disconnection events. That is useful to diagnose and fix problems with SQL sources.--help
(-h
for short) lists all the parameters that can be called in indexer
.-v
shows indexer
version.You can also configure indexer behavior in the Manticore configuration file in the indexer
section:
indexer {
...
}
lemmatizer_cache = 256M
Lemmatizer cache size. Optional, default is 256K.
Our lemmatizer implementation uses a compressed dictionary format that enables a space/speed tradeoff. It can either perform lemmatization off the compressed data, using more CPU but less RAM, or it can decompress and precache the dictionary either partially or fully, thus using less CPU but more RAM. The lemmatizer_cache directive lets you control how much RAM exactly can be spent for that uncompressed dictionary cache.
Currently, the only available dictionaries are ru.pak, en.pak, and de.pak. These are the Russian, English, and German dictionaries. The compressed dictionary is approximately 2 to 10 MB in size. Note that the dictionary stays in memory at all times too. The default cache size is 256 KB. The accepted cache sizes are 0 to 2047 MB. It's safe to raise the cache size too high; the lemmatizer will only use the needed memory. For example, the entire Russian dictionary decompresses to approximately 110 MB; thus settinglemmatizer_cache
higher than that will not affect the memory use. Even when 1024 MB is allowed for the cache, if only 110 MB is needed, it will only use those 110 MB.
max_file_field_buffer = 128M
Maximum file field adaptive buffer size in bytes. Optional, default is 8MB, minimum is 1MB.
The file field buffer is used to load files referred to from sql_file_field columns. This buffer is adaptive, starting at 1 MB at first allocation, and growing in 2x steps until either the file contents can be loaded or the maximum buffer size, specified by the max_file_field_buffer
directive, is reached.
Thus, if no file fields are specified, no buffer is allocated at all. If all files loaded during indexing are under (for example) 2 MB in size, but the max_file_field_buffer
value is 128 MB, the peak buffer usage would still be only 2 MB. However, files over 128 MB would be entirely skipped.
max_iops = 40
Maximum I/O operations per second, for I/O throttling. Optional, default is 0 (unlimited).
I/O throttling related option. It limits the maximum count of I/O operations (reads or writes) per any given second. A value of 0 means that no limit is imposed.
indexer
can cause bursts of intensive disk I/O during building a table, and it might be desirable to limit its disk activity (and reserve something for other programs running on the same machine, such as searchd
). I/O throttling helps to do that. It works by enforcing a minimum guaranteed delay between subsequent disk I/O operations performed by indexer
. Throttling I/O can help reduce search performance degradation caused by building. This setting is not effective for other kinds of data ingestion, e.g. inserting data into a real-time table.
max_iosize = 1048576
Maximum allowed I/O operation size, in bytes, for I/O throttling. Optional, default is 0 (unlimited).
I/O throttling related option. It limits the maximum file I/O operation (read or write) size for all operations performed by indexer
. A value of 0 means that no limit is imposed. Reads or writes that are bigger than the limit will be split into several smaller operations, and counted as several operations by the max_iops setting. At the time of this writing, all I/O calls should be under 256 KB (default internal buffer size) anyway, so max_iosize values higher than 256 KB should not have any effect.
max_xmlpipe2_field = 8M
Maximum allowed field size for XMLpipe2 source type, in bytes. Optional, default is 2 MB.
mem_limit = 256M
# mem_limit = 262144K # same, but in KB
# mem_limit = 268435456 # same, but in bytes
Plain table building RAM usage limit. Optional, default is 128 MB. Enforced memory usage limit that the indexer
will not go above. Can be specified in bytes, or kilobytes (using K postfix), or megabytes (using M postfix); see the example. This limit will be automatically raised if set to an extremely low value causing I/O buffers to be less than 8 KB; the exact lower bound for that depends on the built data size. If the buffers are less than 256 KB, a warning will be produced.
The maximum possible limit is 2047M. Too low values can hurt plain table building speed, but 256M to 1024M should be enough for most, if not all datasets. Setting this value too high can cause SQL server timeouts. During the document collection phase, there will be periods when the memory buffer is partially sorted and no communication with the database is performed; and the database server can timeout. You can resolve that either by raising timeouts on the SQL server side or by lowering mem_limit
.
on_file_field_error = skip_document
How to handle IO errors in file fields. Optional, default is ignore_field
.
When there is a problem indexing a file referenced by a file field (sql_file_field), indexer
can either process the document, assuming empty content in this particular field, or skip the document, or fail indexing entirely. on_file_field_error
directive controls that behavior. The values it takes are:
ignore_field
, process the current document without field;skip_document
, skip the current document but continue indexing;fail_index
, fail indexing with an error message.The problems that can arise are: open error, size error (file too big), and data read error. Warning messages on any problem will be given at all times, regardless of the phase and the on_file_field_error
setting.
Note that with on_file_field_error = skip_document
documents will only be ignored if problems are detected during an early check phase, and not during the actual file parsing phase. indexer
will open every referenced file and check its size before doing any work, and then open it again when doing actual parsing work. So in case a file goes away between these two open attempts, the document will still be indexed.
write_buffer = 4M
Write buffer size, bytes. Optional, default is 1MB. Write buffers are used to write both temporary and final table files when indexing. Larger buffers reduce the number of required disk writes. Memory for the buffers is allocated in addition to mem_limit. Note that several (currently up to 4) buffers for different files will be allocated, proportionally increasing the RAM usage.
ignore_non_plain = 1
ignore_non_plain
allows you to completely ignore warnings about skipping non-plain tables. The default is 0 (not ignoring).
There are two approaches to scheduling indexer runs. The first way is the classical method of using crontab. The second way is using a systemd timer with a user-defined schedule. To create the timer unit files, you should place them in the appropriate directory where systemd looks for such unit files. On most Linux distributions, this directory is typically /etc/systemd/system
. Here's how to do it:
cat << EOF > /etc/systemd/system/manticore-indexer@.timer
[Unit]
Description=Run Manticore Search's indexer on schedule
[Timer]
OnCalendar=minutely
RandomizedDelaySec=5m
Unit=manticore-indexer@%i.service
[Install]
WantedBy=timers.target
EOF
More on the OnCalendar
syntax and examples can be found here.
systemctl enable manticore-indexer@idx1.timer
systemctl start manticore-indexer@idx1.timer
Manticore Search allows fetching data from databases using specialized drivers or ODBC. Current drivers include:
mysql
- for MySQL/MariaDB/Percona MySQL databasespgsql
- for PostgreSQL databasemssql
- for Microsoft SQL databaseodbc
- for any database that accepts connections using ODBCTo fetch data from the database, a source must be configured with type as one of the above. The source requires information about how to connect to the database and the query that will be used to fetch the data. Additional pre- and post-queries can also be set - either to configure session settings or to perform pre/post fetch tasks. The source also must contain definitions of data types for the columns that are fetched.
The source definition must contain the settings of the connection, this includes the host, port, user credentials, or specific settings of a driver.
The database server host to connect to. Note that the MySQL client library chooses whether to connect over TCP/IP or over UNIX socket based on the host name. Specifically, "localhost" will force it to use UNIX socket (this is the default and generally recommended mode) and "127.0.0.1" will force TCP/IP usage.
The server IP port to connect to.
For mysql
the default is 3306 and for pgsql
, it is 5432.
The SQL database to use after the connection is established and perform further queries within.
The username used for connecting.
The user password to use when connecting. If the password includes #
(which can be used to add comments in the configuration file), you can escape it with \
.
UNIX socket name to connect to for local database servers. Note that it depends on the sql_host
setting whether this value will actually be used.
sql_sock = /var/lib/mysql/mysql.sock
MySQL client connection flags. Optional, the default value is 0 (do not set any flags).
This option must contain an integer value with the sum of the flags. The value will be passed to mysql_real_connect() verbatim. The flags are enumerated in mysql_com.h include file. Flags that are especially interesting in regard to indexing, with their respective values, are as follows:
mysql_connect_flags = 32 # enable compression
mysql_ssl_cert
- path to SSL certificatemysql_ssl_key
- path to SSL key filemysql_ssl_ca
- path to CA certificateunpack_mysqlcompress_maxsize = 1M
Columns to unpack using MySQL UNCOMPRESS()
algorithm. Multi-value, optional, default value is an empty list of columns.
Columns specified using this directive will be unpacked by the indexer using the modified zlib algorithm used by MySQL COMPRESS()
and UNCOMPRESS()
functions. When indexing on a different box than the database, this lets you offload the database and save on network traffic. This feature is only available if zlib and zlib-devel were both available during build time.
unpack_mysqlcompress = body_compressed
unpack_mysqlcompress = description_compressed
By default, a buffer of 16M is used for uncompressing the data. This can be changed by setting unpack_mysqlcompress_maxsize
.
When using unpack_mysqlcompress, due to implementation intricacies, it is not possible to deduce the required buffer size from the compressed data. So, the buffer must be preallocated in advance, and the unpacked data can not go over the buffer size.
unpack_zlib = col1
unpack_zlib = col2
Columns to unpack using zlib (aka deflate, aka gunzip). Multi-value, optional, default value is an empty list of columns. Applies to source types mysql
and pgsql
only.
Columns specified using this directive will be unpacked by the indexer
using the standard zlib algorithm (called deflate and also implemented by gunzip
). When indexing on a different box than the database, this lets you offload the database and save on network traffic. This feature is only available if zlib and zlib-devel were both available during build time.
MS SQL Windows authentication flag. Whether to use currently logged-in Windows account credentials for authentication when connecting to MS SQL Server.
mssql_winauth = 1
Sources using ODBC require the presence of a DSN (Data Source Name) string which can be set with odbc_dsn
.
odbc_dsn = Driver={Oracle ODBC Driver};Dbq=myDBName;Uid=myUsername;Pwd=myPassword
Please note that the format depends on the specific ODBC driver used.
With all the SQL drivers, building a plain table generally works as follows.
sql_query_pre_all
queries are executed to perform any necessary initial setup, such as setting per-connection encoding with MySQL. These queries run before the entire indexing process, and also after a reconnect for indexing MVA attributes and joined fields.sql_query_pre
pre-query is executed to perform any necessary initial setup, such as setting up temporary tables or maintaining counter tables. These queries run once for the entire indexing process.Pre-queries as sql_query_pre
is executed to perform any necessary initial setup, such as setting up temporary
tables, or maintaining counter table. These queries run once per whole indexing.
Main query as sql_query
is executed and the rows it returns are processed.
sql_query_post
is executed to perform some necessary cleanup.sql_query_post_index
is executed to perform some necessary final cleanup.Example of a source fetching data from MYSQL:
source mysource {
type = mysql
path = /path/to/realtime
sql_host = localhost
sql_user = myuser
sql_pass = mypass
sql_db = mydb
sql_query_pre = SET CHARACTER_SET_RESULTS=utf8
sql_query_pre = SET NAMES utf8
sql_query = SELECT id, title, description, category_id FROM mytable
sql_query_post = DROP TABLE view_table
sql_query_post_index = REPLACE INTO counters ( id, val ) \
VALUES ( 'max_indexed_id', $maxid )
sql_attr_uint = category_id
sql_field_string = title
}
table mytable {
type = plain
source = mysource
path = /path/to/mytable
...
}
This is the query used to retrieve documents from a SQL server. There can be only one sql_query declared, and it's mandatory to have one. See also Processing fetched data
Pre-fetch query or pre-query. This is a multi-value, optional setting, with the default being an empty list of queries. The pre-queries are executed before the sql_query in the order they appear in the configuration file. The results of the pre-queries are ignored.
Pre-queries are useful in many ways. They can be used to set up encoding, mark records that are going to be indexed, update internal counters, set various per-connection SQL server options and variables, and so on.
Perhaps the most frequent use of pre-query is to specify the encoding that the server will use for the rows it returns. Note that Manticore accepts only UTF-8 text. Two MySQL specific examples of setting the encoding are:
sql_query_pre = SET CHARACTER_SET_RESULTS=utf8
sql_query_pre = SET NAMES utf8
Also, specific to MySQL sources, it is useful to disable query cache (for indexer connection only) in pre-query, because indexing queries are not going to be re-run frequently anyway, and there's no sense in caching their results.
That could be achieved with:
sql_query_pre = SET SESSION query_cache_type=OFF
Post-fetch query. This is an optional setting, with the default value being empty.
This query is executed immediately after sql_query completes successfully. When the post-fetch query produces errors, they are reported as warnings, but indexing is not terminated. Its result set is ignored. Note that indexing is not yet completed at the point when this query gets executed, and further indexing may still fail. Therefore, any permanent updates should not be done from here. For instance, updates on a helper table that permanently change the last successfully indexed ID should not be run from the sql_query_post
query; they should be run from the sql_query_post_index
query instead.
Post-processing query. This is an optional setting, with the default value being empty.
This query is executed when indexing is fully and successfully completed. If this query produces errors, they are reported as warnings, but indexing is not terminated. Its result set is ignored. The $maxid
macro can be used in its text; it will be expanded to the maximum document ID that was actually fetched from the database during indexing. If no documents were indexed, $maxid
will be expanded to 0.
Example:
sql_query_post_index = REPLACE INTO counters ( id, val ) \
VALUES ( 'max_indexed_id', $maxid )
The difference between sql_query_post
and sql_query_post_index
is that sql_query_post
is run immediately when Manticore receives all the documents, but further indexing may still fail for some other reason. On the contrary, by the time the sql_query_post_index
query gets executed, it is guaranteed that the table was created successfully. Database connection is dropped and re-established because sorting phase can be very lengthy and would just time out otherwise.
By default, the first column from the result set of sql_query
is indexed as the document id.
Document ID MUST be the very first field, and it MUST BE UNIQUE SIGNED (NON-ZERO) INTEGER NUMBER from -9223372036854775808 to 9223372036854775807.
You can specify up to 256 full-text fields and an arbitrary amount of attributes. All the columns that are neither document ID (the first one) nor attributes will be indexed as full-text fields.
Declares a 64-bit signed integer.
Declares a boolean attribute. It's equivalent to an integer attribute with bit count of 1.
Declares a floating point attribute.
The values will be stored in single precision, 32-bit IEEE 754 format. Represented range is approximately from 1e-38 to 1e+38. The amount of decimal digits that can be stored precisely is approximately 7.
One important usage of float attributes is storing latitude and longitude values (in radians), for further usage in query-time geosphere distance calculations.
Declares a JSON attribute.
When indexing JSON attributes, Manticore expects a text field with JSON formatted data. JSON attributes support arbitrary JSON data with no limitation in nested levels or types.
Declares a multi-value attribute.
Plain attributes only allow attaching 1 value per each document. However, there are cases (such as tags or categories) when it is desired to attach multiple values of the same attribute and be able to apply filtering or grouping to value lists.
The MVA can take the values from a column (like the rest of the data types) - in this case, the column in the result set must provide a string with multiple integer values separated by commas - or by running a separate query to get the values.
When executing a query, the engine runs the query, groups the results by IDs, and assigns the values to their corresponding documents in the table. Values with an ID not found in the table are discarded. Before executing the query, any defined sql_query_pre_all
will be run.
The declaration format for sql_attr_multi is as follows:
sql_attr_multi = ATTR-TYPE ATTR-NAME 'from' SOURCE-TYPE \
[;QUERY] \
[;RANGED-QUERY]
where
uint
, bigint
or timestamp
.field
, query
, ranged-query
, or ranged-main-query
.sql_query_range
.It's used with ranged-query SOURCE-TYPE. If using ranged-main-query
SOURCE-TYPE, then omit the RANGED-QUERY, and it will automatically use the same query from sql_query_range
(useful option in complex inheritance setups to save having to manually duplicate the same query many times).
sql_attr_multi = uint tag from field
sql_attr_multi = uint tag from query; SELECT id, tag FROM tags
sql_attr_multi = bigint tag from ranged-query; \
SELECT id, tag FROM tags WHERE id>=$start AND id<=$end; \
SELECT MIN(id), MAX(id) FROM tags
Declares a string attribute. The maximum size of each value is fixed at 4GB.
Declares a UNIX timestamp.
Timestamps can store dates and times in the range of January 01, 1970, to January 19, 2038, with a precision of one second. The expected column value should be a timestamp in UNIX format, which is a 32-bit unsigned integer number of seconds elapsed since midnight on January 01, 1970, GMT. Timestamps are internally stored and handled as integers everywhere. In addition to working with timestamps as integers, you can also use them with different date-based functions, such as time segments sorting mode or day/week/month/year extraction for GROUP BY.
Note that DATE or DATETIME column types in MySQL cannot be directly used as timestamp attributes in Manticore; you need to explicitly convert such columns using UNIX_TIMESTAMP function (if the data is in range).
Note timestamps can not represent dates before January 01, 1970, and UNIX_TIMESTAMP() in MySQL will not return anything expected. If you only need to work with dates, not times, consider TO_DAYS()
function in MySQL instead.
Declares an unsigned integer attribute.
You can specify the bit count for integer attributes by appending ':BITCOUNT' to attribute name (see example below). Attributes with less than default 32-bit size, or bitfields, perform slower.
sql_attr_uint = group_id
sql_attr_uint = forum_id:9 # 9 bits for forum_id
Declares a combo string attribute/text field. The values will be indexed as a full-text field, but also stored in a string attribute with the same name. Note, it should only be used when you are sure you want the field to be searchable both in a full-text manner and as an attribute (with the ability to sort and group by it). If you just want to be able to fetch the original value of the field, you don't need to do anything for it unless you implicitly removed the field from the stored fields list via stored_fields.
sql_field_string = name
Declares a file based field.
This directive makes indexer interpret field contents as a file name, and load and process the referred file. Files larger than max_file_field_buffer in size are skipped. Any errors during the file loading (IO errors, missed limits, etc.) will be reported as indexing warnings and will not early terminate the indexing. No content will be indexed for such files.
sql_file_field = field_name
Joined/payload field fetch query. Multi-value, optional, the default is an empty list of queries.
sql_joined_field
lets you use two different features: joined fields and payloads (payload fields). Its syntax is as follows:
sql_joined_field = FIELD-NAME 'from' ( 'query' | 'payload-query' | 'ranged-query' | 'ranged-main-query' ); \
QUERY [ ; RANGE-QUERY ]
where
Joined fields let you avoid JOIN and/or GROUP_CONCAT statements in the main document fetch query (sql_query). This can be useful when the SQL-side JOIN is slow, or needs to be offloaded on the Manticore side, or simply to emulate MySQL-specific GROUP_CONCAT
functionality in case your database server does not support it.
The query must return exactly 2 columns: document ID, and text to append to a joined field. Document IDs can be duplicate, but they must be in ascending order. All the text rows fetched for a given ID will be concatenated together, and the concatenation result will be indexed as the entire contents of a joined field. Rows will be concatenated in the order returned from the query, and separating whitespace will be inserted between them. For instance, if the joined field query returns the following rows:
( 1, 'red' )
( 1, 'right' )
( 1, 'hand' )
( 2, 'mysql' )
( 2, 'manticore' )
then the indexing results would be equivalent to adding a new text field with a value of 'red right hand' to document 1 and 'mysql sphinx' to document 2, including the keyword positions inside the field in the order they come from the query. If the rows need to be in a specific order, that needs to be explicitly defined in the query.
Joined fields are only indexed differently. There are no other differences between joined fields and regular text fields.
Before executing the joined fields query, any set of sql_query_pre_all
will be run, if any exist. This allows you to set the desired encoding, etc., within the joined fields' context.
When a single query is not efficient enough or does not work because of the database driver limitations, ranged queries can be used. It works similarly to the ranged queries in the main indexing loop. The range will be queried for and fetched upfront once, then multiple queries with different $start
and $end
substitutions will be run to fetch the actual data.
When using ranged-main-query
query, omit the ranged-query
, and it will automatically use the same query from sql_query_range
(a useful option in complex inheritance setups to save having to manually duplicate the same query many times).
Payloads let you create a special field in which, instead of keyword positions, so-called user payloads are stored. Payloads are custom integer values attached to every keyword. They can then be used at search time to affect the ranking.
The payload query must return exactly 3 columns:
- document ID
- keyword
- and integer payload value.
Document IDs can be duplicate, but they must be in ascending order. Payloads must be unsigned integers within the 24-bit range, i.e., from 0 to 16777215.
The only ranker that accounts for payloads is proximity_bm25
(the default ranker). On tables with payload fields, it will automatically switch to a variant that matches keywords in those fields, computes a sum of matched payloads multiplied by field weights, and adds that sum to the final rank.
Please note that the payload field is ignored for full-text queries containing complex operators. It only works for simple bag-of-words queries.
Configuration example:
source min {
type = mysql
sql_host = localhost
sql_user = test
sql_pass =
sql_db = test
sql_query = select 1, 'Nike bag' f \
UNION select 2, 'Adidas bag' f \
UNION select 3, 'Reebok bag' f \
UNION select 4, 'Nike belt' f
sql_joined_field = tag from payload-query; select 1 id, 'nike' tag, 10 weight \
UNION select 4 id, 'nike' tag, 10 weight;
}
index idx {
path = idx
source = min
}
mysql> select * from idx;
+------+------------+------+
| id | f | tag |
+------+------------+------+
| 1 | Nike bag | nike |
| 2 | Adidas bag | |
| 3 | Reebok bag | |
| 4 | Nike belt | nike |
+------+------------+------+
4 rows in set (0.00 sec)
mysql> select *, weight() from idx where match('nike|adidas');
+------+------------+------+----------+
| id | f | tag | weight() |
+------+------------+------+----------+
| 1 | Nike bag | nike | 11539 |
| 4 | Nike belt | nike | 11539 |
| 2 | Adidas bag | | 1597 |
+------+------------+------+----------+
3 rows in set (0.01 sec)
mysql> select *, weight() from idx where match('"nike bag"|"adidas bag"');
+------+------------+------+----------+
| id | f | tag | weight() |
+------+------------+------+----------+
| 2 | Adidas bag | | 2565 |
| 1 | Nike bag | nike | 2507 |
+------+------------+------+----------+
2 rows in set (0.00 sec)
sql_column_buffers = <colname>=<size>[K|M] [, ...]
Per-column buffer sizes. Optional, default is empty (deduce the sizes automatically). Applies to odbc
, mssql
source types only.
ODBC and MS SQL drivers sometimes cannot return the maximum actual column size to be expected. For instance,NVARCHAR(MAX)
columns always report their length as 2147483647 bytes to indexer
even though the actually used length is likely considerably less. However, the receiving buffers still need to be allocated upfront, and their sizes have to be determined. When the driver does not report the column length at all, Manticore allocates default 1 KB buffers for each non-char column, and 1 MB buffers for each char column. Driver-reported column length also gets clamped by an upper limit of 8 MB, so in case the driver reports (almost) a 2 GB column length, it will be clamped and an 8 MB buffer will be allocated instead for that column. These hard-coded limits can be overridden using the sql_column_buffers
directive, either in order to save memory on actually shorter columns or to overcome the 8 MB limit on actually longer columns. The directive values must be a comma-separated list of selected column names and sizes:
Example:
sql_query = SELECT id, mytitle, mycontent FROM documents
sql_column_buffers = mytitle=64K, mycontent=10M
Main query, which needs to fetch all the documents, can impose a read lock on the whole table and stall the concurrent queries (e.g. INSERTs to MyISAM table), waste a lot of memory for result set, etc. To avoid this, Manticore supports so-called ranged queries. With ranged queries, Manticore first fetches min and max document IDs from the table, and then substitutes different ID intervals into main query text and runs the modified query to fetch another chunk of documents. Here's an example.
Ranged query usage example:
sql_query_range = SELECT MIN(id),MAX(id) FROM documents
sql_range_step = 1000
sql_query = SELECT * FROM documents WHERE id>=$start AND id<=$end
If the table contains document IDs from 1 to, say, 2345, then sql_query would be run three times:
$start
replaced with 1 and $end
replaced with 1000;$start
replaced with 1001 and $end
replaced with 2000;$start
replaced with 2001 and $end
replaced with 2345.Obviously, that's not much of a difference for 2000-row table, but when it comes to indexing 10-million-row table, ranged queries might be of some help.
Defines the range query. The query specified in this option must fetch min and max document IDs that will be used as range boundaries. It must return exactly two integer fields, min ID first and max ID second; the field names are ignored. When enabled, sql_query
will be required to contain $start and $end macros. Note that the intervals specified by $start..$end will not overlap, so you should not remove document IDs that are exactly equal to $start or $end from your query.
This directive defines the range query step. The default value is 1024.
This directive can be used to throttle the ranged query. By default, there is no throttling. Values for sql_ranged_throttle should be specified in milliseconds.
Throttling can be useful when the indexer imposes too much load on the database server. It causes the indexer to sleep for a given amount of time once per each ranged query step. This sleep is unconditional and is performed before the fetch query.
sql_ranged_throttle = 1000 # sleep for 1 sec before each query step
The xmlpipe2
source type allows for passing custom full-text and attribute data to Manticore in a custom XML format, with the schema (i.e., set of fields and attributes) specified in either the XML stream itself or in the source settings.
To declare the XML stream, the xmlpipe_command
directive is mandatory and contains the shell command that produces the XML stream to be indexed. This can be a file, but it can also be a program that generates XML content on-the-fly.
When indexing an xmlpipe2 source, the indexer runs the specified command, opens a pipe to its stdout, and expects a well-formed XML stream.
Here's an example of what the XML stream data might look like:
<?xml version="1.0" encoding="utf-8"?>
<sphinx:docset>
<sphinx:schema>
<sphinx:field name="subject"/>
<sphinx:field name="content"/>
<sphinx:attr name="published" type="timestamp"/>
<sphinx:attr name="author_id" type="int" bits="16" default="1"/>
</sphinx:schema>
<sphinx:document id="1234">
<content>this is the main content <![CDATA[and this <cdata> entry
must be handled properly by xml parser lib]]></content>
<published>1012325463</published>
<subject>note how field/attr tags can be
in <b> class="red">randomized</b> order</subject>
<misc>some undeclared element</misc>
</sphinx:document>
<sphinx:document id="1235">
<subject>another subject</subject>
<content>here comes another document, and i am given to understand,
that in-document field order must not matter, sir</content>
<published>1012325467</published>
</sphinx:document>
<!-- ... even more sphinx:document entries here ... -->
<sphinx:killlist>
<id>1234</id>
<id>4567</id>
</sphinx:killlist>
</sphinx:docset>
Arbitrary fields and attributes are allowed. They can also occur in the stream in arbitrary order within each document; the order is ignored. There is a restriction on the maximum field length; fields longer than 2 MB will be truncated to 2 MB (this limit can be changed in the source).
The schema, i.e., complete fields and attributes list, must be declared before any document can be parsed. This can be done either in the configuration file by using xmlpipe_field
and xmlpipe_attr_XXX
settings, or right in the stream using <sphinx:schema>
element. <sphinx:schema>
is optional. It is only allowed to occur as the very first sub-element in <sphinx:docset>
. If there is no in-stream schema definition, settings from the configuration file will be used. Otherwise, stream settings take precedence. Note that the document id should be specified as a property id
of tag <sphinx:document>
(e.g. <sphinx:document id="1235">
) and is supposed to be a unique-signed positive non-zero 64-bit integer.
Unknown tags (which were not declared neither as fields nor as attributes) will be ignored with a warning. In the example above, <misc>
will be ignored. All embedded tags and their attributes (such as <strong>
in <subject>
in the example above) will be silently ignored.
Support for incoming stream encodings depends on whether iconv
is installed on the system. xmlpipe2 is parsed using the libexpat
parser, which understands US-ASCII, ISO-8859-1, UTF-8, and a few UTF-16 variants natively. Manticore's configure
script will also check for libiconv
presence and utilize it to handle other encodings. libexpat
also enforces the requirement to use the UTF-8 charset on the Manticore side because the parsed data it returns is always in UTF-8.
XML elements (tags) recognized by xmlpipe2 (and their attributes where applicable) are:
sphinx:docset
- Mandatory top-level element, denotes and contains the xmlpipe2 document set.sphinx:schema
- Optional element, must either occur as the very first child of sphinx:docset or never occur at all. Declares the document schema and contains field and attribute declarations. If present, it overrides per-source settings from the configuration file.sphinx:field
- Optional element, child of sphinx:schema. Declares a full-text field. Known attributes are:sphinx:attr
- Optional element, child of sphinx:schema. Declares an attribute. Known attributes are:sphinx:document
- Mandatory element, must be a child of sphinx:docset. Contains arbitrary other elements with field and attribute values to be indexed, as declared either using sphinx:field and sphinx:attr elements or in the configuration file. The only known attribute is "id" that must contain the unique integer document ID.sphinx:killlist
- Optional element, child of sphinx:docset. Contains a number of "id" elements whose contents are document IDs to be put into a kill-list of the table. The kill-list is used in multi-table searches to suppress documents found in other tables of the search.If the XML doesn't define a schema, the data types of tables elements must be defined in the source configuration.
xmlpipe_field
- declares a text
field.xmlpipe_field_string
- declares a text field/string attribute. The column will be both indexed as a text field but also stored as a string attribute.xmlpipe_attr_uint
- declares an integer attributexmlpipe_attr_timestamp
- declares a timestamp attributexmlpipe_attr_bool
- declares a boolean attributexmlpipe_attr_float
- declares a float attributexmlpipe_attr_bigint
- declares a big integer attributexmlpipe_attr_multi
- declares a multi-value attribute with integersxmlpipe_attr_multi_64
- declares a multi-value attribute with 64-bit integersxmlpipe_attr_string
- declares a string attributexmlpipe_attr_json
- declares a JSON attributeIf xmlpipe_fixup_utf8
is set it will enable Manticore-side UTF-8 validation and filtering to prevent XML parser from choking on non-UTF-8 documents. By default, this option is disabled.
Under certain occasions it might be hard or even impossible to guarantee that the incoming XMLpipe2 document bodies are in perfectly valid and conforming UTF-8 encoding. For instance, documents with national single-byte encodings could sneak into the stream. libexpat XML parser is fragile, meaning that it will stop processing in such cases. UTF8 fixup feature lets you avoid that. When fixup is enabled, Manticore will preprocess the incoming stream before passing it to the XML parser and replace invalid UTF-8 sequences with spaces.
xmlpipe_fixup_utf8 = 1
Example of XML source without schema in configuration:
source xml_test_1
{
type = xmlpipe2
xmlpipe_command = cat /tmp/products_today.xml
}
Example of XML source with schema in configuration:
source xml_test_2
{
type = xmlpipe2
xmlpipe_command = cat /tmp/products_today.xml
xmlpipe_field = subject
xmlpipe_field = content
xmlpipe_attr_timestamp = published
xmlpipe_attr_uint = author_id:16
}
TSV/CSV is the simplest way to pass data to the Manticore indexer. This method was created due to the limitations of xmlpipe2. In xmlpipe2, the indexer must map each attribute and field tag in the XML file to a corresponding schema element. This mapping requires time, and it increases with the number of fields and attributes in the schema. TSV/CSV has no such issue, as each field and attribute corresponds to a particular column in the TSV/CSV file. In some cases, TSV/CSV could work slightly faster than xmlpipe2.
The first column in TSV/CSV file must be a document ID. The rest columns must mirror the declaration of fields and attributes in the schema definition. Note that you don't need to declare the document ID in the schema, since it's always considered to be present, should be in the 1st column and needs to be a unique-signed positive non-zero 64-bit integer.
The difference between tsvpipe and csvpipe is delimiter and quoting rules. tsvpipe has a tab character as hardcoded delimiter and has no quoting rules. csvpipe has the csvpipe_delimiter
option for delimiter with a default value of , and also has quoting rules, such as:
tsvpipe_command
directive is mandatory and contains the shell command invoked to produce the TSV stream that gets indexed. The command can read a TSV file, but it can also be a program that generates on-the-fly the tab delimited content.
The following directives can be used to declare the types of the indexed columns:
tsvpipe_field
- declares a text
field. tsvpipe_field_string
- declares a text field/string attribute. The column will be both indexed as a text field but also stored as a string attribute.tsvpipe_attr_uint
- declares an integer attribute. tsvpipe_attr_timestamp
- declares a timestamp attribute.tsvpipe_attr_bool
- declares a boolean attribute.tsvpipe_attr_float
- declares a float attribute.tsvpipe_attr_bigint
- declares a big integer attribute.tsvpipe_attr_multi
- declares a multi-value attribute with integers.tsvpipe_attr_multi_64
- declares a multi-value attribute with 64-bit integers.tsvpipe_attr_string
- declares a string attribute. tsvpipe_attr_json
- declares a JSON attribute.Example of a source using a TSV file:
source tsv_test
{
type = tsvpipe
tsvpipe_command = cat /tmp/rock_bands.tsv
tsvpipe_field = name
tsvpipe_attr_multi = genre_tags
}
1 Led Zeppelin 35,23,16
2 Deep Purple 35,92
3 Frank Zappa 35,23,16,92,33,24
csvpipe_command
directive is mandatory and contains the shell command invoked to produce the CSV stream which gets indexed. The command can just read a CSV file but it can also be a program that generates on-the-fly the comma delimited content.
The following directives can be used to declare the types of the indexed columns:
csvpipe_field
- declares a text
field. csvpipe_field_string
- declares a text field/string attribute. The column will be both indexed as a text field but also stored as a string attribute.csvpipe_attr_uint
- declares an integer attribute. csvpipe_attr_timestamp
- declares a timestamp attribute.csvpipe_attr_bool
- declares a boolean attribute.csvpipe_attr_float
- declares a float attribute.csvpipe_attr_bigint
- declares a big integer attribute.csvpipe_attr_multi
- declares a multi-value attribute with integers.csvpipe_attr_multi_64
- declares a multi-value attribute with 64-bit integers.csvpipe_attr_string
- declares a string attribute.csvpipe_attr_json
- declares a JSON attribute.Example of a source using a CSV file:
source csv_test
{
type = csvpipe
csvpipe_command = cat /tmp/rock_bands.csv
csvpipe_field = name
csvpipe_attr_multi = genre_tags
}
1,"Led Zeppelin","35,23,16"
2,"Deep Purple","35,92"
3,"Frank Zappa","35,23,16,92,33,24"
In many situations, the total dataset is too large to be frequently rebuilt from scratch, while the number of new records remains relatively small. For example, a forum may have 1,000,000 archived posts but only receive 1,000 new posts per day.
In such cases, implementing "live" (nearly real-time) table updates can be achieved using a "main+delta" scheme.
The concept involves setting up two sources and two tables, with one "main" table for data that rarely changes (if ever), and one "delta" table for new documents. In the example, the 1,000,000 archived posts would be stored in the main table, while the 1,000 new daily posts would be placed in the delta table. The delta table can then be rebuilt frequently, making the documents available for searching within seconds or minutes. Determining which documents belong to which table and rebuilding the main table can be fully automated. One approach is to create a counter table that tracks the ID used to split the documents and update it whenever the main table is rebuilt.
Using a timestamp column as the split variable is more effective than using the ID since timestamps can track not only new documents but also modified ones.
For datasets that may contain modified or deleted documents, the delta table should provide a list of affected documents, ensuring they are suppressed and excluded from search queries. This is accomplished using a feature called Kill Lists. The document IDs to be killed can be specified in an auxiliary query defined by sql_query_killlist. The delta table must indicate the target tables for which the kill lists will be applied using the killlist_target directive. The impact of kill lists is permanent on the target table, meaning that even if a search is performed without the delta table, the suppressed documents will not appear in the search results.
Notice how we're overriding sql_query_pre
in the delta source. We must explicitly include this override. If we don't, the REPLACE
query would be executed during the delta source's build as well, effectively rendering it useless.
# in MySQL
CREATE TABLE deltabreaker (
index_name VARCHAR(50) NOT NULL,
created_at TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
PRIMARY KEY (index_name)
);
# in manticore.conf
source main {
...
sql_query_pre = REPLACE INTO deltabreaker SET index_name = 'main', created_at = NOW()
sql_query = SELECT id, title, UNIX_TIMESTAMP(updated_at) AS updated FROM documents WHERE deleted=0 AND updated_at >=FROM_UNIXTIME($start) AND updated_at <=FROM_UNIXTIME($end)
sql_query_range = SELECT ( SELECT UNIX_TIMESTAMP(MIN(updated_at)) FROM documents) min, ( SELECT UNIX_TIMESTAMP(created_at)-1 FROM deltabreaker WHERE index_name='main') max
sql_query_post_index = REPLACE INTO deltabreaker set index_name = 'delta', created_at = (SELECT created_at FROM deltabreaker t WHERE index_name='main')
...
sql_attr_timestamp = updated
}
source delta : main {
sql_query_pre =
sql_query_range = SELECT ( SELECT UNIX_TIMESTAMP(created_at) FROM deltabreaker WHERE index_name='delta') min, UNIX_TIMESTAMP() max
sql_query_killlist = SELECT id FROM documents WHERE updated_at >= (SELECT created_at FROM deltabreaker WHERE index_name='delta')
}
table main {
path = /var/lib/manticore/main
source = main
}
table delta {
path = /var/lib/manticore/delta
source = delta
killlist_target = main:kl
}
Merging two existing plain tables can be more efficient than indexing the data from scratch and might be desired in some cases (such as merging 'main' and 'delta' tables instead of simply rebuilding 'main' in the 'main+delta' partitioning scheme). Thus,indexer
provides an option to do that. Merging tables is typically faster than rebuilding, but still not instant for huge tables. Essentially, it needs to read the contents of both tables once and write the result once. Merging a 100 GB and 1 GB table, for example, will result in 202 GB of I/O (but that's still likely less than indexing from scratch requires).
The basic command syntax is as follows:
sudo -u manticore indexer --merge DSTINDEX SRCINDEX [--rotate] [--drop-src]
Unless --drop-src
is specified, only the DSTINDEX table will be affected: the contents of SRCINDEX will be merged into it.
The --rotate
switch is required if DSTINDEX is already being served by searchd
.
The typical usage pattern is to merge a smaller update from SRCINDEX into DSTINDEX. Thus, when merging attributes, the values from SRCINDEX will take precedence if duplicate document IDs are encountered. However, note that the "old" keywords will not be automatically removed in such cases. For example, if there's a keyword "old" associated with document 123 in DSTINDEX, and a keyword "new" associated with it in SRCINDEX, document 123 will be found by both keywords after the merge. You can supply an explicit condition to remove documents from DSTINDEX to mitigate this; the relevant switch is --merge-dst-range
:
sudo -u manticore indexer --merge main delta --merge-dst-range deleted 0 0
This switch allows you to apply filters to the destination table along with merging. There can be several filters; all of their conditions must be met in order to include the document in the resulting merged table. In the example above, the filter passes only those records where 'deleted' is 0, eliminating all records that were flagged as deleted.
--drop-src
enables dropping SRCINDEX after the merge and before rotating the tables, which is important if you specify DSTINDEX in killlist_target
of DSTINDEX. Otherwise, when rotating the tables, the documents that have been merged into DSTINDEX may be suppressed by SRCINDEX.
When using plain tables, there's a challenge arising from the need to have the data in the table as fresh as possible.
In this case, one or more secondary (also known as delta) tables are used to capture the modified data between the time the main table was created and the current time. The modified data can include new, updated, or deleted documents. The search becomes a search over the main table and the delta table. This works seamlessly when you just add new documents to the delta table, but when it comes to updated or deleted documents, there remains the following issue.
If a document is present in both the main and delta tables, it can cause issues during searching, as the engine will see two versions of a document and won't know how to pick the right one. So, the delta needs to somehow inform the search that there are deleted documents in the main table that should be disregarded. This is where kill lists come in.
A table can maintain a list of document IDs that can be used to suppress records in other tables. This feature is available for plain tables using database sources or plain tables using XML sources. In the case of database sources, the source needs to provide an additional query defined by sql_query_killlist
. It will store in the table a list of documents that can be used by the server to remove documents from other plain tables.
This query is expected to return a number of 1-column rows, each containing just the document ID.
In many cases, the query is a union between a query that retrieves a list of updated documents and a list of deleted documents, e.g.:
sql_query_killlist = \
SELECT id FROM documents WHERE updated_ts>=@last_reindex UNION \
SELECT id FROM documents_deleted WHERE deleted_ts>=@last_reindex
A plain table can contain a directive called killlist_target that will tell the server it can provide a list of document IDs that should be removed from certain existing tables. The table can use either its document IDs as the source for this list or provide a separate list.
Sets the table(s) that the kill-list will be applied to. Optional, default value is empty.
When you use plain tables you often need to maintain not just a single table, but a set of them to be able to add/update/delete new documents sooner (read about delta table updates). n order to suppress matches in the previous (main) table that were updated or deleted in the next (delta) table, you need to:
killlist_target
in delta table settings:table products {
killlist_target = main:kl
path = products
source = src_base
}
When killlist_target
is specified, the kill-list is applied to all the tables listed in it on searchd
startup. If any of the tables from killlist_target
are rotated, the kill-list is reapplied to these tables. When the kill-list is applied, tables that were affected save these changes to disk.
killlist_target
has 3 modes of operation:
killlist_target = main:kl
. Document IDs from the kill-list of the delta table are suppressed in the main table (see sql_query_killlist
).killlist_target = main:id
. All document IDs from the delta table are suppressed in the main table. The kill-list is ignored.killlist_target = main
. Both document IDs from the delta table and its kill-list are suppressed in the main table.Multiple targets can be specified, separated by commas like:
killlist_target = table_one:kl,table_two:kl
You can change the killlist_target
settings for a table without rebuilding it by using ALTER
.
However, since the 'old' main table has already written the changes to disk, the documents that were deleted in it will remain deleted even if it is no longer in the killlist_target
of the delta table.
ALTER TABLE delta KILLLIST_TARGET='new_main_table:kl'
POST /cli -d "
ALTER TABLE delta KILLLIST_TARGET='new_main_table:kl'"
A plain table can be converted into a real-time table or added to an existing real-time table.
The first case is useful when you need to regenerate a real-time table completely, which may be needed, for example, if tokenization settings need an update. In this situation, preparing a plain table and converting it into a real-time table may be easier than preparing a batch job to perform INSERTs for adding all the data into a real-time table.
In the second case, you normally want to add a large bulk of new data to a real-time table, and again, creating a plain table with that data is easier than populating the existing real-time table.
The ATTACH
statement allows you to convert a plain table to be attached to an existing real-time table. It also enables you to attach the content of one real-time table to another real-time table.
ATTACH TABLE plain_or_rt_table TO TABLE rt_table [WITH TRUNCATE]
ATTACH TABLE
statement lets you move data from a plain table or a RT table to an RT table.
After a successful ATTACH
the data originally stored in the source plain table becomes a part of the target RT table, and the source plain table becomes unavailable (until the next rebuild). If the source table is an RT table, its content is moved into the destination RT table, and the source RT table remains empty. ATTACH
does not result in any table data changes. Essentially, it just renames the files (making the source table a new disk chunk of the target RT table) and updates the metadata. So it is generally a quick operation that might (frequently) complete as fast as under a second.
Note that when a table is attached to an empty RT table, the fields, attributes, and text processing settings (tokenizer, wordforms, etc.) from the source table are copied over and take effect. The respective parts of the RT table definition from the configuration file will be ignored.
When the TRUNCATE
option is used, the RT table gets truncated prior to attaching the source plain table. This allows the operation to be atomic or ensures that the attached source plain table will be the only data in the target RT table.
ATTACH TABLE
comes with a number of restrictions. Most notably, the target RT table is currently required to be either empty or have the same settings as the source table. In case the source table gets attached to a non-empty RT table, the RT table data collected so far gets stored as a regular disk chunk, and the table being attached becomes the newest disk chunk, with documents having the same IDs getting killed. The complete list of restrictions is as follows:
mysql> DESC rt;
Empty set (0.00 sec)
mysql> SELECT * FROM rt;
+-----------+---------+
| Field | Type |
+-----------+---------+
| id | integer |
| testfield | field |
| testattr | uint |
+-----------+---------+
3 rows in set (0.00 sec)
mysql> SELECT * FROM plain WHERE MATCH('test');
+------+--------+----------+------------+
| id | weight | group_id | date_added |
+------+--------+----------+------------+
| 1 | 1304 | 1 | 1313643256 |
| 2 | 1304 | 1 | 1313643256 |
| 3 | 1304 | 1 | 1313643256 |
| 4 | 1304 | 1 | 1313643256 |
+------+--------+----------+------------+
4 rows in set (0.00 sec)
mysql> ATTACH TABLE plain TO TABLE rt;
Query OK, 0 rows affected (0.00 sec)
mysql> DESC rt;
+------------+-----------+
| Field | Type |
+------------+-----------+
| id | integer |
| title | field |
| content | field |
| group_id | uint |
| date_added | timestamp |
+------------+-----------+
5 rows in set (0.00 sec)
mysql> SELECT * FROM rt WHERE MATCH('test');
+------+--------+----------+------------+
| id | weight | group_id | date_added |
+------+--------+----------+------------+
| 1 | 1304 | 1 | 1313643256 |
| 2 | 1304 | 1 | 1313643256 |
| 3 | 1304 | 1 | 1313643256 |
| 4 | 1304 | 1 | 1313643256 |
+------+--------+----------+------------+
4 rows in set (0.00 sec)
mysql> SELECT * FROM plain WHERE MATCH('test');
ERROR 1064 (42000): no enabled local indexes to search
If you decide to migrate from Plain mode to RT mode or in some other cases, real-time and percolate tables built in Plain mode can be imported to Manticore running in RT mode using the IMPORT TABLE
statement. The general syntax is as follows:
IMPORT TABLE table_name FROM 'path'
where the 'path' parameter must be set as: /your_backup_folder/your_backup_name/data/your_table_name/your_table_name
mysql -P9306 -h0 -e 'create table t(f text)'
mysql -P9306 -h0 -e "backup table t to /tmp/"
mysql -P9306 -h0 -e "drop table t"
BACKUP_NAME=$(ls /tmp | grep 'backup-' | tail -n 1)
mysql -P9306 -h0 -e "import table t from '/tmp/$BACKUP_NAME/data/t/t'
mysql -P9306 -h0 -e "show tables"
Executing this command makes all the table files of the specified table copied to data_dir. All the external table files such as wordforms, exceptions and stopwords are also copied to the same data_dir
.
IMPORT TABLE
has the following limitations:
If the above method for migrating a plain table to an RT table is not possible, you may use indexer --print-rt
to dump data from a plain table directly without the need to convert it to an RT type table and then import the dump into an RT table right from the command line.
This method has a few limitations though:
/usr/bin/indexer --rotate --config /etc/manticoresearch/manticore.conf --print-rt my_rt_index my_plain_index > /tmp/dump_regular.sql
mysql -P $9306 -h0 -e "truncate table my_rt_index"
mysql -P 9306 -h0 < /tmp/dump_regular.sql
rm /tmp/dump_regular.sql
Table rotation is a procedure in which the searchd server looks for new versions of defined tables in the configuration. Rotation is supported only in Plain mode of operation.
There can be two cases:
In the first case, the indexer cannot put the new version of the table online as the running copy is locked and loaded by searchd
. In this case indexer
needs to be called with the --rotate parameter. If rotate is used, indexer
creates new table files with .new.
in their names and sends a HUP signal to searchd
informing it about the new version. The searchd
will perform a lookup and will put the new version of the table in place and discard the old one. In some cases, it might be desired to create the new version of the table but not perform rotate as soon as possible. For example, it might be desired to first check the health of the new table versions. In this case, indexer
can accept the--nohup
parameter which will forbid sending the HUP signal to the server.
New tables can be loaded by rotation; however, the regular handling of the HUP signal is to check for new tables only if the configuration has changed since server startup. If the table was already defined in the configuration, the table should be first created by running indexer
without rotation and perform the RELOAD TABLES statement instead.
There are also two specialized statements that can be used to perform rotations on tables:
RELOAD TABLE tbl [ FROM '/path/to/table_files' [ OPTION switchover=1 ] ];
The RELOAD TABLE
command enables table rotation via SQL.
This command functions in three modes. In the first mode, without specifying a path, the Manticore server checks for new table files in the directory indicated by the path. New table files must be named as tbl.new.sp?
.
If you specify a path, the server searches for table files in that directory, relocates them to the table path, renames them from tbl.sp?
to tbl.new.sp?
, and rotates them.
The third mode, activated by OPTION switchover=1
, switches the index to the new path. Here, the daemon tries to load the table directly from the new path without moving the files. If loading is successful, this new index supersedes the old one.
Also, the daemon writes a unique link file (tbl.link
) in the directory specified by path, maintaining persistent redirection.
If you revert a redirected index to the path specified in the configuration, the daemon will detect this and delete the link file.
Once redirected, the daemon retrieves the table from the newly linked path. When rotating, it looks for new table versions at the newly redirected path. Bear in mind, the daemon checks the configuration for common errors, like duplicate paths across different tables. However, it won't identify if multiple tables point to the same path via redirection. Under normal operations, tables are locked with the .spl
file, but disabling the lock may cause problems. If there's an error (e.g., the path is inaccessible for any reason), you should manually correct (or simply delete) the link file.
indextool
follows the link file, but other tools (indexer
, index_converter
, etc.) do not recognize the link file and consistently use the path defined in the configuration file, ignoring any redirection. Thus, you can inspect the index with indextool
, and it will read from the new location. However, more complex operations like merging will not acknowledge any link file.
mysql> RELOAD TABLE plain_table;
mysql> RELOAD TABLE plain_table FROM '/home/mighty/new_table_files';
mysql> RELOAD TABLE plain_table FROM '/home/mighty/new/place/for/table/table_files' OPTION switchover=1;
RELOAD TABLES;
This command functions similarly to the HUP system signal, triggering a rotation of tables. Nevertheless, it doesn't exactly mirror the typical HUP signal (which can come from a kill -HUP
command or indexer --rotate
). This command actively searches for any tables needing rotation and is capable of re-reading the configuration. Suppose you launch Manticore in plain mode with a config file that points to a nonexistent plain table. If you then attempt to indexer --rotate
the table, the new table won't be recognized by the server until you execute RELOAD TABLES
or restart the server.
Depending on the value of the seamless_rotate setting, new queries might be shortly stalled, and clients will receive temporary errors.
mysql> RELOAD TABLES;
Query OK, 0 rows affected (0.01 sec)
The rotate assumes old table version is discarded and new table version is loaded and replaces the existing one. During this swapping, the server needs to also serve incoming queries made on the table that is going to be updated. To avoid stalls of the queries, the server implements a seamless rotate of the table by default, as described below.
Tables may contain data that needs to be precached in RAM. At the moment, .spa
, .spb
, .spi
and .spm
files are fully precached (they contain attribute data, blob attribute data, keyword table, and killed row map, respectively). Without seamless rotate, rotating a table tries to use as little RAM as possible and works as follows:
searchd
waits for all currently running queries to finish.searchd
resumes serving queries from the new table.However, if there's a lot of attribute or dictionary data, then the preloading step could take noticeable time - up to several minutes in case of preloading 1-5+ GB files.
With seamless rotate enabled, rotation works as follows:
Seamless rotate comes at the cost of higher peak memory usage during the rotation (because both old and new copies of .spa/.spb/.spi/.spm
data need to be in RAM while preloading the new copy). However, average usage stays the same.
Example:
seamless_rotate = 1
You can modify existing data in an RT or PQ table by either updating or replacing it.
UPDATE replaces row-wise attribute values of existing documents with new values. Full-text fields and columnar attributes cannot be updated. If you need to change the content of a full-text field or columnar attributes, use REPLACE.
REPLACE works similarly to INSERT except that if an old document has the same ID as the new document, the old document is marked as deleted before the new document is inserted. Note that the old document does not get physically deleted from the table. The deletion can only happen when chunks are merged in a table, e.g., as a result of an OPTIMIZE.
REPLACE
works similarly to INSERT, but it marks the previous document with the same ID as deleted before inserting a new one.
If you are looking for in-place updates, please see this section.
The syntax of the SQL REPLACE
statement is as follows:
To replace the whole document:
REPLACE INTO table [(column1, column2, ...)]
VALUES (value1, value2, ...)
[, (...)]
To replace only selected fields:
REPLACE INTO table
SET field1=value1[, ..., fieldN=valueN]
WHERE id = <id>
Note, you can filter only by id in this mode.
See the examples for more details.
/replace
:
POST /replace
{
"index": "<table name>",
"id": <document id>,
"doc":
{
"<field1>": <value1>,
...
"<fieldN>": <valueN>
}
}
/index
is an alias endpoint and works the same.
Elasticsearch-like endpoint <table>/_doc/<id>
:
PUT/POST /<table name>/_doc/<id>
{
"<field1>": <value1>,
...
"<fieldN>": <valueN>
}
Partial replace:
POST /<table name>/_update/<id>
{
"<field1>": <value1>,
...
"<fieldN>": <valueN>
}
See the examples for more details.
REPLACE INTO products VALUES(1, "document one", 10);
Query OK, 1 row affected (0.00 sec)
REPLACE INTO products SET description='HUAWEI Matebook 15', price=10 WHERE id = 55;
Query OK, 1 row affected (0.00 sec)
POST /replace
-H "Content-Type: application/x-ndjson" -d '
{
"index":"products",
"id":1,
"doc":
{
"title":"product one",
"price":10
}
}
'
{
"_index":"products",
"_id":1,
"created":false,
"result":"updated",
"status":200
}
PUT /products/_doc/2
{
"title": "product two",
"price": 20
}
POST /products/_doc/3
{
"title": "product three",
"price": 10
}
{
"_id":2,
"_index":"products",
"_primary_term":1,
"_seq_no":0,
"_shards":{
"failed":0,
"successful":1,
"total":1
},
"_type":"_doc",
"_version":1,
"result":"updated"
}
{
"_id":3,
"_index":"products",
"_primary_term":1,
"_seq_no":0,
"_shards":{
"failed":0,
"successful":1,
"total":1
},
"_type":"_doc",
"_version":1,
"result":"updated"
}
POST /products/_update/55
{
"doc": {
"description": "HUAWEI Matebook 15",
"price": 10
}
}
{
"_index":"products",
"updated":1
}
$index->replaceDocument([
'title' => 'document one',
'price' => 10
],1);
Array(
[_index] => products
[_id] => 1
[created] => false
[result] => updated
[status] => 200
)
indexApi.replace({"index" : "products", "id" : 1, "doc" : {"title" : "document one","price":10}})
{'created': False,
'found': None,
'id': 1,
'index': 'products',
'result': 'updated'}
res = await indexApi.replace({"index" : "products", "id" : 1, "doc" : {"title" : "document one","price":10}});
{"_index":"products","_id":1,"result":"updated"}
docRequest = new InsertDocumentRequest();
HashMap<String,Object> doc = new HashMap<String,Object>(){{
put("title","document one");
put("price",10);
}};
docRequest.index("products").id(1L).setDoc(doc);
sqlresult = indexApi.replace(docRequest);
class SuccessResponse {
index: products
id: 1
created: false
result: updated
found: null
}
Dictionary<string, Object> doc = new Dictionary<string, Object>();
doc.Add("title", "document one");
doc.Add("price", 10);
InsertDocumentRequest docRequest = new InsertDocumentRequest(index: "products", id: 1, doc: doc);
var sqlresult = indexApi.replace(docRequest);
class SuccessResponse {
index: products
id: 1
created: false
result: updated
found: null
}
res = await indexApi.replace({
index: 'test',
id: 1,
doc: { content: 'Text 11', name: 'Doc 11', cat: 3 },
});
{
"_index":"test",
"_id":1,
"created":false
"result":"updated"
"status":200
}
replaceDoc := map[string]interface{} {"content": "Text 11", "name": "Doc 11", "cat": 3}
replaceRequest := manticoreclient.NewInsertDocumentRequest("test", replaceDoc)
replaceRequest.SetId(1)
res, _, _ := apiClient.IndexAPI.Replace(context.Background()).InsertDocumentRequest(*replaceRequest).Execute()
{
"_index":"test",
"_id":1,
"created":false
"result":"updated"
"status":200
}
REPLACE
is available for real-time and percolate tables. You can't replace data in a plain table.
When you run a REPLACE
, the previous document is not removed, but it's marked as deleted, so the table size grows until chunk merging happens. To force a chunk merge, use the OPTIMIZE statement.
You can replace multiple documents at once. Check bulk adding documents for more information.
REPLACE INTO products(id,title,tag) VALUES (1, 'doc one', 10), (2,' doc two', 20);
Query OK, 2 rows affected (0.00 sec)
POST /bulk
-H "Content-Type: application/x-ndjson" -d '
{ "replace" : { "index" : "products", "id":1, "doc": { "title": "doc one", "tag" : 10 } } }
{ "replace" : { "index" : "products", "id":2, "doc": { "title": "doc two", "tag" : 20 } } }
'
{
"items":
[
{
"replace":
{
"_index":"products",
"_id":1,
"created":false,
"result":"updated",
"status":200
}
},
{
"replace":
{
"_index":"products",
"_id":2,
"created":false,
"result":"updated",
"status":200
}
}
],
"errors":false
}
$index->replaceDocuments([
[
'id' => 1,
'title' => 'document one',
'tag' => 10
],
[
'id' => 2,
'title' => 'document one',
'tag' => 20
]
);
Array(
[items] =>
Array(
Array(
[_index] => products
[_id] => 2
[created] => false
[result] => updated
[status] => 200
)
Array(
[_index] => products
[_id] => 2
[created] => false
[result] => updated
[status] => 200
)
)
[errors => false
)
indexApi = manticoresearch.IndexApi(client)
docs = [ \
{"replace": {"index" : "products", "id" : 1, "doc" : {"title" : "document one"}}}, \
{"replace": {"index" : "products", "id" : 2, "doc" : {"title" : "document two"}}} ]
api_resp = indexApi.bulk('\n'.join(map(json.dumps,docs)))
{'error': None,
'items': [{u'replace': {u'_id': 1,
u'_index': u'products',
u'created': False,
u'result': u'updated',
u'status': 200}},
{u'replace': {u'_id': 2,
u'_index': u'products',
u'created': False,
u'result': u'updated',
u'status': 200}}]}
docs = [
{"replace": {"index" : "products", "id" : 1, "doc" : {"title" : "document one"}}},
{"replace": {"index" : "products", "id" : 2, "doc" : {"title" : "document two"}}} ];
res = await indexApi.bulk(docs.map(e=>JSON.stringify(e)).join('\n'));
{"items":[{"replace":{"_index":"products","_id":1,"created":false,"result":"updated","status":200}},{"replace":{"_index":"products","_id":2,"created":false,"result":"updated","status":200}}],"errors":false}
body = "{\"replace\": {\"index\" : \"products\", \"id\" : 1, \"doc\" : {\"title\" : \"document one\"}}}" +"\n"+
"{\"replace\": {\"index\" : \"products\", \"id\" : 2, \"doc\" : {\"title\" : \"document two\"}}}"+"\n" ;
indexApi.bulk(body);
class BulkResponse {
items: [{replace={_index=products, _id=1, created=false, result=updated, status=200}}, {replace={_index=products, _id=2, created=false, result=updated, status=200}}]
error: null
additionalProperties: {errors=false}
}
string body = "{\"replace\": {\"index\" : \"products\", \"id\" : 1, \"doc\" : {\"title\" : \"document one\"}}}" +"\n"+
"{\"replace\": {\"index\" : \"products\", \"id\" : 2, \"doc\" : {\"title\" : \"document two\"}}}"+"\n" ;
indexApi.Bulk(body);
class BulkResponse {
items: [{replace={_index=products, _id=1, created=false, result=updated, status=200}}, {replace={_index=products, _id=2, created=false, result=updated, status=200}}]
error: null
additionalProperties: {errors=false}
}
replaceDocs = [
{
replace: {
index: 'test',
id: 1,
doc: { content: 'Text 11', cat: 1, name: 'Doc 11' },
},
},
{
replace: {
index: 'test',
id: 2,
doc: { content: 'Text 22', cat: 9, name: 'Doc 22' },
},
},
];
res = await indexApi.bulk(
replaceDocs.map((e) => JSON.stringify(e)).join("\n")
);
{
"items":
[
{
"replace":
{
"_index":"test",
"_id":1,
"created":false,
"result":"updated",
"status":200
}
},
{
"replace":
{
"_index":"test",
"_id":2,
"created":false,
"result":"updated",
"status":200
}
}
],
"errors":false
}
body := "{\"replace\": {\"index\": \"test\", \"id\": 1, \"doc\": {\"content\": \"Text 11\", \"name\": \"Doc 11\", \"cat\": 1 }}}" + "\n" +
"{\"replace\": {\"index\": \"test\", \"id\": 2, \"doc\": {\"content\": \"Text 22\", \"name\": \"Doc 22\", \"cat\": 9 }}}" +"\n";
res, _, _ := apiClient.IndexAPI.Bulk(context.Background()).Body(body).Execute()
{
"items":
[
{
"replace":
{
"_index":"test",
"_id":1,
"created":false,
"result":"updated",
"status":200
}
},
{
"replace":
{
"_index":"test",
"_id":2,
"created":false,
"result":"updated",
"status":200
}
}
],
"errors":false
}
UPDATE changes row-wise attribute values of existing documents in a specified table with new values. Note that you can't update the contents of a fulltext field or a columnar attribute. If there's such a need, use REPLACE.
Attribute updates are supported for RT, PQ, and plain tables. All attribute types can be updated as long as they are stored in the traditional row-wise storage.
Note that the document ID cannot be updated.
Note that when you update an attribute, its secondary index gets disabled, so consider replacing the document instead.
UPDATE products SET enabled=0 WHERE id=10;
Query OK, 1 row affected (0.00 sec)
POST /update
{
"index":"products",
"id":10,
"doc":
{
"enabled":0
}
}
{
"_index":"products",
"updated":1
}
$index->updateDocument([
'enabled'=>0
],10);
Array(
[_index] => products
[_id] => 10
[result] => updated
)
indexApi = api = manticoresearch.IndexApi(client)
indexApi.update({"index" : "products", "id" : 1, "doc" : {"price":10}})
{'id': 1, 'index': 'products', 'result': 'updated', 'updated': None}
res = await indexApi.update({"index" : "products", "id" : 1, "doc" : {"price":10}});
{"_index":"products","_id":1,"result":"updated"}
UpdateDocumentRequest updateRequest = new UpdateDocumentRequest();
doc = new HashMap<String,Object >(){{
put("price",10);
}};
updateRequest.index("products").id(1L).setDoc(doc);
indexApi.update(updateRequest);
class UpdateResponse {
index: products
updated: null
id: 1
result: updated
}
Dictionary<string, Object> doc = new Dictionary<string, Object>();
doc.Add("price", 10);
UpdateDocumentRequest updateRequest = new UpdateDocumentRequest(index: "products", id: 1, doc: doc);
indexApi.Update(updateRequest);
class UpdateResponse {
index: products
updated: null
id: 1
result: updated
}
res = await indexApi.update({ index: "test", id: 1, doc: { cat: 10 } });
{
"_index":"test",
"_id":1,
"result":"updated"
}
updateDoc = map[string]interface{} {"cat":10}
updateRequest = openapiclient.NewUpdateDocumentRequest("test", updateDoc)
updateRequest.SetId(1)
res, _, _ = apiClient.IndexAPI.Update(context.Background()).UpdateDocumentRequest(*updateRequest).Execute()
{
"_index":"test",
"_id":1,
"result":"updated"
}
Multiple attributes can be updated in a single statement. Example:
UPDATE products
SET price=100000000000,
coeff=3465.23,
tags1=(3,6,4),
tags2=()
WHERE MATCH('phone') AND enabled=1;
Query OK, 148 rows affected (0.0 sec)
POST /update
{
"index":"products",
"doc":
{
"price":100000000000,
"coeff":3465.23,
"tags1":[3,6,4],
"tags2":[]
},
"query":
{
"match": { "*": "phone" },
"equals": { "enabled": 1 }
}
}
{
"_index":"products",
"updated":148
}
$query= new BoolQuery();
$query->must(new Match('phone','*'));
$query->must(new Equals('enabled',1));
$index->updateDocuments([
'price' => 100000000000,
'coeff' => 3465.23,
'tags1' => [3,6,4],
'tags2' => []
],
$query
);
Array(
[_index] => products
[updated] => 148
)
indexApi = api = manticoresearch.IndexApi(client)
indexApi.update({"index" : "products", "id" : 1, "doc" : {
"price": 100000000000,
"coeff": 3465.23,
"tags1": [3,6,4],
"tags2": []}})
{'id': 1, 'index': 'products', 'result': 'updated', 'updated': None}
res = await indexApi.update({"index" : "products", "id" : 1, "doc" : {
"price": 100000000000,
"coeff": 3465.23,
"tags1": [3,6,4],
"tags2": []}});
{"_index":"products","_id":1,"result":"updated"}
UpdateDocumentRequest updateRequest = new UpdateDocumentRequest();
doc = new HashMap<String,Object >(){{
put("price",10);
put("coeff",3465.23);
put("tags1",new int[]{3,6,4});
put("tags2",new int[]{});
}};
updateRequest.index("products").id(1L).setDoc(doc);
indexApi.update(updateRequest);
class UpdateResponse {
index: products
updated: null
id: 1
result: updated
}
Dictionary<string, Object> doc = new Dictionary<string, Object>();
doc.Add("price", 10);
doc.Add("coeff", 3465.23);
doc.Add("tags1", new List<int> {3,6,4});
doc.Add("tags2", new List<int> {});
UpdateDocumentRequest updateRequest = new UpdateDocumentRequest(index: "products", id: 1, doc: doc);
indexApi.Update(updateRequest);
class UpdateResponse {
index: products
updated: null
id: 1
result: updated
}
res = await indexApi.update({ index: "test", id: 1, doc: { name: "Doc 21", cat: "10" } });
{
"_index":"test",
"_id":1,
"result":"updated"
}
updateDoc = map[string]interface{} {"name":"Doc 21", "cat":10}
updateRequest = manticoreclient.NewUpdateDocumentRequest("test", updateDoc)
updateRequest.SetId(1)
res, _, _ = apiClient.IndexAPI.Update(context.Background()).UpdateDocumentRequest(*updateRequest).Execute()
{
"_index":"test",
"_id":1,
"result":"updated"
}
When assigning out-of-range values to 32-bit attributes, they will be trimmed to their lower 32 bits without a prompt. For example, if you try to update the 32-bit unsigned int with a value of 4294967297, the value of 1 will actually be stored, because the lower 32 bits of 4294967297 (0x100000001 in hex) amount to 1 (0x00000001 in hex).
UPDATE
can be used to perform partial JSON updates on numeric data types or arrays of numeric data types. Just make sure you don't update an integer value with a float value as it will be rounded off.
insert into products (id, title, meta) values (1,'title','{"tags":[1,2,3]}');
update products set meta.tags[0]=100 where id=1;
Query OK, 1 row affected (0.00 sec)
Query OK, 1 row affected (0.00 sec)
POST /insert
{
"index":"products",
"id":100,
"doc":
{
"title":"title",
"meta": {
"tags":[1,2,3]
}
}
}
POST /update
{
"index":"products",
"id":100,
"doc":
{
"meta.tags[0]":100
}
}
{
"_index":"products",
"_id":100,
"created":true,
"result":"created",
"status":201
}
{
"_index":"products",
"updated":1
}
$index->insertDocument([
'title' => 'title',
'meta' => ['tags' => [1,2,3]]
],1);
$index->updateDocument([
'meta.tags[0]' => 100
],1);
Array(
[_index] => products
[_id] => 1
[created] => true
[result] => created
)
Array(
[_index] => products
[updated] => 1
)
indexApi = api = manticoresearch.IndexApi(client)
indexApi.update({"index" : "products", "id" : 1, "doc" : {
"meta.tags[0]": 100}})
{'id': 1, 'index': 'products', 'result': 'updated', 'updated': None}
res = await indexApi.update({"index" : "products", "id" : 1, "doc" : {
"meta.tags[0]": 100}});
{"_index":"products","_id":1,"result":"updated"}
UpdateDocumentRequest updateRequest = new UpdateDocumentRequest();
doc = new HashMap<String,Object >(){{
put("meta.tags[0]",100);
}};
updateRequest.index("products").id(1L).setDoc(doc);
indexApi.update(updateRequest);
class UpdateResponse {
index: products
updated: null
id: 1
result: updated
}
Dictionary<string, Object> doc = new Dictionary<string, Object>();
doc.Add("meta.tags[0]", 100);
UpdateDocumentRequest updateRequest = new UpdateDocumentRequest(index: "products", id: 1, doc: doc);
indexApi.Update(updateRequest);
class UpdateResponse {
index: products
updated: null
id: 1
result: updated
}
res = await indexApi.update({"index" : "test", "id" : 1, "doc" : { "meta.tags[0]": 100} });
{"_index":"test","_id":1,"result":"updated"}
updateDoc = map[string]interface{} {"meta.tags[0]":100}
updateRequest = manticoreclient.NewUpdateDocumentRequest("test", updateDoc)
updateRequest.SetId(1)
res, _, _ = apiClient.IndexAPI.Update(context.Background()).UpdateDocumentRequest(*updateRequest).Execute()
{
"_index":"test",
"_id":1,
"result":"updated"
}
Updating other data types or changing property type in a JSON attribute requires a full JSON update.
insert into products values (1,'title','{"tags":[1,2,3]}');
update products set data='{"tags":["one","two","three"]}' where id=1;
Query OK, 1 row affected (0.00 sec)
Query OK, 1 row affected (0.00 sec)
POST /insert
{
"index":"products",
"id":1,
"doc":
{
"title":"title",
"data":"{\"tags\":[1,2,3]}"
}
}
POST /update
{
"index":"products",
"id":1,
"doc":
{
"data":"{\"tags\":[\"one\",\"two\",\"three\"]}"
}
}
{
"_index":"products",
"updated":1
}
$index->insertDocument([
'title'=> 'title',
'data' => [
'tags' => [1,2,3]
]
],1);
$index->updateDocument([
'data' => [
'one', 'two', 'three'
]
],1);
Array(
[_index] => products
[_id] => 1
[created] => true
[result] => created
)
Array(
[_index] => products
[updated] => 1
)
indexApi.insert({"index" : "products", "id" : 100, "doc" : {"title" : "title", "meta" : {"tags":[1,2,3]}}})
indexApi.update({"index" : "products", "id" : 100, "doc" : {"meta" : {"tags":['one','two','three']}}})
{'created': True,
'found': None,
'id': 100,
'index': 'products',
'result': 'created'}
{'id': 100, 'index': 'products', 'result': 'updated', 'updated': None}
res = await indexApi.insert({"index" : "products", "id" : 100, "doc" : {"title" : "title", "meta" : {"tags":[1,2,3]}}});
res = await indexApi.update({"index" : "products", "id" : 100, "doc" : {"meta" : {"tags":['one','two','three']}}});
{"_index":"products","_id":100,"created":true,"result":"created"}
{"_index":"products","_id":100,"result":"updated"}
InsertDocumentRequest newdoc = new InsertDocumentRequest();
doc = new HashMap<String,Object>(){{
put("title","title");
put("meta",
new HashMap<String,Object>(){{
put("tags",new int[]{1,2,3});
}});
}};
newdoc.index("products").id(100L).setDoc(doc);
indexApi.insert(newdoc);
updatedoc = new UpdateDocumentRequest();
doc = new HashMap<String,Object >(){{
put("meta",
new HashMap<String,Object>(){{
put("tags",new String[]{"one","two","three"});
}});
}};
updatedoc.index("products").id(100L).setDoc(doc);
indexApi.update(updatedoc);
class SuccessResponse {
index: products
id: 100
created: true
result: created
found: null
}
class UpdateResponse {
index: products
updated: null
id: 100
result: updated
}
Dictionary<string, Object> meta = new Dictionary<string, Object>();
meta.Add("tags", new List<int> {1,2,3});
Dictionary<string, Object> doc = new Dictionary<string, Object>();
doc.Add("title", "title");
doc.Add("meta", meta);
InsertDocumentRequest newdoc = new InsertDocumentRequest(index: "products", id: 100, doc: doc);
indexApi.Insert(newdoc);
meta = new Dictionary<string, Object>();
meta.Add("tags", new List<string> {"one","two","three"});
doc = new Dictionary<string, Object>();
doc.Add("meta", meta);
UpdateDocumentRequest updatedoc = new UpdateDocumentRequest(index: "products", id: 100, doc: doc);
indexApi.Update(updatedoc);
class SuccessResponse {
index: products
id: 100
created: true
result: created
found: null
}
class UpdateResponse {
index: products
updated: null
id: 100
result: updated
}
res = await indexApi.insert({
index: 'test',
id: 1,
doc: { content: 'Text 1', name: 'Doc 1', meta: { tags:[1,2,3] } }
})
res = await indexApi.update({ index: 'test', id: 1, doc: { meta: { tags:['one','two','three'] } } });
{
"_index":"test",
"_id":1,
"created":true,
"result":"created"
}
{
"_index":"test",
"_id":1,
"result":"updated"
}
metaField := map[string]interface{} {"tags": []int{1, 2, 3}}
insertDoc := map[string]interface{} {"name": "Doc 1", "meta": metaField}}
insertRequest := manticoreclient.NewInsertDocumentRequest("test", insertDoc)
insertRequest.SetId(1)
res, _, _ := apiClient.IndexAPI.Insert(context.Background()).InsertDocumentRequest(*insertRequest).Execute();
metaField = map[string]interface{} {"tags": []string{"one", "two", "three"}}
updateDoc := map[string]interface{} {"meta": metaField}
updateRequest := manticoreclient.NewUpdateDocumentRequest("test", updateDoc)
res, _, _ = apiClient.IndexAPI.Update(context.Background()).UpdateDocumentRequest(*updateRequest).Execute()
{
"_index":"test",
"_id":1,
"created":true,
"result":"created"
}
{
"_index":"test",
"_id":1,
"result":"updated"
}
When using replication, the table name should be prepended with cluster_name:
(in SQL) so that updates will be propagated to all nodes in the cluster. For queries via HTTP, you should set a cluster
property. See setting up replication for more information.
{
"cluster":"nodes4",
"index":"test",
"id":1,
"doc":
{
"gid" : 100,
"price" : 1000
}
}
update weekly:posts set enabled=0 where id=1;
POST /update
{
"cluster":"weekly",
"index":"products",
"id":1,
"doc":
{
"enabled":0
}
}
$index->setName('products')->setCluster('weekly');
$index->updateDocument(['enabled'=>0],1);
indexApi.update({"cluster":"weekly", "index" : "products", "id" : 1, "doc" : {"enabled" : 0}})
res = wait indexApi.update({"cluster":"weekly", "index" : "products", "id" : 1, "doc" : {"enabled" : 0}});
updatedoc = new UpdateDocumentRequest();
doc = new HashMap<String,Object >(){{
put("enabled",0);
}};
updatedoc.index("products").cluster("weekly").id(1L).setDoc(doc);
indexApi.update(updatedoc);
Dictionary<string, Object> doc = new Dictionary<string, Object>();
doc.Add("enabled", 0);
UpdateDocumentRequest updatedoc = new UpdateDocumentRequest(index: "products", cluster: "weekly", id: 1, doc: doc);
indexApi.Update(updatedoc);
res = wait indexApi.update( {cluster: 'test_cluster', index : 'test', id : 1, doc : {name : 'Doc 11'}} );
updateDoc = map[string]interface{} {"name":"Doc 11"}
updateRequest = manticoreclient.NewUpdateDocumentRequest("test", updateDoc)
updateRequest.SetCluster("test_cluster")
updateRequest.SetId(1)
res, _, _ = apiClient.IndexAPI.Update(context.Background()).UpdateDocumentRequest(*updateRequest).Execute()
Here is the syntax for the SQL UPDATE
statement:
UPDATE table SET col1 = newval1 [, ...] WHERE where_condition [OPTION opt_name = opt_value [, ...]] [FORCE|IGNORE INDEX(id)]
where_condition
has the same syntax as in the SELECT statement.
Multi-value attribute value sets must be specified as comma-separated lists in parentheses. To remove all values from a multi-value attribute, just assign ()
to it.
UPDATE products SET tags1=(3,6,4) WHERE id=1;
UPDATE products SET tags1=() WHERE id=1;
Query OK, 1 row affected (0.00 sec)
Query OK, 1 row affected (0.00 sec)
POST /update
{
"index":"products",
"_id":1,
"doc":
{
"tags1": []
}
}
{
"_index":"products",
"updated":1
}
$index->updateDocument(['tags1'=>[]],1);
Array(
[_index] => products
[updated] => 1
)
indexApi.update({"index" : "products", "id" : 1, "doc" : {"tags1": []}})
{'id': 1, 'index': 'products', 'result': 'updated', 'updated': None}
indexApi.update({"index" : "products", "id" : 1, "doc" : {"tags1": []}})
{"_index":"products","_id":1,"result":"updated"}
updatedoc = new UpdateDocumentRequest();
doc = new HashMap<String,Object >(){{
put("tags1",new int[]{});
}};
updatedoc.index("products").id(1L).setDoc(doc);
indexApi.update(updatedoc);
class UpdateResponse {
index: products
updated: null
id: 1
result: updated
}
Dictionary<string, Object> doc = new Dictionary<string, Object>();
doc.Add("tags1", new List<int> {});
UpdateDocumentRequest updatedoc = new UpdateDocumentRequest(index: "products", id: 1, doc: doc);
indexApi.Update(updatedoc);
class UpdateResponse {
index: products
updated: null
id: 1
result: updated
}
res = await indexApi.update({ index: 'test', id: 1, doc: { cat: 10 } });
{
"_index":"test",
"_id":1,
"result":"updated"
}
updateDoc = map[string]interface{} {"cat":10}
updateRequest = manticoreclient.NewUpdateDocumentRequest("test", updateDoc)
updateRequest.SetId(1)
res, _, _ = apiClient.IndexAPI.Update(context.Background()).UpdateDocumentRequest(*updateRequest).Execute()
{
"_index":"test",
"_id":1,
"result":"updated"
}
OPTION
clause is a Manticore-specific extension that lets you control a number of per-update options. The syntax is:
OPTION <optionname>=<value> [ , ... ]
The options are the same as for the SELECT statement. Specifically for the UPDATE
statement, you can use these options:
UPDATE
will result in an error if the UPDATE
query tries to perform an update on non-numeric properties. With strict=0, if multiple properties are updated and some are not allowed, the UPDATE
will not result in an error and will perform the changes only on allowed properties (with the rest being ignored). If none of the SET
changes of the UPDATE
re permitted, the command will result in an error even with strict=0.In rare cases, Manticore's built-in query analyzer may be incorrect in understanding a query and determining whether a table by ID should be used. This can result in poor performance for queries like UPDATE ... WHERE id = 123
.
For information on how to force the optimizer to use a docid index, see Query optimizer hints.
Updates using HTTP JSON protocol are performed via the /update
endpoint. The syntax is similar to the /insert endpoint, but this time the doc
property is mandatory.
The server will respond with a JSON object stating if the operation was successful or not.
POST /update
{
"index":"test",
"id":1,
"doc":
{
"gid" : 100,
"price" : 1000
}
}
{
"_index": "test",
"_id": 1,
"result": "updated"
}
The ID of the document that needs to be updated can be set directly using the id
property, as shown in the previous example, or you can update documents by query and apply the update to all the documents that match the query:
POST /update
{
"index":"test",
"doc":
{
"price" : 1000
},
"query":
{
"match": { "*": "apple" }
}
}
{
"_index":"products",
"updated":1
}
The query syntax is the same as in the /search endpoint. Note that you can't specify id
and query
at the same time.
FLUSH ATTRIBUTES
The FLUSH ATTRIBUTES command flushes all in-memory attribute updates in all the active tables to disk. It returns a tag that identifies the result on-disk state, which represents the number of actual disk attribute saves performed since the server startup.
mysql> UPDATE testindex SET channel_id=1107025 WHERE id=1;
Query OK, 1 row affected (0.04 sec)
mysql> FLUSH ATTRIBUTES;
+------+
| tag |
+------+
| 1 |
+------+
1 row in set (0.19 sec)
See also attr_flush_period setting.
You can perform multiple update operations in a single call using the /bulk
endpoint. This endpoint only works with data that has Content-Type
set to application/x-ndjson
. The data should be formatted as newline-delimited JSON (NDJSON). Essentially, this means that each line should contain exactly one JSON statement and end with a newline \n
and, possibly, a \r
.
POST /bulk
{ "update" : { "index" : "products", "id" : 1, "doc": { "price" : 10 } } }
{ "update" : { "index" : "products", "id" : 2, "doc": { "price" : 20 } } }
{
"items":
[
{
"update":
{
"_index":"products",
"_id":1,
"result":"updated"
}
},
{
"update":
{
"_index":"products",
"_id":2,
"result":"updated"
}
}
],
"errors":false
}
The /bulk
endpoint supports inserts, replaces, and deletes. Each statement begins with an action type (in this case, update
). Here's a list of the supported actions:
insert
: Inserts a document. The syntax is the same as in the /insert endpoint.create
: a synonym for insert
replace
: Replaces a document. The syntax is the same as in the /replace.index
: a synonym for replace
update
: Updates a document. The syntax is the same as in the /update.delete
: Deletes a document. The syntax is the same as in the /delete endpoint.Updates by query and deletes by query are also supported.
POST /bulk
{ "update" : { "index" : "products", "doc": { "coeff" : 1000 }, "query": { "range": { "price": { "gte": 1000 } } } } }
{ "update" : { "index" : "products", "doc": { "coeff" : 0 }, "query": { "range": { "price": { "lt": 1000 } } } } }
{
"items":
[
{
"update":
{
"_index":"products",
"updated":1
}
},
{
"update":
{
"_index":"products",
"updated":3
}
}
],
"errors":false
}
$client->bulk([
['update'=>[
'index' => 'products',
'doc' => [
'coeff' => 100
],
'query' => [
'range' => ['price'=>['gte'=>1000]]
]
]
],
['update'=>[
'index' => 'products',
'doc' => [
'coeff' => 0
],
'query' => [
'range' => ['price'=>['lt'=>1000]]
]
]
]
]);
Array(
[items] => Array (
Array(
[update] => Array(
[_index] => products
[updated] => 1
)
)
Array(
[update] => Array(
[_index] => products
[updated] => 3
)
)
)
docs = [ \
{ "update" : { "index" : "products", "doc": { "coeff" : 1000 }, "query": { "range": { "price": { "gte": 1000 } } } } }, \
{ "update" : { "index" : "products", "doc": { "coeff" : 0 }, "query": { "range": { "price": { "lt": 1000 } } } } } ]
indexApi.bulk('\n'.join(map(json.dumps,docs)))
{'error': None,
'items': [{u'update': {u'_index': u'products', u'updated': 1}},
{u'update': {u'_index': u'products', u'updated': 3}}]}
docs = [
{ "update" : { "index" : "products", "doc": { "coeff" : 1000 }, "query": { "range": { "price": { "gte": 1000 } } } } },
{ "update" : { "index" : "products", "doc": { "coeff" : 0 }, "query": { "range": { "price": { "lt": 1000 } } } } } ];
res = await indexApi.bulk(docs.map(e=>JSON.stringify(e)).join('\n'));
{"items":[{"update":{"_index":"products","updated":1}},{"update":{"_index":"products","updated":3}}],"errors":false}
String body = "{ \"update\" : { \"index\" : \"products\", \"doc\": { \"coeff\" : 1000 }, \"query\": { \"range\": { \"price\": { \"gte\": 1000 } } } }} "+"\n"+
"{ \"update\" : { \"index\" : \"products\", \"doc\": { \"coeff\" : 0 }, \"query\": { \"range\": { \"price\": { \"lt\": 1000 } } } } }"+"\n";
indexApi.bulk(body);
class BulkResponse {
items: [{update={_index=products, _id=1, created=false, result=updated, status=200}}, {update={_index=products, _id=2, created=false, result=updated, status=200}}]
error: null
additionalProperties: {errors=false}
}
string body = "{ \"update\" : { \"index\" : \"products\", \"doc\": { \"coeff\" : 1000 }, \"query\": { \"range\": { \"price\": { \"gte\": 1000 } } } }} "+"\n"+
"{ \"update\" : { \"index\" : \"products\", \"doc\": { \"coeff\" : 0 }, \"query\": { \"range\": { \"price\": { \"lt\": 1000 } } } } }"+"\n";
indexApi.Bulk(body);
class BulkResponse {
items: [{update={_index=products, _id=1, created=false, result=updated, status=200}}, {update={_index=products, _id=2, created=false, result=updated, status=200}}]
error: null
additionalProperties: {errors=false}
}
updateDocs = [
{
update: {
index: 'test',
id: 1,
doc: { content: 'Text 11', cat: 1, name: 'Doc 11' },
},
},
{
update: {
index: 'test',
id: 2,
doc: { content: 'Text 22', cat: 9, name: 'Doc 22' },
},
},
];
res = await indexApi.bulk(
updateDocs.map((e) => JSON.stringify(e)).join("\n")
);
{
"items":
[
{
"update":
{
"_index":"test",
"updated":1
}
},
{
"update":
{
"_index":"test",
"updated":1
}
}
],
"errors":false
}
body := "{\"update\": {\"index\": \"test\", \"id\": 1, \"doc\": {\"content\": \"Text 11\", \"name\": \"Doc 11\", \"cat\": 1 }}}" + "\n" +
"{\"update\": {\"index\": \"test\", \"id\": 2, \"doc\": {\"content\": \"Text 22\", \"name\": \"Doc 22\", \"cat\": 9 }}}" +"\n";
res, _, _ := apiClient.IndexAPI.Bulk(context.Background()).Body(body).Execute()
{
"items":
[
{
"update":
{
"_index":"test",
"updated":1
}
},
{
"update":
{
"_index":"test",
"updated":1
}
}
],
"errors":false
}
Keep in mind that the bulk operation stops at the first query that results in an error.
attr_update_reserve=size
attr_update_reserve
is a per-table setting that determines the space reserved for blob attribute updates. This setting is optional, with a default value of 128k.
When blob attributes (MVAs, strings, JSON) are updated, their length may change. If the updated string (or MVA, or JSON) is shorter than the old one, it overwrites the old one in the .spb
file. However, if the updated string is longer, updates are written to the end of the .spb
file. This file is memory-mapped, which means resizing it may be a rather slow process, depending on the OS implementation of memory-mapped files.
To avoid frequent resizes, you can specify the extra space to be reserved at the end of the .spb
file using this option.
create table products(title text, price float) attr_update_reserve = '1M'
POST /cli -d "
create table products(title text, price float) attr_update_reserve = '1M'"
$params = [
'body' => [
'settings' => [
'attr_update_reserve' => '1M'
],
'columns' => [
'title'=>['type'=>'text'],
'price'=>['type'=>'float']
]
],
'index' => 'products'
];
$index = new \Manticoresearch\Index($client);
$index->create($params);
utilsApi.sql('create table products(title text, price float) attr_update_reserve = \'1M\'')
res = await utilsApi.sql('create table products(title text, price float) attr_update_reserve = \'1M\'');
utilsApi.sql("create table products(title text, price float) attr_update_reserve = '1M'");
utilsApi.Sql("create table products(title text, price float) attr_update_reserve = '1M'");
utilsApi.sql("create table test(content text, name string, cat int) attr_update_reserve = '1M'");
apiClient.UtilsAPI.Sql(context.Background()).Body("create table test(content text, name string, cat int) attr_update_reserve = '1M'").Execute()
table products {
attr_update_reserve = 1M
type = rt
path = tbl
rt_field = title
rt_attr_uint = price
}
attr_flush_period = 900 # persist updates to disk every 15 minutes
When updating attributes the changes are first written to in-memory copy of attributes. This setting allows to set the interval between flushing the updates to disk. It defaults to 0, which disables the periodic flushing, but flushing will still occur at normal shut-down.
Deleting documents is only supported in RT mode for the following table types:
You can delete existing documents from a table based on either their ID or certain conditions.
Also, bulk deletion is available to delete multiple documents.
Deletion of documents can be accomplished via both SQL and JSON interfaces.
For SQL, the response for a successful operation will indicate the number of rows deleted.
For JSON, the json/delete
endpoint is used. The server will respond with a JSON object indicating whether the operation was successful and the number of rows deleted.
It is recommended to use table truncation instead of deletion to delete all documents from a table, as it is a much faster operation.
In this example we delete all documents that match full-text query test document
from the table named test
:
mysql> SELECT * FROM TEST;
+------+------+-------------+------+
| id | gid | mva1 | mva2 |
+------+------+-------------+------+
| 100 | 1000 | 100,201 | 100 |
| 101 | 1001 | 101,202 | 101 |
| 102 | 1002 | 102,203 | 102 |
| 103 | 1003 | 103,204 | 103 |
| 104 | 1004 | 104,204,205 | 104 |
| 105 | 1005 | 105,206 | 105 |
| 106 | 1006 | 106,207 | 106 |
| 107 | 1007 | 107,208 | 107 |
+------+------+-------------+------+
8 rows in set (0.00 sec)
mysql> DELETE FROM TEST WHERE MATCH ('test document');
Query OK, 2 rows affected (0.00 sec)
mysql> SELECT * FROM TEST;
+------+------+-------------+------+
| id | gid | mva1 | mva2 |
+------+------+-------------+------+
| 100 | 1000 | 100,201 | 100 |
| 101 | 1001 | 101,202 | 101 |
| 102 | 1002 | 102,203 | 102 |
| 103 | 1003 | 103,204 | 103 |
| 104 | 1004 | 104,204,205 | 104 |
| 105 | 1005 | 105,206 | 105 |
+------+------+-------------+------+
6 rows in set (0.00 sec)
POST /delete -d '
{
"index":"test",
"query":
{
"match": { "*": "test document" }
}
}'
{
"_index":"test",
"deleted":2,
}
$index->deleteDocuments(new MatchPhrase('test document','*'));
Array(
[_index] => test
[deleted] => 2
)
indexApi.delete({"index" : "test", "query": { "match": { "*": "test document" }}})
{'deleted': 5, 'id': None, 'index': 'test', 'result': None}
res = await indexApi.delete({"index" : "test", "query": { "match": { "*": "test document" }}});
{"_index":"test","deleted":5}
DeleteDocumentRequest deleteRequest = new DeleteDocumentRequest();
query = new HashMap<String,Object>();
query.put("match",new HashMap<String,Object>(){{
put("*","test document");
}});
deleteRequest.index("test").setQuery(query);
indexApi.delete(deleteRequest);
class DeleteResponse {
index: test
deleted: 5
id: null
result: null
}
Dictionary<string, Object> match = new Dictionary<string, Object>();
match.Add("*", "test document");
Dictionary<string, Object> query = new Dictionary<string, Object>();
query.Add("match", match);
DeleteDocumentRequest deleteRequest = new DeleteDocumentRequest(index: "test", query: query);
indexApi.Delete(deleteRequest);
class DeleteResponse {
index: test
deleted: 5
id: null
result: null
}
res = await indexApi.delete({
index: 'test',
query: { match: { '*': 'test document' } },
});
{"_index":"test","deleted":5}
deleteRequest := manticoresearch.NewDeleteDocumentRequest("test")
matchExpr := map[string]interface{} {"*": "test document"}
deleteQuery := map[string]interface{} {"match": matchExpr }
deleteRequest.SetQuery(deleteQuery)
{"_index":"test","deleted":5}
Here - deleting a document with id
equalling 1 from the table named test
:
mysql> DELETE FROM TEST WHERE id=1;
Query OK, 1 rows affected (0.00 sec)
POST /delete -d '
{
"index": "test",
"id": 1
}'
{
"_index": "test",
"_id": 1,
"found": true,
"result": "deleted"
}
$index->deleteDocument(1);
Array(
[_index] => test
[_id] => 1
[found] => true
[result] => deleted
)
indexApi.delete({"index" : "test", "id" : 1})
{'deleted': None, 'id': 1, 'index': 'test', 'result': 'deleted'}
res = await indexApi.delete({"index" : "test", "id" : 1});
{"_index":"test","_id":1,"result":"deleted"}
DeleteDocumentRequest deleteRequest = new DeleteDocumentRequest();
deleteRequest.index("test").setId(1L);
indexApi.delete(deleteRequest);
class DeleteResponse {
index: test
_id: 1
result: deleted
}
DeleteDocumentRequest deleteRequest = new DeleteDocumentRequest(index: "test", id: 1);
indexApi.Delete(deleteRequest);
class DeleteResponse {
index: test
_id: 1
result: deleted
}
res = await indexApi.delete({ index: 'test', id: 1 });
{"_index":"test","_id":1,"result":"deleted"}
deleteRequest := manticoresearch.NewDeleteDocumentRequest("test")
deleteRequest.SetId(1)
{"_index":"test","_id":1,"result":"deleted"}
Here, documents with id
matching values from the table named test
are deleted:
Note that the delete forms with id=N
or id IN (X,Y)
are the fastest, as they delete documents without performing a search.
Also note that the response contains only the id of the first deleted document in the corresponding _id
field.
DELETE FROM TEST WHERE id IN (1,2);
Query OK, 2 rows affected (0.00 sec)
POST /delete -d '
{
"index":"test",
"id": [1,2]
}'
{
"_index":"test",
"_id":1,
"found":true,
"result":"deleted"
}
$index->deleteDocumentsByIds([1,2]);
Array(
[_index] => test
[_id] => 1
[found] => true
[result] => deleted
)
Manticore SQL allows to use complex conditions for the DELETE
statement.
For example here we are deleting documents that match full-text query test document
and have attribute mva1
with a value greater than 206 or mva1
values 100 or 103 from table named test
:
DELETE FROM TEST WHERE MATCH ('test document') AND ( mva1>206 or mva1 in (100, 103) );
SELECT * FROM TEST;
Query OK, 4 rows affected (0.00 sec)
+------+------+-------------+------+
| id | gid | mva1 | mva2 |
+------+------+-------------+------+
| 101 | 1001 | 101,202 | 101 |
| 102 | 1002 | 102,203 | 102 |
| 104 | 1004 | 104,204,205 | 104 |
| 105 | 1005 | 105,206 | 105 |
+------+------+-------------+------+
6 rows in set (0.00 sec)
Here is an example of deleting documents in cluster cluster
's table test
. Note that we must provide the cluster name property along with table property to delete a row from a table within a replication cluster:
delete from cluster:test where id=100;
POST /delete -d '
{
"cluster": "cluster",
"index": "test",
"id": 100
}'
$index->setCluster('cluster');
$index->deleteDocument(100);
Array(
[_index] => test
[_id] => 100
[found] => true
[result] => deleted
)
indexApi.delete({"cluster":"cluster","index" : "test", "id" : 1})
{'deleted': None, 'id': 1, 'index': 'test', 'result': 'deleted'}
indexApi.delete({"cluster":"cluster_1","index" : "test", "id" : 1})
{"_index":"test","_id":1,"result":"deleted"}
DeleteDocumentRequest deleteRequest = new DeleteDocumentRequest();
deleteRequest.cluster("cluster").index("test").setId(1L);
indexApi.delete(deleteRequest);
class DeleteResponse {
index: test
_id: 1
result: deleted
}
DeleteDocumentRequest deleteRequest = new DeleteDocumentRequest(index: "test", cluster: "cluster", id: 1);
indexApi.Delete(deleteRequest);
class DeleteResponse {
index: test
_id: 1
result: deleted
}
res = await indexApi.delete({ cluster: 'cluster_1', index: 'test', id: 1 });
{"_index":"test","_id":1,"result":"deleted"}
deleteRequest := manticoresearch.NewDeleteDocumentRequest("test")
deleteRequest.SetCluster("cluster_1")
deleteRequest.SetId(1)
{"_index":"test","_id":1,"result":"deleted"}
You can also perform multiple delete operations in a single call using the /bulk
endpoint. This endpoint only works with data that has Content-Type
set to application/x-ndjson
. The data should be formatted as newline-delimited JSON (NDJSON). Essentially, this means that each line should contain exactly one JSON statement and end with a newline \n
and, possibly, a \r
.
POST /bulk
{ "delete" : { "index" : "test", "id" : 1 } }
{ "delete" : { "index" : "test", "query": { "equals": { "int_data" : 20 } } } }
{
"items":
[
{
"bulk":
{
"_index":"test",
"_id":0,
"created":0,
"deleted":2,
"updated":0,
"result":"created",
"status":201
}
}
],
"errors":false
}
$client->bulk([
['delete' => [
'index' => 'test',
'id' => 1
]
],
['delete'=>[
'index' => 'test',
'query' => [
'equals' => ['int_data' => 20]
]
]
]
]);
Array(
[items] => Array
(
[0] => Array
(
[bulk] => Array
(
[_index] => test
[_id] => 0
[created] => 0
[deleted] => 2
[updated] => 0
[result] => created
[status] => 201
)
)
)
[current_line] => 3
[skipped_lines] => 0
[errors] =>
[error] =>
)
docs = [ \
{ "delete" : { "index" : "test", "id": 1 } }, \
{ "delete" : { "index" : "test", "query": { "equals": { "int_data": 20 } } } } ]
indexApi.bulk('\n'.join(map(json.dumps,docs)))
{
'error': None,
'items': [{u'delete': {u'_index': test', u'deleted': 2}}]
}
docs = [
{ "delete" : { "index" : "test", "id": 1 } },
{ "delete" : { "index" : "test", "query": { "equals": { "int_data": 20 } } } } ];
res = await indexApi.bulk(docs.map(e=>JSON.stringify(e)).join('\n'));
{"items":[{"delete":{"_index":"test","deleted":2}}],"errors":false}
String body = "{ "delete" : { "index" : "test", "id": 1 } } "+"\n"+
"{ "delete" : { "index" : "test", "query": { "equals": { "int_data": 20 } } } }"+"\n";
indexApi.bulk(body);
class BulkResponse {
items: [{delete={_index=test, _id=0, created=false, deleted=2, result=created, status=200}}]
error: null
additionalProperties: {errors=false}
}
string body = "{ "delete" : { "index" : "test", "id": 1 } } "+"\n"+
"{ "delete" : { "index" : "test", "query": { "equals": { "int_data": 20 } } } }"+"\n";
indexApi.Bulk(body);
class BulkResponse {
items: [{replace={_index=test, _id=0, created=false, deleted=2, result=created, status=200}}]
error: null
additionalProperties: {errors=false}
}
docs = [
{ "delete" : { "index" : "test", "id": 1 } },
{ "delete" : { "index" : "test", "query": { "equals": { "int_data": 20 } } } }
];
body = await indexApi.bulk(
docs.map((e) => JSON.stringify(e)).join("\n")
);
res = await indexApi.bulk(body);
{"items":[{"delete":{"_index":"test","deleted":2}}],"errors":false}
docs = []string {
`{ "delete" : { "index" : "test", "id": 1 } }`,
`{ "delete" : { "index" : "test", "query": { "equals": { "int_data": 20 } } } }`
]
body = strings.Join(docs, "\n")
resp, httpRes, err := manticoreclient.IndexAPI.Bulk(context.Background()).Body(body).Execute()
{"items":[{"delete":{"_index":"test","deleted":2}}],"errors":false}
Manticore supports basic transactions for deleting and inserting data into real-time and percolate tables, except when attempting to write to a distributed table which includes a real-time or percolate table. Each change to a table is first saved in an internal changeset and then actually committed to the table. By default, each command is wrapped in an individual automatic transaction, making it transparent: you simply 'insert' something and can see the inserted result after it completes, without worrying about transactions. However, this behavior can be explicitly managed by starting and committing transactions manually.
Transactions are supported for the following commands:
Transactions are not supported for:
Please note that transactions in Manticore do not aim to provide isolation. The purpose of transactions in Manticore is to allow you to accumulate multiple writes and execute them all at once upon commit, or to roll them all back if necessary. Transactions are integrated with binary log for durability and consistency.
SET AUTOCOMMIT = {0 | 1}
SET AUTOCOMMIT
controls the autocommit mode in the active session. AUTOCOMMIT
is set to 1 by default. With the default setting, you don't have to worry about transactions, as every statement that makes any changes to any table is implicitly wrapped in a separate transaction. Setting it to 0 allows you to manage transactions manually, meaning they will not be visible until you explicitly commit them.
Transactions are limited to a single real-time or percolate table and are also limited in size. They are atomic, consistent, overly isolated, and durable. Overly isolated means that the changes are not only invisible to concurrent transactions but even to the current session itself.
START TRANSACTION | BEGIN
COMMIT
ROLLBACK
The BEGIN
statement (or its START TRANSACTION
alias) forcibly commits any pending transaction, if present, and starts a new one.
The COMMIT
statement commits the current transaction, making all its changes permanent.
The ROLLBACK
statement rolls back the current transaction, canceling all its changes.
When using one of the /bulk
JSON endpoints ( bulk insert, bulk replace, bulk delete ), you can force a batch of documents to be committed by adding an empty line after them.
insert into indexrt (id, content, title, channel_id, published) values (1, 'aa', 'blabla', 1, 10);
Query OK, 1 rows affected (0.00 sec)
select * from indexrt where id=1;
+------+------------+-----------+--------+
| id | channel_id | published | title |
+------+------------+-----------+--------+
| 1 | 1 | 10 | blabla |
+------+------------+-----------+--------+
1 row in set (0.00 sec)
The inserted value is immediately visible in the following 'select' statement.
set autocommit=0;
Query OK, 0 rows affected (0.00 sec)
insert into indexrt (id, content, title, channel_id, published) values (3, 'aa', 'bb', 1, 1);
Query OK, 1 row affected (0.00 sec)
insert into indexrt (id, content, title, channel_id, published) values (4, 'aa', 'bb', 1, 1);
Query OK, 1 row affected (0.00 sec)
select * from indexrt where id=3;
Empty set (0.01 sec)
select * from indexrt where id=4;
Empty set (0.00 sec)
In this case, changes are NOT automatically committed. As a result, the insertions are not visible, even in the same session, since they have not been committed. Also, despite the absence of a BEGIN
statement, a transaction is implicitly started.
To make the changes visible, you need to commit the transaction:
commit;
Query OK, 0 rows affected (0.00 sec)
select * from indexrt where id=4;
+------+------------+-----------+-------+
| id | channel_id | published | title |
+------+------------+-----------+-------+
| 4 | 1 | 1 | bb |
+------+------------+-----------+-------+
1 row in set (0.00 sec)
select * from indexrt where id=3;
+------+------------+-----------+-------+
| id | channel_id | published | title |
+------+------------+-----------+-------+
| 3 | 1 | 1 | bb |
+------+------------+-----------+-------+
1 row in set (0.00 sec)
After the commit statement, the insertions are visible in the table.
By using BEGIN
and COMMIT
, you can define the bounds of a transaction explicitly, so there's no need to worry about autocommit in this case.
begin;
Query OK, 0 rows affected (0.00 sec)
insert into indexrt (id, content, title, channel_id, published) values (2, 'aa', 'bb', 1, 1);
Query OK, 1 row affected (0.00 sec)
select * from indexrt where id=2;
Empty set (0.01 sec)
commit;
Query OK, 0 rows affected (0.01 sec)
select * from indexrt where id=2;
+------+------------+-----------+-------+
| id | channel_id | published | title |
+------+------------+-----------+-------+
| 2 | 1 | 1 | bb |
+------+------------+-----------+-------+
1 row in set (0.01 sec)