¶ 12.1.1

Adding documents to a real-time table

If you're looking for information on adding documents to a plain table, please refer to the section on adding data from external storages.

Adding documents in real-time is supported only for Real-Time and percolate tables. The corresponding SQL command, HTTP endpoint, or client functions insert new rows (documents) into a table with the provided field values. It's not necessary for a table to exist before adding documents to it. If the table doesn't exist, Manticore will attempt to create it automatically. For more information, see Auto schema.

You can insert a single or multiple documents with values for all fields of the table or just a portion of them. In this case, the other fields will be filled with their default values (0 for scalar types, an empty string for text types).

Expressions are not currently supported in INSERT, so values must be explicitly specified.

The ID field/value can be omitted, as RT and PQ tables support auto-id functionality. You can also use 0 as the id value to force automatic ID generation. Rows with duplicate IDs will not be overwritten by INSERT. Instead, you can use REPLACE for that purpose.

When using the HTTP JSON protocol, you have two different request formats to choose from: a common Manticore format and an Elasticsearch-like format. Both formats are demonstrated in the examples below.

Additionally, when using the Manticore JSON request format, keep in mind that the doc node is required, and all the values should be provided within it.

SQL

General syntax:

INSERT INTO <table name> [(column, ...)]
VALUES (value, ...)
[, (...)]

INSERT INTO products(title,price) VALUES ('Crossbody Bag with Tassel', 19.85);
INSERT INTO products(title) VALUES ('Crossbody Bag with Tassel');
INSERT INTO products VALUES (0,'Yellow bag', 4.95);

Query OK, 1 rows affected (0.00 sec)
Query OK, 1 rows affected (0.00 sec)
Query OK, 1 rows affected (0.00 sec)

JSON

POST /insert
{
  "index":"products",
  "id":1,
  "doc":
  {
    "title" : "Crossbody Bag with Tassel",
    "price" : 19.85
  }
}

POST /insert
{
  "index":"products",
  "id":2,
  "doc":
  {
    "title" : "Crossbody Bag with Tassel"
  }
}

POST /insert
{
  "index":"products",
  "id":0,
  "doc":
  {
    "title" : "Yellow bag"
  }
}

{
  "_index": "products",
  "_id": 1,
  "created": true,
  "result": "created",
  "status": 201
}
{
  "_index": "products",
  "_id": 2,
  "created": true,
  "result": "created",
  "status": 201
}
{
  "_index": "products",
  "_id": 0,
  "created": true,
  "result": "created",
  "status": 201
}

Elasticsearch

POST /products/_create/3
{
  "title": "Yellow Bag with Tassel",
  "price": 19.85
}

POST /products/_create/
{
  "title": "Red Bag with Tassel",
  "price": 19.85
}

{
"_id":3,
"_index":"products",
"_primary_term":1,
"_seq_no":0,
"_shards":{
    "failed":0,
    "successful":1,
    "total":1
},
"_type":"_doc",
"_version":1,
"result":"updated"
}
{
"_id":2235747273424240642,
"_index":"products",
"_primary_term":1,
"_seq_no":0,
"_shards":{
    "failed":0,
    "successful":1,
    "total":1
},
"_type":"_doc",
"_version":1,
"result":"updated"
}

PHP

$index->addDocuments([
        ['id' => 1, 'title' => 'Crossbody Bag with Tassel', 'price' => 19.85]
]);
$index->addDocuments([
        ['id' => 2, 'title' => 'Crossbody Bag with Tassel']
]);
$index->addDocuments([
        ['id' => 0, 'title' => 'Yellow bag']
]);

Python

indexApi.insert({"index" : "test", "id" : 1, "doc" : {"title" : "Crossbody Bag with Tassel", "price" : 19.85}})
indexApi.insert({"index" : "test", "id" : 2, "doc" : {"title" : "Crossbody Bag with Tassel"}})
indexApi.insert({"index" : "test", "id" : 0, "doc" : {{"title" : "Yellow bag"}})

Javascript

res = await indexApi.insert({"index" : "test", "id" : 1, "doc" : {"title" : "Crossbody Bag with Tassel", "price" : 19.85}});
res = await indexApi.insert({"index" : "test", "id" : 2, "doc" : {"title" : "Crossbody Bag with Tassel"}});
res = await indexApi.insert({"index" : "test", "id" : 0, "doc" : {{"title" : "Yellow bag"}});

Java

InsertDocumentRequest newdoc = new InsertDocumentRequest();
HashMap<String,Object> doc = new HashMap<String,Object>(){{
    put("title","Crossbody Bag with Tassel");
    put("price",19.85);
}};
newdoc.index("products").id(1L).setDoc(doc);
sqlresult = indexApi.insert(newdoc);

newdoc = new InsertDocumentRequest();
HashMap<String,Object> doc = new HashMap<String,Object>(){{
    put("title","Crossbody Bag with Tassel");
}};
newdoc.index("products").id(2L).setDoc(doc);
sqlresult = indexApi.insert(newdoc);

newdoc = new InsertDocumentRequest();
HashMap<String,Object> doc = new HashMap<String,Object>(){{
    put("title","Yellow bag");
 }};
newdoc.index("products").id(0L).setDoc(doc);
sqlresult = indexApi.insert(newdoc);

Dictionary<string, Object> doc = new Dictionary<string, Object>(); 
doc.Add("title", "Crossbody Bag with Tassel");
doc.Add("price", 19.85);
InsertDocumentRequest newdoc = new InsertDocumentRequest(index: "products", id: 1, doc: doc);
var sqlresult = indexApi.Insert(newdoc);

doc = new Dictionary<string, Object>(); 
doc.Add("title", "Crossbody Bag with Tassel");
newdoc = new InsertDocumentRequest(index: "products", id: 2, doc: doc);
sqlresult = indexApi.Insert(newdoc);

doc = new Dictionary<string, Object>(); 
doc.Add("title", "Yellow bag");
newdoc = new InsertDocumentRequest(index: "products", id: 0, doc: doc);
sqlresult = indexApi.Insert(newdoc);

Auto schema

Manticore features an automatic table creation mechanism, which activates when a specified table in the insert query doesn't yet exist. This mechanism is enabled by default. To disable it, set auto_schema = 0 in the Searchd section of your Manticore config file.

By default, all text values in the VALUES clause are considered to be of the text type, except for values representing valid email addresses, which are treated as the string type.

If you attempt to INSERT multiple rows with different, incompatible value types for the same field, auto table creation will be canceled, and an error message will be returned. However, if the different value types are compatible, the resulting field type will be the one that accommodates all the values. Some automatic data type conversions that may occur include:

mva -> mva64
uint -> bigint -> float
string -> text

Keep in mind that the /bulk HTTP endpoint does not support automatic table creation (auto schema). Only the /_bulk (Elasticsearch-like) HTTP endpoint and the SQL interface support this feature.

SQL

MySQL [(none)]> drop table if exists t; insert into t(i,f,t,s,j,b,m,mb) values(123,1.2,'text here','test@mail.com','{"a": 123}',1099511627776,(1,2),(1099511627776,1099511627777)); desc t; select * from t;

--------------
drop table if exists t
--------------

Query OK, 0 rows affected (0.42 sec)

--------------
insert into t(i,f,t,j,b,m,mb) values(123,1.2,'text here','{"a": 123}',1099511627776,(1,2),(1099511627776,1099511627777))
--------------

Query OK, 1 row affected (0.00 sec)

--------------
desc t
--------------

+-------+--------+----------------+
| Field | Type   | Properties     |
+-------+--------+----------------+
| id    | bigint |                |
| t     | text   | indexed stored |
| s     | string |                |
| j     | json   |                |
| i     | uint   |                |
| b     | bigint |                |
| f     | float  |                |
| m     | mva    |                |
| mb    | mva64  |                |
+-------+--------+----------------+
8 rows in set (0.00 sec)

--------------
select * from t
--------------

+---------------------+------+---------------+----------+------+-----------------------------+-----------+---------------+------------+
| id                  | i    | b             | f        | m    | mb                          | t         | s             | j          |
+---------------------+------+---------------+----------+------+-----------------------------+-----------+---------------+------------+
| 5045949922868723723 |  123 | 1099511627776 | 1.200000 | 1,2  | 1099511627776,1099511627777 | text here | test@mail.com | {"a": 123} |
+---------------------+------+---------------+----------+------+-----------------------------+-----------+---------------+------------+
1 row in set (0.00 sec)

JSON

POST /insert  -d
{
 "index":"t",
 "id": 2,
 "doc":
 {
   "i" : 123,
   "f" : 1.23,
   "t": "text here",
   "s": "test@mail.com",
   "j": {"a": 123},
   "b": 1099511627776,
   "m": [1,2],
   "mb": [1099511627776,1099511627777]
 }
}

{"_index":"t","_id":2,"created":true,"result":"created","status":201}

Auto ID

Manticore provides an auto ID generation functionality for the column ID of documents inserted or replaced into a real-time or Percolate table. The generator produces a unique ID for a document with some guarantees, but it should not be considered an auto-incremented ID.

The generated ID value is guaranteed to be unique under the following conditions:

The server_id value of the current server is in the range of 0 to 127 and is unique among nodes in the cluster, or it uses the default value generated from the MAC address as a seed
The system time does not change for the Manticore node between server restarts
The auto ID is generated fewer than 16 million times per second between search server restarts

The auto ID generator creates a 64-bit integer for a document ID and uses the following schema:

Bits 0 to 23 form a counter that gets incremented on every call to the auto ID generator
Bits 24 to 55 represent the Unix timestamp of the server start
Bits 56 to 63 correspond to the server_id

This schema ensures that the generated ID is unique among all nodes in the cluster and that data inserted into different cluster nodes does not create collisions between the nodes.

As a result, the first ID from the generator used for auto ID is NOT 1 but a larger number. Additionally, the document stream inserted into a table might have non-sequential ID values if inserts into other tables occur between calls, as the ID generator is singular in the server and shared between all its tables.

SQL

INSERT INTO products(title,price) VALUES ('Crossbody Bag with Tassel', 19.85);
INSERT INTO products VALUES (0,'Yello bag', 4.95);
select * from products;

+---------------------+-----------+---------------------------+
| id                  | price     | title                     |
+---------------------+-----------+---------------------------+
| 1657860156022587404 | 19.850000 | Crossbody Bag with Tassel |
| 1657860156022587405 |  4.950000 | Yello bag                 |
+---------------------+-----------+---------------------------+

JSON

POST /insert
{
  "index":"products",
  "id":0,
  "doc":
  {
    "title" : "Yellow bag"
  }
}

GET /search
{
  "index":"products",
  "query":{
    "query_string":""
  }
}

{
  "took": 0,
  "timed_out": false,
  "hits": {
    "total": 1,
    "hits": [
      {
        "_id": "1657860156022587406",
        "_score": 1,
        "_source": {
          "price": 0,
          "title": "Yellow bag"
        }
      }
    ]
  }
}

PHP

$index->addDocuments([
        ['id' => 0, 'title' => 'Yellow bag']
]);

Python

indexApi.insert({"index" : "products", "id" : 0, "doc" : {"title" : "Yellow bag"}})

Javascript

res = await indexApi.insert({"index" : "products", "id" : 0, "doc" : {"title" : "Yellow bag"}});

Java

newdoc = new InsertDocumentRequest();
HashMap<String,Object> doc = new HashMap<String,Object>(){{
    put("title","Yellow bag");
 }};
newdoc.index("products").id(0L).setDoc(doc);
sqlresult = indexApi.insert(newdoc);

Dictionary<string, Object> doc = new Dictionary<string, Object>(); 
doc.Add("title", "Yellow bag");
InsertDocumentRequest newdoc = new InsertDocumentRequest(index: "products", id: 0, doc: doc);
var sqlresult = indexApi.Insert(newdoc);

Bulk adding documents

You can insert not just a single document into a real-time table, but as many as you'd like. It's perfectly fine to insert batches of tens of thousands of documents into a real-time table. However, it's important to keep the following points in mind:

The larger the batch, the higher the latency of each insert operation
The larger the batch, the higher the indexation speed you can expect
You might want to increase the max_packet_size value to allow for larger batches
Normally, each batch insert operation is considered a single transaction with atomicity guarantee, so you will either have all the new documents in the table at once or, in case of failure, none of them will be added. See more details about an empty line or switching to another table in the "JSON" example.

Note that the /bulk HTTP endpoint does not support automatic creation of tables (auto schema). Only the /_bulk (Elasticsearch-like) HTTP endpoint and the SQL interface support this feature.

Chunked transfer in /bulk

The /bulk (Manticore mode) endpoint supports Chunked transfer encoding. You can use it to transmit large batches. It:

reduces peak RAM usage, lowering the risk of OOM
decreases response time
allows you to bypass max_packet_size and transfer batches much larger than the maximum allowed value of max_packet_size (128MB), for example, 1GB at a time.

Bulk insert examples

SQL

For bulk insert, simply provide more documents in brackets after `VALUES()`. The syntax is:

INSERT INTO <table name>[(column1, column2, ...)] VALUES ()[,(value1,[value2, ...])]

The optional column name list allows you to explicitly specify values for some of the columns present in the table. All other columns will be filled with their default values (0 for scalar types, empty string for string types). For example:

INSERT INTO products(title,price) VALUES ('Crossbody Bag with Tassel', 19.85), ('microfiber sheet set', 19.99), ('Pet Hair Remover Glove', 7.99);

Query OK, 3 rows affected (0.01 sec)

Expressions are currently not supported in `INSERT`, and values should be explicitly specified.

JSON

The syntax is generally the same as for [inserting a single document](04-quick_start_guide.html#add-documents). Just provide more lines, one for each document, and use the `/bulk` endpoint instead of `/insert`. Enclose each document in the "insert" node. Note that it also requires: * `Content-Type: application/x-ndjson` * The data should be formatted as newline-delimited JSON (NDJSON). Essentially, this means that each line should contain exactly one JSON statement and end with a newline `\n` and possibly `\r`. The `/bulk` endpoint supports 'insert', 'replace', 'delete', and 'update' queries. Keep in mind that you can direct operations to multiple tables, but transactions are only possible for a single table. If you specify more, Manticore will gather operations directed to one table into a single transaction. When the table changes, it will commit the collected operations and initiate a new transaction on the new table. An empty line separating batches also leads to committing the previous batch and starting a new transaction. In the response for a `/bulk` request, you can find the following fields: * "errors": shows whether any errors occurred (true/false) * "error": describes the error that took place * "current_line": the line number where execution stopped (or failed); empty lines, including the first empty line, are also counted * "skipped_lines": the count of non-committed lines, beginning from the `current_line` and moving backward

POST /bulk
-H "Content-Type: application/x-ndjson" -d '
{"insert": {"index":"products", "id":1, "doc":  {"title":"Crossbody Bag with Tassel","price" : 19.85}}}
{"insert":{"index":"products", "id":2, "doc":  {"title":"microfiber sheet set","price" : 19.99}}}
'

POST /bulk
-H "Content-Type: application/x-ndjson" -d '
{"insert":{"index":"test1","id":21,"doc":{"int_col":1,"price":1.1,"title":"bulk doc one"}}}
{"insert":{"index":"test1","id":22,"doc":{"int_col":2,"price":2.2,"title":"bulk doc two"}}}

{"insert":{"index":"test1","id":23,"doc":{"int_col":3,"price":3.3,"title":"bulk doc three"}}}
{"insert":{"index":"test2","id":24,"doc":{"int_col":4,"price":4.4,"title":"bulk doc four"}}}
{"insert":{"index":"test2","id":25,"doc":{"int_col":5,"price":5.5,"title":"bulk doc five"}}}
'

{
  "items": [
    {
      "bulk": {
        "_index": "products",
        "_id": 2,
        "created": 2,
        "deleted": 0,
        "updated": 0,
        "result": "created",
        "status": 201
      }
    }
  ],
  "current_line": 4,
  "skipped_lines": 0,
  "errors": false,
  "error": ""
}

{
  "items": [
    {
      "bulk": {
        "_index": "test1",
        "_id": 22,
        "created": 2,
        "deleted": 0,
        "updated": 0,
        "result": "created",
        "status": 201
      }
    },
    {
      "bulk": {
        "_index": "test1",
        "_id": 23,
        "created": 1,
        "deleted": 0,
        "updated": 0,
        "result": "created",
        "status": 201
      }
    },
    {
      "bulk": {
        "_index": "test2",
        "_id": 25,
        "created": 2,
        "deleted": 0,
        "updated": 0,
        "result": "created",
        "status": 201
      }
    }
  ],
  "current_line": 8,
  "skipped_lines": 0,
  "errors": false,
  "error": ""
}

Elasticsearch

POST /_bulk
-H "Content-Type: application/x-ndjson" -d '
{ "index" : { "_index" : "products" } }
{ "title" : "Yellow Bag", "price": 12 }
{ "create" : { "_index" : "products" } }
{ "title" : "Red Bag", "price": 12.5, "id": 3 }
'

{
  "items": [
    {
      "index": {
        "_index": "products",
        "_type": "doc",
        "_id": "0",
        "_version": 1,
        "result": "created",
        "_shards": {
          "total": 1,
          "successful": 1,
          "failed": 0
        },
        "_seq_no": 0,
        "_primary_term": 1,
        "status": 201
      }
    },
    {
      "create": {
        "_index": "products",
        "_type": "doc",
        "_id": "3",
        "_version": 1,
        "result": "created",
        "_shards": {
          "total": 1,
          "successful": 1,
          "failed": 0
        },
        "_seq_no": 0,
        "_primary_term": 1,
        "status": 201
      }
    }
  ],
  "errors": false,
  "took": 1
}

PHP

Use method addDocuments():

$index->addDocuments([
        ['id' => 1, 'title' => 'Crossbody Bag with Tassel', 'price' => 19.85],
        ['id' => 2, 'title' => 'microfiber sheet set', 'price' => 19.99],
        ['id' => 3, 'title' => 'Pet Hair Remover Glove', 'price' => 7.99]
]);

Python

docs = [ \
    {"insert": {"index" : "products", "id" : 1, "doc" : {"title" : "Crossbody Bag with Tassel", "price" : 19.85}}}, \
    {"insert": {"index" : "products", "id" : 2, "doc" : {"title" : "microfiber sheet set", "price" : 19.99}}}, \
    {"insert": {"index" : "products", "id" : 3, "doc" : {"title" : "CPet Hair Remover Glove", "price" : 7.99}}}
]
res = indexApi.bulk('\n'.join(map(json.dumps,docs)))

Javascript

let docs = [
    {"insert": {"index" : "products", "id" : 3, "doc" : {"title" : "Crossbody Bag with Tassel", "price" : 19.85}}},
    {"insert": {"index" : "products", "id" : 4, "doc" : {"title" : "microfiber sheet set", "price" : 19.99}}},
    {"insert": {"index" : "products", "id" : 5, "doc" : {"title" : "CPet Hair Remover Glove", "price" : 7.99}}}
];
res =  await indexApi.bulk(docs.map(e=>JSON.stringify(e)).join('\n'));

Java

String body = "{\"insert\": {\"index\" : \"products\", \"id\" : 1, \"doc\" : {\"title\" : \"Crossbody Bag with Tassel\", \"price\" : 19.85}}}"+"\n"+
    "{\"insert\": {\"index\" : \"products\", \"id\" : 4, \"doc\" : {\"title\" : \"microfiber sheet set\", \"price\" : 19.99}}}"+"\n"+
    "{\"insert\": {\"index\" : \"products\", \"id\" : 5, \"doc\" : {\"title\" : \"CPet Hair Remover Glove\", \"price\" : 7.99}}}"+"\n";         
BulkResponse bulkresult = indexApi.bulk(body);

string body = "{\"insert\": {\"index\" : \"products\", \"id\" : 1, \"doc\" : {\"title\" : \"Crossbody Bag with Tassel\", \"price\" : 19.85}}}"+"\n"+
    "{\"insert\": {\"index\" : \"products\", \"id\" : 4, \"doc\" : {\"title\" : \"microfiber sheet set\", \"price\" : 19.99}}}"+"\n"+
    "{\"insert\": {\"index\" : \"products\", \"id\" : 5, \"doc\" : {\"title\" : \"CPet Hair Remover Glove\", \"price\" : 7.99}}}"+"\n";                                 
BulkResponse bulkresult = indexApi.Bulk(string.Join("\n", docs));

Inserting multi-value attributes (MVA) values

Multi-value attributes (MVA) are inserted as arrays of numbers.

Examples

SQL

INSERT INTO products(title, sizes) VALUES('shoes', (40,41,42,43));

JSON

POST /insert
{
  "index":"products",
  "id":1,
  "doc":
  {
    "title" : "shoes",
    "sizes" : [40, 41, 42, 43]
  }
}

Elasticsearch

POST /products/_create/1
{
  "title": "shoes",
  "sizes" : [40, 41, 42, 43]
}

Or, alternatively

POST /products/_doc/
{
  "title": "shoes",
  "sizes" : [40, 41, 42, 43]
}

PHP

$index->addDocument(
  ['title' => 'shoes', 'sizes' => [40,41,42,43]],
  1
);

Python

indexApi.insert({"index" : "products", "id" : 0, "doc" : {"title" : "Yellow bag","sizes":[40,41,42,43]}})

Javascript

res = await indexApi.insert({"index" : "products", "id" : 0, "doc" : {"title" : "Yellow bag","sizes":[40,41,42,43]}});

Java

newdoc = new InsertDocumentRequest();
HashMap<String,Object> doc = new HashMap<String,Object>(){{
    put("title","Yellow bag");
    put("sizes",new int[]{40,41,42,43});
 }};
newdoc.index("products").id(0L).setDoc(doc);
sqlresult = indexApi.insert(newdoc);

Dictionary<string, Object> doc = new Dictionary<string, Object>(); 
doc.Add("title", "Yellow bag");
doc.Add("sizes", new List<Object> {40,41,42,43});
InsertDocumentRequest newdoc = new InsertDocumentRequest(index: "products", id: 0, doc: doc);
var sqlresult = indexApi.Insert(newdoc);

Inserting JSON

JSON value can be inserted as an escaped string (via SQL or JSON) or as a JSON object (via the JSON interface).

Examples

SQL

INSERT INTO products VALUES (1, 'shoes', '{"size": 41, "color": "red"}');

JSON

JSON value can be inserted as a JSON object

POST /insert
{
  "index":"products",
  "id":1,
  "doc":
  {
    "title" : "shoes",
    "meta" : {
      "size": 41,
      "color": "red"
    }
  }
}

JSON value can be also inserted as a string containing escaped JSON:

POST /insert
{
  "index":"products",
  "id":1,
  "doc":
  {
    "title" : "shoes",
    "meta" : "{\"size\": 41, \"color\": \"red\"}"
  }
}

Elasticsearch

POST /products/_create/1
{
  "title": "shoes",
  "meta" : {
    "size": 41,
    "color": "red"
  }
}

Or, alternatively

POST /products/_doc/
{
  "title": "shoes",
  "meta" : {
    "size": 41,
    "color": "red"
  }
}

Consider JSON just as string:

PHP

$index->addDocument(
  ['title' => 'shoes', 'meta' => '{"size": 41, "color": "red"}'],
  1
);

Python

indexApi = api = manticoresearch.IndexApi(client)
indexApi.insert({"index" : "products", "id" : 0, "doc" : {"title" : "Yellow bag","meta":'{"size": 41, "color": "red"}'}})

Javascript

res = await indexApi.insert({"index" : "products", "id" : 0, "doc" : {"title" : "Yellow bag","meta":'{"size": 41, "color": "red"}'}});

Java

newdoc = new InsertDocumentRequest();
HashMap<String,Object> doc = new HashMap<String,Object>(){{
    put("title","Yellow bag");
    put("meta",
        new HashMap<String,Object>(){{
            put("size",41);
            put("color","red");
        }});
 }};
newdoc.index("products").id(0L).setDoc(doc);
sqlresult = indexApi.insert(newdoc);

Dictionary<string, Object> meta = new Dictionary<string, Object>(); 
meta.Add("size", 41);
meta.Add("color", "red");
Dictionary<string, Object> doc = new Dictionary<string, Object>(); 
doc.Add("title", "Yellow bag");
doc.Add("meta", meta);
InsertDocumentRequest newdoc = new InsertDocumentRequest(index: "products", id: 0, doc: doc);
var sqlresult = indexApi.Insert(newdoc);

¶ 12.1.2

Adding rules to a percolate table

In a percolate table documents that are percolate query rules are stored and must follow the exact schema of four fields:

field	type	description
id	bigint	PQ rule identifier (if omitted, it will be assigned automatically)
query	string	Full-text query (can be empty) compatible with the percolate table
filters	string	Additional filters by non-full-text fields (can be empty) compatible with the percolate table
tags	string	A string with one or many comma-separated tags, which may be used to selectively show/delete saved queries

Any other field names are not supported and will trigger an error.

Warning: Inserting/replacing JSON-formatted PQ rules via SQL will not work. In other words, the JSON-specific operators (match etc) will be considered just parts of the rule's text that should match with documents. If you prefer JSON syntax, use the HTTP endpoint instead of INSERT/REPLACE.

SQL

INSERT INTO pq(id, query, filters) VALUES (1, '@title shoes', 'price > 5');
INSERT INTO pq(id, query, tags) VALUES (2, '@title bag', 'Louis Vuitton');
SELECT * FROM pq;

+------+--------------+---------------+---------+
| id   | query        | tags          | filters |
+------+--------------+---------------+---------+
|    1 | @title shoes |               | price>5 |
|    2 | @title bag   | Louis Vuitton |         |
+------+--------------+---------------+---------+

JSON

There are two ways you can add a percolate query into a percolate table: * Query in JSON /search compatible format, described at [json/search](13-searching.html#http-json)

PUT /pq/pq_table/doc/1
{
  "query": {
    "match": {
      "title": "shoes"
    },
    "range": {
      "price": {
        "gt": 5
      }
    }
  },
  "tags": ["Loius Vuitton"]
}

* Query in SQL format, described at [search query syntax](13-searching.html#queries-in-sql-format)

PUT /pq/pq_table/doc/2
{
  "query": {
    "ql": "@title shoes"
  },
  "filters": "price > 5",
  "tags": ["Loius Vuitton"]
}

PHP

$newstoredquery = [
    'index' => 'test_pq',
    'body' => [
        'query' => [
                       'match' => [
                               'title' => 'shoes'
                       ]
               ],
               'range' => [
                       'price' => [
                               'gt' => 5
                       ]
               ]
       ],
    'tags' => ['Loius Vuitton']
];
$client->pq()->doc($newstoredquery);

Python

newstoredquery ={"index" : "test_pq", "id" : 2, "doc" : {"query": {"ql": "@title shoes"},"filters": "price > 5","tags": ["Loius Vuitton"]}}
indexApi.insert(newstoredquery)

Javascript

newstoredquery ={"index" : "test_pq", "id" : 2, "doc" : {"query": {"ql": "@title shoes"},"filters": "price > 5","tags": ["Loius Vuitton"]}};
indexApi.insert(newstoredquery);

Java

newstoredquery = new HashMap<String,Object>(){{
    put("query",new HashMap<String,Object >(){{
        put("q1","@title shoes");
        put("filters","price>5");
        put("tags",new String[] {"Loius Vuitton"});
    }});
}};
newdoc.index("test_pq").id(2L).setDoc(doc);
indexApi.insert(newdoc);

Dictionary<string, Object> query = new Dictionary<string, Object>(); 
query.Add("q1", "@title shoes");
query.Add("filters", "price>5");
query.Add("tags", new List<string> {"Loius Vuitton"});
Dictionary<string, Object> newstoredquery = new Dictionary<string, Object>(); 
newstoredquery.Add("query", query);
InsertDocumentRequest newdoc = new InsertDocumentRequest(index: "test_pq", id: 2, doc: doc);
indexApi.Insert(newdoc);

Auto ID provisioning

If you don't specify an ID, it will be assigned automatically. You can read more about auto-ID here.

SQL

INSERT INTO pq(query, filters) VALUES ('wristband', 'price > 5');
SELECT * FROM pq;

+---------------------+-----------+------+---------+
| id                  | query     | tags | filters |
+---------------------+-----------+------+---------+
| 1657843905795719192 | wristband |      | price>5 |
+---------------------+-----------+------+---------+

JSON

PUT /pq/pq_table/doc
{
"query": {
  "match": {
    "title": "shoes"
  },
  "range": {
    "price": {
      "gt": 5
    }
  }
},
"tags": ["Loius Vuitton"]
}

PUT /pq/pq_table/doc
{
"query": {
  "ql": "@title shoes"
},
"filters": "price > 5",
"tags": ["Loius Vuitton"]
}

{
  "index": "pq_table",
  "type": "doc",
  "_id": "1657843905795719196",
  "result": "created"
}

{
  "index": "pq_table",
  "type": "doc",
  "_id": "1657843905795719198",
  "result": "created"
}

PHP

$newstoredquery = [
    'index' => 'pq_table',
    'body' => [
        'query' => [
                       'match' => [
                               'title' => 'shoes'
                       ]
               ],
               'range' => [
                       'price' => [
                               'gt' => 5
                       ]
               ]
       ],
    'tags' => ['Loius Vuitton']
];
$client->pq()->doc($newstoredquery);

Array(
       [index] => pq_table
       [type] => doc
       [_id] => 1657843905795719198
       [result] => created
)

Python

indexApi = api = manticoresearch.IndexApi(client)
newstoredquery ={"index" : "test_pq",   "doc" : {"query": {"ql": "@title shoes"},"filters": "price > 5","tags": ["Loius Vuitton"]}}
indexApi.insert(store_query)

{'created': True,
 'found': None,
 'id': 1657843905795719198,
 'index': 'test_pq',
 'result': 'created'}

Javascript

newstoredquery ={"index" : "test_pq",  "doc" : {"query": {"ql": "@title shoes"},"filters": "price > 5","tags": ["Loius Vuitton"]}};
res =  await indexApi.insert(store_query);

{"_index":"test_pq","_id":1657843905795719198,"created":true,"result":"created"}

Java

newstoredquery = new HashMap<String,Object>(){{
    put("query",new HashMap<String,Object >(){{
        put("q1","@title shoes");
        put("filters","price>5");
        put("tags",new String[] {"Loius Vuitton"});
    }});
}};
newdoc.index("test_pq").setDoc(doc);
indexApi.insert(newdoc);

Dictionary<string, Object> query = new Dictionary<string, Object>(); 
query.Add("q1", "@title shoes");
query.Add("filters", "price>5");
query.Add("tags", new List<string> {"Loius Vuitton"});
Dictionary<string, Object> newstoredquery = new Dictionary<string, Object>(); 
newstoredquery.Add("query", query);
InsertDocumentRequest newdoc = new InsertDocumentRequest(index: "test_pq", doc: doc);
indexApi.Insert(newdoc);

No schema in SQL

In case of omitted schema in SQL INSERT command, the following parameters are expected:
1. ID. You can use 0 as the ID to trigger auto-ID generation.
2. Query - Full-text query.
3. Tags - PQ rule tags string.
4. Filters - Additional filters by attributes.

SQL

INSERT INTO pq VALUES (0, '@title shoes', '', '');
INSERT INTO pq VALUES (0, '@title shoes', 'Louis Vuitton', '');
SELECT * FROM pq;

+---------------------+--------------+---------------+---------+
| id                  | query        | tags          | filters |
+---------------------+--------------+---------------+---------+
| 2810855531667783688 | @title shoes |               |         |
| 2810855531667783689 | @title shoes | Louis Vuitton |         |
+---------------------+--------------+---------------+---------+

Replacing rules in a PQ table

To replace an existing PQ rule with a new one in SQL, just use a regular REPLACE command. There's a special syntax ?refresh=1 to replace a PQ rule defined in JSON mode via the HTTP JSON interface.

SQL

mysql> select * from pq;
+---------------------+--------------+------+---------+
| id                  | query        | tags | filters |
+---------------------+--------------+------+---------+
| 2810823411335430148 | @title shoes |      |         |
+---------------------+--------------+------+---------+
1 row in set (0.00 sec)

mysql> replace into pq(id,query) values(2810823411335430148,'@title boots');
Query OK, 1 row affected (0.00 sec)

mysql> select * from pq;
+---------------------+--------------+------+---------+
| id                  | query        | tags | filters |
+---------------------+--------------+------+---------+
| 2810823411335430148 | @title boots |      |         |
+---------------------+--------------+------+---------+
1 row in set (0.00 sec)

JSON

GET /pq/pq/doc/2810823411335430149
{
  "took": 0,
  "timed_out": false,
  "hits": {
    "total": 1,
    "hits": [
      {
        "_id": "2810823411335430149",
        "_score": 1,
        "_source": {
          "query": {
            "match": {
              "title": "shoes"
            }
          },
          "tags": "",
          "filters": ""
        }
      }
    ]
  }
}

PUT /pq/pq/doc/2810823411335430149?refresh=1 -d '{
  "query": {
    "match": {
      "title": "boots"
    }
  }
}'

GET /pq/pq/doc/2810823411335430149
{
  "took": 0,
  "timed_out": false,
  "hits": {
    "total": 1,
    "hits": [
      {
        "_id": "2810823411335430149",
        "_score": 1,
        "_source": {
          "query": {
            "match": {
              "title": "boots"
            }
          },
          "tags": "",
          "filters": ""
        }
      }
    ]
  }
}

¶ 12.2

Adding data from external storages

¶ 12.2.1

Plain tables creation

Plain tables are tables that are created one-time by fetching data at creation from one or several sources. A plain table is immutable as documents cannot be added or deleted during its lifespan. It is only possible to update values of numeric attributes (including MVA). Refreshing the data is only possible by recreating the whole table.

Plain tables are available only in the Plain mode and their definition is made up of a table declaration and one or several source declarations. The data gathering and table creation are not made by the searchd server but by the auxiliary tool indexer.

Indexer is a command-line tool that can be called directly from the command line or from shell scripts.

It can accept a number of arguments when called, but there are also several settings of its own in the Manticore configuration file.

In the typical scenario, indexer does the following:

Fetches the data from the source
Builds the plain table
Writes the table files
(Optional) Informs the search server about the new table which triggers table rotation

Indexer tool

The indexer tool is used to create plain tables in Manticore Search. It has a general syntax of:

indexer [OPTIONS] [table_name1 [table_name2 [...]]]

When creating tables with indexer, the generated table files must be made with permissions that allow searchd to read, write, and delete them. In case of the official Linux packages, searchd runs under the manticore user. Therefore, indexer must also run under the manticore user:

sudo -u manticore indexer ...

If you are running searchd differently, you might need to omit sudo -u manticore. Just make sure that the user under which your searchd instance is running has read/write permissions to the tables generated using indexer.

To create a plain table, you need to list the table(s) you want to process. For example, if your manticore.conf file contains details on two tables, mybigindex and mysmallindex, you could run:

sudo -u manticore indexer mysmallindex mybigindex

You can also use wildcard tokens to match table names:

? matches any single character
* matches any count of any characters
% matches none or any single character

sudo -u manticore indexer indexpart*main --rotate

The exit codes for indexer are as follows:

0: everything went OK
1: there was a problem while indexing (and if --rotate was specified, it was skipped) or an operation emitted a warning
2: indexing went OK, but the --rotate attempt failed

Also, you can run indexer via a systemctl unit file:

systemctl start --no-block manticore-indexer

Or, in case you want to build a specific table:

systemctl start --no-block manticore-indexer@specific-table-name

Find more information about scheduling indexer via systemd below.

Indexer command line arguments

--config <file> (-c <file> for short) tells indexer to use the given file as its configuration. Normally, it will look for manticore.conf in the installation directory (e.g. /etc/manticoresearch/manticore.conf), followed by the current directory you are in when calling indexer from the shell. This is most useful in shared environments where the binary files are installed in a global folder, e.g. /usr/bin/, but you want to provide users with the ability to make their own custom Manticore set-ups, or if you want to run multiple instances on a single server. In cases like those you could allow them to create their own manticore.conf files and pass them to indexer with this option. For example:

shell sudo -u manticore indexer --config /home/myuser/manticore.conf mytable

--all tells indexer to update every table listed in manticore.conf instead of listing individual tables. This would be useful in small configurations or cron-kind or maintenance jobs where the entire table set will get rebuilt each day or week or whatever period is best. Please note that since --all tries to update all found tables in the configuration, it will issue a warning if it encounters RealTime tables and the exit code of the command will be 1 not 0 even if the plain tables finished without issue. Example usage:

shell sudo -u manticore indexer --config /home/myuser/manticore.conf --all

--rotate is used for rotating tables. Unless you have the situation where you can take the search function offline without troubling users you will almost certainly need to keep search running whilst indexing new documents. --rotate creates a second table, parallel to the first (in the same place, simply including .new in the filenames). Once complete, indexer notifies searchd via sending the SIGHUP signal, and the searchd will attempt to rename the tables (renaming the existing ones to include .old and renaming the .new to replace them), and then will start serving from the newer files. Depending on the setting of seamless_rotate there may be a slight delay in being able to search the newer tables. In case multiple tables are rotated at once which are chained by killlist_target relations rotation will start with the tables that are not targets and finish with the ones at the end of target chain. Example usage:

shell sudo -u manticore indexer --rotate --all

--quiet tells indexer ot to output anything, unless there is an error. This is mostly used for cron-type or other scripted jobs where the output is irrelevant or unnecessary, except in the event of some kind of error. Example usage:

shell sudo -u manticore indexer --rotate --all --quiet

--noprogress does not display progress details as they occur. Instead, the final status details (such as documents indexed, speed of indexing and so on are only reported at completion of indexing. In instances where the script is not being run on a console (or 'tty'), this will be on by default. Example usage:

shell sudo -u manticore indexer --rotate --all --noprogress

--buildstops <outputfile.text> <N> reviews the table source, as if it were indexing the data, and produces a list of the terms that are being indexed. In other words, it produces a list of all the searchable terms that are becoming part of the table. Note, it does not update the table in question, it simply processes the data as if it were indexing, including running queries defined with sql_query_pre or sql_query_post. outputfile.txt will contain the list of words, one per line, sorted by frequency with most frequent first, and N specifies the maximum number of words that will be listed. If it's sufficiently large to encompass every word in the table, only that many words will be returned. Such a dictionary list could be used for client application features around "Did you mean…" functionality, usually in conjunction with --buildfreqs, below. Example:

shell sudo -u manticore indexer mytable --buildstops word_freq.txt 1000

This would produce a document in the current directory, word_freq.txt, with the 1,000 most common words in 'mytable', ordered by most common first. Note that the file will pertain to the last table indexed when specified with multiple tables or --all (i.e. the last one listed in the configuration file)

--buildfreqs works with --buildstops (and is ignored if --buildstops is not specified). As --buildstops provides the list of words used within the table, --buildfreqs adds the quantity present in the table, which would be useful in establishing whether certain words should be considered stopwords if they are too prevalent. It will also help with developing "Did you mean…" features where you need to know how much more common a given word compared to another, similar one. For example:

shell sudo -u manticore indexer mytable --buildstops word_freq.txt 1000 --buildfreqs

This would produce the word_freq.txt as above, however after each word would be the number of times it occurred in the table in question.

--merge <dst-table> <src-table> is used for physically merging tables together, for example if you have a main+delta scheme, where the main table rarely changes, but the delta table is rebuilt frequently, and --merge would be used to combine the two. The operation moves from right to left - the contents of src-table get examined and physically combined with the contents of dst-table and the result is left in dst-table. In pseudo-code, it might be expressed as: dst-table += src-table An example:

shell sudo -u manticore indexer --merge main delta --rotate

In the above example, where the main is the master, rarely modified table, and the delta is more frequently modified one, you might use the above to call indexer to combine the contents of the delta into the main table and rotate the tables.

--merge-dst-range <attr> <min> <max> runs the filter range given upon merging. Specifically, as the merge is applied to the destination table (as part of --merge, and is ignored if --merge is not specified), indexer will also filter the documents ending up in the destination table, and only documents will pass through the filter given will end up in the final table. This could be used for example, in a table where there is a 'deleted' attribute, where 0 means 'not deleted'. Such a table could be merged with:

shell sudo -u manticore indexer --merge main delta --merge-dst-range deleted 0 0

Any documents marked as deleted (value 1) will be removed from the newly-merged destination table. It can be added several times to the command line, to add successive filters to the merge, all of which must be met in order for a document to become part of the final table.

--merge-killlists (and its shorter alias --merge-klists) changes the way kill lists are processed when merging tables. By default, both kill lists get discarded after a merge. That supports the most typical main+delta merge scenario. With this option enabled, however, kill lists from both tables get concatenated and stored into the destination table. Note that a source (delta) table kill list will be used to suppress rows from a destination (main) table at all times.
--keep-attrs allows to reuse existing attributes on reindexing. Whenever the table is rebuilt, each new document id is checked for presence in the "old" table, and if it already exists, its attributes are transferred to the "new" table; if not found, attributes from the new table are used. If the user has updated attributes in the table, but not in the actual source used for the table, all updates will be lost when reindexing; using --keep-attrs enables saving the updated attribute values from the previous table. It is possible to specify a path for table files to be used instead of the reference path from the config:

shell sudo -u manticore indexer mytable --keep-attrs=/path/to/index/files

--keep-attrs-names=<attributes list> allows you to specify attributes to reuse from an existing table on reindexing. By default, all attributes from the existing table are reused in the new table:

shell sudo -u manticore indexer mytable --keep-attrs=/path/to/table/files --keep-attrs-names=update,state

--dump-rows <FILE> dumps rows fetched by SQL source(s) into the specified file, in a MySQL compatible syntax. The resulting dumps are the exact representation of data as received by indexer and can help repeat indexing-time issues. The command performs fetching from the source and creates both table files and the dump file.
--print-rt <rt_index> <table> outputs fetched data from the source as INSERTs for a real-time table. The first lines of the dump will contain the real-time fields and attributes (as a reflection of the plain table fields and attributes). The command performs fetching from the source and creates both table files and the dump output. The command can be used as sudo -u manticore indexer -c manticore.conf --print-rt indexrt indexplain > dump.sql. Only SQL-based sources are supported. MVAs are not supported.
--sighup-each is useful when you are rebuilding many big tables and want each one rotated into searchd as soon as possible. With --sighup-each, indexer will send the SIGHUP signal to searchd after successfully completing work on each table. (The default behavior is to send a single SIGHUP after all the tables are built).
--nohup is useful when you want to check your table with indextool before actually rotating it. indexer won't send the SIGHUP if this option is on. Table files are renamed to .tmp. Use indextool to rename table files to .new and rotate it. Example usage:

shell sudo -u manticore indexer --rotate --nohup mytable sudo -u manticore indextool --rotate --check mytable

--print-queries prints out SQL queries that indexer sends to the database, along with SQL connection and disconnection events. That is useful to diagnose and fix problems with SQL sources.
--help (-h for short) lists all the parameters that can be called in indexer.
-v shows indexer version.

Indexer configuration settings

You can also configure indexer behavior in the Manticore configuration file in the indexer section:

indexer {
...
}

lemmatizer_cache

lemmatizer_cache = 256M

Lemmatizer cache size. Optional, default is 256K.

Our lemmatizer implementation uses a compressed dictionary format that enables a space/speed tradeoff. It can either perform lemmatization off the compressed data, using more CPU but less RAM, or it can decompress and precache the dictionary either partially or fully, thus using less CPU but more RAM. The lemmatizer_cache directive lets you control how much RAM exactly can be spent for that uncompressed dictionary cache.

Currently, the only available dictionaries are ru.pak, en.pak, and de.pak. These are the Russian, English, and German dictionaries. The compressed dictionary is approximately 2 to 10 MB in size. Note that the dictionary stays in memory at all times too. The default cache size is 256 KB. The accepted cache sizes are 0 to 2047 MB. It's safe to raise the cache size too high; the lemmatizer will only use the needed memory. For example, the entire Russian dictionary decompresses to approximately 110 MB; thus settinglemmatizer_cache higher than that will not affect the memory use. Even when 1024 MB is allowed for the cache, if only 110 MB is needed, it will only use those 110 MB.

max_file_field_buffer

max_file_field_buffer = 128M

Maximum file field adaptive buffer size in bytes. Optional, default is 8MB, minimum is 1MB.

The file field buffer is used to load files referred to from sql_file_field columns. This buffer is adaptive, starting at 1 MB at first allocation, and growing in 2x steps until either the file contents can be loaded or the maximum buffer size, specified by the max_file_field_buffer directive, is reached.

Thus, if no file fields are specified, no buffer is allocated at all. If all files loaded during indexing are under (for example) 2 MB in size, but the max_file_field_buffer value is 128 MB, the peak buffer usage would still be only 2 MB. However, files over 128 MB would be entirely skipped.

max_iops

max_iops = 40

Maximum I/O operations per second, for I/O throttling. Optional, default is 0 (unlimited).

I/O throttling related option. It limits the maximum count of I/O operations (reads or writes) per any given second. A value of 0 means that no limit is imposed.

indexer can cause bursts of intensive disk I/O during building a table, and it might be desirable to limit its disk activity (and reserve something for other programs running on the same machine, such as searchd). I/O throttling helps to do that. It works by enforcing a minimum guaranteed delay between subsequent disk I/O operations performed by indexer. Throttling I/O can help reduce search performance degradation caused by building. This setting is not effective for other kinds of data ingestion, e.g. inserting data into a real-time table.

max_iosize

max_iosize = 1048576

Maximum allowed I/O operation size, in bytes, for I/O throttling. Optional, default is 0 (unlimited).

I/O throttling related option. It limits the maximum file I/O operation (read or write) size for all operations performed by indexer. A value of 0 means that no limit is imposed. Reads or writes that are bigger than the limit will be split into several smaller operations, and counted as several operations by the max_iops setting. At the time of this writing, all I/O calls should be under 256 KB (default internal buffer size) anyway, so max_iosize values higher than 256 KB should not have any effect.

max_xmlpipe2_field

max_xmlpipe2_field = 8M

Maximum allowed field size for XMLpipe2 source type, in bytes. Optional, default is 2 MB.

mem_limit

mem_limit = 256M
# mem_limit = 262144K # same, but in KB
# mem_limit = 268435456 # same, but in bytes

Plain table building RAM usage limit. Optional, default is 128 MB. Enforced memory usage limit that the indexer will not go above. Can be specified in bytes, or kilobytes (using K postfix), or megabytes (using M postfix); see the example. This limit will be automatically raised if set to an extremely low value causing I/O buffers to be less than 8 KB; the exact lower bound for that depends on the built data size. If the buffers are less than 256 KB, a warning will be produced.

The maximum possible limit is 2047M. Too low values can hurt plain table building speed, but 256M to 1024M should be enough for most, if not all datasets. Setting this value too high can cause SQL server timeouts. During the document collection phase, there will be periods when the memory buffer is partially sorted and no communication with the database is performed; and the database server can timeout. You can resolve that either by raising timeouts on the SQL server side or by lowering mem_limit.

on_file_field_error

on_file_field_error = skip_document

How to handle IO errors in file fields. Optional, default is ignore_field.
When there is a problem indexing a file referenced by a file field (sql_file_field), indexer can either process the document, assuming empty content in this particular field, or skip the document, or fail indexing entirely. on_file_field_error directive controls that behavior. The values it takes are:

ignore_field, process the current document without field;
skip_document, skip the current document but continue indexing;
fail_index, fail indexing with an error message.

The problems that can arise are: open error, size error (file too big), and data read error. Warning messages on any problem will be given at all times, regardless of the phase and the on_file_field_error setting.

Note that with on_file_field_error = skip_document documents will only be ignored if problems are detected during an early check phase, and not during the actual file parsing phase. indexer will open every referenced file and check its size before doing any work, and then open it again when doing actual parsing work. So in case a file goes away between these two open attempts, the document will still be indexed.

write_buffer

write_buffer = 4M

Write buffer size, bytes. Optional, default is 1MB. Write buffers are used to write both temporary and final table files when indexing. Larger buffers reduce the number of required disk writes. Memory for the buffers is allocated in addition to mem_limit. Note that several (currently up to 4) buffers for different files will be allocated, proportionally increasing the RAM usage.

ignore_non_plain

ignore_non_plain = 1

ignore_non_plain allows you to completely ignore warnings about skipping non-plain tables. The default is 0 (not ignoring).

Schedule indexer via systemd

There are two approaches to scheduling indexer runs. The first way is the classical method of using crontab. The second way is using a systemd timer with a user-defined schedule. To create the timer unit files, you should place them in the appropriate directory where systemd looks for such unit files. On most Linux distributions, this directory is typically /etc/systemd/system. Here's how to do it:

Create a timer unit file for your custom schedule:

cat << EOF > /etc/systemd/system/manticore-indexer@.timer
[Unit]
Description=Run Manticore Search's indexer on schedule
[Timer]
OnCalendar=minutely
RandomizedDelaySec=5m
Unit=manticore-indexer@%i.service
[Install]
WantedBy=timers.target
EOF

More on the OnCalendar syntax and examples can be found here.

Edit the timer unit for your specific needs.
Enable the timer:

systemctl enable manticore-indexer@idx1.timer

Start the timer:

systemctl start manticore-indexer@idx1.timer

Repeat steps 2-4 for any additional timers.

¶ 12.2.2

Fetching from databases

¶ 12.2.2.1

Introduction

Manticore Search allows fetching data from databases using specialized drivers or ODBC. Current drivers include:

mysql - for MySQL/MariaDB/Percona MySQL databases
pgsql - for PostgreSQL database
mssql - for Microsoft SQL database
odbc - for any database that accepts connections using ODBC

To fetch data from the database, a source must be configured with type as one of the above. The source requires information about how to connect to the database and the query that will be used to fetch the data. Additional pre- and post-queries can also be set - either to configure session settings or to perform pre/post fetch tasks. The source also must contain definitions of data types for the columns that are fetched.

¶ 12.2.2.2

Database connection

The source definition must contain the settings of the connection, this includes the host, port, user credentials, or specific settings of a driver.

sql_host

The database server host to connect to. Note that the MySQL client library chooses whether to connect over TCP/IP or over UNIX socket based on the host name. Specifically, "localhost" will force it to use UNIX socket (this is the default and generally recommended mode) and "127.0.0.1" will force TCP/IP usage.

sql_port

The server IP port to connect to.
For mysql the default is 3306 and for pgsql, it is 5432.

sql_db

The SQL database to use after the connection is established and perform further queries within.

sql_user

The username used for connecting.

sql_pass

The user password to use when connecting. If the password includes # (which can be used to add comments in the configuration file), you can escape it with \.

sql_sock

UNIX socket name to connect to for local database servers. Note that it depends on the sql_host setting whether this value will actually be used.

sql_sock = /var/lib/mysql/mysql.sock

Specific settings for drivers

MySQL

mysql_connect_flags

MySQL client connection flags. Optional, the default value is 0 (do not set any flags).

This option must contain an integer value with the sum of the flags. The value will be passed to mysql_real_connect() verbatim. The flags are enumerated in mysql_com.h include file. Flags that are especially interesting in regard to indexing, with their respective values, are as follows:

CLIENT_COMPRESS = 32; can use compression protocol
CLIENT_SSL = 2048; switch to SSL after handshake
CLIENT_SECURE_CONNECTION = 32768; new 4.1 authentication
For instance, you can specify 2080 (2048+32) to use both compression and SSL, or 32768 to use new authentication only. Initially, this option was introduced to be able to use compression when the indexer and mysqld are on different hosts. Compression on 1 Gbps links is most likely to hurt indexing time though it reduces network traffic, both in theory and in practice. However, enabling compression on 100 Mbps links may improve indexing time significantly (up to 20-30% of the total indexing time improvement was reported). Your mileage may vary.

mysql_connect_flags = 32 # enable compression

SSL certificate settings

mysql_ssl_cert - path to SSL certificate
mysql_ssl_key - path to SSL key file
mysql_ssl_ca - path to CA certificate

unpack_mysqlcompress

unpack_mysqlcompress_maxsize = 1M

Columns to unpack using MySQL UNCOMPRESS() algorithm. Multi-value, optional, default value is an empty list of columns.

Columns specified using this directive will be unpacked by the indexer using the modified zlib algorithm used by MySQL COMPRESS() and UNCOMPRESS() functions. When indexing on a different box than the database, this lets you offload the database and save on network traffic. This feature is only available if zlib and zlib-devel were both available during build time.

unpack_mysqlcompress = body_compressed
unpack_mysqlcompress = description_compressed

By default, a buffer of 16M is used for uncompressing the data. This can be changed by setting unpack_mysqlcompress_maxsize.

When using unpack_mysqlcompress, due to implementation intricacies, it is not possible to deduce the required buffer size from the compressed data. So, the buffer must be preallocated in advance, and the unpacked data can not go over the buffer size.

unpack_zlib

unpack_zlib = col1
unpack_zlib = col2

Columns to unpack using zlib (aka deflate, aka gunzip). Multi-value, optional, default value is an empty list of columns. Applies to source types mysql and pgsql only.

Columns specified using this directive will be unpacked by the indexer using the standard zlib algorithm (called deflate and also implemented by gunzip). When indexing on a different box than the database, this lets you offload the database and save on network traffic. This feature is only available if zlib and zlib-devel were both available during build time.

MSSQL

MS SQL Windows authentication flag. Whether to use currently logged-in Windows account credentials for authentication when connecting to MS SQL Server.

mssql_winauth = 1

ODBC

Sources using ODBC require the presence of a DSN (Data Source Name) string which can be set with odbc_dsn.

odbc_dsn = Driver={Oracle ODBC Driver};Dbq=myDBName;Uid=myUsername;Pwd=myPassword

Please note that the format depends on the specific ODBC driver used.

¶ 12.2.2.3

Execution of fetch queries

With all the SQL drivers, building a plain table generally works as follows.

A connection to the database is established.
The sql_query_pre_all queries are executed to perform any necessary initial setup, such as setting per-connection encoding with MySQL. These queries run before the entire indexing process, and also after a reconnect for indexing MVA attributes and joined fields.
The sql_query_pre pre-query is executed to perform any necessary initial setup, such as setting up temporary tables or maintaining counter tables. These queries run once for the entire indexing process.
Pre-queries as sql_query_pre is executed to perform any necessary initial setup, such as setting up temporary
tables, or maintaining counter table. These queries run once per whole indexing.
Main query as sql_query is executed and the rows it returns are processed.
Post-query as sql_query_post is executed to perform some necessary cleanup.
The connection to the database is closed.
Indexer does the sorting phase (to be pedantic, table-type specific post-processing).
A connection to the database is established again.
Post-processing query as sql_query_post_index is executed to perform some necessary final cleanup.
The connection to the database is closed again.

Example of a source fetching data from MYSQL:

source mysource {
  type             = mysql
  path             = /path/to/realtime
  sql_host         = localhost
  sql_user         = myuser
  sql_pass         = mypass
  sql_db           = mydb
  sql_query_pre    = SET CHARACTER_SET_RESULTS=utf8
  sql_query_pre    = SET NAMES utf8
  sql_query        =  SELECT id, title, description, category_id  FROM mytable
  sql_query_post   = DROP TABLE view_table
  sql_query_post_index = REPLACE INTO counters ( id, val ) \
    VALUES ( 'max_indexed_id', $maxid )
  sql_attr_uint    = category_id
  sql_field_string = title
 }

table mytable {
  type   = plain
  source = mysource
  path   = /path/to/mytable
  ...
 }

sql_query

This is the query used to retrieve documents from a SQL server. There can be only one sql_query declared, and it's mandatory to have one. See also Processing fetched data

sql_query_pre

Pre-fetch query or pre-query. This is a multi-value, optional setting, with the default being an empty list of queries. The pre-queries are executed before the sql_query in the order they appear in the configuration file. The results of the pre-queries are ignored.

Pre-queries are useful in many ways. They can be used to set up encoding, mark records that are going to be indexed, update internal counters, set various per-connection SQL server options and variables, and so on.

Perhaps the most frequent use of pre-query is to specify the encoding that the server will use for the rows it returns. Note that Manticore accepts only UTF-8 text. Two MySQL specific examples of setting the encoding are:

sql_query_pre = SET CHARACTER_SET_RESULTS=utf8
sql_query_pre = SET NAMES utf8

Also, specific to MySQL sources, it is useful to disable query cache (for indexer connection only) in pre-query, because indexing queries are not going to be re-run frequently anyway, and there's no sense in caching their results.
That could be achieved with:

sql_query_pre = SET SESSION query_cache_type=OFF

sql_query_post

Post-fetch query. This is an optional setting, with the default value being empty.

This query is executed immediately after sql_query completes successfully. When the post-fetch query produces errors, they are reported as warnings, but indexing is not terminated. Its result set is ignored. Note that indexing is not yet completed at the point when this query gets executed, and further indexing may still fail. Therefore, any permanent updates should not be done from here. For instance, updates on a helper table that permanently change the last successfully indexed ID should not be run from the sql_query_post query; they should be run from the sql_query_post_index query instead.

sql_query_post_index

Post-processing query. This is an optional setting, with the default value being empty.

This query is executed when indexing is fully and successfully completed. If this query produces errors, they are reported as warnings, but indexing is not terminated. Its result set is ignored. The $maxid macro can be used in its text; it will be expanded to the maximum document ID that was actually fetched from the database during indexing. If no documents were indexed, $maxid will be expanded to 0.

Example:

sql_query_post_index = REPLACE INTO counters ( id, val ) \
    VALUES ( 'max_indexed_id', $maxid )

The difference between sql_query_post and sql_query_post_index is that sql_query_post is run immediately when Manticore receives all the documents, but further indexing may still fail for some other reason. On the contrary, by the time the sql_query_post_index query gets executed, it is guaranteed that the table was created successfully. Database connection is dropped and re-established because sorting phase can be very lengthy and would just time out otherwise.

¶ 12.2.2.4

Processing fetched data

By default, the first column from the result set of sql_query is indexed as the document id.

Document ID MUST be the very first field, and it MUST BE UNIQUE SIGNED (NON-ZERO) INTEGER NUMBER from -9223372036854775808 to 9223372036854775807.

You can specify up to 256 full-text fields and an arbitrary amount of attributes. All the columns that are neither document ID (the first one) nor attributes will be indexed as full-text fields.

Declaration of attributes:

sql_attr_bigint

Declares a 64-bit signed integer.

sql_attr_bool

Declares a boolean attribute. It's equivalent to an integer attribute with bit count of 1.

sql_attr_float

Declares a floating point attribute.

The values will be stored in single precision, 32-bit IEEE 754 format. Represented range is approximately from 1e-38 to 1e+38. The amount of decimal digits that can be stored precisely is approximately 7.

One important usage of float attributes is storing latitude and longitude values (in radians), for further usage in query-time geosphere distance calculations.

sql_attr_json

Declares a JSON attribute.

When indexing JSON attributes, Manticore expects a text field with JSON formatted data. JSON attributes support arbitrary JSON data with no limitation in nested levels or types.

sql_attr_multi

Declares a multi-value attribute.

Plain attributes only allow attaching 1 value per each document. However, there are cases (such as tags or categories) when it is desired to attach multiple values of the same attribute and be able to apply filtering or grouping to value lists.

The MVA can take the values from a column (like the rest of the data types) - in this case, the column in the result set must provide a string with multiple integer values separated by commas - or by running a separate query to get the values.

When executing a query, the engine runs the query, groups the results by IDs, and assigns the values to their corresponding documents in the table. Values with an ID not found in the table are discarded. Before executing the query, any defined sql_query_pre_all will be run.

The declaration format for sql_attr_multi is as follows:

sql_attr_multi = ATTR-TYPE ATTR-NAME 'from' SOURCE-TYPE \
    [;QUERY] \
    [;RANGED-QUERY]

where

ATTR-TYPE is uint, bigint or timestamp.
SOURCE-TYPE is field, query, ranged-query, or ranged-main-query.
QUERY is an optional SQL query used to fetch all (docid, attrvalue) pairs.
RANGED-QUERY is an optional SQL query used to fetch min and max ID values, similar to sql_query_range.
The backslashes are included for clarity only; everything can be declared in a single line as well.

It's used with ranged-query SOURCE-TYPE. If using ranged-main-query SOURCE-TYPE, then omit the RANGED-QUERY, and it will automatically use the same query from sql_query_range(useful option in complex inheritance setups to save having to manually duplicate the same query many times).

sql_attr_multi = uint tag from field
sql_attr_multi = uint tag from query; SELECT id, tag FROM tags
sql_attr_multi = bigint tag from ranged-query; \
    SELECT id, tag FROM tags WHERE id>=$start AND id<=$end; \
    SELECT MIN(id), MAX(id) FROM tags

sql_attr_string

Declares a string attribute. The maximum size of each value is fixed at 4GB.

sql_attr_timestamp

Declares a UNIX timestamp.

Timestamps can store dates and times in the range of January 01, 1970, to January 19, 2038, with a precision of one second. The expected column value should be a timestamp in UNIX format, which is a 32-bit unsigned integer number of seconds elapsed since midnight on January 01, 1970, GMT. Timestamps are internally stored and handled as integers everywhere. In addition to working with timestamps as integers, you can also use them with different date-based functions, such as time segments sorting mode or day/week/month/year extraction for GROUP BY.

Note that DATE or DATETIME column types in MySQL cannot be directly used as timestamp attributes in Manticore; you need to explicitly convert such columns using UNIX_TIMESTAMP function (if the data is in range).

Note timestamps can not represent dates before January 01, 1970, and UNIX_TIMESTAMP() in MySQL will not return anything expected. If you only need to work with dates, not times, consider TO_DAYS() function in MySQL instead.

sql_attr_uint

Declares an unsigned integer attribute.

You can specify the bit count for integer attributes by appending ':BITCOUNT' to attribute name (see example below). Attributes with less than default 32-bit size, or bitfields, perform slower.

sql_attr_uint = group_id
sql_attr_uint = forum_id:9 # 9 bits for forum_id

sql_field_string

Declares a combo string attribute/text field. The values will be indexed as a full-text field, but also stored in a string attribute with the same name. Note, it should only be used when you are sure you want the field to be searchable both in a full-text manner and as an attribute (with the ability to sort and group by it). If you just want to be able to fetch the original value of the field, you don't need to do anything for it unless you implicitly removed the field from the stored fields list via stored_fields.

sql_field_string = name

sql_file_field

Declares a file based field.

This directive makes indexer interpret field contents as a file name, and load and process the referred file. Files larger than max_file_field_buffer in size are skipped. Any errors during the file loading (IO errors, missed limits, etc.) will be reported as indexing warnings and will not early terminate the indexing. No content will be indexed for such files.

sql_file_field = field_name

sql_joined_field

Joined/payload field fetch query. Multi-value, optional, the default is an empty list of queries.

sql_joined_field lets you use two different features: joined fields and payloads (payload fields). Its syntax is as follows:

sql_joined_field = FIELD-NAME 'from'  ( 'query' | 'payload-query' | 'ranged-query' | 'ranged-main-query' ); \
        QUERY [ ; RANGE-QUERY ]

where

FIELD-NAME is a joined/payload field name
QUERY is an SQL query that must fetch values for further processing
RANGE-QUERY is an optional SQL query that fetches a range of values to process

Joined fields let you avoid JOIN and/or GROUP_CONCAT statements in the main document fetch query (sql_query). This can be useful when the SQL-side JOIN is slow, or needs to be offloaded on the Manticore side, or simply to emulate MySQL-specific GROUP_CONCAT functionality in case your database server does not support it.

The query must return exactly 2 columns: document ID, and text to append to a joined field. Document IDs can be duplicate, but they must be in ascending order. All the text rows fetched for a given ID will be concatenated together, and the concatenation result will be indexed as the entire contents of a joined field. Rows will be concatenated in the order returned from the query, and separating whitespace will be inserted between them. For instance, if the joined field query returns the following rows:

( 1, 'red' )
( 1, 'right' )
( 1, 'hand' )
( 2, 'mysql' )
( 2, 'manticore' )

then the indexing results would be equivalent to adding a new text field with a value of 'red right hand' to document 1 and 'mysql sphinx' to document 2, including the keyword positions inside the field in the order they come from the query. If the rows need to be in a specific order, that needs to be explicitly defined in the query.

Joined fields are only indexed differently. There are no other differences between joined fields and regular text fields.

Before executing the joined fields query, any set of sql_query_pre_all will be run, if any exist. This allows you to set the desired encoding, etc., within the joined fields' context.

When a single query is not efficient enough or does not work because of the database driver limitations, ranged queries can be used. It works similarly to the ranged queries in the main indexing loop. The range will be queried for and fetched upfront once, then multiple queries with different $start and $end substitutions will be run to fetch the actual data.

When using ranged-main-query query, omit the ranged-query, and it will automatically use the same query from sql_query_range (a useful option in complex inheritance setups to save having to manually duplicate the same query many times).

Payloads let you create a special field in which, instead of keyword positions, so-called user payloads are stored. Payloads are custom integer values attached to every keyword. They can then be used at search time to affect the ranking.

The payload query must return exactly 3 columns:
- document ID
- keyword
- and integer payload value.

Document IDs can be duplicate, but they must be in ascending order. Payloads must be unsigned integers within the 24-bit range, i.e., from 0 to 16777215.

The only ranker that accounts for payloads is proximity_bm25 (the default ranker). On tables with payload fields, it will automatically switch to a variant that matches keywords in those fields, computes a sum of matched payloads multiplied by field weights, and adds that sum to the final rank.

Please note that the payload field is ignored for full-text queries containing complex operators. It only works for simple bag-of-words queries.

Configuration example:

Configuration file

source min {
    type = mysql
    sql_host = localhost
    sql_user = test
    sql_pass =
    sql_db = test
    sql_query = select 1, 'Nike bag' f \
    UNION select 2, 'Adidas bag' f \
    UNION select 3, 'Reebok bag' f \
    UNION select 4, 'Nike belt' f

    sql_joined_field = tag from payload-query; select 1 id, 'nike' tag, 10 weight \
    UNION select 4 id, 'nike' tag, 10 weight;
}

index idx {
    path = idx
    source = min
}

Just SELECT

mysql> select * from idx;
+------+------------+------+
| id   | f          | tag  |
+------+------------+------+
|    1 | Nike bag   | nike |
|    2 | Adidas bag |      |
|    3 | Reebok bag |      |
|    4 | Nike belt  | nike |
+------+------------+------+
4 rows in set (0.00 sec)

Full-text search

Note that when you search for `nike | adidas`, the results containing "nike" receive a higher weight due to the "nike" tag and its weight originating from the payload query.

mysql> select *, weight() from idx where match('nike|adidas');
+------+------------+------+----------+
| id   | f          | tag  | weight() |
+------+------------+------+----------+
|    1 | Nike bag   | nike |    11539 |
|    4 | Nike belt  | nike |    11539 |
|    2 | Adidas bag |      |     1597 |
+------+------------+------+----------+
3 rows in set (0.01 sec)

Complex full-text search

Note that the special payload field is ignored for full-text queries containing complex operators. It only works for simple bag-of-words queries.

mysql> select *, weight() from idx where match('"nike bag"|"adidas bag"');
+------+------------+------+----------+
| id   | f          | tag  | weight() |
+------+------------+------+----------+
|    2 | Adidas bag |      |     2565 |
|    1 | Nike bag   | nike |     2507 |
+------+------------+------+----------+
2 rows in set (0.00 sec)

sql_column_buffers

sql_column_buffers = <colname>=<size>[K|M] [, ...]

Per-column buffer sizes. Optional, default is empty (deduce the sizes automatically). Applies to odbc, mssql source types only.

ODBC and MS SQL drivers sometimes cannot return the maximum actual column size to be expected. For instance,NVARCHAR(MAX) columns always report their length as 2147483647 bytes to indexer even though the actually used length is likely considerably less. However, the receiving buffers still need to be allocated upfront, and their sizes have to be determined. When the driver does not report the column length at all, Manticore allocates default 1 KB buffers for each non-char column, and 1 MB buffers for each char column. Driver-reported column length also gets clamped by an upper limit of 8 MB, so in case the driver reports (almost) a 2 GB column length, it will be clamped and an 8 MB buffer will be allocated instead for that column. These hard-coded limits can be overridden using the sql_column_buffers directive, either in order to save memory on actually shorter columns or to overcome the 8 MB limit on actually longer columns. The directive values must be a comma-separated list of selected column names and sizes:

Example:

sql_query = SELECT id, mytitle, mycontent FROM documents
sql_column_buffers = mytitle=64K, mycontent=10M

¶ 12.2.2.5

Ranged queries

Main query, which needs to fetch all the documents, can impose a read lock on the whole table and stall the concurrent queries (e.g. INSERTs to MyISAM table), waste a lot of memory for result set, etc. To avoid this, Manticore supports so-called ranged queries. With ranged queries, Manticore first fetches min and max document IDs from the table, and then substitutes different ID intervals into main query text and runs the modified query to fetch another chunk of documents. Here's an example.

Ranged query usage example:

sql_query_range = SELECT MIN(id),MAX(id) FROM documents
sql_range_step = 1000
sql_query = SELECT * FROM documents WHERE id>=$start AND id<=$end

If the table contains document IDs from 1 to, say, 2345, then sql_query would be run three times:

with $start replaced with 1 and $end replaced with 1000;
with $start replaced with 1001 and $end replaced with 2000;
with $start replaced with 2001 and $end replaced with 2345.

Obviously, that's not much of a difference for 2000-row table, but when it comes to indexing 10-million-row table, ranged queries might be of some help.

sql_query_range

Defines the range query. The query specified in this option must fetch min and max document IDs that will be used as range boundaries. It must return exactly two integer fields, min ID first and max ID second; the field names are ignored. When enabled, sql_query will be required to contain $start and $end macros. Note that the intervals specified by $start..$end will not overlap, so you should not remove document IDs that are exactly equal to $start or $end from your query.

sql_range_step

This directive defines the range query step. The default value is 1024.

sql_ranged_throttle

This directive can be used to throttle the ranged query. By default, there is no throttling. Values for sql_ranged_throttle should be specified in milliseconds.

Throttling can be useful when the indexer imposes too much load on the database server. It causes the indexer to sleep for a given amount of time once per each ranged query step. This sleep is unconditional and is performed before the fetch query.

sql_ranged_throttle = 1000 # sleep for 1 sec before each query step

¶ 12.2.3

Fetching from XML streams

The xmlpipe2 source type allows for passing custom full-text and attribute data to Manticore in a custom XML format, with the schema (i.e., set of fields and attributes) specified in either the XML stream itself or in the source settings.

Declaration of XML stream

To declare the XML stream, the xmlpipe_command directive is mandatory and contains the shell command that produces the XML stream to be indexed. This can be a file, but it can also be a program that generates XML content on-the-fly.

XML file format

When indexing an xmlpipe2 source, the indexer runs the specified command, opens a pipe to its stdout, and expects a well-formed XML stream.

Here's an example of what the XML stream data might look like:

<?xml version="1.0" encoding="utf-8"?>
<sphinx:docset>

<sphinx:schema>
<sphinx:field name="subject"/>
<sphinx:field name="content"/>
<sphinx:attr name="published" type="timestamp"/>
<sphinx:attr name="author_id" type="int" bits="16" default="1"/>
</sphinx:schema>

<sphinx:document id="1234">
<content>this is the main content <![CDATA[and this <cdata> entry
must be handled properly by xml parser lib]]></content>
<published>1012325463</published>
<subject>note how field/attr tags can be
in <b> class="red">randomized</b> order</subject>
<misc>some undeclared element</misc>
</sphinx:document>

<sphinx:document id="1235">
<subject>another subject</subject>
<content>here comes another document, and i am given to understand,
that in-document field order must not matter, sir</content>
<published>1012325467</published>
</sphinx:document>

<!-- ... even more sphinx:document entries here ... -->

<sphinx:killlist>
<id>1234</id>
<id>4567</id>
</sphinx:killlist>

</sphinx:docset>

Arbitrary fields and attributes are allowed. They can also occur in the stream in arbitrary order within each document; the order is ignored. There is a restriction on the maximum field length; fields longer than 2 MB will be truncated to 2 MB (this limit can be changed in the source).

The schema, i.e., complete fields and attributes list, must be declared before any document can be parsed. This can be done either in the configuration file by using xmlpipe_field and xmlpipe_attr_XXX settings, or right in the stream using <sphinx:schema> element. <sphinx:schema> is optional. It is only allowed to occur as the very first sub-element in <sphinx:docset>. If there is no in-stream schema definition, settings from the configuration file will be used. Otherwise, stream settings take precedence. Note that the document id should be specified as a property id of tag <sphinx:document> (e.g. <sphinx:document id="1235">) and is supposed to be a unique-signed positive non-zero 64-bit integer.

Unknown tags (which were not declared neither as fields nor as attributes) will be ignored with a warning. In the example above, <misc> will be ignored. All embedded tags and their attributes (such as <strong> in <subject> in the example above) will be silently ignored.

Support for incoming stream encodings depends on whether iconv is installed on the system. xmlpipe2 is parsed using the libexpat parser, which understands US-ASCII, ISO-8859-1, UTF-8, and a few UTF-16 variants natively. Manticore's configure script will also check for libiconv presence and utilize it to handle other encodings. libexpat also enforces the requirement to use the UTF-8 charset on the Manticore side because the parsed data it returns is always in UTF-8.

XML elements (tags) recognized by xmlpipe2 (and their attributes where applicable) are:

sphinx:docset - Mandatory top-level element, denotes and contains the xmlpipe2 document set.
sphinx:schema - Optional element, must either occur as the very first child of sphinx:docset or never occur at all. Declares the document schema and contains field and attribute declarations. If present, it overrides per-source settings from the configuration file.
sphinx:field - Optional element, child of sphinx:schema. Declares a full-text field. Known attributes are:
- "name", specifies the XML element name that will be treated as a full-text field in the subsequent documents.
- attr", specifies whether to also index this field as a string. Possible value is "string".
sphinx:attr - Optional element, child of sphinx:schema. Declares an attribute. Known attributes are:
- "name", specifies the element name that should be treated as an attribute in the subsequent documents.
- "type", specifies the attribute type. Possible values are "int", "bigint", "timestamp", "bool", "float", "multi" and "json".
- "bits", specifies the bit size for the "int" attribute type. Valid values are 1 to 32.
- "default", specifies the default value for this attribute that should be used if the attribute's element is not present in the document.
sphinx:document - Mandatory element, must be a child of sphinx:docset. Contains arbitrary other elements with field and attribute values to be indexed, as declared either using sphinx:field and sphinx:attr elements or in the configuration file. The only known attribute is "id" that must contain the unique integer document ID.
sphinx:killlist - Optional element, child of sphinx:docset. Contains a number of "id" elements whose contents are document IDs to be put into a kill-list of the table. The kill-list is used in multi-table searches to suppress documents found in other tables of the search.

Data definition in source configuration

If the XML doesn't define a schema, the data types of tables elements must be defined in the source configuration.

xmlpipe_field - declares a text field.
xmlpipe_field_string - declares a text field/string attribute. The column will be both indexed as a text field but also stored as a string attribute.
xmlpipe_attr_uint - declares an integer attribute
xmlpipe_attr_timestamp - declares a timestamp attribute
xmlpipe_attr_bool - declares a boolean attribute
xmlpipe_attr_float - declares a float attribute
xmlpipe_attr_bigint - declares a big integer attribute
xmlpipe_attr_multi - declares a multi-value attribute with integers
xmlpipe_attr_multi_64 - declares a multi-value attribute with 64-bit integers
xmlpipe_attr_string - declares a string attribute
xmlpipe_attr_json - declares a JSON attribute

Specific XML source settings

If xmlpipe_fixup_utf8 is set it will enable Manticore-side UTF-8 validation and filtering to prevent XML parser from choking on non-UTF-8 documents. By default, this option is disabled.

Under certain occasions it might be hard or even impossible to guarantee that the incoming XMLpipe2 document bodies are in perfectly valid and conforming UTF-8 encoding. For instance, documents with national single-byte encodings could sneak into the stream. libexpat XML parser is fragile, meaning that it will stop processing in such cases. UTF8 fixup feature lets you avoid that. When fixup is enabled, Manticore will preprocess the incoming stream before passing it to the XML parser and replace invalid UTF-8 sequences with spaces.

xmlpipe_fixup_utf8 = 1

Example of XML source without schema in configuration:

source xml_test_1
{
    type = xmlpipe2
    xmlpipe_command = cat /tmp/products_today.xml
}

Example of XML source with schema in configuration:

source xml_test_2
{
    type = xmlpipe2
    xmlpipe_command = cat /tmp/products_today.xml
    xmlpipe_field = subject
    xmlpipe_field = content
    xmlpipe_attr_timestamp = published
    xmlpipe_attr_uint = author_id:16
}

¶ 12.2.4

Fetching from TSV,CSV

TSV/CSV is the simplest way to pass data to the Manticore indexer. This method was created due to the limitations of xmlpipe2. In xmlpipe2, the indexer must map each attribute and field tag in the XML file to a corresponding schema element. This mapping requires time, and it increases with the number of fields and attributes in the schema. TSV/CSV has no such issue, as each field and attribute corresponds to a particular column in the TSV/CSV file. In some cases, TSV/CSV could work slightly faster than xmlpipe2.

File format

The first column in TSV/CSV file must be a document ID. The rest columns must mirror the declaration of fields and attributes in the schema definition. Note that you don't need to declare the document ID in the schema, since it's always considered to be present, should be in the 1st column and needs to be a unique-signed positive non-zero 64-bit integer.

The difference between tsvpipe and csvpipe is delimiter and quoting rules. tsvpipe has a tab character as hardcoded delimiter and has no quoting rules. csvpipe has the csvpipe_delimiteroption for delimiter with a default value of , and also has quoting rules, such as:

Any field may be quoted
Fields containing a line-break, double-quote or commas should be quoted
A double quote character in a field must be represented by two double quote characters

Declaration of TSV stream

tsvpipe_command directive is mandatory and contains the shell command invoked to produce the TSV stream that gets indexed. The command can read a TSV file, but it can also be a program that generates on-the-fly the tab delimited content.

TSV indexed columns

The following directives can be used to declare the types of the indexed columns:

tsvpipe_field - declares a text field.
tsvpipe_field_string - declares a text field/string attribute. The column will be both indexed as a text field but also stored as a string attribute.
tsvpipe_attr_uint - declares an integer attribute.
tsvpipe_attr_timestamp - declares a timestamp attribute.
tsvpipe_attr_bool - declares a boolean attribute.
tsvpipe_attr_float - declares a float attribute.
tsvpipe_attr_bigint - declares a big integer attribute.
tsvpipe_attr_multi - declares a multi-value attribute with integers.
tsvpipe_attr_multi_64 - declares a multi-value attribute with 64-bit integers.
tsvpipe_attr_string - declares a string attribute.
tsvpipe_attr_json - declares a JSON attribute.

Example of a source using a TSV file:

source tsv_test
{
    type = tsvpipe
    tsvpipe_command = cat /tmp/rock_bands.tsv
    tsvpipe_field = name
    tsvpipe_attr_multi = genre_tags
}

1   Led Zeppelin    35,23,16
2   Deep Purple 35,92
3   Frank Zappa 35,23,16,92,33,24

Declaration of CSV stream

csvpipe_command directive is mandatory and contains the shell command invoked to produce the CSV stream which gets indexed. The command can just read a CSV file but it can also be a program that generates on-the-fly the comma delimited content.

CSV indexed columns

The following directives can be used to declare the types of the indexed columns:

csvpipe_field - declares a text field.
csvpipe_field_string - declares a text field/string attribute. The column will be both indexed as a text field but also stored as a string attribute.
csvpipe_attr_uint - declares an integer attribute.
csvpipe_attr_timestamp - declares a timestamp attribute.
csvpipe_attr_bool - declares a boolean attribute.
csvpipe_attr_float - declares a float attribute.
csvpipe_attr_bigint - declares a big integer attribute.
csvpipe_attr_multi - declares a multi-value attribute with integers.
csvpipe_attr_multi_64 - declares a multi-value attribute with 64-bit integers.
csvpipe_attr_string - declares a string attribute.
csvpipe_attr_json - declares a JSON attribute.

Example of a source using a CSV file:

source csv_test
{
    type = csvpipe
    csvpipe_command = cat /tmp/rock_bands.csv
    csvpipe_field = name
    csvpipe_attr_multi = genre_tags
}

1,"Led Zeppelin","35,23,16"
2,"Deep Purple","35,92"
3,"Frank Zappa","35,23,16,92,33,24"

¶ 12.2.5

Main+delta schema

In many situations, the total dataset is too large to be frequently rebuilt from scratch, while the number of new records remains relatively small. For example, a forum may have 1,000,000 archived posts but only receive 1,000 new posts per day.

In such cases, implementing "live" (nearly real-time) table updates can be achieved using a "main+delta" scheme.

The concept involves setting up two sources and two tables, with one "main" table for data that rarely changes (if ever), and one "delta" table for new documents. In the example, the 1,000,000 archived posts would be stored in the main table, while the 1,000 new daily posts would be placed in the delta table. The delta table can then be rebuilt frequently, making the documents available for searching within seconds or minutes. Determining which documents belong to which table and rebuilding the main table can be fully automated. One approach is to create a counter table that tracks the ID used to split the documents and update it whenever the main table is rebuilt.

Using a timestamp column as the split variable is more effective than using the ID since timestamps can track not only new documents but also modified ones.

For datasets that may contain modified or deleted documents, the delta table should provide a list of affected documents, ensuring they are suppressed and excluded from search queries. This is accomplished using a feature called Kill Lists. The document IDs to be killed can be specified in an auxiliary query defined by sql_query_killlist. The delta table must indicate the target tables for which the kill lists will be applied using the killlist_target directive. The impact of kill lists is permanent on the target table, meaning that even if a search is performed without the delta table, the suppressed documents will not appear in the search results.

Notice how we're overriding sql_query_pre in the delta source. We must explicitly include this override. If we don't, the REPLACE query would be executed during the delta source's build as well, effectively rendering it useless.

Example

# in MySQL
CREATE TABLE deltabreaker (
  index_name VARCHAR(50) NOT NULL,
  created_at TIMESTAMP NOT NULL  DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
  PRIMARY KEY (index_name)
);

# in manticore.conf
source main {
  ...
  sql_query_pre = REPLACE INTO deltabreaker SET index_name = 'main', created_at = NOW()
  sql_query =  SELECT id, title, UNIX_TIMESTAMP(updated_at) AS updated FROM documents WHERE deleted=0 AND  updated_at  >=FROM_UNIXTIME($start) AND updated_at  <=FROM_UNIXTIME($end)
  sql_query_range  = SELECT ( SELECT UNIX_TIMESTAMP(MIN(updated_at)) FROM documents) min, ( SELECT UNIX_TIMESTAMP(created_at)-1 FROM deltabreaker WHERE index_name='main') max
  sql_query_post_index = REPLACE INTO deltabreaker set index_name = 'delta', created_at = (SELECT created_at FROM deltabreaker t WHERE index_name='main')
  ...
  sql_attr_timestamp = updated
}

source delta : main {
  sql_query_pre =
  sql_query_range = SELECT ( SELECT UNIX_TIMESTAMP(created_at) FROM deltabreaker WHERE index_name='delta') min, UNIX_TIMESTAMP() max
  sql_query_killlist = SELECT id FROM documents WHERE updated_at >=  (SELECT created_at FROM deltabreaker WHERE index_name='delta')
}

table main {
  path = /var/lib/manticore/main
  source = main
}

table delta {
  path = /var/lib/manticore/delta
  source = delta
  killlist_target = main:kl
}

¶ 12.2.6

Adding data from tables

¶ 12.2.6.1

Merging tables

Merging two existing plain tables can be more efficient than indexing the data from scratch and might be desired in some cases (such as merging 'main' and 'delta' tables instead of simply rebuilding 'main' in the 'main+delta' partitioning scheme). Thus,indexer provides an option to do that. Merging tables is typically faster than rebuilding, but still not instant for huge tables. Essentially, it needs to read the contents of both tables once and write the result once. Merging a 100 GB and 1 GB table, for example, will result in 202 GB of I/O (but that's still likely less than indexing from scratch requires).

The basic command syntax is as follows:

sudo -u manticore indexer --merge DSTINDEX SRCINDEX [--rotate] [--drop-src]

Unless --drop-src is specified, only the DSTINDEX table will be affected: the contents of SRCINDEX will be merged into it.

The --rotate switch is required if DSTINDEX is already being served by searchd.

The typical usage pattern is to merge a smaller update from SRCINDEX into DSTINDEX. Thus, when merging attributes, the values from SRCINDEX will take precedence if duplicate document IDs are encountered. However, note that the "old" keywords will not be automatically removed in such cases. For example, if there's a keyword "old" associated with document 123 in DSTINDEX, and a keyword "new" associated with it in SRCINDEX, document 123 will be found by both keywords after the merge. You can supply an explicit condition to remove documents from DSTINDEX to mitigate this; the relevant switch is --merge-dst-range:

sudo -u manticore indexer --merge main delta --merge-dst-range deleted 0 0

This switch allows you to apply filters to the destination table along with merging. There can be several filters; all of their conditions must be met in order to include the document in the resulting merged table. In the example above, the filter passes only those records where 'deleted' is 0, eliminating all records that were flagged as deleted.

--drop-src enables dropping SRCINDEX after the merge and before rotating the tables, which is important if you specify DSTINDEX in killlist_target of DSTINDEX. Otherwise, when rotating the tables, the documents that have been merged into DSTINDEX may be suppressed by SRCINDEX.

¶ 12.2.6.2

Killlist in plain tables

When using plain tables, there's a challenge arising from the need to have the data in the table as fresh as possible.

In this case, one or more secondary (also known as delta) tables are used to capture the modified data between the time the main table was created and the current time. The modified data can include new, updated, or deleted documents. The search becomes a search over the main table and the delta table. This works seamlessly when you just add new documents to the delta table, but when it comes to updated or deleted documents, there remains the following issue.

If a document is present in both the main and delta tables, it can cause issues during searching, as the engine will see two versions of a document and won't know how to pick the right one. So, the delta needs to somehow inform the search that there are deleted documents in the main table that should be disregarded. This is where kill lists come in.

Table kill-list

A table can maintain a list of document IDs that can be used to suppress records in other tables. This feature is available for plain tables using database sources or plain tables using XML sources. In the case of database sources, the source needs to provide an additional query defined by sql_query_killlist. It will store in the table a list of documents that can be used by the server to remove documents from other plain tables.

This query is expected to return a number of 1-column rows, each containing just the document ID.

In many cases, the query is a union between a query that retrieves a list of updated documents and a list of deleted documents, e.g.:

sql_query_killlist = \
    SELECT id FROM documents WHERE updated_ts>=@last_reindex UNION \
    SELECT id FROM documents_deleted WHERE deleted_ts>=@last_reindex

Removing documents in a plain table

A plain table can contain a directive called killlist_target that will tell the server it can provide a list of document IDs that should be removed from certain existing tables. The table can use either its document IDs as the source for this list or provide a separate list.

killlist_target

Sets the table(s) that the kill-list will be applied to. Optional, default value is empty.

When you use plain tables you often need to maintain not just a single table, but a set of them to be able to add/update/delete new documents sooner (read about delta table updates). n order to suppress matches in the previous (main) table that were updated or deleted in the next (delta) table, you need to:

Create a kill-list in the delta table using sql_query_killlist
Specify main table as killlist_target in delta table settings:

CONFIG

table products {
  killlist_target = main:kl

  path = products
  source = src_base
}

When killlist_target is specified, the kill-list is applied to all the tables listed in it on searchd startup. If any of the tables from killlist_target are rotated, the kill-list is reapplied to these tables. When the kill-list is applied, tables that were affected save these changes to disk.

killlist_target has 3 modes of operation:

killlist_target = main:kl. Document IDs from the kill-list of the delta table are suppressed in the main table (see sql_query_killlist).
killlist_target = main:id. All document IDs from the delta table are suppressed in the main table. The kill-list is ignored.
killlist_target = main. Both document IDs from the delta table and its kill-list are suppressed in the main table.

Multiple targets can be specified, separated by commas like:

killlist_target = table_one:kl,table_two:kl

You can change the killlist_target settings for a table without rebuilding it by using ALTER.

However, since the 'old' main table has already written the changes to disk, the documents that were deleted in it will remain deleted even if it is no longer in the killlist_target of the delta table.

SQL

ALTER TABLE delta KILLLIST_TARGET='new_main_table:kl'

HTTP

POST /cli -d "
ALTER TABLE delta KILLLIST_TARGET='new_main_table:kl'"

¶ 12.2.6.3

Attaching a plain table to a real-time table

A plain table can be converted into a real-time table or added to an existing real-time table.

The first case is useful when you need to regenerate a real-time table completely, which may be needed, for example, if tokenization settings need an update. In this situation, preparing a plain table and converting it into a real-time table may be easier than preparing a batch job to perform INSERTs for adding all the data into a real-time table.

In the second case, you normally want to add a large bulk of new data to a real-time table, and again, creating a plain table with that data is easier than populating the existing real-time table.

The ATTACH statement allows you to convert a plain table to be attached to an existing real-time table. It also enables you to attach the content of one real-time table to another real-time table.

ATTACH TABLE plain_or_rt_table TO TABLE rt_table [WITH TRUNCATE]

ATTACH TABLE statement lets you move data from a plain table or a RT table to an RT table.

After a successful ATTACH the data originally stored in the source plain table becomes a part of the target RT table, and the source plain table becomes unavailable (until the next rebuild). If the source table is an RT table, its content is moved into the destination RT table, and the source RT table remains empty. ATTACH does not result in any table data changes. Essentially, it just renames the files (making the source table a new disk chunk of the target RT table) and updates the metadata. So it is generally a quick operation that might (frequently) complete as fast as under a second.

Note that when a table is attached to an empty RT table, the fields, attributes, and text processing settings (tokenizer, wordforms, etc.) from the source table are copied over and take effect. The respective parts of the RT table definition from the configuration file will be ignored.

When the TRUNCATE option is used, the RT table gets truncated prior to attaching the source plain table. This allows the operation to be atomic or ensures that the attached source plain table will be the only data in the target RT table.

ATTACH TABLE comes with a number of restrictions. Most notably, the target RT table is currently required to be either empty or have the same settings as the source table. In case the source table gets attached to a non-empty RT table, the RT table data collected so far gets stored as a regular disk chunk, and the table being attached becomes the newest disk chunk, with documents having the same IDs getting killed. The complete list of restrictions is as follows:

The target RT table needs to be either empty or have the same settings as the source table.
The source table needs to have phrase_boundary_step set to 0 and stopword_step set to 1.

Example

Before the ATTACH, the RT table is empty and has 3 fields:

mysql> DESC rt;
Empty set (0.00 sec)

mysql> SELECT * FROM rt;
+-----------+---------+
| Field     | Type    |
+-----------+---------+
| id        | integer |
| testfield | field   |
| testattr  | uint    |
+-----------+---------+
3 rows in set (0.00 sec)

The plain table is not empty:

mysql> SELECT * FROM plain WHERE MATCH('test');
+------+--------+----------+------------+
| id   | weight | group_id | date_added |
+------+--------+----------+------------+
|    1 |   1304 |        1 | 1313643256 |
|    2 |   1304 |        1 | 1313643256 |
|    3 |   1304 |        1 | 1313643256 |
|    4 |   1304 |        1 | 1313643256 |
+------+--------+----------+------------+
4 rows in set (0.00 sec)

Attaching the plain table to the RT table:

mysql> ATTACH TABLE plain TO TABLE rt;
Query OK, 0 rows affected (0.00 sec)

The RT table now has 5 fields:

mysql> DESC rt;
+------------+-----------+
| Field      | Type      |
+------------+-----------+
| id         | integer   |
| title      | field     |
| content    | field     |
| group_id   | uint      |
| date_added | timestamp |
+------------+-----------+
5 rows in set (0.00 sec)

And it's not empty:

mysql> SELECT * FROM rt WHERE MATCH('test');
+------+--------+----------+------------+
| id   | weight | group_id | date_added |
+------+--------+----------+------------+
|    1 |   1304 |        1 | 1313643256 |
|    2 |   1304 |        1 | 1313643256 |
|    3 |   1304 |        1 | 1313643256 |
|    4 |   1304 |        1 | 1313643256 |
+------+--------+----------+------------+
4 rows in set (0.00 sec)

After the ATTACH, the plain table is removed and no longer available for searching:

mysql> SELECT * FROM plain WHERE MATCH('test');
ERROR 1064 (42000): no enabled local indexes to search

¶ 12.2.6.4

Importing table

If you decide to migrate from Plain mode to RT mode or in some other cases, real-time and percolate tables built in Plain mode can be imported to Manticore running in RT mode using the IMPORT TABLE statement. The general syntax is as follows:

IMPORT TABLE table_name FROM 'path'

where the 'path' parameter must be set as: /your_backup_folder/your_backup_name/data/your_table_name/your_table_name

Req

mysql -P9306 -h0 -e 'create table t(f text)'

mysql -P9306 -h0 -e "backup table t to /tmp/"

mysql -P9306 -h0 -e "drop table t"

BACKUP_NAME=$(ls /tmp | grep 'backup-' | tail -n 1)

mysql -P9306 -h0 -e "import table t from '/tmp/$BACKUP_NAME/data/t/t'

mysql -P9306 -h0 -e "show tables"

Executing this command makes all the table files of the specified table copied to data_dir. All the external table files such as wordforms, exceptions and stopwords are also copied to the same data_dir.
IMPORT TABLE has the following limitations:

paths to the external files that were originally specified in the config file must be absolute
only real-time and percolate tables are supported
plain tables need to be preliminarily (in the plain mode) converted to real-time tables via ATTACH TABLE

indexer --print-rt

If the above method for migrating a plain table to an RT table is not possible, you may use indexer --print-rt to dump data from a plain table directly without the need to convert it to an RT type table and then import the dump into an RT table right from the command line.

This method has a few limitations though:

Only SQL-based sources are supported
MVAs are not supported

Req

/usr/bin/indexer --rotate --config /etc/manticoresearch/manticore.conf --print-rt my_rt_index my_plain_index > /tmp/dump_regular.sql

mysql -P $9306 -h0 -e "truncate table my_rt_index"

mysql -P 9306 -h0 < /tmp/dump_regular.sql

rm /tmp/dump_regular.sql

¶ 12.2.7

Rotating a table

Table rotation is a procedure in which the searchd server looks for new versions of defined tables in the configuration. Rotation is supported only in Plain mode of operation.

There can be two cases:

for plain tables that are already loaded
tables added in configuration, but not loaded yet

In the first case, the indexer cannot put the new version of the table online as the running copy is locked and loaded by searchd. In this case indexer needs to be called with the --rotate parameter. If rotate is used, indexer creates new table files with .new. in their names and sends a HUP signal to searchd informing it about the new version. The searchd will perform a lookup and will put the new version of the table in place and discard the old one. In some cases, it might be desired to create the new version of the table but not perform rotate as soon as possible. For example, it might be desired to first check the health of the new table versions. In this case, indexer can accept the--nohup parameter which will forbid sending the HUP signal to the server.

New tables can be loaded by rotation; however, the regular handling of the HUP signal is to check for new tables only if the configuration has changed since server startup. If the table was already defined in the configuration, the table should be first created by running indexer without rotation and perform the RELOAD TABLES statement instead.

There are also two specialized statements that can be used to perform rotations on tables:

RELOAD TABLE

RELOAD TABLE tbl [ FROM '/path/to/table_files' [ OPTION switchover=1 ] ];

The RELOAD TABLE command enables table rotation via SQL.

This command functions in three modes. In the first mode, without specifying a path, the Manticore server checks for new table files in the directory indicated by the path. New table files must be named as tbl.new.sp?.

If you specify a path, the server searches for table files in that directory, relocates them to the table path, renames them from tbl.sp? to tbl.new.sp?, and rotates them.

The third mode, activated by OPTION switchover=1, switches the index to the new path. Here, the daemon tries to load the table directly from the new path without moving the files. If loading is successful, this new index supersedes the old one.

Also, the daemon writes a unique link file (tbl.link) in the directory specified by path, maintaining persistent redirection.

If you revert a redirected index to the path specified in the configuration, the daemon will detect this and delete the link file.

Once redirected, the daemon retrieves the table from the newly linked path. When rotating, it looks for new table versions at the newly redirected path. Bear in mind, the daemon checks the configuration for common errors, like duplicate paths across different tables. However, it won't identify if multiple tables point to the same path via redirection. Under normal operations, tables are locked with the .spl file, but disabling the lock may cause problems. If there's an error (e.g., the path is inaccessible for any reason), you should manually correct (or simply delete) the link file.

indextool follows the link file, but other tools (indexer, index_converter, etc.) do not recognize the link file and consistently use the path defined in the configuration file, ignoring any redirection. Thus, you can inspect the index with indextool, and it will read from the new location. However, more complex operations like merging will not acknowledge any link file.

mysql> RELOAD TABLE plain_table;
mysql> RELOAD TABLE plain_table FROM '/home/mighty/new_table_files';
mysql> RELOAD TABLE plain_table FROM '/home/mighty/new/place/for/table/table_files' OPTION switchover=1;

RELOAD TABLES

RELOAD TABLES;

This command functions similarly to the HUP system signal, triggering a rotation of tables. Nevertheless, it doesn't exactly mirror the typical HUP signal (which can come from a kill -HUP command or indexer --rotate). This command actively searches for any tables needing rotation and is capable of re-reading the configuration. Suppose you launch Manticore in plain mode with a config file that points to a nonexistent plain table. If you then attempt to indexer --rotate the table, the new table won't be recognized by the server until you execute RELOAD TABLES or restart the server.

Depending on the value of the seamless_rotate setting, new queries might be shortly stalled, and clients will receive temporary errors.

mysql> RELOAD TABLES;
Query OK, 0 rows affected (0.01 sec)

Seamless rotate

The rotate assumes old table version is discarded and new table version is loaded and replaces the existing one. During this swapping, the server needs to also serve incoming queries made on the table that is going to be updated. To avoid stalls of the queries, the server implements a seamless rotate of the table by default, as described below.

Tables may contain data that needs to be precached in RAM. At the moment, .spa, .spb, .spi and .spm files are fully precached (they contain attribute data, blob attribute data, keyword table, and killed row map, respectively). Without seamless rotate, rotating a table tries to use as little RAM as possible and works as follows:

New queries are temporarily rejected (with "retry" error code).
searchd waits for all currently running queries to finish.
Old table is deallocated and its files are renamed.
New table files are renamed and required RAM is allocated.
New table attribute and dictionary data is preloaded to RAM.
searchd resumes serving queries from the new table.

However, if there's a lot of attribute or dictionary data, then the preloading step could take noticeable time - up to several minutes in case of preloading 1-5+ GB files.

With seamless rotate enabled, rotation works as follows:

New table RAM storage is allocated.
New table attribute and dictionary data is asynchronously preloaded to RAM.
On success, old table is deallocated and both tables' files are renamed.
On failure, new table is deallocated.
At any given moment, queries are served either from old or new table copy.

Seamless rotate comes at the cost of higher peak memory usage during the rotation (because both old and new copies of .spa/.spb/.spi/.spm data need to be in RAM while preloading the new copy). However, average usage stays the same.

Example:

seamless_rotate = 1

¶ 12.3

Updating documents

¶ 12.3.1

REPLACE vs UPDATE

You can modify existing data in an RT or PQ table by either updating or replacing it.

UPDATE replaces row-wise attribute values of existing documents with new values. Full-text fields and columnar attributes cannot be updated. If you need to change the content of a full-text field or columnar attributes, use REPLACE.

REPLACE works similarly to INSERT except that if an old document has the same ID as the new document, the old document is marked as deleted before the new document is inserted. Note that the old document does not get physically deleted from the table. The deletion can only happen when chunks are merged in a table, e.g., as a result of an OPTIMIZE.

¶ 12.3.2

REPLACE

REPLACE works similarly to INSERT, but it marks the previous document with the same ID as deleted before inserting a new one.

If you are looking for in-place updates, please see this section.

SQL REPLACE

The syntax of the SQL REPLACE statement is as follows:

To replace the whole document:

REPLACE INTO table [(column1, column2, ...)]
    VALUES (value1, value2, ...)
    [, (...)]

To replace only selected fields:

REPLACE INTO table
    SET field1=value1[, ..., fieldN=valueN]
    WHERE id = <id>

Note, you can filter only by id in this mode.

See the examples for more details.

JSON REPLACE

/replace:
POST /replace { "index": "<table name>", "id": <document id>, "doc": { "<field1>": <value1>, ... "<fieldN>": <valueN> } }
/index is an alias endpoint and works the same.
Elasticsearch-like endpoint <table>/_doc/<id>:
PUT/POST /<table name>/_doc/<id> { "<field1>": <value1>, ... "<fieldN>": <valueN> }
Partial replace:

POST /<table name>/_update/<id>
{
  "<field1>": <value1>,
  ...
  "<fieldN>": <valueN>
}

See the examples for more details.

SQL

REPLACE INTO products VALUES(1, "document one", 10);

Query OK, 1 row affected (0.00 sec)

REPLACE ... SET

REPLACE INTO products SET description='HUAWEI Matebook 15', price=10 WHERE id = 55;

Query OK, 1 row affected (0.00 sec)

JSON

POST /replace
-H "Content-Type: application/x-ndjson" -d '
{
  "index":"products",
  "id":1,
  "doc":
  {
    "title":"product one",
    "price":10
  }
}
'

{
  "_index":"products",
  "_id":1,
  "created":false,
  "result":"updated",
  "status":200
}

Elasticsearch-like

PUT /products/_doc/2
{
  "title": "product two",
  "price": 20
}

POST /products/_doc/3
{
  "title": "product three",
  "price": 10
}

{
"_id":2,
"_index":"products",
"_primary_term":1,
"_seq_no":0,
"_shards":{
    "failed":0,
    "successful":1,
    "total":1
},
"_type":"_doc",
"_version":1,
"result":"updated"
}

{
"_id":3,
"_index":"products",
"_primary_term":1,
"_seq_no":0,
"_shards":{
    "failed":0,
    "successful":1,
    "total":1
},
"_type":"_doc",
"_version":1,
"result":"updated"
}

Elasticsearch-like partial

POST /products/_update/55
{
  "doc": {
    "description": "HUAWEI Matebook 15",
    "price": 10
  }
}

{
"_index":"products",
"updated":1
}

PHP

$index->replaceDocument([
   'title' => 'document one',
    'price' => 10
],1);

Array(
    [_index] => products
    [_id] => 1
    [created] => false
    [result] => updated
    [status] => 200
)

Python

indexApi.replace({"index" : "products", "id" : 1, "doc" : {"title" : "document one","price":10}})

{'created': False,
 'found': None,
 'id': 1,
 'index': 'products',
 'result': 'updated'}

javascript

res = await indexApi.replace({"index" : "products", "id" : 1, "doc" : {"title" : "document one","price":10}});

{"_index":"products","_id":1,"result":"updated"}

Java

docRequest = new InsertDocumentRequest();
HashMap<String,Object> doc = new HashMap<String,Object>(){{
            put("title","document one");
            put("price",10);
}};
docRequest.index("products").id(1L).setDoc(doc);
sqlresult = indexApi.replace(docRequest);

class SuccessResponse {
    index: products
    id: 1
    created: false
    result: updated
    found: null
}

Dictionary<string, Object> doc = new Dictionary<string, Object>();
doc.Add("title", "document one");
doc.Add("price", 10);
InsertDocumentRequest docRequest = new InsertDocumentRequest(index: "products", id: 1, doc: doc);
var sqlresult = indexApi.replace(docRequest);

class SuccessResponse {
    index: products
    id: 1
    created: false
    result: updated
    found: null
}

TypeScript

res = await indexApi.replace({
  index: 'test',
  id: 1,
  doc: { content: 'Text 11', name: 'Doc 11', cat: 3 },
});

{
    "_index":"test",
    "_id":1,
    "created":false
    "result":"updated"
    "status":200
}

replaceDoc := map[string]interface{} {"content": "Text 11", "name": "Doc 11", "cat": 3}
replaceRequest := manticoreclient.NewInsertDocumentRequest("test", replaceDoc)
replaceRequest.SetId(1)
res, _, _ := apiClient.IndexAPI.Replace(context.Background()).InsertDocumentRequest(*replaceRequest).Execute()

{
    "_index":"test",
    "_id":1,
    "created":false
    "result":"updated"
    "status":200
}

REPLACE is available for real-time and percolate tables. You can't replace data in a plain table.

When you run a REPLACE, the previous document is not removed, but it's marked as deleted, so the table size grows until chunk merging happens. To force a chunk merge, use the OPTIMIZE statement.

Bulk replace

You can replace multiple documents at once. Check bulk adding documents for more information.

SQL

REPLACE INTO products(id,title,tag) VALUES (1, 'doc one', 10), (2,' doc two', 20);

Query OK, 2 rows affected (0.00 sec)

JSON

POST /bulk
-H "Content-Type: application/x-ndjson" -d '
{ "replace" : { "index" : "products", "id":1, "doc": { "title": "doc one", "tag" : 10 } } }
{ "replace" : { "index" : "products", "id":2, "doc": { "title": "doc two", "tag" : 20 } } }
'

{
  "items":
  [
    {
      "replace":
      {
        "_index":"products",
        "_id":1,
        "created":false,
        "result":"updated",
        "status":200
      }
    },
    {
      "replace":
      {
        "_index":"products",
        "_id":2,
        "created":false,
        "result":"updated",
        "status":200
      }
    }
  ],
  "errors":false
}

PHP

$index->replaceDocuments([
    [   
        'id' => 1,
        'title' => 'document one',
        'tag' => 10
    ],
    [   
        'id' => 2,
        'title' => 'document one',
        'tag' => 20
    ]
);

Array(
    [items] =>
    Array(
        Array(
            [_index] => products
            [_id] => 2
            [created] => false
            [result] => updated
            [status] => 200
        )
        Array(
            [_index] => products
            [_id] => 2
            [created] => false
            [result] => updated
            [status] => 200
        )
    )
    [errors => false
)

Python

indexApi = manticoresearch.IndexApi(client)
docs = [ \
    {"replace": {"index" : "products", "id" : 1, "doc" : {"title" : "document one"}}}, \
    {"replace": {"index" : "products", "id" : 2, "doc" : {"title" : "document two"}}} ]
api_resp = indexApi.bulk('\n'.join(map(json.dumps,docs)))

{'error': None,
 'items': [{u'replace': {u'_id': 1,
                         u'_index': u'products',
                         u'created': False,
                         u'result': u'updated',
                         u'status': 200}},
           {u'replace': {u'_id': 2,
                         u'_index': u'products',
                         u'created': False,
                         u'result': u'updated',
                         u'status': 200}}]}

javascript

docs = [
    {"replace": {"index" : "products", "id" : 1, "doc" : {"title" : "document one"}}},
    {"replace": {"index" : "products", "id" : 2, "doc" : {"title" : "document two"}}} ];
res =  await indexApi.bulk(docs.map(e=>JSON.stringify(e)).join('\n'));

{"items":[{"replace":{"_index":"products","_id":1,"created":false,"result":"updated","status":200}},{"replace":{"_index":"products","_id":2,"created":false,"result":"updated","status":200}}],"errors":false}

Java

body = "{\"replace\": {\"index\" : \"products\", \"id\" : 1, \"doc\" : {\"title\" : \"document one\"}}}" +"\n"+
    "{\"replace\": {\"index\" : \"products\", \"id\" : 2, \"doc\" : {\"title\" : \"document two\"}}}"+"\n" ;         
indexApi.bulk(body);

class BulkResponse {
    items: [{replace={_index=products, _id=1, created=false, result=updated, status=200}}, {replace={_index=products, _id=2, created=false, result=updated, status=200}}]
    error: null
    additionalProperties: {errors=false}
}

string body = "{\"replace\": {\"index\" : \"products\", \"id\" : 1, \"doc\" : {\"title\" : \"document one\"}}}" +"\n"+
    "{\"replace\": {\"index\" : \"products\", \"id\" : 2, \"doc\" : {\"title\" : \"document two\"}}}"+"\n" ;         
indexApi.Bulk(body);

class BulkResponse {
    items: [{replace={_index=products, _id=1, created=false, result=updated, status=200}}, {replace={_index=products, _id=2, created=false, result=updated, status=200}}]
    error: null
    additionalProperties: {errors=false}
}

TypeScript

replaceDocs = [
  {
    replace: {
      index: 'test',
      id: 1,
      doc: { content: 'Text 11', cat: 1, name: 'Doc 11' },
    },
  },
  {
    replace: {
      index: 'test',
      id: 2,
      doc: { content: 'Text 22', cat: 9, name: 'Doc 22' },
    },
  },
];

res = await indexApi.bulk(
  replaceDocs.map((e) => JSON.stringify(e)).join("\n")
);

{
  "items":
  [
    {
      "replace":
      {
        "_index":"test",
        "_id":1,
        "created":false,
        "result":"updated",
        "status":200
      }
    },
    {
      "replace":
      {
        "_index":"test",
        "_id":2,
        "created":false,
        "result":"updated",
        "status":200
      }
    }
  ],
  "errors":false
}

body := "{\"replace\": {\"index\": \"test\", \"id\": 1, \"doc\": {\"content\": \"Text 11\", \"name\": \"Doc 11\", \"cat\": 1 }}}" + "\n" +
    "{\"replace\": {\"index\": \"test\", \"id\": 2, \"doc\": {\"content\": \"Text 22\", \"name\": \"Doc 22\", \"cat\": 9 }}}" +"\n";
res, _, _ := apiClient.IndexAPI.Bulk(context.Background()).Body(body).Execute()

{
  "items":
  [
    {
      "replace":
      {
        "_index":"test",
        "_id":1,
        "created":false,
        "result":"updated",
        "status":200
      }
    },
    {
      "replace":
      {
        "_index":"test",
        "_id":2,
        "created":false,
        "result":"updated",
        "status":200
      }
    }
  ],
  "errors":false
}

¶ 12.3.3

UPDATE

UPDATE changes row-wise attribute values of existing documents in a specified table with new values. Note that you can't update the contents of a fulltext field or a columnar attribute. If there's such a need, use REPLACE.

Attribute updates are supported for RT, PQ, and plain tables. All attribute types can be updated as long as they are stored in the traditional row-wise storage.

Note that the document ID cannot be updated.

Note that when you update an attribute, its secondary index gets disabled, so consider replacing the document instead.

SQL

UPDATE products SET enabled=0 WHERE id=10;

Query OK, 1 row affected (0.00 sec)

JSON

POST /update

{
  "index":"products",
  "id":10,
  "doc":
  {
    "enabled":0
  }
}

{
  "_index":"products",
  "updated":1
}

PHP

$index->updateDocument([
    'enabled'=>0
],10);

Array(
    [_index] => products
    [_id] => 10
    [result] => updated
)

Python

indexApi = api = manticoresearch.IndexApi(client)
indexApi.update({"index" : "products", "id" : 1, "doc" : {"price":10}})

{'id': 1, 'index': 'products', 'result': 'updated', 'updated': None}

javascript

res = await indexApi.update({"index" : "products", "id" : 1, "doc" : {"price":10}});

{"_index":"products","_id":1,"result":"updated"}

Java

UpdateDocumentRequest updateRequest = new UpdateDocumentRequest();
doc = new HashMap<String,Object >(){{
    put("price",10);
}};
updateRequest.index("products").id(1L).setDoc(doc);
indexApi.update(updateRequest);

class UpdateResponse {
    index: products
    updated: null
    id: 1
    result: updated
}

Dictionary<string, Object> doc = new Dictionary<string, Object>(); 
doc.Add("price", 10);
UpdateDocumentRequest updateRequest = new UpdateDocumentRequest(index: "products", id: 1, doc: doc);
indexApi.Update(updateRequest);

class UpdateResponse {
    index: products
    updated: null
    id: 1
    result: updated
}

TypeScript

res = await indexApi.update({ index: "test", id: 1, doc: { cat: 10 } });

{
    "_index":"test",
    "_id":1,
    "result":"updated"
}

updateDoc = map[string]interface{} {"cat":10}
updateRequest = openapiclient.NewUpdateDocumentRequest("test", updateDoc)
updateRequest.SetId(1)
res, _, _ = apiClient.IndexAPI.Update(context.Background()).UpdateDocumentRequest(*updateRequest).Execute()

{
    "_index":"test",
    "_id":1,
    "result":"updated"
}

Multiple attributes can be updated in a single statement. Example:

SQL

UPDATE products
SET price=100000000000,
    coeff=3465.23,
    tags1=(3,6,4),
    tags2=()
WHERE MATCH('phone') AND enabled=1;

Query OK, 148 rows affected (0.0 sec)

JSON

POST /update
{
  "index":"products",
  "doc":
  {
    "price":100000000000,
    "coeff":3465.23,
    "tags1":[3,6,4],
    "tags2":[]
  },
  "query":
  {
    "match": { "*": "phone" },
    "equals": { "enabled": 1 }
  }
}

{
  "_index":"products",
  "updated":148
}

PHP

$query= new BoolQuery();
$query->must(new Match('phone','*'));
$query->must(new Equals('enabled',1));
$index->updateDocuments([
    'price' => 100000000000,
    'coeff' => 3465.23,
    'tags1' => [3,6,4],
    'tags2' => []
    ],
    $query
);

Array(
    [_index] => products
    [updated] => 148
)

Python

indexApi = api = manticoresearch.IndexApi(client)
indexApi.update({"index" : "products", "id" : 1, "doc" : {
    "price": 100000000000,
    "coeff": 3465.23,
    "tags1": [3,6,4],
    "tags2": []}})

{'id': 1, 'index': 'products', 'result': 'updated', 'updated': None}

javascript

res = await indexApi.update({"index" : "products", "id" : 1, "doc" : {
    "price": 100000000000,
    "coeff": 3465.23,
    "tags1": [3,6,4],
    "tags2": []}});

{"_index":"products","_id":1,"result":"updated"}

Java

UpdateDocumentRequest updateRequest = new UpdateDocumentRequest();
doc = new HashMap<String,Object >(){{
    put("price",10);
    put("coeff",3465.23);
    put("tags1",new int[]{3,6,4});
    put("tags2",new int[]{});
}};
updateRequest.index("products").id(1L).setDoc(doc);
indexApi.update(updateRequest);

class UpdateResponse {
    index: products
    updated: null
    id: 1
    result: updated
}

Dictionary<string, Object> doc = new Dictionary<string, Object>(); 
doc.Add("price", 10);
doc.Add("coeff", 3465.23);
doc.Add("tags1", new List<int> {3,6,4});
doc.Add("tags2", new List<int> {});
UpdateDocumentRequest updateRequest = new UpdateDocumentRequest(index: "products", id: 1, doc: doc);
indexApi.Update(updateRequest);

class UpdateResponse {
    index: products
    updated: null
    id: 1
    result: updated
}

TypeScript

res = await indexApi.update({ index: "test", id: 1, doc: { name: "Doc 21", cat: "10" } });

{
  "_index":"test",
  "_id":1,
  "result":"updated"
}

updateDoc = map[string]interface{} {"name":"Doc 21", "cat":10}
updateRequest = manticoreclient.NewUpdateDocumentRequest("test", updateDoc)
updateRequest.SetId(1)
res, _, _ = apiClient.IndexAPI.Update(context.Background()).UpdateDocumentRequest(*updateRequest).Execute()

{
  "_index":"test",
  "_id":1,
  "result":"updated"
}

When assigning out-of-range values to 32-bit attributes, they will be trimmed to their lower 32 bits without a prompt. For example, if you try to update the 32-bit unsigned int with a value of 4294967297, the value of 1 will actually be stored, because the lower 32 bits of 4294967297 (0x100000001 in hex) amount to 1 (0x00000001 in hex).

UPDATE can be used to perform partial JSON updates on numeric data types or arrays of numeric data types. Just make sure you don't update an integer value with a float value as it will be rounded off.

SQL

insert into products (id, title, meta) values (1,'title','{"tags":[1,2,3]}');

update products set meta.tags[0]=100 where id=1;

Query OK, 1 row affected (0.00 sec)

Query OK, 1 row affected (0.00 sec)

JSON

POST /insert
{
    "index":"products",
    "id":100,
    "doc":
    {
        "title":"title",
        "meta": {
            "tags":[1,2,3]
        }
    }
}

POST /update
{
    "index":"products",
    "id":100,
    "doc":
    {
        "meta.tags[0]":100
    }
}

{
   "_index":"products",
   "_id":100,
   "created":true,
   "result":"created",
   "status":201
}

{
  "_index":"products",
  "updated":1
}

PHP

$index->insertDocument([
    'title' => 'title',
    'meta' => ['tags' => [1,2,3]]
],1);
$index->updateDocument([
    'meta.tags[0]' => 100
],1);

Array(
    [_index] => products
    [_id] => 1
    [created] => true
    [result] => created
)

Array(
    [_index] => products
    [updated] => 1
)

Python

indexApi = api = manticoresearch.IndexApi(client)
indexApi.update({"index" : "products", "id" : 1, "doc" : {
    "meta.tags[0]": 100}})

{'id': 1, 'index': 'products', 'result': 'updated', 'updated': None}

javascript

res = await indexApi.update({"index" : "products", "id" : 1, "doc" : {
   "meta.tags[0]": 100}});

{"_index":"products","_id":1,"result":"updated"}

Java

UpdateDocumentRequest updateRequest = new UpdateDocumentRequest();
doc = new HashMap<String,Object >(){{
    put("meta.tags[0]",100);
}};
updateRequest.index("products").id(1L).setDoc(doc);
indexApi.update(updateRequest);

class UpdateResponse {
    index: products
    updated: null
    id: 1
    result: updated
}

Dictionary<string, Object> doc = new Dictionary<string, Object>(); 
doc.Add("meta.tags[0]", 100);
UpdateDocumentRequest updateRequest = new UpdateDocumentRequest(index: "products", id: 1, doc: doc);
indexApi.Update(updateRequest);

class UpdateResponse {
    index: products
    updated: null
    id: 1
    result: updated
}

TypeScript

res = await indexApi.update({"index" : "test", "id" : 1, "doc" : { "meta.tags[0]": 100} });

{"_index":"test","_id":1,"result":"updated"}

updateDoc = map[string]interface{} {"meta.tags[0]":100}
updateRequest = manticoreclient.NewUpdateDocumentRequest("test", updateDoc)
updateRequest.SetId(1)
res, _, _ = apiClient.IndexAPI.Update(context.Background()).UpdateDocumentRequest(*updateRequest).Execute()

{
    "_index":"test",
    "_id":1,
    "result":"updated"
}

Updating other data types or changing property type in a JSON attribute requires a full JSON update.

SQL

insert into products values (1,'title','{"tags":[1,2,3]}');

update products set data='{"tags":["one","two","three"]}' where id=1;

Query OK, 1 row affected (0.00 sec)

Query OK, 1 row affected (0.00 sec)

JSON

POST /insert
{
    "index":"products",
    "id":1,
    "doc":
    {
        "title":"title",
        "data":"{\"tags\":[1,2,3]}"
    }
}

POST /update
{
    "index":"products",
    "id":1,
    "doc":
    {
        "data":"{\"tags\":[\"one\",\"two\",\"three\"]}"
    }
}

{
  "_index":"products",
  "updated":1
}

PHP

$index->insertDocument([
    'title'=> 'title',
    'data' => [
         'tags' => [1,2,3]
    ]
],1);

$index->updateDocument([
    'data' => [
            'one', 'two', 'three'
    ]
],1);

Array(
    [_index] => products
    [_id] => 1
    [created] => true
    [result] => created
)

Array(
    [_index] => products
    [updated] => 1
)

Python

indexApi.insert({"index" : "products", "id" : 100, "doc" : {"title" : "title", "meta" : {"tags":[1,2,3]}}})
indexApi.update({"index" : "products", "id" : 100, "doc" : {"meta" : {"tags":['one','two','three']}}})

{'created': True,
 'found': None,
 'id': 100,
 'index': 'products',
 'result': 'created'}
{'id': 100, 'index': 'products', 'result': 'updated', 'updated': None}

javascript

res = await indexApi.insert({"index" : "products", "id" : 100, "doc" : {"title" : "title", "meta" : {"tags":[1,2,3]}}});
res = await indexApi.update({"index" : "products", "id" : 100, "doc" : {"meta" : {"tags":['one','two','three']}}});

{"_index":"products","_id":100,"created":true,"result":"created"}
{"_index":"products","_id":100,"result":"updated"}

Java

InsertDocumentRequest newdoc = new InsertDocumentRequest();
doc = new HashMap<String,Object>(){{
    put("title","title");
    put("meta",
        new HashMap<String,Object>(){{
            put("tags",new int[]{1,2,3});
        }});

}};
newdoc.index("products").id(100L).setDoc(doc);        
indexApi.insert(newdoc);

updatedoc = new UpdateDocumentRequest();
doc = new HashMap<String,Object >(){{
    put("meta",
        new HashMap<String,Object>(){{
            put("tags",new String[]{"one","two","three"});
        }});
}};
updatedoc.index("products").id(100L).setDoc(doc);
indexApi.update(updatedoc);

class SuccessResponse {
    index: products
    id: 100
    created: true
    result: created
    found: null
}

class UpdateResponse {
    index: products
    updated: null
    id: 100
    result: updated
}

Dictionary<string, Object> meta = new Dictionary<string, Object>(); 
meta.Add("tags", new List<int> {1,2,3});
Dictionary<string, Object> doc = new Dictionary<string, Object>(); 
doc.Add("title", "title");
doc.Add("meta", meta);
InsertDocumentRequest newdoc = new InsertDocumentRequest(index: "products", id: 100, doc: doc);
indexApi.Insert(newdoc);

meta = new Dictionary<string, Object>();
meta.Add("tags", new List<string> {"one","two","three"}); 
doc = new Dictionary<string, Object>(); 
doc.Add("meta", meta);
UpdateDocumentRequest updatedoc = new UpdateDocumentRequest(index: "products", id: 100, doc: doc);
indexApi.Update(updatedoc);

class SuccessResponse {
    index: products
    id: 100
    created: true
    result: created
    found: null
}

class UpdateResponse {
    index: products
    updated: null
    id: 100
    result: updated
}

TypeScript

res = await indexApi.insert({
  index: 'test',
  id: 1,
  doc: { content: 'Text 1', name: 'Doc 1', meta: { tags:[1,2,3] } }
})
res = await indexApi.update({ index: 'test', id: 1, doc: { meta: { tags:['one','two','three'] } } });

{
    "_index":"test",
    "_id":1,
    "created":true,
    "result":"created"
}

{
    "_index":"test",
    "_id":1,
    "result":"updated"
}

metaField := map[string]interface{} {"tags": []int{1, 2, 3}}
insertDoc := map[string]interface{} {"name": "Doc 1", "meta": metaField}}
insertRequest := manticoreclient.NewInsertDocumentRequest("test", insertDoc)
insertRequest.SetId(1)
res, _, _ := apiClient.IndexAPI.Insert(context.Background()).InsertDocumentRequest(*insertRequest).Execute();

metaField = map[string]interface{} {"tags": []string{"one", "two", "three"}}
updateDoc := map[string]interface{} {"meta": metaField}
updateRequest := manticoreclient.NewUpdateDocumentRequest("test", updateDoc)
res, _, _ = apiClient.IndexAPI.Update(context.Background()).UpdateDocumentRequest(*updateRequest).Execute()

{
    "_index":"test",
    "_id":1,
    "created":true,
    "result":"created"
}

{
    "_index":"test",
    "_id":1,
    "result":"updated"
}

When using replication, the table name should be prepended with cluster_name: (in SQL) so that updates will be propagated to all nodes in the cluster. For queries via HTTP, you should set a cluster property. See setting up replication for more information.

{
  "cluster":"nodes4",
  "index":"test",
  "id":1,
  "doc":
  {
    "gid" : 100,
    "price" : 1000
  }
}

SQL

update weekly:posts set enabled=0 where id=1;

JSON

POST /update
{
    "cluster":"weekly",
    "index":"products",
    "id":1,
    "doc":
    {
        "enabled":0
    }
}

PHP

$index->setName('products')->setCluster('weekly');
$index->updateDocument(['enabled'=>0],1);

Python

indexApi.update({"cluster":"weekly", "index" : "products", "id" : 1, "doc" : {"enabled" : 0}})

javascript

res = wait indexApi.update({"cluster":"weekly", "index" : "products", "id" : 1, "doc" : {"enabled" : 0}});

Java

updatedoc = new UpdateDocumentRequest();
doc = new HashMap<String,Object >(){{
    put("enabled",0);
}};
updatedoc.index("products").cluster("weekly").id(1L).setDoc(doc);
indexApi.update(updatedoc);

Dictionary<string, Object> doc = new Dictionary<string, Object>(); 
doc.Add("enabled", 0);
UpdateDocumentRequest updatedoc = new UpdateDocumentRequest(index: "products", cluster: "weekly", id: 1, doc: doc);
indexApi.Update(updatedoc);

TypeScript

res = wait indexApi.update( {cluster: 'test_cluster', index : 'test', id : 1, doc : {name : 'Doc 11'}} );

updateDoc = map[string]interface{} {"name":"Doc 11"}
updateRequest = manticoreclient.NewUpdateDocumentRequest("test", updateDoc)
updateRequest.SetCluster("test_cluster")
updateRequest.SetId(1)
res, _, _ = apiClient.IndexAPI.Update(context.Background()).UpdateDocumentRequest(*updateRequest).Execute()

Updates via SQL

Here is the syntax for the SQL UPDATE statement:

UPDATE table SET col1 = newval1 [, ...] WHERE where_condition [OPTION opt_name = opt_value [, ...]] [FORCE|IGNORE INDEX(id)]

where_condition has the same syntax as in the SELECT statement.

Multi-value attribute value sets must be specified as comma-separated lists in parentheses. To remove all values from a multi-value attribute, just assign () to it.

SQL

UPDATE products SET tags1=(3,6,4) WHERE id=1;

UPDATE products SET tags1=() WHERE id=1;

Query OK, 1 row affected (0.00 sec)

Query OK, 1 row affected (0.00 sec)

JSON

POST /update

{
    "index":"products",
    "_id":1,
    "doc":
    {
        "tags1": []
    }
}

{
  "_index":"products",
  "updated":1
}

PHP

$index->updateDocument(['tags1'=>[]],1);

Array(
    [_index] => products
    [updated] => 1
)

Python

indexApi.update({"index" : "products", "id" : 1, "doc" : {"tags1": []}})

{'id': 1, 'index': 'products', 'result': 'updated', 'updated': None}

javascript

indexApi.update({"index" : "products", "id" : 1, "doc" : {"tags1": []}})

{"_index":"products","_id":1,"result":"updated"}

Java

updatedoc = new UpdateDocumentRequest();
doc = new HashMap<String,Object >(){{
    put("tags1",new int[]{});
}};
updatedoc.index("products").id(1L).setDoc(doc);
indexApi.update(updatedoc);

class UpdateResponse {
    index: products
    updated: null
    id: 1
    result: updated
}

Dictionary<string, Object> doc = new Dictionary<string, Object>(); 
doc.Add("tags1", new List<int> {});
UpdateDocumentRequest updatedoc = new UpdateDocumentRequest(index: "products", id: 1, doc: doc);
indexApi.Update(updatedoc);

class UpdateResponse {
    index: products
    updated: null
    id: 1
    result: updated
}

TypeScript

res = await indexApi.update({ index: 'test', id: 1, doc: { cat: 10 } });

{
    "_index":"test",
    "_id":1,
    "result":"updated"
}

updateDoc = map[string]interface{} {"cat":10}
updateRequest = manticoreclient.NewUpdateDocumentRequest("test", updateDoc)
updateRequest.SetId(1)
res, _, _ = apiClient.IndexAPI.Update(context.Background()).UpdateDocumentRequest(*updateRequest).Execute()

{
    "_index":"test",
    "_id":1,
    "result":"updated"
}

OPTION clause is a Manticore-specific extension that lets you control a number of per-update options. The syntax is:

OPTION <optionname>=<value> [ , ... ]

The options are the same as for the SELECT statement. Specifically for the UPDATE statement, you can use these options:

'ignore_nonexistent_columns' - If set to 1, it indicates that the update will silently ignore any warnings about trying to update a column which does not exist in the current table schema. The default value is 0.
'strict' - This option is used in partial JSON attribute updates. By default (strict=1), UPDATE will result in an error if the UPDATE query tries to perform an update on non-numeric properties. With strict=0, if multiple properties are updated and some are not allowed, the UPDATE will not result in an error and will perform the changes only on allowed properties (with the rest being ignored). If none of the SET changes of the UPDATE re permitted, the command will result in an error even with strict=0.

Query optimizer hints

In rare cases, Manticore's built-in query analyzer may be incorrect in understanding a query and determining whether a table by ID should be used. This can result in poor performance for queries like UPDATE ... WHERE id = 123.
For information on how to force the optimizer to use a docid index, see Query optimizer hints.

Updates via HTTP JSON

Updates using HTTP JSON protocol are performed via the /update endpoint. The syntax is similar to the /insert endpoint, but this time the doc property is mandatory.

The server will respond with a JSON object stating if the operation was successful or not.

JSON

POST /update
{
  "index":"test",
  "id":1,
  "doc":
   {
     "gid" : 100,
     "price" : 1000
    }
}

{
  "_index": "test",
  "_id": 1,
  "result": "updated"
}

The ID of the document that needs to be updated can be set directly using the id property, as shown in the previous example, or you can update documents by query and apply the update to all the documents that match the query:

JSON

POST /update

{
  "index":"test",
  "doc":
  {
    "price" : 1000
  },
  "query":
  {
    "match": { "*": "apple" }
  }
}

{
  "_index":"products",
  "updated":1
}

The query syntax is the same as in the /search endpoint. Note that you can't specify id and query at the same time.

Flushing attributes

FLUSH ATTRIBUTES

The FLUSH ATTRIBUTES command flushes all in-memory attribute updates in all the active tables to disk. It returns a tag that identifies the result on-disk state, which represents the number of actual disk attribute saves performed since the server startup.

mysql> UPDATE testindex SET channel_id=1107025 WHERE id=1;
Query OK, 1 row affected (0.04 sec)

mysql> FLUSH ATTRIBUTES;
+------+
| tag  |
+------+
|    1 |
+------+
1 row in set (0.19 sec)

Bulk updates

You can perform multiple update operations in a single call using the /bulk endpoint. This endpoint only works with data that has Content-Type set to application/x-ndjson. The data should be formatted as newline-delimited JSON (NDJSON). Essentially, this means that each line should contain exactly one JSON statement and end with a newline \n and, possibly, a \r.

JSON --Content-type=application/x-ndjson

POST /bulk

{ "update" : { "index" : "products", "id" : 1, "doc": { "price" : 10 } } }
{ "update" : { "index" : "products", "id" : 2, "doc": { "price" : 20 } } }

{
   "items":
   [
      {
         "update":
         {
            "_index":"products",
            "_id":1,
            "result":"updated"
         }
      },
      {
         "update":
         {
            "_index":"products",
            "_id":2,
            "result":"updated"
         }
      }
   ],
   "errors":false
}

The /bulk endpoint supports inserts, replaces, and deletes. Each statement begins with an action type (in this case, update). Here's a list of the supported actions:

insert: Inserts a document. The syntax is the same as in the /insert endpoint.
create: a synonym for insert
replace: Replaces a document. The syntax is the same as in the /replace.
index: a synonym for replace
update: Updates a document. The syntax is the same as in the /update.
delete: Deletes a document. The syntax is the same as in the /delete endpoint.

Updates by query and deletes by query are also supported.

JSON --Content-type=application/x-ndjson

POST /bulk

{ "update" : { "index" : "products", "doc": { "coeff" : 1000 }, "query": { "range": { "price": { "gte": 1000 } } } } }
{ "update" : { "index" : "products", "doc": { "coeff" : 0 }, "query": { "range": { "price": { "lt": 1000 } } } } }

{
  "items":
  [
    {
      "update":
      {
        "_index":"products",
        "updated":1
      }
    },
    {
      "update":
      {
        "_index":"products",
        "updated":3
      }
    }
  ],
  "errors":false
}

PHP

$client->bulk([
    ['update'=>[
            'index' => 'products',
             'doc' => [
                'coeff' => 100
            ],
            'query' => [
                'range' => ['price'=>['gte'=>1000]]
            ]   
        ]
    ],
    ['update'=>[
            'index' => 'products',
             'doc' => [
                'coeff' => 0
            ],
            'query' => [
                'range' => ['price'=>['lt'=>1000]]
            ]   
        ]
    ]
]);

Array(
    [items] => Array (
        Array(
            [update] => Array(
                [_index] => products
                [updated] => 1
            )
        )   
        Array(
             [update] => Array(
                 [_index] => products
                 [updated] => 3
             )
        )    
)

Python

docs = [ \
            { "update" : { "index" : "products", "doc": { "coeff" : 1000 }, "query": { "range": { "price": { "gte": 1000 } } } } }, \
            { "update" : { "index" : "products", "doc": { "coeff" : 0 }, "query": { "range": { "price": { "lt": 1000 } } } } } ]
indexApi.bulk('\n'.join(map(json.dumps,docs)))

{'error': None,
 'items': [{u'update': {u'_index': u'products', u'updated': 1}},
           {u'update': {u'_index': u'products', u'updated': 3}}]}

javascript

docs = [
            { "update" : { "index" : "products", "doc": { "coeff" : 1000 }, "query": { "range": { "price": { "gte": 1000 } } } } },
            { "update" : { "index" : "products", "doc": { "coeff" : 0 }, "query": { "range": { "price": { "lt": 1000 } } } } } ];
res =  await indexApi.bulk(docs.map(e=>JSON.stringify(e)).join('\n'));

{"items":[{"update":{"_index":"products","updated":1}},{"update":{"_index":"products","updated":3}}],"errors":false}

Java

String   body = "{ \"update\" : { \"index\" : \"products\", \"doc\": { \"coeff\" : 1000 }, \"query\": { \"range\": { \"price\": { \"gte\": 1000 } } } }} "+"\n"+
    "{ \"update\" : { \"index\" : \"products\", \"doc\": { \"coeff\" : 0 }, \"query\": { \"range\": { \"price\": { \"lt\": 1000 } } } } }"+"\n";         
indexApi.bulk(body);

class BulkResponse {
    items: [{update={_index=products, _id=1, created=false, result=updated, status=200}}, {update={_index=products, _id=2, created=false, result=updated, status=200}}]
    error: null
    additionalProperties: {errors=false}
}

string   body = "{ \"update\" : { \"index\" : \"products\", \"doc\": { \"coeff\" : 1000 }, \"query\": { \"range\": { \"price\": { \"gte\": 1000 } } } }} "+"\n"+
    "{ \"update\" : { \"index\" : \"products\", \"doc\": { \"coeff\" : 0 }, \"query\": { \"range\": { \"price\": { \"lt\": 1000 } } } } }"+"\n";         
indexApi.Bulk(body);

class BulkResponse {
    items: [{update={_index=products, _id=1, created=false, result=updated, status=200}}, {update={_index=products, _id=2, created=false, result=updated, status=200}}]
    error: null
    additionalProperties: {errors=false}
}

TypeScript

updateDocs = [
  {
    update: {
      index: 'test',
      id: 1,
      doc: { content: 'Text 11', cat: 1, name: 'Doc 11' },
    },
  },
  {
    update: {
      index: 'test',
      id: 2,
      doc: { content: 'Text 22', cat: 9, name: 'Doc 22' },
    },
  },
];

res = await indexApi.bulk(
  updateDocs.map((e) => JSON.stringify(e)).join("\n")
);

{
  "items":
  [
    {
      "update":
      {
        "_index":"test",
        "updated":1
      }
    },
    {
      "update":
      {
        "_index":"test",
        "updated":1
      }
    }
  ],
  "errors":false
}

body := "{\"update\": {\"index\": \"test\", \"id\": 1, \"doc\": {\"content\": \"Text 11\", \"name\": \"Doc 11\", \"cat\": 1 }}}" + "\n" +
    "{\"update\": {\"index\": \"test\", \"id\": 2, \"doc\": {\"content\": \"Text 22\", \"name\": \"Doc 22\", \"cat\": 9 }}}" +"\n";
res, _, _ := apiClient.IndexAPI.Bulk(context.Background()).Body(body).Execute()

{
  "items":
  [
    {
      "update":
      {
        "_index":"test",
        "updated":1
      }
    },
    {
      "update":
      {
        "_index":"test",
        "updated":1
      }
    }
  ],
  "errors":false
}

Keep in mind that the bulk operation stops at the first query that results in an error.

attr_update_reserve

attr_update_reserve=size

attr_update_reserve is a per-table setting that determines the space reserved for blob attribute updates. This setting is optional, with a default value of 128k.

When blob attributes (MVAs, strings, JSON) are updated, their length may change. If the updated string (or MVA, or JSON) is shorter than the old one, it overwrites the old one in the .spb file. However, if the updated string is longer, updates are written to the end of the .spb file. This file is memory-mapped, which means resizing it may be a rather slow process, depending on the OS implementation of memory-mapped files.

To avoid frequent resizes, you can specify the extra space to be reserved at the end of the .spb file using this option.

SQL

create table products(title text, price float) attr_update_reserve = '1M'

JSON

POST /cli -d "
create table products(title text, price float) attr_update_reserve = '1M'"

PHP

$params = [
    'body' => [
        'settings' => [
            'attr_update_reserve' => '1M'
        ],
        'columns' => [
            'title'=>['type'=>'text'],
            'price'=>['type'=>'float']
        ]
    ],
    'index' => 'products'
];
$index = new \Manticoresearch\Index($client);
$index->create($params);

Python

utilsApi.sql('create table products(title text, price float) attr_update_reserve = \'1M\'')

javascript

res = await utilsApi.sql('create table products(title text, price float) attr_update_reserve = \'1M\'');

Java

utilsApi.sql("create table products(title text, price float) attr_update_reserve = '1M'");

utilsApi.Sql("create table products(title text, price float) attr_update_reserve = '1M'");

TypeScript

utilsApi.sql("create table test(content text, name string, cat int) attr_update_reserve = '1M'");

apiClient.UtilsAPI.Sql(context.Background()).Body("create table test(content text, name string, cat int) attr_update_reserve = '1M'").Execute()

CONFIG

table products {
  attr_update_reserve = 1M
  type = rt
  path = tbl
  rt_field = title
  rt_attr_uint = price
}

attr_flush_period

attr_flush_period = 900 # persist updates to disk every 15 minutes

When updating attributes the changes are first written to in-memory copy of attributes. This setting allows to set the interval between flushing the updates to disk. It defaults to 0, which disables the periodic flushing, but flushing will still occur at normal shut-down.

¶ 12.4

Deleting documents

Deleting documents is only supported in RT mode for the following table types:

Real-time tables
Percolate tables
Distributed tables that only contain RT tables as local or remote agents.

You can delete existing documents from a table based on either their ID or certain conditions.

Also, bulk deletion is available to delete multiple documents.

Deletion of documents can be accomplished via both SQL and JSON interfaces.

For SQL, the response for a successful operation will indicate the number of rows deleted.

For JSON, the json/delete endpoint is used. The server will respond with a JSON object indicating whether the operation was successful and the number of rows deleted.

It is recommended to use table truncation instead of deletion to delete all documents from a table, as it is a much faster operation.

In this example we delete all documents that match full-text query test document from the table named test:

SQL

mysql> SELECT * FROM TEST;
+------+------+-------------+------+
| id   | gid  | mva1        | mva2 |
+------+------+-------------+------+
|  100 | 1000 | 100,201     | 100  |
|  101 | 1001 | 101,202     | 101  |
|  102 | 1002 | 102,203     | 102  |
|  103 | 1003 | 103,204     | 103  |
|  104 | 1004 | 104,204,205 | 104  |
|  105 | 1005 | 105,206     | 105  |
|  106 | 1006 | 106,207     | 106  |
|  107 | 1007 | 107,208     | 107  |
+------+------+-------------+------+
8 rows in set (0.00 sec)

mysql> DELETE FROM TEST WHERE MATCH ('test document');
Query OK, 2 rows affected (0.00 sec)

mysql> SELECT * FROM TEST;
+------+------+-------------+------+
| id   | gid  | mva1        | mva2 |
+------+------+-------------+------+
|  100 | 1000 | 100,201     | 100  |
|  101 | 1001 | 101,202     | 101  |
|  102 | 1002 | 102,203     | 102  |
|  103 | 1003 | 103,204     | 103  |
|  104 | 1004 | 104,204,205 | 104  |
|  105 | 1005 | 105,206     | 105  |
+------+------+-------------+------+
6 rows in set (0.00 sec)

JSON

POST /delete -d '
    {
        "index":"test",
        "query":
        {
            "match": { "*": "test document" }
        }
    }'

* `query` for JSON contains a clause for a full-text search; it has the same syntax as in the [JSON/update](../12-data_creation_and_modification.html#updates-via-http-json).

{
    "_index":"test",
    "deleted":2,
}

PHP

$index->deleteDocuments(new MatchPhrase('test document','*'));

Array(
    [_index] => test
    [deleted] => 2
)

Python

indexApi.delete({"index" : "test", "query": { "match": { "*": "test document" }}})

{'deleted': 5, 'id': None, 'index': 'test', 'result': None}

javascript

res = await indexApi.delete({"index" : "test", "query": { "match": { "*": "test document" }}});

{"_index":"test","deleted":5}

Java

DeleteDocumentRequest deleteRequest = new DeleteDocumentRequest();
query = new HashMap<String,Object>();
query.put("match",new HashMap<String,Object>(){{
    put("*","test document");
}});
deleteRequest.index("test").setQuery(query);
indexApi.delete(deleteRequest);

class DeleteResponse {
    index: test
    deleted: 5
    id: null
    result: null
}

Dictionary<string, Object> match = new Dictionary<string, Object>();
match.Add("*", "test document");
Dictionary<string, Object> query = new Dictionary<string, Object>();
query.Add("match", match);
DeleteDocumentRequest deleteRequest = new DeleteDocumentRequest(index: "test", query: query);
indexApi.Delete(deleteRequest);

class DeleteResponse {
    index: test
    deleted: 5
    id: null
    result: null
}

TypeScript

res = await indexApi.delete({
  index: 'test',
  query: { match: { '*': 'test document' } },
});

{"_index":"test","deleted":5}

deleteRequest := manticoresearch.NewDeleteDocumentRequest("test")
matchExpr := map[string]interface{} {"*": "test document"}
deleteQuery := map[string]interface{} {"match": matchExpr }
deleteRequest.SetQuery(deleteQuery)

{"_index":"test","deleted":5}

Here - deleting a document with id equalling 1 from the table named test:

SQL

mysql> DELETE FROM TEST WHERE id=1;
Query OK, 1 rows affected (0.00 sec)

JSON

POST /delete -d '
    {
        "index": "test",
        "id": 1
    }'

* `id` for JSON is the row `id` which should be deleted.

{
    "_index": "test",
    "_id": 1,
    "found": true,
    "result": "deleted"      
}

PHP

$index->deleteDocument(1);

Array(
    [_index] => test
    [_id] => 1
    [found] => true
    [result] => deleted
)

Python

indexApi.delete({"index" : "test", "id" : 1})

{'deleted': None, 'id': 1, 'index': 'test', 'result': 'deleted'}

javascript

res = await indexApi.delete({"index" : "test", "id" : 1});

{"_index":"test","_id":1,"result":"deleted"}

Java

DeleteDocumentRequest deleteRequest = new DeleteDocumentRequest();
deleteRequest.index("test").setId(1L);
indexApi.delete(deleteRequest);

class DeleteResponse {
    index: test
    _id: 1
    result: deleted
}

DeleteDocumentRequest deleteRequest = new DeleteDocumentRequest(index: "test", id: 1);
indexApi.Delete(deleteRequest);

class DeleteResponse {
    index: test
    _id: 1
    result: deleted
}

TypeScript

res = await indexApi.delete({ index: 'test', id: 1 });

{"_index":"test","_id":1,"result":"deleted"}

deleteRequest := manticoresearch.NewDeleteDocumentRequest("test")
deleteRequest.SetId(1)

{"_index":"test","_id":1,"result":"deleted"}

Here, documents with id matching values from the table named test are deleted:

Note that the delete forms with id=N or id IN (X,Y) are the fastest, as they delete documents without performing a search.
Also note that the response contains only the id of the first deleted document in the corresponding _id field.

SQL

DELETE FROM TEST WHERE id IN (1,2);

Query OK, 2 rows affected (0.00 sec)

JSON

POST /delete -d '
    {
        "index":"test",
        "id": [1,2]
    }'

    {
        "_index":"test",
        "_id":1,
        "found":true,
        "result":"deleted"      
    }

PHP

$index->deleteDocumentsByIds([1,2]);

Array(
    [_index] => test
    [_id] => 1
    [found] => true
    [result] => deleted
)

Manticore SQL allows to use complex conditions for the DELETE statement.

For example here we are deleting documents that match full-text query test document and have attribute mva1 with a value greater than 206 or mva1 values 100 or 103 from table named test:

SQL

DELETE FROM TEST WHERE MATCH ('test document') AND ( mva1>206 or mva1 in (100, 103) );

SELECT * FROM TEST;

Query OK, 4 rows affected (0.00 sec)

+------+------+-------------+------+
| id   | gid  | mva1        | mva2 |
+------+------+-------------+------+
|  101 | 1001 | 101,202     | 101  |
|  102 | 1002 | 102,203     | 102  |
|  104 | 1004 | 104,204,205 | 104  |
|  105 | 1005 | 105,206     | 105  |
+------+------+-------------+------+
6 rows in set (0.00 sec)

Here is an example of deleting documents in cluster cluster's table test. Note that we must provide the cluster name property along with table property to delete a row from a table within a replication cluster:

SQL

delete from cluster:test where id=100;

JSON

POST /delete -d '
    {
      "cluster": "cluster",
      "index": "test",
      "id": 100
    }'

* `cluster` for JSON is the name of the [replication cluster](../10-creating_a_cluster.html#replication-cluster). which contains the needed table

PHP

$index->setCluster('cluster');
$index->deleteDocument(100);

Array(
    [_index] => test
    [_id] => 100
    [found] => true
    [result] => deleted
)

Python

indexApi.delete({"cluster":"cluster","index" : "test", "id" : 1})

{'deleted': None, 'id': 1, 'index': 'test', 'result': 'deleted'}

javascript

indexApi.delete({"cluster":"cluster_1","index" : "test", "id" : 1})

{"_index":"test","_id":1,"result":"deleted"}

Java

DeleteDocumentRequest deleteRequest = new DeleteDocumentRequest();
deleteRequest.cluster("cluster").index("test").setId(1L);
indexApi.delete(deleteRequest);

class DeleteResponse {
    index: test
    _id: 1
    result: deleted
}

DeleteDocumentRequest deleteRequest = new DeleteDocumentRequest(index: "test", cluster: "cluster", id: 1);
indexApi.Delete(deleteRequest);

class DeleteResponse {
    index: test
    _id: 1
    result: deleted
}

TypeScript

res = await indexApi.delete({ cluster: 'cluster_1', index: 'test', id: 1 });

{"_index":"test","_id":1,"result":"deleted"}

deleteRequest := manticoresearch.NewDeleteDocumentRequest("test")
deleteRequest.SetCluster("cluster_1")
deleteRequest.SetId(1)

{"_index":"test","_id":1,"result":"deleted"}

Bulk deletion

You can also perform multiple delete operations in a single call using the /bulk endpoint. This endpoint only works with data that has Content-Type set to application/x-ndjson. The data should be formatted as newline-delimited JSON (NDJSON). Essentially, this means that each line should contain exactly one JSON statement and end with a newline \n and, possibly, a \r.

JSON --Content-type=application/x-ndjson

POST /bulk

{ "delete" : { "index" : "test", "id" : 1 } }
{ "delete" : { "index" : "test", "query": { "equals": { "int_data" : 20 } } } }

{
   "items":
   [
      {
         "bulk":
         {
            "_index":"test",
            "_id":0,
            "created":0,
            "deleted":2,
            "updated":0,
            "result":"created",
            "status":201
         }
      }
   ],
   "errors":false
}

PHP

$client->bulk([
    ['delete' => [
            'index' => 'test',
            'id' => 1
        ]
    ],
    ['delete'=>[
            'index' => 'test',
            'query' => [
                'equals' => ['int_data' => 20]
            ]
        ]
    ]
]);

Array(
    [items] => Array
        (
            [0] => Array
                (
                    [bulk] => Array
                        (
                            [_index] => test
                            [_id] => 0
                            [created] => 0
                            [deleted] => 2
                            [updated] => 0
                            [result] => created
                            [status] => 201
                        )

                )

        )

    [current_line] => 3
    [skipped_lines] => 0
    [errors] =>
    [error] =>
)

Python

docs = [ \
            { "delete" : { "index" : "test", "id": 1 } }, \
            { "delete" : { "index" : "test", "query": { "equals": { "int_data": 20 } } } } ]
indexApi.bulk('\n'.join(map(json.dumps,docs)))

{
    'error': None,
    'items': [{u'delete': {u'_index': test', u'deleted': 2}}]
}

javascript

docs = [
            { "delete" : { "index" : "test", "id": 1 } },
            { "delete" : { "index" : "test", "query": { "equals": { "int_data": 20 } } } } ];
res =  await indexApi.bulk(docs.map(e=>JSON.stringify(e)).join('\n'));

{"items":[{"delete":{"_index":"test","deleted":2}}],"errors":false}

Java

String   body = "{ "delete" : { "index" : "test", "id": 1 } } "+"\n"+
    "{ "delete" : { "index" : "test", "query": { "equals": { "int_data": 20 } } } }"+"\n";         
indexApi.bulk(body);

class BulkResponse {
    items: [{delete={_index=test, _id=0, created=false, deleted=2, result=created, status=200}}]
    error: null
    additionalProperties: {errors=false}
}

string   body = "{ "delete" : { "index" : "test", "id": 1 } } "+"\n"+
    "{ "delete" : { "index" : "test", "query": { "equals": { "int_data": 20 } } } }"+"\n";         
indexApi.Bulk(body);

class BulkResponse {
    items: [{replace={_index=test, _id=0, created=false, deleted=2, result=created, status=200}}]
    error: null
    additionalProperties: {errors=false}
}

TypeScript

docs = [
  { "delete" : { "index" : "test", "id": 1 } },
  { "delete" : { "index" : "test", "query": { "equals": { "int_data": 20 } } } }
];
body = await indexApi.bulk(
  docs.map((e) => JSON.stringify(e)).join("\n")
);            
res = await indexApi.bulk(body);

{"items":[{"delete":{"_index":"test","deleted":2}}],"errors":false}

docs = []string {
  `{ "delete" : { "index" : "test", "id": 1 } }`,
  `{ "delete" : { "index" : "test", "query": { "equals": { "int_data": 20 } } } }`
]
body = strings.Join(docs, "\n")
resp, httpRes, err := manticoreclient.IndexAPI.Bulk(context.Background()).Body(body).Execute()

{"items":[{"delete":{"_index":"test","deleted":2}}],"errors":false}