¶ 15

Functions

¶ 15.1

Mathematical functions

ABS()

Returns the absolute value of the argument.

ATAN2()

Returns the arctangent function of two arguments, expressed in radians.

BITDOT()

BITDOT(mask, w0, w1, ...) returns the sum of products of each bit of a mask multiplied by its weight. bit0*w0 + bit1*w1 + ...

CEIL()

Returns the smallest integer value greater than or equal to the argument.

COS()

Returns the cosine of the argument.

CRC32()

Returns the CRC32 value of a string argument.

EXP()

Returns the exponent of the argument (e=2.718... to the power of the argument).

FIBONACCI()

Returns the N-th Fibonacci number, where N is the integer argument. That is, arguments of 0 and up will generate the values 0, 1, 1, 2, 3, 5, 8, 13 and so on. Note that the computations are done using 32-bit integer math and thus numbers 48th and up will be returned modulo 2^32.

FLOOR()

Returns the largest integer value lesser than or equal to the argument.

GREATEST()

GREATEST(attr_json.some_array) function takes a JSON array as the argument, and returns the greatest value in that array. Also works for MVA.

IDIV()

Returns the result of an integer division of the first argument by the second argument. Both arguments must be of an integer type.

LEAST()

LEAST(attr_json.some_array) function takes a JSON array as the argument, and returns the least value in that array. Also works for MVA.

LN()

Returns the natural logarithm of the argument (with the base of e=2.718...).

LOG10()

Returns the common logarithm of the argument (with the base of 10).

LOG2()

Returns the binary logarithm of the argument (with the base of 2).

MAX()

Returns the larger of two arguments.

MIN()

Returns the smaller of two arguments.

POW()

Returns the first argument raised to the power of the second argument.

RAND()

Returns a random float between 0 and 1. It can optionally accept a seed, which can be a constant integer or an integer attribute's name.

If you use a seed, keep in mind that it resets rand()'s starting point separately for each plain table, RT disk, RAM chunk, or pseudo shard. Therefore, queries to a distributed table in any form can return multiple identical random values.

SIN()

Returns the sine of the argument.

SQRT()

Returns the square root of the argument.

¶ 15.2

Searching and ranking functions

BM25A()

BM25A(k1,b) returns the exact BM25A() value. Requires the expr ranker and enabled index_field_lengths. Parameters k1 and b must be floats.

BM25F()

BM25F(k1, b, {field=weight, ...}) returns the exact BM25F() value and requires index_field_lengths to be enabled. The expr ranker is also necessary. Parameters k1 and b must be floats.

EXIST()

Substitutes non-existent columns with default values. It returns either the value of an attribute specified by 'attr-name', or the 'default-value' if that attribute does not exist. STRING or MVA attributes are not supported. This function is useful when searching through multiple tables with different schemas.

SELECT *, EXIST('gid', 6) as cnd FROM i1, i2 WHERE cnd>5

MIN_TOP_SORTVAL()

Returns the sort key value of the worst-ranked element in the current top-N matches if the sort key is a float, and 0 otherwise.

MIN_TOP_WEIGHT()

Returns the weight of the worst-ranked element in the current top-N matches.

PACKEDFACTORS()

PACKEDFACTORS() can be used in queries to display all calculated weighting factors during matching or to provide a binary attribute for creating a custom ranking UDF. This function only works if the expression ranker is specified and the query is not a full scan; otherwise, it returns an error. PACKEDFACTORS() can take an optional argument that disables ATC ranking factor calculation: PACKEDFACTORS({no_atc=1}). Calculating ATC significantly slows down query processing, so this option can be useful if you need to see the ranking factors but don't require ATC. PACKEDFACTORS() can also output in JSON format: PACKEDFACTORS({json=1}). The respective outputs in either key-value pair or JSON format are shown below. (Note that the examples below are wrapped for readability; actual returned values would be single-line.)

mysql> SELECT id, PACKEDFACTORS() FROM test1
    -> WHERE MATCH('test one') OPTION ranker=expr('1') \G

*************************** 1\. row ***************************
             id: 1
packedfactors(): bm25=569, bm25a=0.617197, field_mask=2, doc_word_count=2,
    field1=(lcs=1, hit_count=2, word_count=2, tf_idf=0.152356,
        min_idf=-0.062982, max_idf=0.215338, sum_idf=0.152356, min_hit_pos=4,
        min_best_span_pos=4, exact_hit=0, max_window_hits=1, min_gaps=2,
        exact_order=1, lccs=1, wlccs=0.215338, atc=-0.003974),
    word0=(tf=1, idf=-0.062982),
    word1=(tf=1, idf=0.215338)
1 row in set (0.00 sec)

mysql> SELECT id, PACKEDFACTORS({json=1}) FROM test1
    -> WHERE MATCH('test one') OPTION ranker=expr('1') \G

*************************** 1\. row ***************************
                     id: 1
packedfactors({json=1}):
{

    "bm25": 569,
    "bm25a": 0.617197,
    "field_mask": 2,
    "doc_word_count": 2,
    "fields": [
        {
            "lcs": 1,
            "hit_count": 2,
            "word_count": 2,
            "tf_idf": 0.152356,
            "min_idf": -0.062982,
            "max_idf": 0.215338,
            "sum_idf": 0.152356,
            "min_hit_pos": 4,
            "min_best_span_pos": 4,
            "exact_hit": 0,
            "max_window_hits": 1,
            "min_gaps": 2,
            "exact_order": 1,
            "lccs": 1,
            "wlccs": 0.215338,
            "atc": -0.003974
        }
    ],
    "words": [
        {
            "tf": 1,
            "idf": -0.062982
        },
        {
            "tf": 1,
            "idf": 0.215338
        }
    ]

}
1 row in set (0.01 sec)

This function can be used to implement custom ranking functions in UDFs, as in:

SELECT *, CUSTOM_RANK(PACKEDFACTORS()) AS r
FROM my_index
WHERE match('hello')
ORDER BY r DESC
OPTION ranker=expr('1');

Where CUSTOM_RANK() is a function implemented in a UDF. It should declare a SPH_UDF_FACTORS structure (defined in sphinxudf.h), initialize this structure, unpack the factors into it before usage, and deinitialize it afterwards, as follows:

SPH_UDF_FACTORS factors;
sphinx_factors_init(&factors);
sphinx_factors_unpack((DWORD*)args->arg_values[0], &factors);
// ... can use the contents of factors variable here ...
sphinx_factors_deinit(&factors);

PACKEDFACTORS() data is available at all query stages, not just during the initial matching and ranking pass. This enables another particularly interesting application of PACKEDFACTORS(): re-ranking.

In the example above, we used an expression-based ranker with a dummy expression and sorted the result set by the value computed by our UDF. In other words, we used the UDF to rank all our results. Now, let's assume for the sake of an example that our UDF is extremely expensive to compute, with a throughput of only 10,000 calls per second. If our query matches 1,000,000 documents, we would want to use a much simpler expression to do most of our ranking in order to maintain reasonable performance. Then, we would apply the expensive UDF to only a few top results, say, the top 100 results. In other words, we would build the top 100 results using a simpler ranking function and then re-rank those with a more complex one. This can be done with subselects:

SELECT * FROM (
    SELECT *, CUSTOM_RANK(PACKEDFACTORS()) AS r
    FROM my_index WHERE match('hello')
    OPTION ranker=expr('sum(lcs)*1000+bm25')
    ORDER BY WEIGHT() DESC
    LIMIT 100
) ORDER BY r DESC LIMIT 10

In this example, the expression-based ranker is called for every matched document to compute WEIGHT(), so it gets called 1,000,000 times. However, the UDF computation can be postponed until the outer sort, and it will only be performed for the top 100 matches by WEIGHT(), according to the inner limit. This means the UDF will only be called 100 times. Finally, the top 10 matches by UDF value are selected and returned to the application.

For reference, in a distributed setup, the PACKEDFACTORS() data is sent from the agents to the master node in binary format. This makes it technically feasible to implement additional re-ranking passes on the master node if needed.

When used in SQL but not called from any UDFs, the result of PACKEDFACTORS() is formatted as plain text, which can be used to manually assess the ranking factors. Note that this feature is not currently supported by the Manticore API.

REMOVE_REPEATS()

REMOVE_REPEATS ( result_set, column, offset, limit ) - removes repeated adjusted rows with the same 'column' value.

SELECT REMOVE_REPEATS((SELECT * FROM dist1), gid, 0, 10)

WEIGHT()

The WEIGHT() function returns the calculated matching score. If no ordering is specified, the result is sorted in descending order by the score provided by WEIGHT(). In this example, we order first by weight and then by an integer attribute.

The search above performs a simple matching, where all words need to be present. However, we can do more (and this is just a simple example):

mysql> SELECT *,WEIGHT() FROM testrt WHERE MATCH('"list of business laptops"/3');
+------+------+-------------------------------------+---------------------------+----------+
| id   | gid  | title                               | content                   | weight() |
+------+------+-------------------------------------+---------------------------+----------+
|    1 |   10 | List of HP business laptops         | Elitebook Probook         |     2397 |
|    2 |   10 | List of Dell business laptops       | Latitude Precision Vostro |     2397 |
|    3 |   20 | List of Dell gaming laptops         | Inspirion Alienware       |     2375 |
|    5 |   30 | List of ASUS ultrabooks and laptops | Zenbook Vivobook          |     2375 |
+------+------+-------------------------------------+---------------------------+----------+
4 rows in set (0.00 sec)


mysql> SHOW META;
+----------------+----------+
| Variable_name  | Value    |
+----------------+----------+
| total          | 4        |
| total_found    | 4        |
| total_relation | eq       |
| time           | 0.000    |
| keyword[0]     | list     |
| docs[0]        | 5        |
| hits[0]        | 5        |
| keyword[1]     | of       |
| docs[1]        | 4        |
| hits[1]        | 4        |
| keyword[2]     | business |
| docs[2]        | 2        |
| hits[2]        | 2        |
| keyword[3]     | laptops  |
| docs[3]        | 5        |
| hits[3]        | 5        |
+----------------+----------+
16 rows in set (0.00 sec)

Here, we search for four words, but a match can occur even if only three of the four words are found. The search will rank documents containing all words higher.

ZONESPANLIST()

The ZONESPANLIST() function returns pairs of matched zone spans. Each pair contains the matched zone span identifier, a colon, and the order number of the matched zone span. For example, if a document reads <emphasis role="bold"><i>text</i> the <i>text</i></emphasis>, and you query for 'ZONESPAN:(i,b) text', then ZONESPANLIST() will return the string "1:1 1:2 2:1", meaning that the first zone span matched "text" in spans 1 and 2, and the second zone span in span 1 only.

QUERY()

QUERY() returns the current search query. QUERY() is a postlimit expression and is intended to be used with SNIPPET().

Table functions are a mechanism for post-query result set processing. Table functions take an arbitrary result set as input and return a new, processed set as output. The first argument should be the input result set, but a table function can optionally take and handle more arguments. Table functions can completely change the result set, including the schema. Currently, only built-in table functions are supported. Table functions work for both outer SELECT and nested SELECT.

¶ 15.3

Type Casting Functions

Type casting comprises three principal actions: conversion, reinterpretation, and promotion.

Conversion. This refers to the process of changing the data type of a value to another data type. This involves additional computations and is exclusively performed by the TO_STRING() function.
Reinterpretation. This involves treating the binary data representing a value as if it were of a different data type, without actually changing the underlying data. This is handled by SINT(), doesn't involve extra computations; instead, it merely reinterprets existing data.
Promotion. This refers to the process of converting a value to a "larger" or more precise data type. It doesn't require extra computation either; it merely requests the argument to deliver a value of a different type. Only JSON fields and a few other functions can promote their values to integers. If an argument cannot yield a value of a different type, the promotion will fail. For instance, the TIMEDIFF() function usually returns a string, but can also return a number. So, BIGINT(TIMEDIFF(1,2)) will execute successfully, compelling TIMEDIFF() to supply an integer value. Conversely, DATE_FORMAT() solely returns strings and can't yield a number, meaning that BIGINT(DATE_FORMAT(...)) will fail.

BIGINT()

This function promotes an integer argument to a 64-bit type, leaving floating-point arguments untouched. It's designed to ensure the evaluation of specific expressions (such as a*b) in 64-bit mode, even if all arguments are 32-bit.

DOUBLE()

The DOUBLE() function promotes its argument to a floating-point type. This is designed to help enforce the evaluation of numeric JSON fields.

INTEGER()

The INTEGER() function promotes its argument to a 64-bit signed type. This is designed to enforce the evaluation of numeric JSON fields.

TO_STRING()

This function forcefully converts its argument to a string type.

UINT()

The UINT() function promotes its argument to a 32-bit unsigned integer type.

UINT64()

The UINT64() function promotes its argument to a 64-bit unsigned integer type.

SINT()

The SINT() function forcefully reinterprets its 32-bit unsigned integer argument as signed and extends it to a 64-bit type (since the 32-bit type is unsigned). For instance, 1-2 ordinarily evaluates to 4294967295, but SINT(1-2) evaluates to -1.

¶ 15.4

Arrays and conditions functions

ALL()

ALL(cond FOR var IN json.array) applies to JSON arrays and returns 1 if the condition is true for all elements in the array and 0 otherwise. cond is a general expression that can also use var as the current value of an array element within itself.

ALL() with json

select *, ALL(x>0 AND x<4 FOR x IN j.ar) from tbl

+------+--------------+--------------------------------+
| id   | j            | all(x>0 and x<4 for x in j.ar) |
+------+--------------+--------------------------------+
|    1 | {"ar":[1,3]} |                              1 |
|    2 | {"ar":[3,7]} |                              0 |
+------+--------------+--------------------------------+
2 rows in set (0.00 sec)

ALL() with json ex. 2

select *, ALL(x>0 AND x<4 FOR x IN j.ar) cond from tbl where cond=1

+------+--------------+------+
| id   | j            | cond |
+------+--------------+------+
|    1 | {"ar":[1,3]} |    1 |
+------+--------------+------+
1 row in set (0.00 sec)

ALL(mva) is a special constructor for multi-value attributes. When used with comparison operators (including comparison with IN()), it returns 1 if all values from the MVA attribute are found among the compared values.

ALL() with MVA

select * from tbl where all(m) >= 1

+------+------+
| id   | m    |
+------+------+
|    1 | 1,3  |
|    2 | 3,7  |
+------+------+
2 rows in set (0.00 sec)

ALL() with MVA and IN()

select * from tbl where all(m) in (1, 3, 7, 10)

+------+------+
| id   | m    |
+------+------+
|    1 | 1,3  |
|    2 | 3,7  |
+------+------+
2 rows in set (0.00 sec)

To compare an MVA attribute with an array, avoid using <mva> NOT ALL(); use ALL(<mva>) NOT IN() instead.

ALL() with MVA and NOT IN()

select * from tbl where all(m) not in (2, 4)

+------+------+
| id   | m    |
+------+------+
|    1 | 1,3  |
|    2 | 3,7  |
+------+------+
2 rows in set (0.00 sec)

ALL(string list) is a special operation for filtering string tags.

If all of the words enumerated as arguments of ALL() are present in the attribute, the filter matches. The optional NOT inverts the logic.

This filter internally uses doc-by-doc matching, so in the case of a full scan query, it might be slower than expected. It is intended for attributes that are not indexed, like calculated expressions or tags in PQ tables. If you need such filtering, consider the solution of putting the string attribute as a full-text field, and then use the full-text operator match(), which will invoke a full-text search.

ALL() with strings

select * from tbl where tags all('bug', 'release')

+------+---------------------------+
| id   | tags                      |
+------+---------------------------+
|    1 | bug priority_high release |
|    2 | bug priority_low release  |
+------+---------------------------+
2 rows in set (0.00 sec)

ALL() with strings and NOT

mysql> select * from tbl

+------+---------------------------+
| id   | tags                      |
+------+---------------------------+
|    1 | bug priority_high release |
|    2 | bug priority_low release  |
+------+---------------------------+
2 rows in set (0.00 sec)

mysql> select * from tbl where tags not all('bug')

Empty set (0.00 sec)

ANY()

ANY(cond FOR var IN json.array) applies to JSON arrays and returns 1 if the condition is true for any element in the array and 0 otherwise. cond is a general expression that can also use var as the current value of an array element within itself.

ANY() with json

select *, ANY(x>5 AND x<10 FOR x IN j.ar) from tbl

+------+--------------+---------------------------------+
| id   | j            | any(x>5 and x<10 for x in j.ar) |
+------+--------------+---------------------------------+
|    1 | {"ar":[1,3]} |                               0 |
|    2 | {"ar":[3,7]} |                               1 |
+------+--------------+---------------------------------+
2 rows in set (0.00 sec)

ANY() with json ex. 2

select *, ANY(x>5 AND x<10 FOR x IN j.ar) cond from tbl where cond=1

+------+--------------+------+
| id   | j            | cond |
+------+--------------+------+
|    2 | {"ar":[3,7]} |    1 |
+------+--------------+------+
1 row in set (0.00 sec)

ANY(mva) is a special constructor for multi-value attributes. When used with comparison operators (including comparison with IN()), it returns 1 if any of the MVA values is found among the compared values.

When comparing an array using IN(), ANY() is assumed by default if not otherwise specified, but a warning will be issued regarding the missing constructor.

ANY() with MVA

mysql> select * from tbl

+------+------+
| id   | m    |
+------+------+
|    1 | 1,3  |
|    2 | 3,7  |
+------+------+
2 rows in set (0.01 sec)

mysql> select * from tbl where any(m) > 5

+------+------+
| id   | m    |
+------+------+
|    2 | 3,7  |
+------+------+
1 row in set (0.00 sec)

ANY() with MVA and IN()

select * from tbl where any(m) in (1, 7, 10)

+------+------+
| id   | m    |
+------+------+
|    1 | 1,3  |
|    2 | 3,7  |
+------+------+
2 rows in set (0.00 sec)

To compare an MVA attribute with an array, avoid using <mva> NOT ANY(); use <mva> NOT IN() instead or ANY(<mva>) NOT IN().

ANY() with MVA and NOT IN()

mysql> select * from tbl

+------+------+
| id   | m    |
+------+------+
|    1 | 1,3  |
|    2 | 3,7  |
+------+------+
2 rows in set (0.00 sec)

mysql> select * from tbl where any(m) not in (1, 3, 5)

+------+------+
| id   | m    |
+------+------+
|    2 | 3,7  |
+------+------+
1 row in set (0.00 sec)

ANY(string list) is a special operation for filtering string tags.

If any of the words enumerated as arguments of ANY() is present in the attribute, the filter matches. The optional NOT inverts the logic.

This filter internally uses doc-by-doc matching, so in the case of a full scan query, it might be slower than expected. It is intended for attributes that are not indexed, like calculated expressions or tags in PQ tables. If you need such filtering, consider the solution of putting the string attribute as a full-text field, and then use the full-text operator match(), which will invoke a full-text search.

ANY() with strings

select * from tbl where tags any('bug', 'feature')

+------+---------------------------+
| id   | tags                      |
+------+---------------------------+
|    1 | bug priority_high release |
|    2 | bug priority_low release  |
+------+---------------------------+
2 rows in set (0.00 sec)

ANY() with strings and NOT

select * from tbl
--------------

+------+---------------------------+
| id   | tags                      |
+------+---------------------------+
|    1 | bug priority_high release |
|    2 | bug priority_low release  |
+------+---------------------------+
2 rows in set (0.00 sec)

--------------
select * from tbl where tags not any('feature', 'priority_low')
--------------

+------+---------------------------+
| id   | tags                      |
+------+---------------------------+
|    1 | bug priority_high release |
+------+---------------------------+
1 row in set (0.01 sec)

CONTAINS()

CONTAINS(polygon, x, y) checks whether the (x,y) point is within the given polygon, and returns 1 if true, or 0 if false. The polygon has to be specified using either the POLY2D() function. The former function is intended for "small" polygons, meaning less than 500 km (300 miles) a side, and it doesn't take into account the Earth's curvature for speed. For larger distances, you should use GEOPOLY2D, which tessellates the given polygon in smaller parts, accounting for the Earth's curvature.

IF()

The behavior of IF() is slightly different from its MySQL counterpart. It takes 3 arguments, checks whether the 1st argument is equal to 0.0, returns the 2nd argument if it is not zero, or the 3rd one when it is. Note that unlike comparison operators, IF() does not use a threshold! Therefore, it's safe to use comparison results as its 1st argument, but arithmetic operators might produce unexpected results. For instance, the following two calls will produce different results even though they are logically equivalent:

IF()

IF ( sqrt(3)*sqrt(3)-3<>0, a, b )
IF ( sqrt(3)*sqrt(3)-3, a, b )

In the first case, the comparison operator <> will return 0.0 (false) due to a threshold, and IF() will always return ** as a result. In the second case, the same sqrt(3)*sqrt(3)-3 expression will be compared with zero without a threshold by the IF() function itself. However, its value will be slightly different from zero due to limited floating-point calculation precision. Because of this, the comparison with 0.0 done by IF() will not pass, and the second variant will return 'a' as a result.

HISTOGRAM()

HISTOGRAM(expr, {hist_interval=size, hist_offset=value}) takes a bucket size and returns the bucket number for the value. The key function is:

key_of_the_bucket = interval + offset * floor ( ( value - offset ) / interval )

The histogram argument interval must be positive. The histogram argument offset must be positive and less than interval. It is used in aggregation, FACET, and grouping.

Example:

HISTOGRAM()

SELECT COUNT(*),
HISTOGRAM(price, {hist_interval=100}) as price_range
FROM facets
GROUP BY price_range ORDER BY price_range ASC;

IN()

IN(expr,val1,val2,...) takes 2 or more arguments and returns 1 if the 1st argument (expr) is equal to any of the other arguments (val1..valN), or 0 otherwise. Currently, all the checked values (but not the expression itself) are required to be constant. The constants are pre-sorted, and binary search is used, so IN() even against a large arbitrary list of constants will be very quick. The first argument can also be an MVA attribute. In that case, IN() will return 1 if any of the MVA values are equal to any of the other arguments. IN() also supports IN(expr,@uservar) syntax to check whether the value belongs to the list in the given global user variable. The first argument can be a JSON attribute.

INDEXOF()

INDEXOF(cond FOR var IN json.array) function iterates through all elements in the array and returns the index of the first element for which 'cond' is true, and -1 if 'cond' is false for every element in the array.

INTERVAL()

INTERVAL(expr,point1,point2,point3,...) takes 2 or more arguments and returns the index of the argument that is less than the first argument: it returns 0 if expr<point1, 1 if point1<=expr<point2, and so on. It is required that point1<point2<...<pointN for this function to work correctly.

LENGTH()

LENGTH(attr_mva) function returns the number of elements in an MVA set. It works with both 32-bit and 64-bit MVA attributes. LENGTH(attr_json) returns the length of a field in JSON. The return value depends on the type of field. For example, LENGTH(json_attr.some_int) always returns 1, and LENGTH(json_attr.some_array) returns the number of elements in the array. LENGTH(string_expr) function returns the length of the string resulting from an expression.
TO_STRING() must enclose the expression, regardless of whether the expression returns a non-string or it's simply a string attribute.

RANGE()

RANGE(expr, {range_from=value,range_to=value}) takes a set of ranges and returns the bucket number for the value.
This expression includes the range_from value and excludes the range_to value for each range. A range can be open - having only the range_from or only the range_to value. It is used in aggregation, FACET, and grouping.

Example:

RANGE()

SELECT COUNT(*),
RANGE(price, {range_to=150},{range_from=150,range_to=300},{range_from=300}) price_range
FROM facets
GROUP BY price_range ORDER BY price_range ASC;

REMAP()

REMAP(condition, expression, (cond1, cond2, ...), (expr1, expr2, ...)) function allows you to make some exceptions to expression values depending on condition values. The condition expression should always result in an integer, while the expression can result in an integer or float.

Example:

REMAP()

SELECT id, size, REMAP(size, 15, (5,6,7,8), (1,1,2,2)) s
FROM products
ORDER BY s ASC;

Another example

SELECT REMAP(userid, karmapoints, (1, 67), (999, 0)) FROM users;
SELECT REMAP(id%10, salary, (0), (0.0)) FROM employes;

This will put documents with sizes 5 and 6 first, followed by sizes 7 and 8. In case there's an original value not listed in the array (e.g. size 10), it will default to 15, and in this case, will be placed at the end.

¶ 15.5

Date and time functions

Note, that CURTIME(), UTC_TIME(), UTC_TIMESTAMP(), and TIMEDIFF() can be promoted to numeric types using arbitrary conversion functions such as BIGINT(), DOUBLE(), etc.

NOW()

Returns the current timestamp as an INTEGER.

SQL

select NOW();

+------------+
| NOW()      |
+------------+
| 1615788407 |
+------------+

CURTIME()

Returns the current time in the local timezone in hh:ii:ss format.

SQL

select CURTIME();

+-----------+
| CURTIME() |
+-----------+
| 07:06:30  |
+-----------+

CURDATE()

Returns the current date in the local timezone in YYYY-MM-DD format.

SQL

select curdate();

+------------+
| curdate()  |
+------------+
| 2023-08-02 |
+------------+

UTC_TIME()

Returns the current time in UTC timezone in hh:ii:ss format.

SQL

select UTC_TIME();

+------------+
| UTC_TIME() |
+------------+
| 06:06:18   |
+------------+

UTC_TIMESTAMP()

Returns the current time in UTC timezone in YYYY-MM-DD hh:ii:ss format.

SQL

select UTC_TIMESTAMP();

+---------------------+
| UTC_TIMESTAMP()     |
+---------------------+
| 2021-03-15 06:06:03 |
+---------------------+

SECOND()

Returns the integer second (in 0..59 range) from a timestamp argument, according to the current timezone.

SQL

select second(now());

+---------------+
| second(now()) |
+---------------+
| 52            |
+---------------+

MINUTE()

Returns the integer minute (in 0..59 range) from a timestamp argument, according to the current timezone.

SQL

select minute(now());

+---------------+
| minute(now()) |
+---------------+
| 5             |
+---------------+

HOUR()

Returns the integer hour (in 0..23 range) from a timestamp argument, according to the current timezone.

SQL

select hour(now());

+-------------+
| hour(now()) |
+-------------+
| 7           |
+-------------+

DAY()

Returns the integer day of the month (in 1..31 range) from a timestamp argument, according to the current timezone.

SQL

select day(now());

+------------+
| day(now()) |
+------------+
| 15         |
+------------+

MONTH()

Returns the integer month (in 1..12 range) from a timestamp argument, according to the current timezone.

SQL

select month(now());

+--------------+
| month(now()) |
+--------------+
| 3            |
+--------------+

QUARTER()

Returns the integer quarter of the year (in 1..4 range) from a timestamp argument, according to the current timezone.

SQL

select quarter(now());

+----------------+
| quarter(now()) |
+----------------+
| 2              |
+----------------+

YEAR()

Returns the integer year (in 1969..2038 range) from a timestamp argument, according to the current timezone.

SQL

select year(now());

+-------------+
| year(now()) |
+-------------+
| 2024        |
+-------------+

DAYNAME()

Returns the weekday name for a given timestamp argument, according to the current timezone.

SQL

select dayname(now());

+----------------+
| dayname(now()) |
+----------------+
| Wednesday      |
+----------------+

MONTHNAME()

Returns the name of the month for a given timestamp argument, according to the current timezone.

SQL

select monthname(now());

+------------------+
| monthname(now()) |
+------------------+
| August           |
+------------------+

DAYOFWEEK()

Returns the integer weekday index (in 1..7 range) for a given timestamp argument, according to the current timezone.
Note that the week starts on Sunday.

SQL

select dayofweek(now());

+------------------+
| dayofweek(now()) |
+------------------+
| 5                |
+------------------+

DAYOFYEAR()

Returns the integer day of the year (in 1..366 range) for a given timestamp argument, according to the current timezone.

SQL

select dayofyear(now());

+------------------+
| dayofyear(now()) |
+------------------+
|              214 |
+------------------+

YEARWEEK()

Returns the integer year and the day code of the first day of current week (in 1969001..2038366 range) for a given timestamp argument, according to the current timezone.

SQL

select yearweek(now());

+-----------------+
| yearweek(now()) |
+-----------------+
|         2023211 |
+-----------------+

YEARMONTH()

Returns the integer year and month code (in 196912..203801 range) from a timestamp argument, according to the current timezone.

SQL

select yearmonth(now());

+------------------+
| yearmonth(now()) |
+------------------+
| 202103           |
+------------------+

YEARMONTHDAY()

Returns the integer year, month, and date code (ranging from 19691231 to 20380119) based on the current timezone.

SQL

select yearmonthday(now());

+---------------------+
| yearmonthday(now()) |
+---------------------+
| 20210315            |
+---------------------+

TIMEDIFF()

Calculates the difference between two timestamps in the format hh:ii:ss.

SQL

select timediff(1615787586, 1613787583);

+----------------------------------+
| timediff(1615787586, 1613787583) |
+----------------------------------+
| 555:33:23                        |
+----------------------------------+

DATEDIFF()

Calculates the number of days between two given timestamps.

SQL

select datediff(1615787586, 1613787583);

+----------------------------------+
| datediff(1615787586, 1613787583) |
+----------------------------------+
|                               23 |
+----------------------------------+

DATE()

Formats the date part from a timestamp argument as a string in YYYY-MM-DD format.

SQL

select date(now());

+-------------+
| date(now()) |
+-------------+
| 2023-08-02  |
+-------------+

TIME()

Formats the time part from a timestamp argument as a string in HH:MM:SS format.

SQL

select time(now());

+-------------+
| time(now()) |
+-------------+
| 15:21:27    |
+-------------+

DATE_FORMAT()

Returns a formatted string based on the provided date and format arguments. The format argument uses the same specifiers as the strftime function. For convenience, here are some common format specifiers:

%Y - Four-digit year
%m - Two-digit month (01-12)
%d - Two-digit day of the month (01-31)
%H - Two-digit hour (00-23)
%M - Two-digit minute (00-59)
%S - Two-digit second (00-59)
%T - Time in 24-hour format (%H:%M:%S)

Note that this is not a complete list of the specifiers. Please consult the documentation for strftime() for your operating system to get the full list.

SQL

SELECT DATE_FORMAT(NOW(), 'year %Y and time %T');

+------------------------------------------+
| DATE_FORMAT(NOW(), 'year %Y and time %T') |
+------------------------------------------+
| year 2023 and time 11:54:52              |
+------------------------------------------+

This example formats the current date and time, displaying the four-digit year and the time in 24-hour format.

DATE_HISTOGRAM()

DATE_HISTOGRAM(expr, {calendar_interval='unit_name'}) Takes a bucket size as a unit name and returns the bucket number for the value. Values are rounded to the closest bucket. The key function is:

key_of_the_bucket = interval * floor ( value / interval )

Intervals are specified using the unit name, such as week or as a single unit like 1M. Multiple units such as 2w are not supported.

The valid intervals are:

minute, 1m
hour, 1h
day, 1d
week, 1w (a week is the interval between the start day of the week, hour, minute, second and the next week but the same day and time of the week)
month, 1M
year, 1y (a year is the interval between the start day of the month, time and the next year but the same day of the month, time)

Used in aggregation, FACET, and grouping.

Example:

SELECT COUNT(*),
DATE_HISTOGRAM(tm, {calendar_interval='month'}) AS months
FROM facets
GROUP BY months ORDER BY months ASC;

DATE_RANGE()

DATE_RANGE(expr, {range_from='date_math', range_to='date_math'}) takes a set of ranges and returns the bucket number for the value.
The expression includes the range_from value and excludes the range_to value for each range. The range can be open - having only the range_from or only the range_to value.
The difference between this and the RANGE() function is that the range_from and range_to values can be expressed in Date math expressions.

Used in aggregation, FACET, and grouping.

Example:

SELECT COUNT(*),
DATE_RANGE(tm, {range_to='2017||+2M/M'},{range_from='2017||+2M/M',range_to='2017||+5M/M'},{range_from='2017||+5M/M'}) AS points
FROM idx_dates
GROUP BY points ORDER BY points ASC;

Date math lets you work with dates and times directly in your searches. It's especially useful for handling data that changes over time. With date math, you can easily do things like find entries from a certain period, analyze data trends, or manage when information should be removed. It simplifies working with dates by letting you add or subtract time from a given date, round dates to the nearest time unit, and more, all within your search queries.

To use date math, you start with a base date, which can be:
- now for the current date and time,
- or a specific date string ending with ||.

Then, you can modify this date with operations like:
- +1y to add one year,
- -1h to subtract one hour,
- /m to round to the nearest month.

You can use these units in your operations:
- s for seconds,
- m for minutes,
- h (or H) for hours,
- d for days,
- w for weeks,
- M for months,
- y for years.

Here are some examples of how you might use date math:
- now+4h means four hours from now.
- now-2d/d is the time two days ago, rounded to the nearest day.
- 2010-04-20||+2M/d is June 20, 2010, rounded to the nearest day.

¶ 15.6

Geo spatial functions

GEODIST()

GEODIST(lat1, lon1, lat2, lon2, [...]) function calculates the geosphere distance between two points specified by their coordinates. Note that by default, both latitudes and longitudes must be in radians, and the result will be in meters. You can use arbitrary expressions for any of the four coordinates. An optimized path will be chosen when one pair of arguments directly refers to a pair of attributes, and the other one is constant.

GEODIST() also accepts an optional 5th argument, allowing you to easily convert between input and output units and select the specific geodistance formula to use. The complete syntax and a few examples are as follows:

GEODIST(lat1, lon1, lat2, lon2, { option=value, ... })

GEODIST(40.7643929, -73.9997683, 40.7642578, -73.9994565, {in=degrees, out=feet})

GEODIST(51.50, -0.12, 29.98, 31.13, {in=deg, out=mi})

The known options and their values are:

in = {deg | degrees | rad | radians}, specifies the input units;
out = {m | meters | km | kilometers | ft | feet | mi | miles}, specifies the output units;
method = {adaptive | haversine}, specifies the geodistance calculation method.

The default method is "adaptive". It is a well-optimized implementation that is both more precise and much faster at all times than "haversine".

GEOPOLY2D()

GEOPOLY2D(lat1,lon1,lat2,lon2,lat3,lon3...) creates a polygon to be used with the CONTAINS() function. This function takes into account the Earth's curvature by tessellating the polygon into smaller ones, and should be used for larger areas. For small areas, the POLY2D() function can be used instead. The function expects coordinates to be pairs of latitude/longitude coordinates in degrees; if radians are used, it will give the same result as POLY2D().

POLY2D()

POLY2D(x1,y1,x2,y2,x3,y3...) creates a polygon to be used with the CONTAINS() function. This polygon assumes a flat Earth, so it should not be too large; for large areas, the GEOPOLY2D() function, which takes Earth's curvature into consideration, should be used.

¶ 15.7

String functions

CONCAT()

Concatenates two or more strings into one. Non-string arguments must be explicitly converted to string using the TO_STRING() function.

CONCAT(TO_STRING(float_attr), ',', TO_STRING(int_attr), ',', title)

LEVENSHTEIN()

LEVENSHTEIN ( pattern, source, {normalize=0, length_delta=0}) returns number (Levenshtein distance) of single-character edits (insertions, deletions or substitutions) between pattern and source strings required to make in pattern to make it source.

pattern, source - constant string, string field name, JSON field name, or any expression that produces a string (like e.g., SUBSTRING_INDEX())
normalize - option to return the distance as a float number in the range [0.0 - 1.0], where 0.0 is an exact match, and 1.0 is the maximum difference. The default value is 0, meaning not to normalize and provide the result as an integer.
length_delta - skips Levenshtein distance calculation and returns max(strlen(pattern), strlen(source)) if the option is set and the lengths of the strings differ by more than the length_delta value. The default value is 0, meaning to calculate Levenshtein distance for any input strings. This option can be useful when checking mostly similar strings.

SELECT LEVENSHTEIN('gily', attr1) AS dist, WEIGHT() AS w FROM test WHERE MATCH('test') ORDER BY w DESC, dist ASC;
SELECT LEVENSHTEIN('gily', j.name, {length_delta=6}) AS dist, WEIGHT() AS w FROM test WHERE MATCH('test') ORDER BY w DESC;
SELECT LEVENSHTEIN(title, j.name, {normalize=1}) AS dist, WEIGHT() AS w FROM test WHERE MATCH ('test') ORDER BY w DESC, dist ASC;

REGEX()

The REGEX(attr,expr) function returns 1 if a regular expression matches the attribute's string, and 0 otherwise. It works with both string and JSON attributes.

SELECT REGEX(content, 'box?') FROM test;
SELECT REGEX(j.color, 'red | pink') FROM test;

Expressions should adhere to the RE2 syntax. To perform a case-insensitive search, for instance, you can use:

SELECT REGEX(content, '(?i)box') FROM test;

SNIPPET()

The SNIPPET() function can be used to highlight search results within a given text. The first two arguments are: the text to be highlighted, and a query. Options can be passed to the function as the third, fourth, and so on arguments. SNIPPET() can obtain the text for highlighting directly from the table. In this case, the first argument should be the field name:

SELECT SNIPPET(body, QUERY()) FROM myIndex WHERE MATCH('my.query')

In this example, the QUERY() expression returns the current full-text query. SNIPPET() can also highlight non-indexed text:

mysql  SELECT id, SNIPPET('text to highlight', 'my.query', 'limit=100') FROM myIndex WHERE MATCH('my.query')

Additionally, it can be used to highlight text fetched from other sources using a User-Defined Function (UDF):

SELECT id, SNIPPET(myUdf(id), 'my.query', 'limit=100') FROM myIndex WHERE MATCH('my.query')

In this context, myUdf() is a User-Defined Function (UDF) that retrieves a document by its ID from an external storage source. The SNIPPET() function is considered a "post limit" function, which means that the computation of snippets is delayed until the entire final result set is prepared, and even after the LIMIT clause has been applied. For instance, if a LIMIT 20,10 clause is used, SNIPPET() will be called no more than 10 times.

It is important to note that SNIPPET() does not support field-based limitations. For this functionality, use HIGHLIGHT() instead.

SUBSTRING_INDEX()

SUBSTRING_INDEX(string, delimiter, number) returns a substring of the original string, based on a specified number of delimiter occurrences:

string - The original string, which can be a constant string or a string from a string/JSON attribute.
delimiter - The delimiter to search for.
number - The number of times to search for the delimiter. This can be either a positive or negative number. If it is a positive number, the function will return everything to the left of the delimiter. If it is a negative number, the function will return everything to the right of the delimiter.

SUBSTRING_INDEX() by default returns a string, but it can also be coerced into other types (such as integer or float) if necessary. Numeric values can be converted using specific functions (such as BIGINT(), DOUBLE(), etc.).

SQL

SELECT SUBSTRING_INDEX('www.w3schools.com', '.', 2) FROM test;
SELECT SUBSTRING_INDEX(j.coord, ' ', 1) FROM test;
SELECT          SUBSTRING_INDEX('1.2 3.4', ' ',  1);  /* '1.2' */
SELECT          SUBSTRING_INDEX('1.2 3.4', ' ', -1);  /* '3.4' */
SELECT sint (   SUBSTRING_INDEX('1.2 3.4', ' ',  1)); /* 1 */
SELECT sint (   SUBSTRING_INDEX('1.2 3.4', ' ', -1)); /* 3 */
SELECT double ( SUBSTRING_INDEX('1.2 3.4', ' ',  1)); /* 1.200000 */
SELECT double ( SUBSTRING_INDEX('1.2 3.4', ' ', -1)); /* 3.400000 */

UPPER() and LOWER()

UPPER(string) convert argument to upper case, LOWER(string) convert argument to lower case.

Result also can be promoted to numeric, but only if string argument is convertible to a number. Numeric values could be promoted with arbitrary functions (BITINT, DOUBLE, etc.).

SELECT upper('www.w3schools.com', '.', 2); /* WWW.W3SCHOOLS.COM  */
SELECT double (upper ('1.2e3')); /* 1200.000000 */
SELECT integer (lower ('12345')); /* 12345 */

¶ 15.8

Other functions

LAST_INSERT_ID()

Returns the IDs of documents that were inserted or replaced by the last statement in the current session.

The same value can also be obtained via the @@session.last_insert_id variable.

mysql> select @@session.last_insert_id;
+--------------------------+
| @@session.last_insert_id |
+--------------------------+
| 11,32                    |
+--------------------------+
1 rows in set

mysql> select LAST_INSERT_ID();
+------------------+
| LAST_INSERT_ID() |
+------------------+
| 25,26,29         |
+------------------+
1 rows in set

CONNECTION_ID()

Returns the current connection ID.

mysql> select CONNECTION_ID();
+-----------------+
| CONNECTION_ID() |
+-----------------+
| 6               |
+-----------------+
1 row in set (0.00 sec)

KNN_DIST()

Returns KNN vector search distance.

mysql> select id, knn_dist() from test where knn ( image_vector, 5, (0.286569,-0.031816,0.066684,0.032926) ) and match('white') and id < 10;
+------+------------+
| id   | knn_dist() |
+------+------------+
|    2 | 0.81527930 |
+------+------------+
1 row in set (0.00 sec)