Elasticsearch range query datemath and timezones
22/12/2020
TLDR: if you use the time_zone
parameter in a range query, do not set date values with a timezone offset.
When using range queries in elasticsearch, the API allows a time_zone
to be set. Elasticsearch will ignore the time_zone
parameter if the value of the range query (gt, lt etc) contains a timezone offset.
For example the following time_zone
+02:00
is ignored because the gte
value contains a zulu time offset (Z
).
"query": {
"bool": {
"must": [
{
"range": {
"date_of_birth": {
"gte": "2020-01-01T00:00:00Z",
"time_zone": "+02:00"
}
}
}
]
}
}
As far as I can tell, this undocumented in the API, but I think it’s understandable given that queries shouldn’t be passing multiple timezones. However the situation becomes weirder if you use datemath in combination with an offset and a time_zone
.
"range": {
"date_of_birth": {
"gte": "2020-01-01T00:00:00+02:00||/d",
"time_zone": "+02:00"
}
}
There is a small note in the documentation which says:
The time_zone parameter does not affect the date math value of now. now is always the current system time in UTC.
However, the time_zone parameter does convert dates calculated using now and date math rounding. For example, the time_zone parameter will convert a value of now/d.
So there’s a hint what ES will do, but we can determine exactly what ES will do using the Validation API:
Example validation query:
curl -XGET "https://<ES_INSTANCE>/<INDEX>/_validate/query?explain=true" -H 'Content-Type: application/json' -d'
{
"query": {
"bool": {
"must": [
{
"range": {
"date_of_birth": {
"gte": "2020-01-01T00:00:00+02:00||/d",
"time_zone": "+02:00"
}
}
}
]
}
}
}'
-------------
{
"_shards" : {
"total" : 1,
"successful" : 1,
"failed" : 0
},
"valid" : true,
"explanations" : [
{
"index" : "xxx",
"valid" : true,
"explanation" : "+date_of_birth:[1577829600000 TO 9223372036854775807]"
}
]
}
ES converts the range query to a lower-bound of 1577916000000
and an upper-bound of 9223372036854775807
. We can ignore the upper-bound since it is a hard limit, the lower-bound converts to 2019-12-31T22:00:00Z
.
With some experimentation we can guess what ES is up to. First ES will convert 2020-01-01T00:00:00+0200
to UTC, 2019-12-31T22:00:00Z
. It will then apply the rounding logic which, since we have gte
, rounds down to the first millisecond, 2019-12-31T00:00:00Z
. Finally, and most surprisingly to me, ES will use the time_zone
value to convert the date to 2019-12-30T22:00:00Z
.
So if you are going to send the time_zone
parameter, save yourself a lot of confusion and send date strings without timezone offsets:
"query": {
"bool": {
"must": [
{
"range": {
"date_of_birth": {
"gte": "2020-01-01T00:00:00",
"time_zone": "+02:00"
}
}
}
]
}
}
"query": {
"bool": {
"must": [
{
"range": {
"date_of_birth": {
"gte": "2020-01-01T00:00:00||/d",
"time_zone": "+02:00"
}
}
}
]
}
}
That way time_zone
is never ignored and when using datemath, ES will not convert your dates twice.