_fetch


Overview

_fetch is a query directive used to retrieve data from your data store. It lets you filter, group, label, and limit your search results. You can also write multiple fetch query functions in a pipeline to create an inclusive result set which includes results of all these fetch query functions. _fetch is the only query directive that is designed to directly query the data store.

Example:

A quick example is given below:

_fetch * from event where $LogType=FIREWALL group count_unique $SrcIP limit 100

This query retrieves all fields for each event in the event index where $LogType is FIREWALL. The result set is grouped by unique values of $SrcIP along with a count (count_unique) for each group. The result set is sorted in the descending order of count_unique (by default). It is then limited to 100 rows.

result set of a sample fetch query

The _fetch directive allows you to:

  • choose the different indexes that can be queried,
  • filter data using direct or fuzzy search,
  • group data and perform certain operations on each group,
  • label data for further logical operations, and
  • limit data to avert a big data flood from an erroneous query.

Syntax

The generic syntax of the _fetch directive is as given below:

_fetch [* | $field1, $field2]
from [event | usage | module | incident | device | metrics | dnifevents | [event store] ]
where $field1=value [AND | NOT] $field2=value*
group [ [sum | min | max | avg] $field] | [ count_unique $field1, $field2,...$field4 ] | [timeslice <integer>[m|d|h|M] ]
as #labelname
limit <integer> | first <integer> | last <integer>

Here:

  • from: Lets you choose the index to query.
  • where: Lets you filter rows in a result set.
  • group: Lets you perform aggregation on the result set to calculate sum, max, min, and avg of a specified field in the entire result set. You can also group unique values of a specified field and count events in each of these groups. Lastly, you can group (bucket) events based on time (timeslice).
  • as: Lets you assign a single label to events in a result set retrieved by a query function.
  • limit | first | last: Lets you limit the rows in a result set and choose whether to display it in default or reverse order.

Field selection

The _fetch directive can be used to include only certain fields in a result set by mentioning the list of field names separated by a comma immediately after the _fetch keyword.

Take a look at the example given below: _fetch $DevSrcIP, $LogEvent, $CNAMTime from event limit 10

As seen in the image below, this query retrieves three fields: $DevSrcIP, $LogEvent and $CNAMTime for each event in the event index. The result set is limited to the 10 latest events.

The output is as shown below

Using fetch directive to filter fields

NOTE: The outputs shown as screenshots may only be partial.

NOTE: To retrieve all available fields, use * (asterisk) instead of the fields list in the query.

NOTE: The number of events fetched by the _fetch directive is limited by the duration specified in the Date Selector Using fetch directive to filter fields on the Search page. Only events whose $CNAMTime falls within this duration are retrieved by the fetch query. By default, the duration is the last 24 hours (from the time of opening the Search page). The date selector uses your local timezone. $CNAMTime is internally stored in UTC (time standard). In a fetch query, it is displayed in your local timezone as configured in the console (after conversion). The console allows you to normalize time zones across deployments.

The from clause

Use the from clause with the _fetch directive to specify the index to query. Every deployment has the following indexes that can be queried:

  • event: All devices that are integrated with DNIF send data i.e. logs which are stored in the event index. This is the only index that has a real-time scrolling functionality where new data is received by the index at a rapid scale.
  • module: Events (from the event index) are analyzed and correlated together. These outcomes are assigned a module name and stored in the module index. This index typically contains “threats” or “alerts” based on correlations on the event index. For example, a large group of failed logins from the same IP address and user may be grouped as a module and this threat is marked as a potential brute force attack.
  • incident: Correlations between module index and event index give rise to incidents which are stored in the incident index. An incident is akin to a ticket raised in a system requiring attention. It is a system verified progressive attack. The incident index could also be used to group several modules together.
  • usage: Every 5 minutes an entry is made into the usage index indicating the amount of data received by the DNIF adapter for each scope in that time interval. The usage index is primarily used for accounting and finding anomalies in data collection.
  • device: The device index contains a list of devices that have been integrated with DNIF.
  • metrics: Shows the current health status of the services of the DNIF components: the adapter, datastore and correlator. The metrics index also exposes the internals of the platform, for example, whether an emailer service of a datastore is currently up (functioning) or not.
  • dnifevents: This index stores user activities for audit purposes. For example, if a user has created or deleted a workbook it would be recorded by the dnifevents index.
  • custom event store: A user may create a custom event store by uploading data (for example, a .csv file) to create an event store. This index contains user defined structure and data and can be deleted or recreated at will.

NOTE: The _fetch directive is used to retrieve data from all the above indexes. While explaining the functions and clauses of the _fetch directive, the word event generically refers to all these indexes.

Take a look at the example given below: _fetch * from event limit 100

As seen in the image below, this query retrieves all the fields, for each event, in the event index. The result set is limited to the 100 latest events.

NOTE: By default, the fetch query when used with the limit keyword sorts the result set such that the latest events appear first. The integer after the limit keyword specifies the number of rows to limit the result set to. First the result set is ordered and only then is it limited to the number of rows specified.

sample query to retrieve data from event index

The where Clause

The where clause filters data returned by the query based on the conditions you specify immediately after the where keyword. Hence, you can use this clause to search for events based on the values of one or more fields in various permutations as desired.

Take a look at the example given below:

_fetch $CNAMTime, $LogEvent, $LogType from event where $LogType=FIREWALL limit 4

This query retrieves 3 fields: $CNAMTime, $LogEvent, and $LogType, for each event in the event index, where $LogType is FIREWALL. The result set is limited to the 4 latest events satisfying the where clause. The output is as shown below.

sample query to filter data from event index

NOTE: The search is case-sensitive. FIREWALL is not the same as firewall .

Fuzzy search is used to look for a specific word or phrase in the log event ($LogEvent field).

This type of search lets you use a wildcard to look for specific words. A wildcard character is a special character that represents one or more other characters (or even zero characters). An asterisk ( * ) is an example of a wildcard character.

A * before a word (for example *beat) simply means that the $LogEvent field contains a word ending with beat. Similarly a * after a word (for example beat*) means that the $LogEvent field contains a word beginning with beat. Finally, a * before and after a word (for example *beat*) implies that the $LogEvent field contains the word beat.

NOTE: A word refers to continuous string of characters[A-Z,a-z] and digits[0-9] and dot/period [.] only. Space, comma, backslash(/) and hyphen mark the end of words (word delimiters).

Example 1

Take a look at the example given below:

_fetch * from event where $LogEvent=Mob* limit 10

This query retrieves all the fields for each event in the event index where $LogEvent contains a word that begins with Mob (for example Mobs, Mobile and so on). The asterisk in Mob* is the wildcard character and is used to search for words beginning with Mob. The output is as shown below:

The result set is limited to the 10 latest events satisfying the where clause.

sample query to filter data from event index

Here $LogEvent contains a word Mobile that begins with Mob. Hence this event is included in the result set.

Example 2

Take a look at another example given below:

_fetch * from event where $LogEvent=*bot limit 10

This query retrieves all the fields for each event in the event index where $LogEvent contains a word that ends with bot (for example robot, Googlebot and so on). The asterisk in *bot is the wildcard character and is used to search for words ending with bot. The result set is limited to the 10 latest events satisfying the where clause. The outputs are as shown below:

sample query to filter data from event index

Here $LogEvent contains a word Googlebot that ends with bot. Hence this event is included in the result set.

sample query to filter data from event index

Here $LogEvent contains a word Mediatoolkitbot that ends with bot. Hence this event is included in the result set.

NOTE: A fuzzy search will only work on the $LogEvent field because this is the only field where the raw event is available; no other field will return similar results.

To search for a phrase instead of a single word in the $LogEvent field, use single quotes (‘) before and after the phrase. Wildcard search is not permitted here. The exact phrase needs to be entered.

NOTE: A phrase here refers to a group of words.

This options lets you search for a group of words (with spaces in between them). However if you search for a phrase server is up, events having the phrase webserver is up will not be included in the result set.

Example 1:

Take a look at the example given below:

_fetch * from event where $LogEvent='Mobile Safari' limit 2

This query retrieves all the fields for each event in the event index where $LogEvent contains the phrase Mobile Safari. The result set is limited to the 2 latest events satisfying the where clause. The output is as shown below:

sample query to filter data from event index

Here $LogEvent contains the phrase Mobile Safari. Hence this event is included in the result set.

Example 2

Take a look at the example given below:

_fetch * from event where $LogEvent='Mobile Saf' limit 2

This query is the same as the one in Example 1 above; only the search phrase has been modified. The output is as given below:

sample query to filter data from event index

The event in the previous result set has not been included in this one.

Boolean Expressions

Expressions like AND/NOT in the where clause let you filter your result set.

Example 1

Take a look at the example given below:

_fetch * from event where NOT $LogType=FIREWALL limit 100

This query retrieves all the fields for each event in the event index where $LogType is NOT FIREWALL. Hence it will not retrieve any firewall events in the result set. The result set is limited to the 100 latest events satisfying the where clause.

NOTE: A filter is governed by the operator leading it, in the example where NOT $LogType=FIREWALL the operator NOT is applied to the filter, also if an operator is not applied to the first filter after where an implied AND is used.

Example 2

You can also chain multiple filters together with Boolean expressions. Take a look at the example given below:

_fetch * from event where $LogType=WEBSERVER NOT $EventID=200 limit 100

This query retrieves all fields for each event in the event index where $LogType is WEBSERVER and $EventID is NOT 200. The result set is limited to the 100 latest events satisfying the where clause. The output is as shown below:

sample query to filter data from event index

The events retrieved, are of log type WEBSERVER and none of these retrieved events have an event ID of 200.

NOTE: If you have multiple AND operators in the where clause (where field1=value1 AND field2=value2…), the AND works like an OR. It is not a true AND. Hence, the is no separate OR boolean operator available for use. As DNIF is a big datastore, it is different from traditional SQL used by RDBMS engines.

Duration

This option lets you specify the start and end timestamp for retrieving events 1 from the data store. On the Search page, by default, we have a duration of 24 hours (based on the time you opened the search page) which can be modified using the GUI. However, this can be overridden using the two options below:

Example 1

Take a look at the example given below:

_fetch * from usage where $Duration=2d first 10

This query retrieves all fields for each event in the usage index which have been created between Now-2 days and Now ($Duration is 2 days) where Now refers to the time of executing the query. The result set is sorted in the ascending order of $CNAMTime (oldest/earliest event first) using the first keyword. (The first keyword in explained in detail later on in this article.) The integer value after the first keyword is used to limit the number of rows in the result set. We discuss the first keyword in detail later in this article. The output is as shown below:

sample query to filter data from event index

This query was executed at 5:10pm on 28th June, 2018. This is the end time while 5:10pm on 26th June, 2018 is the start time of our duration. As we use the first keyword, we see the rows for 26th June (oldest first) in the above image.

Example 2:

Take a look at the example given below:

_fetch * from event where $StartTime=2018-06-25T16:42:15 AND $EndTime=2018-06-26T12:36:19 limit 3

This query retrieves all fields for each event in the event index which have been received by DNIF between the $StartTime (start time) and $EndTime (end time) specified in the query. The result set is limited to the 10 latest (newest) events (in that duration) satisfying the where clause. The output is as shown below:

sample query to filter data from event index

The group Clause

The group clause enables you to aggregate data within the result set.

NOTE: The group clause works like the _agg query directive. However, while the group clause performs the aggregation natively on the datastore, the _agg directive performs the aggregation on the result set of a previous query function in the pipeline.

Arithmetic aggregate

The group clause in the _fetch directive lets you perform simple arithmetic functions on a specified field in the entire result set. These operations include sum, min, max and avg. Take a look at the example given below:

_fetch * from event where $Duration=24h group sum $EvtLen limit 10

This query retrieves all fields for each event in the event index which have been received (stored) in the last 24 hours. The result set is then grouped by the sum of values of $EvtLen (event length). The output is as shown below:

sample query to filter data from event index

NOTE: The limit clause has no significance here. It does NOT mean that a summation of event length of only the last 10 events received is displayed. The sum displayed is the summation of event lengths of all events received in the last 24 hours. The output would always have one row. However it is mandatory to include a limiting clause as per the generic syntax of the _fetch directive.

Count aggregate

The group clause in the _fetch directive also lets you count unique occurrences of values of a specified field (or a combination of fields) in the result set. It groups rows based on unique values of a specified field (or combination of fields) and displays the count of each group. This is done using the count_unique keyword.

Example 1:

Take a look at the example given below:

_fetch * from event where $LogType=WEBSERVER group count_unique $URL limit 5

This query retrieves all fields for each event in the event index where $LogType is WEBSERVER. The result set is grouped by unique URLs present in it along with their individual row count (count_unique). The result set is sorted in descending order of count_unique (by default). It is then limited to 5 rows. The output is as shown below:

sample query to filter data from event index

As shown, the highlighted URL has been referenced 879 times.

NOTE: The result set of a count_unique function will always be sorted in descending order based on the values in count_unique field.

Example 2:

Take a look at the example given below:

_fetch * from event where $LogType=WEBSERVER group count_unique $URL, $EvtLen limit 3

This query retrieves all fields for each event in the event index where $LogType is WEBSERVER. The result set is grouped by unique combinations of $URL and $EvtLen present in it along with their individual row count (count_unique). The result set is first sorted (by default) in descending order of sum of count_unique for each $URL and within each group (for each $URL) it is sorted in descending order of $EvtLen. It is then limited to a maximum of 9 groups - (a maximum of) 3 unique $URL(s) and (a maximum of) 3 unique $EvtLen for each of these $URL(s). The output is as shown below:

sample query to filter data from event index

As highlighted in red, 3 unique URLs have been included and for each of these URLs, 3 unique events lengths have been included.

You can compare this with the output of the previous example.

NOTE: Based on the value of limit (for example limit N ), a maximum of N x N rows will be present in the output. In the above example (limit 3), if the first URL (Here /) had only 2 unique event lengths, there would have been only 8 rows in the result set.

NOTE: DQL supports upto four levels of aggregations as shown below:
group count_unique $SrcCN, $SrcISP, $DstIP, $User . Here, $User will be aggregated by $DstIP which will further be aggregated into $SrcISP and $SrcCN.

Timeslice aggregate
Using $CNAMTime

The timeslice keyword groups events based on the time they were stored in DNIF. It also gives a count of events in each bucket (time interval). It is widely used in queries written to create charts (for example, line charts and bar charts) which involve timelines/time intervals. By default, the timeslice function creates these time intervals using the $CNAMTime field. The duration (length) of these time intervals is specified along with the timeslice keyword.

NOTE: $CNAMTime is the time (timestamp) when an event was received and stored by DNIF.

Take a look at the example given below:

_fetch * from event where $LogType=WEBSERVER group timeslice 1h limit 1000

This query retrieves all fields for each event in the event index where $LogType is WEBSERVER. The result set is grouped into buckets of 1 hour intervals on $CNAMTime along with the count for each interval. By default, the result set is sorted such that the oldest (earliest) time interval appears first. The output is as shown below:

sample query to filter data from event index

The Date Selecter on the Search page is set at its default value of 24 hours (This happens when you open the Search page. Hence the 24 hour duration is based on the time of opening the Search page rather than the time of executing the query). Hence, this query picks events in the last 24 hours. The search page was opened (and query was executed) at 3:00 pm on 21st June, 2018. Hence groups were formed starting from 1:30 pm (1:30pm to 2:30 pm) on 20th June (which is the last completed 1 hour time interval) as shown in the image above. The highlighted part indicates that during the 1:30 pm to 2:30 pm time interval on 20th June, 2018 there were 59 events received by DNIF.

The latest (newest) time interval starts at 1:30 pm and ends at 2:30 pm on 21st June 2018 (not shown in the image).

NOTE: The limit keyword has no significance here. The number of output rows are not dependant on it. However it is mandatory to include a limiting clause as per the generic syntax of the _fetch directive.

NOTE: Along with the timeslice keyword, use ‘m’ for minute, ‘h’ for hour and ‘M’ for month.

Example: Use timeslice 1h to create intervals of 1 hour each. Use timeslice 1m to create intervals of 1 minute each. Use timeslice 1M to create intervals of 1 month each.

NOTE: The starting point of the time intervals (hh:15 or hh:30 and so on) are based on the value (interval length) specified along with the timeslice keyword. Also, if any row (time interval) is missing in the result set, it means that no events were received during that time interval.

Using SystemTstamp

By default, time intervals are formed on the $CNAMTime field. To explicitly specify a timestamp field other than this.

Take a look at the example below:

_fetch * from event where $DateTimeField=SystemTstamp group timeslice 1h limit 10

This query is similar to the earlier example. Instead of using $CNAMTime to form intervals, we use the $SystemTstamp field to form them. The output is as shown below:

sample query to filter data from event index

The Search page was opened and query was executed at 6:04 pm 28th June 2018. While $CNAMTime is stored in UTC and converted to the local timezone for calculation and display, the $SystemTstamp is stored as received in the logs and not converted for calculations. Hence the logic for formation of the starting interval is based on timezone.

NOTE: $SystemTstamp carries the timestamp of the origin system and represents the time the event was generated. Instead of $SystemTstamp , any timestamp field can be used while making time intervals in a similar manner.

Labelling

Labelling lets you add (assign) a label to events in the result set using the as keyword. The same label would be assigned to all events retrieved by one fetch query function. When executing several fetch query functions in one pipeline, a label (for example, A) can be added to the events fetched by the first fetch query function while a different label (for example B) can be added to the result set of the next fetch query function. These assigned labels can be used in subsequent query functions using the _checkif or _agg directive to write more complex queries in a simple manner.

Take a look at the example given below:

_fetch * from event as #A limit 5
>> _fetch * from event where $DstIP=202.XX.XX.247 as #B limit 5
>>_agg count_unique $Label

Here:

1. Here: The fetch query function (f1) retrieves all fields for each event in the event index and adds a field $Label with the value #A (in the result set) for each of the rows. As we limit the result set to the 5 latest events, these labels are assigned to only the 5 events fetched. The output is as shown below:

sample query for using labels

The labels and destination IP addresses for these five events are as shown below:

sample query for using labels

A label #A has been added to these 5 events.

2. In the pipelined query function (f2), the _fetch directive retrieves all fields for each event in the event index where $DstIP (destination IP) is as specified in the query function and adds a field $Label with the value #B (in the result set) for each of the rows. As we limit the result set to the latest 5 events satisfying the where clause, these labels are assigned to only the 5 events fetched. The output is as shown below:

sample query for using labels

The destination port and labels after the second query function are as shown below:

Identifying ip address sets based on labels

A label #B has been added to the five events having destination IP as 202.XX.XX.247

3. In the pipelined query function (f3), the _agg directive using the count_unique keyword, groups events in the result set based on $Label and gives the count of each. The output is as shown below:

Identifying ip address sets based on labels

NOTE: When using labels in a fetch query function, the _fetch directive defaults to a _fetch * . Even if you specify certain field names, instead of the asterisk, it would still retrieve all the fields.

Limiting and choosing

limit is used for limiting the number of rows in the result set of a fetch query function. It is a mandatory clause. It also decides which rows (oldest or newest) are chosen to be a part of the final result set.

NOTE: By default, _fetch sorts the result set in the descending order of $CNAMTime i.e. latest/newest events first. Limiting happens on this result set.

NOTE: Every _fetch query function should have one of the three keywords: limit/first/last .

limit

The limit keyword picks the newest or latest events to be a part of the result set. Hence, the latest/newest events appear first in the result set. The result set is then limited to the number of rows specified in the query after the limit keyword.

Take a look at the example given below:

_fetch $CNAMTime, $LogEvent, $DstPort from event where $DstPort=23 limit 2

This query retrieves three fields: $CNAMTime, $LogEvent and $DstPort for each event in the event index where $DstPort is 23. However, we limit the result set (limit 2) to only 2 latest events which satisfy the where condition. The result set is sorted in the descending order of $CNAMTime (new events first). The output is as shown below:

an example of limit directive

The search page was opened and query executed at 6:30 pm on 28th June, 2018.

NOTE: The limit keyword is to be understood in context to the query. For example, if a fetch query function (on the event index) uses aggregation ( group keyword) along with count_unique , the result set does not contain events. Instead, it contains groups and counts for each group. The result set is first sorted in the descending order of counts ( count_unique ) and then limit is applied to limit the number of rows in the result set.

first

The first keyword picks the oldest or earliest events to be a part of the result set. Hence, the oldest/earliest events appear first in the result set. The result set is then limited to the number of events specified in the query after the first keyword.

Take a look at the example given below:

_fetch $CNAMTime, $LogEvent, $DstPort from event where $DstPort=23 first 2

Here, remember that the Date Selector on the Search page has been set to 24 hours. This query retrieves three fields: $CNAMTime, $LogEvent and $DstPort for each event in the event index where $DstPort is 23. However, we limit the result set (first 2) to only 2 oldest events which satisfy the where condition. The result set is sorted in the ascending order of $CNAMTime (oldest events first). The output is as shown below:

an example of first keyword

The search window was opened and query executed at 6:30 pm on 28th June, 2018. Notice that the events displayed in the image above were received on 27th June, 2018. These are the oldest two events satisfying the where clause. Also notice that the oldest event has appeared first in the result set.

last

This works in similar to the limit keyword. This is because the the last keyword also picks the newest/latest events first and then limits the number of events to be included in the final result set.

Take a look at the example given below:

_fetch $CNAMTime, $LogEvent, $DstPort from event where $DstPort=23 last 2

The output is as shown below:

an example of first keyword

The search page was opened and query executed at 9:23 am on 28th June, 2018.

NOTE: Do not use the first or last keywords along with the group sclause.

As you see the timeslice keyword is followed by 1h which means one hour, you could also use ‘m’ for minute, ‘h’ for hour and ‘M’ for month.

1:Every deployment has the following indexes that can be queried, event, module, incident, usage, device, metrics, dnifevents, custom event store.