DNIF Query Language (DQL)


What is DQL?

Data that has volume (ever increasing data), velocity (more data every minute) and variety (unstructured data or data with varying structures) is called Big Data. As DNIF receives data every second from multiple devices and in various formats, it is a big data platform. The DNIF data store stores this big data. DNIF uses its own querying language, the DNIF Query Language (DQL), for querying and analyzing this data.

Although DNIF uses JSON for its internal processing (querying), DQL was created to avoid the hassles associated with writing queries in JSON. Instead, DQL is modeled on SQL, to give it a sense of familiarity and a shorter learning curve. However, DQL queries do not work exactly as traditional SQL queries, the underlying difference being that SQL queries were written for RDBMS systems while DQL works on big data.

DQL queries are used to communicate, retrieve and analyze information from a data store. They can also be used to:

  • Power up a widget
  • Create a customized report
  • Create a module

DQL queries can be executed from the Search interface of the DNIF web console or they can be scheduled to run at fixed time intervals using workbooks.

Understanding the DQL query pipeline

A sample DQL query as written on the Search page of the DNIF web console is shown below:

Sample query in DNIF

Let us try to understand the different parts of this query using the diagram below:

Understanding query functions

There are three query functions here (as shown) that together form the DQL query. Each query function begins with a DQL directive. All query functions form a part of the pipeline. To add another query function to the pipeline (to pipeline another query function), use Shift+Enter. This brings up the pipeline operator ».

Query functions are labeled as f1, f2, f3, and so on, as shown below:

Output of first query functions
Output of second query functions
Output of third query functions

DQL Directives

Let us now take a look at various DQL directives and understand what each is used for.

Directive Description
_fetch
  • Only directive for directly querying and retrieving data from a data store.
  • Lets you filter, group, label and limit your search results.
_sort
  • Sorts the retrieved data in the ascending/descending order based on a specified field.
  • The retrieved data may be of type integer or string (strings will be sorted in alphabetical order).
_agg
  • Counts rows in a result set.
  • Groups rows in a result set and performs arithmetic operations (on a field) in each group.
  • Groups rows in a result set and counts the rows in each group.
  • Performs arithmetic operations on a field in the entire result set.
_checkif Applies conditional logic to the data in a result set.
_buffer
  • Filters rows from a result set based on the time of the previous occurrence of a specified field (value).
  • The eliminated (filtered) rows, thus, will not be passed to the next query function.
_limit Limits the number of rows in the result set.
_export
  • Extracts and represents a result set in XLSX, CSV, or JSON formats.
  • The result set is sent via email.
_store Writes the result set of a query in a profile.
_retrieve Retrieves the result set stored in a profile.
_trigger
  • Triggers an action in response to the occurrence of an event.
  • It can be used to:
  • send email notifications and customized reports, or
  • call a plugin from the trigger_framework to trigger an API
  • Uses trigger plugins to make these API calls.
_field Adds a new field to the result set.
_raise
  • Raises modules created in DNIF.
  • Raises incidents against one or more modules.
  • Sends an email notification/alert by either specifying the email addresses directly or by specifying the name of a DNIF notification group.
_lookup
  • Makes API calls to third party databases to cross-validate security information about a domain, URL, IP address, file and so on.
  • Uses lookup plugins to make these API calls.
_table Restricts the result set to specific fields only for the purpose of display.

Result sets

Outputs (results) of both, a query or individual query functions, are called result sets. The first query function retrieves data from the data store. This data/output is called a result set. The output (result set) of the first query function acts as the input for the second one (in the same pipeline). The output (result set) of the second query function acts as the input for the third one (in the same pipeline) and so on till we arrive at the last (final) result set. This result set is the output of the last query function as well as that of the entire query.

To explain result sets, we use the example given below:

_fetch $CNAMTime, $LogType, $EvtLen from event where $LogType=FIREWALL limit 5
>>_sort by $EvtLen DESC
>>_limit 2
  • The _fetch query directive (in the query function f1) retrieves data from the data store. Its result set is as shown below:
  • Result set for _fetch query directives

  • The pipelined _sort query directive (in the function f2) sorts the result set returned by the fetch query function. The output (result set) is as shown below:
  • Resultset of _sort query directive

  • The pipelined _limit query directive (in the function f3) limits the number of rows in the result set returned by the sort query function. Its result set, which is also the result set of the entire query, is as shown below:
  • Resultset of _limit directive


NOTE: The terms result set and data stack can be used interchangeably to refer to the output of the query or individual query functions. However, to ensure consistency across all content, we will use the term result set .

Types of DQL directives

DQL directives can be classified as:

  • Pass through directives
  • One-way/Two-way directives

Pass through directives

Some directives, for example _store, _raise and _trigger do not alter the result set at all. For these directives, the input result set is simply passed though as the output result set without any changes. Instead of working on the result set, they perform other functions like calling APIs, raising modules or incidents or storing result sets permanently on hard disk. There are currently 5 pass through directives: _store, _raise, _trigger, _export and _table.

One-way/Two-way directives

Two-way directives are those directives, that make a request and get a response. This request may be made to a third party or to the data store. One-way directives, on the other hand, are those directives which do not need to make a request and wait for a response. They operate on the result set of the previous query function. From the directives explained above, _lookup and _fetch are the only two-way directives; all other directives are one-way. The _lookup directive makes a request to a third party, while the _fetch directive makes a request to the data store.