DNIF Query Language (DQL)
- What is DQL?
- Understanding the DQL query pipeline
- DQL Directives
- Result sets
- Types of DQL directives
What is DQL?
Data that has volume (ever increasing data), velocity (more data every minute) and variety (unstructured data or data with varying structures) is called Big Data. As DNIF receives data every second from multiple devices and in various formats, it is a big data platform. The DNIF data store stores this big data. DNIF uses its own querying language, the DNIF Query Language (DQL), for querying and analyzing this data.
Although DNIF uses JSON for its internal processing (querying), DQL was created to avoid the hassles associated with writing queries in JSON. Instead, DQL is modeled on SQL, to give it a sense of familiarity and a shorter learning curve. However, DQL queries do not work exactly as traditional SQL queries, the underlying difference being that SQL queries were written for RDBMS systems while DQL works on big data.
DQL queries are used to communicate, retrieve and analyze information from a data store. They can also be used to:
- Power up a widget
- Create a customized report
- Create a module
DQL queries can be executed from the Search interface of the DNIF web console or they can be scheduled to run at fixed time intervals using workbooks.
Understanding the DQL query pipeline
A sample DQL query as written on the Search page of the DNIF web console is shown below:
Let us try to understand the different parts of this query using the diagram below:
There are three query functions here (as shown) that together form the DQL query. Each query function begins with a DQL directive. All query functions form a part of the pipeline. To add another query function to the pipeline (to pipeline another query function), use Shift+Enter. This brings up the pipeline operator ».
Query functions are labeled as f1, f2, f3, and so on, as shown below:
Let us now take a look at various DQL directives and understand what each is used for.
|_checkif||Applies conditional logic to the data in a result set.|
|_limit||Limits the number of rows in the result set.|
|_store||Writes the result set of a query in a profile.|
|_retrieve||Retrieves the result set stored in a profile.|
|_field||Adds a new field to the result set.|
|_table||Restricts the result set to specific fields only for the purpose of display.|
Outputs (results) of both, a query or individual query functions, are called result sets. The first query function retrieves data from the data store. This data/output is called a result set. The output (result set) of the first query function acts as the input for the second one (in the same pipeline). The output (result set) of the second query function acts as the input for the third one (in the same pipeline) and so on till we arrive at the last (final) result set. This result set is the output of the last query function as well as that of the entire query.
To explain result sets, we use the example given below:
_fetch $CNAMTime, $LogType, $EvtLen from event where $LogType=FIREWALL limit 5 >>_sort by $EvtLen DESC >>_limit 2
- The _fetch query directive (in the query function f1) retrieves data from the data store. Its result set is as shown below:
- The pipelined _sort query directive (in the function f2) sorts the result set returned by the fetch query function. The output (result set) is as shown below:
- The pipelined _limit query directive (in the function f3) limits the number of rows in the result set returned by the sort query function. Its result set, which is also the result set of the entire query, is as shown below:
NOTE: The terms result set and data stack can be used interchangeably to refer to the output of the query or individual query functions. However, to ensure consistency across all content, we will use the term result set .
Types of DQL directives
DQL directives can be classified as:
- Pass through directives
- One-way/Two-way directives
Pass through directives
Some directives, for example _store, _raise and _trigger do not alter the result set at all. For these directives, the input result set is simply passed though as the output result set without any changes. Instead of working on the result set, they perform other functions like calling APIs, raising modules or incidents or storing result sets permanently on hard disk. There are currently 5 pass through directives: _store, _raise, _trigger, _export and _table.
Two-way directives are those directives, that make a request and get a response. This request may be made to a third party or to the data store. One-way directives, on the other hand, are those directives which do not need to make a request and wait for a response. They operate on the result set of the previous query function. From the directives explained above, _lookup and _fetch are the only two-way directives; all other directives are one-way. The _lookup directive makes a request to a third party, while the _fetch directive makes a request to the data store.