Your First Real Data hunt
SDQL (Simple Data Query Language) is very similar to SQL, here is a quick tour to get you started with examples.
In this guide, we will introduce you to query directives and the basic concept of pipelining.
For these examples we are looking at events from an inbound firewall and an Apache Webserver but you may apply the same concepts to any data you have in your repository.
Your First Query
Let’s start with our first query:
_fetch * from event limit 100
The query returns all fields from the events repository to the maximum of 100 rows.
Here you have a data stack that contains a hundred rows with both firewall and webserver events. You can notice that the events marked in red are from the firewall and have a
$DstIP (destination IP). Based on this
$DstIP, DNIF adds layers to enrich the event.
Using Simple Query Directives
Let’s narrow down our result set to webserver logs only.
_fetch * from event where $LogType=WEBSERVER limit 100
This query is similar to the previous one using the
query directive where all fields are returned from the events repository. However, here, we only ask for webserver logs using the $LogType` field.
As expected, only webserver logs have been returned.
Finding Error Conditions
Now let’s try to find only error conditions. For the sake of this example, we’ll focus on events where the
$EventID = 404.
Web scanners are usually used by attackers to identify known vulnerabilities in applications. One key technique is trying to look for URLs of known applications. So, the scanner actually sends requests for URLs that might not exist on the server at all, therefore resulting in a “pageNotFound” or what we call a 404.
_fetch * from event where $LogType=WEBSERVER AND $EventID=404 limit 100
Nice! So we are now looking at all webserver logs with
pageNotFound (404) errors.
Now that we have webserver events with the “pageNotFound” error, let’s turn our attention towards IP addresses causing this chaos.
_fetch * from event where $LogType=WEBSERVER AND $EventID=404 group count_unique $SrcIP limit 100
We can add another layer of groupings to make the data stack more interesting.
_fetch * from event where $LogType=WEBSERVER AND $EventID=404 group count_unique $SrcIP, $SrcCN limit 100
Here we have added
$SrcCN (source country) to the aggregation.
Notice how we see multiple source addresses from the same countries. It would be nice if we could set a threshold and identify all IP addresses that have breached the threshold.
Comparing Grouped Data
It’s time for a logic statement. We can write a conditional query directive
_checkif to measure each event and display only the events that have breached the set threshold.
_fetch * from event where $LogType=WEBSERVER AND $EventID=404 group count_unique $SrcIP, $SrcCN limit 100 >>_checkif int_compare count_unique > 100 include
There are two query functions in the query viz.
_checkif. Both the functions execute sequentially using a pipeline. Read more about pipelines. The output of the
_fetch query directive is sent to
_checkif query directive as input. You can inspect the output of each query function by clicking on the header (marked in red).
Emailing or Exporting result sets
Next, let’s get the result set emailed to us.
_fetch * from event where $LogType=WEBSERVER AND $EventID=404 group count_unique $SrcIP, $SrcCN limit 100 >>_checkif int_compare count_unique > 100 include >>_raise notify_email [email protected]
We could trigger one of the many third party application integrations and get this sent to my Slack messenger or add the result set to a Google Spreadsheet.
Creating Cron Jobs using Workbooks
We can also run this query every five minutes turning this into a continuous alert.
Create a workbook by clicking on more options icon on the result header.
The query is automatically added to the workbook, assign a name and use a cron notation to schedule the query run interval. So now, we have an email every time a web scanner is run on your infrastructure.