Your First Real Data hunt


SDQL (Simple Data Query Language) is very similar to SQL, here is a quick tour to get you started with examples.

In this guide, we will introduce you to query directives and the basic concept of pipelining.

For these examples we are looking at events from an inbound firewall and an Apache Webserver but you may apply the same concepts to any data you have in your repository.

Your First Query

Let’s start with our first query:

_fetch * from event limit 100

The query returns all fields from the events repository to the maximum of 100 rows.

image

Here you have a data stack that contains a hundred rows with both firewall and webserver events. You can notice that the events marked in red are from the firewall and have a $DstIP (destination IP). Based on this $DstIP, DNIF adds layers to enrich the event.

Using Simple Query Directives

Let’s narrow down our result set to webserver logs only.

_fetch * from event where $LogType=WEBSERVER limit 100

This query is similar to the previous one using the ` _fetch query directive where all fields are returned from the events repository. However, here, we only ask for webserver logs using the $LogType` field.

Read more about query directives

image

As expected, only webserver logs have been returned.

Finding Error Conditions

Now let’s try to find only error conditions. For the sake of this example, we’ll focus on events where the $EventID = 404.

Web scanners are usually used by attackers to identify known vulnerabilities in applications. One key technique is trying to look for URLs of known applications. So, the scanner actually sends requests for URLs that might not exist on the server at all, therefore resulting in a “pageNotFound” or what we call a 404.

_fetch * from event where $LogType=WEBSERVER AND $EventID=404 limit 100
image

Nice! So we are now looking at all webserver logs with pageNotFound (404) errors.

Grouping Data

Now that we have webserver events with the “pageNotFound” error, let’s turn our attention towards IP addresses causing this chaos.

_fetch * from event where $LogType=WEBSERVER AND $EventID=404 group count_unique $SrcIP limit 100
image

We can add another layer of groupings to make the data stack more interesting.

_fetch * from event where $LogType=WEBSERVER AND $EventID=404 group count_unique $SrcIP, $SrcCN limit 100

Here we have added $SrcCN (source country) to the aggregation.

image

Notice how we see multiple source addresses from the same countries. It would be nice if we could set a threshold and identify all IP addresses that have breached the threshold.

Comparing Grouped Data

It’s time for a logic statement. We can write a conditional query directive _checkif to measure each event and display only the events that have breached the set threshold.

_fetch * from event where $LogType=WEBSERVER AND $EventID=404 group count_unique $SrcIP, $SrcCN limit 100
>>_checkif int_compare count_unique > 100 include

There are two query functions in the query viz. _fetch and _checkif. Both the functions execute sequentially using a pipeline. Read more about pipelines. The output of the _fetch query directive is sent to _checkif query directive as input. You can inspect the output of each query function by clicking on the header (marked in red).

Emailing or Exporting result sets

Next, let’s get the result set emailed to us.

_fetch * from event where $LogType=WEBSERVER AND $EventID=404 group count_unique $SrcIP, $SrcCN limit 100
>>_checkif int_compare count_unique > 100 include
>>_raise notify_email [email protected]

We could trigger one of the many third party application integrations and get this sent to my Slack messenger or add the result set to a Google Spreadsheet.

Creating Cron Jobs using Workbooks

We can also run this query every five minutes turning this into a continuous alert.

Create a workbook by clicking on more options icon on the result header.

image

The query is automatically added to the workbook, assign a name and use a cron notation to schedule the query run interval. So now, we have an email every time a web scanner is run on your infrastructure.