Talks at RSAC 2019 — Rise of the Machines — AI and ML for Attackers
RSAC 2019 is a wrap up, and let me tell you — it was a #swag fest out here, with almost everyone moving around with an assortment of shiny badges, caps, T-shirts, bags and whatnot. Shiny objects aside, today, I was able to attend a most interesting talk by Étienne Greeff from SecureData, entitled “The Rise of the Machines: AI- and ML-Based Attacks Demonstrated.” Here are some key takeaways from the session.
The current scenario
If you’ve been in the cybersecurity industry for long enough, you’ve certainly heard about ML and AI, either from blue team folks or business executives, all talking about its usefulness in detecting threats and its analytical capabilities that surpass those of humans. However, sometimes, we also hear things like
ML- and AI- based cyberattacks are the next big thing.
So, where are we when it comes to these attacks? Do such attack vectors exist? If so, where are they? These were the very questions that crossed my mind during the session. Here is what I learned:
Challenges with availability of training data sets
As we probably all know, machine learning models need to be trained with annotated data sets, so that when these models are deployed, they are able to identify similar patterns or labels and flag them accordingly (“known knowns”). This is called “supervised learning,” which you can read more about here.
To highlight - there are numerous kinds of attacks plausible and depending on the environment being targeted - the number of permutations and combinations in which these attacks can be carried out is insane! However, do we have enough of these data sets? No.
“What about unsupervised learning?” you might wonder. While unsupervised learning is great for outlier detection, these models tend to produce unacceptable numbers of false positives in threat detection applications, making them impractical there.
Where are these offensive applications?
There are a few applications of ML/AI for adversarial purposes:
- DeepPhish, which uses deep learning to bypass AI-based phishing detection
- Fooling deep learning-based image recognition
- Web application attacks using reinforcement learning
- Exploit hunting in libraries using deep learning
Introduction to topic modeling
Topic modeling is an interesting concept. Put simply, it is a means of automatically identifying topics present in a text document. It is an unsupervised approach used for finding and observing the sets of words (called “topics”) in large clusters of text.
In this context, a topic can be defined as a repeating pattern of terms that occur together in a document. For example, the topic “healthcare” might be associated with the terms “health,” “doctor,” “patient” and “hospital”. Meanwhile, “farming” might be associated with “farm,” “crops” and “wheat”.
The New York Times uses topic models to improve the output of article recommendation engines. Topic modeling is also used to organize large datasets of emails, customer reviews, and social media profiles.
How, then, is topic modeling being used by attackers?
Here is a use case that leverages this modeling technique to help attackers exfiltrate selected documents.
Selective data exfiltration with topic modelling
Let’s consider a scenario where, after exploitation, an attacker has been able to successfully infect a machine and incorporate it into a C&C network. Having done this, the attacker now wants to exfiltrate data from this system.
One approach involves the use of topic modeling to find files that match topics like “network,” “configuration,” “hash” and “password”. This helps the attacker target specific files of interest, rather than downloading all available files and sorting through them later.
Selecting these topics helps the attacker understand more about the system, its users and the data it has access to. The attacker can use this information to plan lateral movement attacks as well, using information gained in the initial attack to identify other potential targets in the network.
This slide from the talk summarizes the method very well:
“Cobalt Strike” is software that simulates a C&C server. It was used in the demo to show exfiltrated documents from an infected machine.
Detecting ML-and AI-based attacks
Although the adversarial use of ML and AI is still in its infancy, with automated web application scripts and other uses as discussed above, these are enough to demonstrate the potential of these technologies for offensive uses.
When it comes to detecting such attacks, there are some key points to remember while devising detection strategies or rules in SIEM platforms.
- Never ignore endpoint telemetry data. (I have personally seen many SIEM platforms devoid of endpoint telemetry data, with security teams justifying the lack of data either by insisting they don’t have permission to get logs from these devices or based on the belief that antivirus software and other endpoint security solutions in use make monitoring these devices unimportant, which is sad.)
- It is important for security teams to know where critical assets and data reside so that adequate monitoring measures can be put in place.
- Use topic modeling defensively to identify documents of interest within your network.
- Monitor built-in libraries and Windows system processes. Most of these attacks rely on existing libraries and system resources for recon or to execute processes. Using profiling to develop and maintain adequate baselines is a solid approach.
This amazing talk offered a unique glimpse into the future of cyberattacks. The introduction of concepts like topic modeling makes it easy to see how offensive applications of machine learning are highly feasible.
Have you heard of any other ML-based adversarial applications? Do you have any strong feelings about the future of ML and AI in general? Let me know in the comments below. It’s been a really tiring day, so it’s time to sign off — until next time. So long!