A decade ago, we were presented with a CSV list of indicators and told "here's some data, go put it in your firewall". The feed usually had very little context and didn't always fit the architecture we had in place. If we had a list of domains, we simply resolved them and toss them in the firewall, in fact, some firewalls did that for us. When everything looks like a nail, all you end up buying is hammers, what could go wrong?

Ten years later, most of us have these same problems. Except now, we have tens or hundreds of separately isolated scripts that bring us the data. We also have tens (or hundreds) of scripts that pour over it to build a feed for each of our different devices (firewalls, mail-servers, dns-clusters, IDSs, etc). Instead of looking at the problem from the device perspective, we tend to look at the problem from an intelligence perspective first.

We say to ourselves- "self: I have thousands of IP addresses and I have a firewall, I should get all those IP addresses into that firewall". We rarely think: "self: what kinds of IPs would the firewall benefit from? which IPs make zero sense being in a firewall?". If we think in the latter context, it changes how we think about managing and deploying threat intelligence.

If we're concerned about incoming traffic (eg: suspicious logins) why would we try and match a threat feed against outgoing traffic? Similarly, if we're looking for compromised IP traffic (eg: infected customer beaconing to a C2), why would we try matching inbound SYN traffic? If we're using a SEM that correlates IP addresses, why would we feed it a list of URLs?

This is a relatively easy problem to solve and hardly worth writing about at first glance. Simply conjure a set of scripts that merges various types of queries and unifies a result based on the architecture you're tossing your data into, right? Right! If you adjust your thinking a bit- it changes your implementation of the underlying architecture.

For years we've been building security platforms that organize around the data rather than the active defense platforms the data is meant for. We figure out the newest, fastest way to normalize, de-duplicate, store and search indicator data and leave the last mile (eg: the most important part of the indicator lifecycle) as an afterthought. "Oh, right.. this data has to go somewhere to be useful!".

The history of CIF is not much different. Earlier incarnations were the result of a set of SNORT based (Perl) scripts, the intelligence was absolutely secondary. Somewhere along the way the focus became indicator storage mechanics and their (in-)efficiencies. This is not a bad thing, but over time an odd thing to get completely wrapped up in.

The thing that makes any threat intelligence platform useful is what it does with those indicators. Your platform can be the most amazing piece of database magic the world has ever seen. If it's still hard to convert your STIX to CSV to IDS rules, users will eventually find something that avoids you and your product. On the flip side- if your database magic is a hot pile of dumpster fire, but it makes getting indicators into my firewall, magic?

Screen Shot 2019-10-05 at 9.00.32 AM.png

Think about the platforms your users are trying to augment, build from there.

Threat Feeds Based on Architecture- Not Indicators

Did you learn something new?

Related stories

The Cutting Edge of eCrime Research is...

Threat Feeds Based on Architecture- Not Indicators

Finding The Needle in the Needle Stack using 6th Grade Math