How to build smarter IOCs using Python3

https://github.com/csirtgadgets/silver-meme

Working with indicators has always been a challenge. About a decade ago, I started messing with things like CSV and JSON, then IDMEF and IODEF, which then lead to things like CYBOX and STIX. After all those odd (and somewhat irritating experiences) we've really not solved many of the problems each of those technologies set out to solve. Making indicators easier to use, manipulate and share with our friends. If anyone tells you these "standards" make it easier to do these things- they're lying, or have never shared lots of data, or both.

If you're using CSV/JSON, you're likely parsing it from someone else. If you're using IODEF or STIX or some other hacked up version of monopolistic XML/JSON, you're likely using it from someone else. At-least these days, you do have some kind of library that will help you pull structure back out of the data. However, once you get the data, you then have to process it, manipulate it, add context, etc. DO SOMETHING WITH IT.

We used IODEF in the early days of CIF, if for no other reason than it was there. As we grew the user-base we learned that really all users wanted was the pure, simple indicator. A IP address, a URL, a little bit of context and a timestamp. Sure, some TTP's and maybe some attribution would be nice, but for the 90% of us who are just keeping our heads above water, a simple indicator is all we really need. The rest is just noise to us.

In the early days of CIF, I failed at this; by trying to be too clever, rather than simple. I paid for those sins in terms of complexity, rigidity, excess doc, performance and most importantly… users. Instead of focusing on the simplicity of "the IOC", I focused on the complexity of what I thought was 'the problem'. 50% of the code in the codebase was designed to take a format and distill it down into the basic parts, an indicator, timestamp, a bit of geo data and some tags. If i'm having to do this, there must be others with this problem too.

Taking a step Back

Sometimes to solve a really hard problem, you need to take a step back and get some perspective. I [accidently] did this by building CSIRTG, which is a Ruby on Rails application. While Python is mostly object oriented (or can be), Ruby really shines when you embrace the Object. You can still write semi- object-oriented Python and have elegant code, you can't say the same for Ruby, functional code (vs Objects) just feels odd. After building function after function that took an IP address and gathered it's ASN, Country Code, Longitude, Latitude, City, etc Ruby forced me to re-think the problem a bit.

This got me thinking, all those functions I had written to manipulate an indicator, why aren't they just functions of "an Indicator" object? I should be able to just load up an indicator in Python and call it's corresponding ".cc" or ".asn" or ".asn_desc" function. The less complex MY application is, the easier it is to maintain, the faster I can improve it. The easier it is to read, the more people will use it. The more people that use it, the faster it's improved and the more people we are able to teach and protect.

Now that we can "get maxmind geo information for free", what other things can we do to re-use this 'preprocessing' pattern? DNS Resolution? What about URL resolution? What about Passive DNS data? Threat Intel data? Seriously, did we just flip the whole paradigm of CIF on it's head? In what started out as a simple effort to reduce the complexity of our analytics, did we just experience a paradigm shift in the problem as a whole?

Do I really care what format you use when you communicate this data to others? Nope. I have my own opinions about formats and standards but i've also learned if you want adoption, keep things small and simple and let people build on your work. If your framework is the simplest, it can be packaged up and sent by just about anything else.

Obviously the more complexity you add, the less elegant a library becomes, but what if you create a plugin structure to help abstract all that away. If it's all about the indicator, have we just been thinking about this problem backwards the whole time? I've managed to abstract PAGES of CIFv4 code away from the main application. It's almost as if, the Indicator itself IS the platform and CIF is just another way to implement that basic pattern...

Did you learn something new?