Prototyping CIFv4 - Engineering a Realtime Intelligence Platform

In the first part of this series, I was about ~12 hours into a prototype and had "pings" flowing back and forth. Since then, i'm into CIFv4 about ~85 hours and have data flowing end to end. Not terribly complex data, mostly ssh scanners and phishing urls, but enough to test indicator types as well as their respective data enrichment processes. Prototyping is a wonderful thing when you can manage to find little bits of time here and there. Within a month or two, you've solved another piece of the puzzle.

In this cycle, I was able to knock out most of the goals on the original list. The current features are far from perfect, however most of the outward facing interfaces are nearly complete. This means, less chance for scope creep and more room for regression testing. Newer features that don't match an existing interface will have a harder time making their way into the code base, but an easier time defining what v5 looks like. It also means CIFv4 will ship "production ready" in months, not years. The internals still need a bit of work, but that's just iteration, not innovation. This is important because, when you've got kids, well.. you get the idea.

Feeds, Indicators and APIs

The main focus of the last ~60 hours has been APIs, feeds and real-time streaming. This includes the  HTTP REST API as well as realtime ZeroMQ streams and to some extent, WebHooks. Earlier versions of CIF had some ambiguity when it came to "indicator queries" vs "feed queries". Most new users didn't understand the difference, and to be fair, we didn't articulate it well. In CIFv3 we added a restful /feeds endpoint, but it didn't come across as clear as it probably should have.

New users are really looking for "I can't shoot myself in the foot feeds" first, and indicator queries second. In CIFv4 the REST API is driven that way, simplicity first, complexity as you read more of the doc. There is a single /indicators endpoint that, when given very few parameters returns you an aggregated, whitelist enhanced, highly confident and recent feed in either JSON or CSV [depending on your 'Accept' header]. If you start asking it more specific questions, such as `?q=example.com`, it'll then switch into "indicator query mode" and give you more un-filtered results. If you use that as a feed in your firewall- that's on you. This seems like common sense, it just took a few iterations to flush out.

Additionally, we're moving towards the Swagger style of documenting APIs. While we tried this a little in v2 and differently in v3, but never quite got it right. However, recent improvements to the Swagger framework [and more importantly their business model]  make me feel a bit more comfortable betting on it as a future API technology. Not only that, but it's architecture may help off load some of the client building exercises in different languages thus making CIF easier to use for non-native python users. The less friction there is, the better the experience should be. 

Docker The Things.

Docker has proven to be both a blessing and a challenge to this project. While in v2 and v3 we've worked really hard to create a great "easy button" experience, there is no denying the power of `docker pull …`. Ventz has done some great work adapting the v3 easy button for his docker-cif repo, but if feels like every time we make changes to v3, it causes a bit of pain and frustration downstream. The challenges between CIF and Docker aren't so much that they're in-compatible, CIF just has some complexity to it that's better suited for multiple docker instances, rather than an all-in-one. New users however, use things like docker as an "all in one tester" platform, so we've been stuck between a rock and a hard place.

There is also a bit of customization to it (multiple API keys, pulling feeds, etc) that require a bit of nuance to get right if you want to leverage `docker pull && docker run` rather than trying to fiddle with docker-compose to simply test-drive the system. None of this is significantly hard- we just hadn't put a lot of thought into it when developing v3 (~2014). Docker was still relatively new, while we kept a close eye on it, we didn't dive straight into it.

Now that it has a few years of community support under it, we're building v4 around the idea that "it has to install in docker with as little hacking as possible". Using docker AND the easybutton as the benchmarks, rather than just the easybutton alone. There are just as many `docker pull` folks out there as there are `bash easybutton.sh` Ansible folks and there's less reason in 2018 to not natively support both. Our goal is to slowly chip away at the time it takes to test-drive the framework and create a near frictionless experience for the community.

Drugs are bad ummmmk?

Drugs are bad ummmmk?

Real-time Threat Streaming

CIF has always has a deeply hidden, ZeroMQ streaming interface, but it was never quite ready for prime time. We've focused most of our time and attention on the RESTful bits, mostly because that's where our core audience was expecting. Even when we wrote clients to help do the heavy lifting, most users simply wanted a nice easy HTTP style interface they could CURL against and do what they wanted with the data [see Swagger comments above]. With that in mind, spending time on the streaming parts of CIF didn't make a lot of sense because the user community just wasn't there yet.

Recently, as folks get used to pulling and using feeds, it's as if some/most of them are now expecting some kind of real-time stream of data. They realized, hey- I don't want to pull this feed every hour for up to date data, I just want to stream it from CIF into Blah! Let CIF do the heavy lifting so my other tools can focus on what they do best. This scenario is also one of the reasons you don't see us focusing on some shiny UI and more so on the "move data" API.

We're not your hunting tool, we're your threat streaming platform. CIF has always been about streaming indicators, it's the reason we've built it on top of a framework like ZeroMQ. Similar to how Netflix started by delivering DVDs in the mail (HTTP/Fetch), at some point they knew they'd be streaming movies directly into your eyeballs. They just needed time for users [and technology] adopt [and/or catch up] to their vision of the content biz. In v4, we're making the real-time streaming actors more prevalent and first class. They won't be on by default, but they'll be better documented, with examples to show both how they work and how you can take advantage of them. 

At their core, they act very similar to some of the existing enrichment interfaces using ZeroMQ's "PUB/SUB" architecture. This does present some odd scaling challenges that still need to be overcome. As you distribute cif-routers sideways (eg: autoscale, other..), if you subscribe to one, the others need to make sure their messages get to YOUR subscription. While most frameworks handle this problem using a more centralized broker approach (kafka, rabbit, redis, etc), we're trying to stay away from that. In many larger CIF deployments there is no "central router", they are distributed and, with the power of things like AWS, temporary. Centralized brokers have a tendency to cost more [in the long run] and be more fragile, traits we try to avoid on our designs.

It may be that we use something like Redis for now, but the ultimate goal is to use something more distributed like ZYRE, ZeroMQ's P2P framework. This would make "the mesh", "the bus" rather than relying on yet another middle piece that could brake, slow down or slow down the system as you scale out. I've been a contributor to the ZYRE framework over the years, and while it's a little more complex vs some of it's more centralized counterparts, the leverage we may get from it should exceed the work we put in [at-least that's the bet we're making and using to differentiate from other platforms / services].

WebHooks

As more web based services begin to interoperate, the concept of WebHooks has gained more and more mainstream traction. WebHooks are "the HTTP way" of event triggering, meaning, when your system want's to trigger event somewhere else, it simply makes an HTTP POST call to another web-service that's expecting it. While not very efficient, it's a quick and dirty way to make slower event calls to other services using simple HTTP methodology. It's meant for slower and less frequent types of communication, which might require things like common authentication, etc. It's trivial to implement and the protocol is obviously mainstream, so it just made sense.

In CIFv4 we're introducing the concept for things like searches and Slack, albeit very slowly. This won't be a mainstream feature, but I wanted to "start" by example, if only to show what searches users are making in Slack. Hopefully others will build this platform into other areas of their workflow and build some neat stuff on it. It's not limited to 'search' traffic, but that seemed like the right place to start.

I don't think you want your Slackbots chirping every time there's a new indicator flowing through the system, but you do if a human asks a non-feed related question. Something that tips users off to the fact that someone is looking for something, others may have noticed. Things like this help generate interest and conversation, which is helpful when it feels like you're hunting alone.

Intelligence Profiles

When we talk about The Last Mile, we've mostly covered how to adapt an intelligence feed to your end point devices. The caveat to this, it assumes you know what kind of feeds are in the system, what they're typically used for and more importantly IT REQUIRES YOU TO THINK… It's 2018, I don't have time to think, I just want magic. Some of the most frequently asked questions we get these days are:

  • What are all the tags in the system?

  • What confidence SHOULD I pick [if I don't want to shoot myself in the foot]?

  • Which indicator types should I pick?

  • WHAT DO YOU MEAN I SHOULDN'T SHOVE ALL THE DATA INTO MY FIREWALL!?

Again, all reasonably obvious questions, but when you're knee deep in the guts of this type of system, not something you readily think through… well. From my perspective, you should just know.. which, is one of my character flaws (something my wife is helping me work on :)).

So, in v4 we're introducing the concept of end-point profiles. If anything they'll give you a quick and dirty feed that "should go in a domain sinkhole" or "a bro sensor". They'll let you test-drive a feed, but also help you learn how feeds work in the system with very little friction. The profiles automatically configure things like, confidence, how many days to go back based on the sensor type and what kind of indicators you're probably looking for. This is something we should have had from day 2, you just don't know what problems are worth solving until you release. Then, you FAQ the answer, then you get sick of pointing at the FAQ.. and realize you were just wrong in your initial assumptions and fix it. Four versions later.. (free.. as in beer).

For the past 10 years i've had this vision of CIF being like a powered exoskeleton, where it wraps around your work and gives you both power and leverage. Almost like your own personal AI API when it comes to threat intel. CIFv4 is really starting to realize that vision, in terms of it's machine-learning, streaming and webhooks capability. However, it doesn't belong to someone else- it belongs to you...
 

And you thought there were no more original ideas...

And you thought there were no more original ideas...

Did you learn something new?