In a few previous posts, I demonstrated the API way to make threat predictions using CSIRTG. I realize for most of us programmers, integrating things of this nature is easy, but for most, we still need that bridge to help us wrap our head around something. In a lot of cases (CIF or otherwise) it always astonishes me when, we write all the tools you'd need and someone inevitably asks: "how do I get that with curl?".
I've spent a lot of time and effort building out console commands to make things easier. As a user of other software, that's exactly one of the first things I think about. Why it's always an afterthought always makes me laugh, but something i'm working to fix the more software I build. We all get so caught up in the "elegance" of our API, that we sometimes forget what it's like to be on the other side of the interface. We forget to ask the most important question- "How can I prove the value of this feature, in 10 seconds or less".
One of the problems I've run into when working with CURL and various REST API's is that ugly JSON string you get back. Most people might suggest, the default response should just be CSV when you're pulling with curl, wget or a-like. The problem with this, not everything fits in a CSV and usually there's a better way. One of the neat tools I've found of the years is called `jq`. It enables you to pipe a messy JSON blob in and dump a colorful, well structured picture out. This makes it incredibly easy to visualize the output of something without writing a single line of code.
Statistics are easy to wrap your head around, as long as you hide them from other people.
The results you get back from the API are simple. The first being a binary 1|0 representing a prediction greater than 84% (statistically significant, meaning "yes, probably") as a 1 and less than 84% as a 0. The second giving you a more granular response, allowing you to make more refined decisions based on a floating point representation of the prediction (eg: 0.65 or 65%). This enables you to get quick, binary answers when you're testing the API, leaving the harder comparison math to the API, while at the same time giving you better context when you're ready to do more of the decision making.
The theory being, most people only care if something is "likely" bad or "not likely bad". That grey squishy middle area is not that if you have to think about it the very first time you integrate it. You want to know, if you tie this output into a production tool, it's only going to fire a warning shot when something is LIKELY to be STATISTICALLY bad. The rest, you can figure out once you've proven concept. If it takes you more than 10 seconds to think it through- the more LIKELY it is to find it’s way further down the TODO list.
The further something is down your TODO list, the LESS LIKELY (eg: < 84%) you'll spend time proving value and actually using it.
(See what I did right there?)