I've seen presentations that prove this, and the AI does a better job at crafting phishing urls with a higher success rates than most humans do. This is where we start thinking of the larger AI frameworks as layers..
The real problem we're trying to solve here is context. We're lifting a bunch of "tokens", that usually have more than 3 characters, surrounding them with context and applying a probability value to them. All this with the express purpose of taking the high value indicators and applying them to our defenses in real-time. Not trivial, but not hard either. I'm not an SKLearn or NLTK expert- but I do know what it feels like to block accidentally netflix.com at the border….
Pretty soon, you find yourself back, staring at this "snort signatures" pattern problem. A small, elegant mathematical formula representing something your sensors should be detecting. All it's missing is a little normalization and a bit of an ever evolving data model behind it, representing the current state of the Internet…
The less noise your hunters have to weed through, the more focused they become. The more focused they are, the more likely they'll find that needle. Often times, as is the case with most breaches, enough positive edge is all it takes….
The problem wasn't trying to manage and automate the code deployment, as much as it became managing the playbooks that deployed the application(s). We could have kept those playbooks in with the core code, but that's more over-head in the repo and more people touching the core code that didn't need to....
...without ANY machine learning or NLTK magic, you have a very basic and generalized pattern (or "algo" in hipster speak) that can parse and normalize, most types of feeds.
If we are to succeed at making YOUR Internet a better place, we need that information to federate out among our peers. We need each of our models to be predictably influenced by our friends to help protect ourselves against threats we do not yet know about. Those models need to be transparent in order for us to gain confidence in them...
Randomly start talking to people at a conference. Head out to a bar, have a few beers, decide to build a mailing list and take down a botnet together. Create professional life partnerships. One of the more successful patterns, because you believe you can do anything when you get a few beers in you. All other (successful) patterns usually have origins in this pattern, or something like this- could be a bar, could be a game night at a coffee house, beer helps, but isn't always required.
It's one thing to think of "statistics" in the general sense. For instance,
"100 unique IPs scanned my darknet today".
This doesn't really tell me anything useful, other than (assuming DHCP churn is nil in a given 24 hour period) there's a bit of noise on the line. 100 by itself isn't a really useful number, it's probably not even statistically relevant, is it a holiday? was part of the Internet down today? was it the same device behind a series of NATs?....
Applied research, content and tools to help you solve real problems.
Did you learn something new? How much is that worth to you?