F your formats, just show me the data.
...without ANY machine learning or NLTK magic, you have a very basic and generalized pattern (or "algo" in hipster speak) that can parse and normalize, most types of feeds.
...without ANY machine learning or NLTK magic, you have a very basic and generalized pattern (or "algo" in hipster speak) that can parse and normalize, most types of feeds.
If we are to succeed at making YOUR Internet a better place, we need that information to federate out among our peers. We need each of our models to be predictably influenced by our friends to help protect ourselves against threats we do not yet know about. Those models need to be transparent in order for us to gain confidence in them...
I've spent about a year thinking about v4 and about 12 hours writing it (most of which has been re-factoring older code and wondering how drunk I was when I wrote it). If you look at the repo today, most of it looks and feeds like v3 but with most of the complexity removed (eg: lots of refactoring for performance and readability). Last night, I was able to get "pings" flowing back and forth between the client and the storage thread, which is good sign...
What good is threat intel, if you have to spend time thinking about it?
If you treated every suspicious domain as a coin flip, in a normally distributed sample, over time you'd have a 50/50 chance at being right.If you filter out the top 1000 domains from Alexa, you're probably at 70/30, if you weed out domains that have more than 3 dots in them, 75/25, 3 or more hyphens might get you to 80/20 and if the domain is greater than 15 chars, it's probably not worth your time....
After a few cycles of just looking at the data, a funny thing happens.. you start making choices a bit differently, if only because there's data staring at you in the face....
Stop. Seriously, just stop this nonsense. If you find a platform that "also does ticketing"; run, don't walk....