Hunting for suspicious domains with Python and SKLearn

Hunting for suspicious domains with Python and SKLearn

If you treated every suspicious domain as a coin flip, in a normally distributed sample, over time you'd have a 50/50 chance at being right.If you filter out the top 1000 domains from Alexa, you're probably at 70/30, if you weed out domains that have more than 3 dots in them, 75/25, 3 or more hyphens might get you to 80/20 and if the domain is greater than 15 chars, it's probably not worth your time....

Where are YOU in ten years?

Where are YOU in ten years?

After a few cycles of just looking at the data, a funny thing happens.. you start making choices a bit differently, if only because there's data staring at you in the face....

CIF, is not a car-boat.

CIF, is not a car-boat.

Stop. Seriously, just stop this nonsense. If you find a platform that "also does ticketing"; run, don't walk....

Beer, Squirrels and other Vetting Patterns.

Beer, Squirrels and other Vetting Patterns.

Randomly start talking to people at a conference. Head out to a bar, have a few beers, decide to build a mailing list and take down a botnet together. Create professional life partnerships. One of the more successful patterns, because you believe you can do anything when you get a few beers in you. All other (successful) patterns usually have origins in this pattern, or something like this- could be a bar, could be a game night at a coffee house, beer helps, but isn't always required.

Drinking from the FireHose.

Drinking from the FireHose.

... and in less than 5min you're streaming all the public data from within CSIRTG (which- at the time of this writing is comprised mostly of various types of scanning activity from honeypots as well as odd-ball spam/phishing urls, email addresses, email attachment hashes, etc..).

There is also an example feed and correlation tool to help get you started, maybe even generate an idea or two.  The correlation tool looks at all the scanners coming across all the feeds in real-time and simply produces a correlated indicator when it finds an indicator created across 3 different users within a 24 hour period. Crazy simple, yet produces a highly suspect list of suspicious actors that can be confidently acted on in your security infrastructure.

Not sure if chat bot...

Not sure if chat bot...

...with chatbots, we can hyper-focus those contexts and interactions to generate a more meaningful experience. If we're in a chatroom, talking about an indicator- the subtly of a bot PM'ing us and suggesting "hey! i know about that- here are some links.." mid conversation can be quite useful. You obviously don't want the bot to be too spammy, but with the right combination of query-ability and common sense, it can be the subtle difference between finding that breach you've been hunting for- and not...

Exploding Woodchucks...

Exploding Woodchucks...

A buddy of mine and I were talking one day about businesses. Working with them, partnering with them, and more importantly .. starting them. There's a famous saying, "ideas are a dime a dozen, everyone's got one and none of them are of any value". Finally, after years of watching fad's come and go- I get it. Something like 90% of new businesses fail in the first few years, not because their ideas were bad, but because of three things- market timing, money and execution.