Hunting for suspicious domains with Python and SKLearn

If you treated every suspicious domain as a coin flip, in a normally distributed sample, over time you'd have a 50/50 chance at being right.If you filter out the top 1000 domains from Alexa, you're probably at 70/30, if you weed out domains that have more than 3 dots in them, 75/25, 3 or more hyphens might get you to 80/20 and if the domain is greater than 15 chars, it's probably not worth your time....