Mining BitCo^H^H^H^H^HSpam...

Mining for BitCoin is Easy.

Well, relatively easy.. Formula is a simple arbitration: Electricity + Silicon + Time = Maybe((N / 21,000,000) * $5... no, $5,000... err $19,000... no! $13,500!). The current implied volatility makes it harder to track day to day, but you get the idea.

But what about mining spam? How do you arb what's already in your inbox- for real dollars?

Transporting Email is Easy...

For anyone that's ever tried, there's no 'one way' to parse email, it's one of those long standing protocols that was developed during a different period of time, is extremely resilient, can carry just about anything, works across different encodings, systems and will do just about anything you want it to. The very thing that makes it so versatile- is the very thing that makes it extremely difficult to parse- well. Transporting email is easy, most of the headers and other implementation details in the RFC define that pretty well. It's what IN the messages that's important (and hard).

Mining [and parsing] Email is Hard

Things you need to consider:

  1. Encoding- what encoding is the message in?
  2. Is the encoding correct?
  3. Are there any headers that appear, abused? (eg: spam adding headers that try and fool your filters?)
  4. Is it HTML or Plain Text?
  5. Does it have any attachments?
  6. Does it have both content (HTML/Plain) AND attachments?
  7. Is it a forward or reply?
  8. Are any of it's attachments a forward? (eg: is it recursive, forwards of forwards? where do you stop?)
  9. Are the attachments encoded or tagged appropriately?
  10. etc
  11. etc..

Email isn't going away anytime soon, and if the growth of services like MailChimp are any indication, no matter how many Facebags, or tweeders or insta-who-gives-a-crap are built, email is [for the foreseeable future] the main hub of a persons personal and professional life. They may only check it a few times a day, but thats where their life's business is processed... and why phishers, scammers and spammers will always try to take advantage of it.

csirtg-mail-py

What we've found over time, is that there are a lot of decent "HowTo's" out there- especially around python when it comes to parsing email. What's missing are the higher level functions, how to mine content from the body, how to get some context around it, and how to get it into a feed. What we built, is a higher level library that sits on top of PYZMail (one of the better python mail parsing libraries we've found thus far) and abstracts out some of the more common functions, such as:

  • Give me the list of URLs from the body (HTML or PlainText)
  • Give me a list of the headers
  • Give me a list of any email addresses found in the message
  • Give me a list of any attachments
  • is this a forward?
  • is this a reply?
  • GDI JUST TAKE CARE OF THE STUPID PYTHONv2 vs PYTHONv3 UNICODE GARBAGE IM SICK OF THINKING ABOUT IT

We've built what takes this:

Delivered-To: phish@csirtgadgets.org
Received: by 10.112.40.50 with SMTP id u18csp916705lbk;
        Sun, 19 Apr 2015 05:50:04 -0700 (PDT)
X-Received: by 10.42.151.4 with SMTP id c4mr13784232icw.77.1429447803846;
        Sun, 19 Apr 2015 05:50:03 -0700 (PDT)
Return-Path: <advertisebz09ua@gmail.com>
Received: from gmail.com ([61.72.137.254])
        by mx.google.com with SMTP id s93si13575887ioe.52.2015.04.19.05.50.00
        for <phish@csirtgadgets.org>;
        Sun, 19 Apr 2015 05:50:03 -0700 (PDT)
Received-SPF: softfail (google.com: domain of transitioning advertisebz09ua@gmail.com does not designate 61.72.137.254 as permitted sender) client-ip=61.72.137.254;
Authentication-Results: mx.google.com;
       spf=softfail (google.com: domain of transitioning advertisebz09ua@gmail.com does not designate 61.72.137.254 as permitted sender) smtp.mail=advertisebz09ua@gmail.com;
       dmarc=fail (p=NONE dis=NONE) header.from=gmail.com
Message-ID: <BE5B7E8D.883B43A2@gmail.com>
Date: Sun, 19 Apr 2015 05:24:33 -0700
Reply-To: "HENRY" <advertisebz09ua@gmail.com>
From: "HENRY" <advertisebz09ua@gmail.com>
User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-GB; rv:1.8.1.19) Gecko/20081209 Thunderbird/2.0.0.19
MIME-Version: 1.0
To: <phish@csirtgadgets.org>
Subject: Boost Social Presence with FB posts likes
Content-Type: text/plain;
    charset="us-ascii"
Content-Transfer-Encoding: 7bit

Hello,
Boost your Facebook posts with a massive promotion 
and gain over 10.000 likes in total towards all your posts. 

We can promote up to 20 posts links at a time. 

Increase exposure with guaranteed promotion service.

Use this coupon and get another 10% discount on your purchase

==================
10% Coupon = EB2CA
==================

Order today, cheap and guaranteed service:
http://www.socialservices.cn/detail.php?id=9

Regards
HENRY
 

Unsubscribe option is available on the footer of our website

and transforms it into this:

[
    {
        "mail_parts": [
            {
                "sanitized_filename": null,
                "filename": null,
                "is_body": "text/plain",
                "disposition": null,
                "charset": "us-ascii",
                "content_id": null,
                "description": null,
                "type": "text/plain",
                "decoded_body": "Hello,\nBoost your Facebook posts with a massive promotion \nand gain over 10.000 likes in total towards all your posts. \n\nWe can promote up to 20 posts links at a time. \n\nIncrease exposure with guaranteed promotion service.\n\nUse this coupon and get another 10% discount on your purchase\n\n==================\n10% Coupon = EB2CA\n==================\n\nOrder today, cheap and guaranteed service:\nhttp://www.socialservices.cn/detail.php?id=9\n\nRegards\nHENRY\n\u00c2\u00a0\n\n\n\n\n\n\nUnsubscribe option is available on the footer of our website\n\n\n\n",
                "base64_encoded_payload": null
            }
        ],
        "body_email_addresses": [],
        "headers": {
            "received-spf": [
                "softfail (google.com: domain of transitioning advertisebz09ua@gmail.com does not designate 61.72.137.254 as permitted sender) client-ip=61.72.137.254;"
            ],
            "reply-to": [
                "\"HENRY\" <advertisebz09ua@gmail.com>"
            ],
            "mime-version": [
                "1.0"
            ],
            "user-agent": [
                "Mozilla/5.0 (Windows; U; Windows NT 5.1; en-GB; rv:1.8.1.19) Gecko/20081209 Thunderbird/2.0.0.19"
            ],
            "authentication-results": [
                "mx.google.com;\n       spf=softfail (google.com: domain of transitioning advertisebz09ua@gmail.com does not designate 61.72.137.254 as permitted sender) smtp.mail=advertisebz09ua@gmail.com;\n       dmarc=fail (p=NONE dis=NONE) header.from=gmail.com"
            ],
            "delivered-to": [
                "phish@csirtgadgets.org"
            ],
            "date": [
                "Sun, 19 Apr 2015 05:24:33 -0700"
            ],
            "return-path": [
                "<advertisebz09ua@gmail.com>"
            ],
            "message-id": [
                "<BE5B7E8D.883B43A2@gmail.com>"
            ],
            "content-transfer-encoding": [
                "7bit"
            ],
            "subject": [
                "Boost Social Presence with FB posts likes"
            ],
            "from": [
                "\"HENRY\" <advertisebz09ua@gmail.com>"
            ],
            "to": [
                "<phish@csirtgadgets.org>"
            ],
            "x-received": [
                "by 10.42.151.4 with SMTP id c4mr13784232icw.77.1429447803846;\n        Sun, 19 Apr 2015 05:50:03 -0700 (PDT)"
            ],
            "received": [
                "by 10.112.40.50 with SMTP id u18csp916705lbk;\n        Sun, 19 Apr 2015 05:50:04 -0700 (PDT)",
                "from gmail.com ([61.72.137.254])\n        by mx.google.com with SMTP id s93si13575887ioe.52.2015.04.19.05.50.00\n        for <phish@csirtgadgets.org>;\n        Sun, 19 Apr 2015 05:50:03 -0700 (PDT)"
            ],
            "content-type": [
                "text/plain;\n    charset=\"us-ascii\""
            ]
        },
        "urls": [
            "http://www.socialservices.cn/detail.php?id=9"
        ]
    }
]

by just doing this:

$ pip install csirtg_mail
$ cat spam.eml | csirtg-mail -d

This is going to be a multipart series in the coming months covering the different ways we can help you parse your inbox for the purposes of generating threat intel from it. We're going to introduce the concept of a Sinkhole API within https://csirtg.io and what you can use it for. We'll explore how we can parse email together, how we can generate feeds, apply machine learning to the urls, email addresses and domains. We'll also explore how we can generate feeds of suspicious attachments for use with malware sandboxes and in-turn generate more feeds of threat intel and how to natively integrate these concepts into SMRT as well as your normal IR processes.. all from the simple parsing of an email.

In the meantime-

spam2.jpg

Check out some of our Unsolicited Commercial Email or "UCE" feeds. You'll get an idea of how we're using these open-source libraries in production and generating new feeds of intelligence by simply redirecting spam to the sinkhole. They're not perfect, and they'll probably never be. They do however, provide some really interesting threads to tug on, especially if you're just getting started in the business. It's free Intel... and it's already in your inbox.

more to come...