I built CSIRTG for a very simple reason, publishing and accessing threat intel is hard. It's trivial to stand up an AWS instance and publish text files, it's much harder to stand up a living ecosystem that scales. It's even harder if you want to manage different types of data. Publishing a list of urls is one thing, threading that with honeynet hits, suspicious email addresses and correlations is another.
What makes this different than things like DShield? Simple- the scope of your participation is unlimited. Not that you shouldn't participate in DShield, it has and continues to provide a massively beneficial project to the Internet itself. However, when you want to control YOUR data, or improve upon your data, projects like these fall short.
In the early days of CSIRTG we made a few very interesting plays; we contributed pull-requests to the Cowrie honeypot. This made it trivial to push honeynet data from Cowrie out of the box. We also designed some lower interaction docker bits to compliment cowrie. To be fair, I wrote this "for me", what I didn't expect was how many people would take advantage of this and where it would lead. What we've observed over the last few years is a number like minded people (they may be robots, I'm never sure) begin to publish their Cowrie data to CSIRTG.
What's neat about that- you not only have your own honeynet data, but access to many other Cowrie instances across the Internet. You can subscribe to each of these feeds individually or even subscribe to the Firehose and get all this honeynet data in realtime. With real-time access you can do things like correlation across honeynets. CSIRTG alone has multiple honeypots across the world all pushing data into its, imagine that data combined with many other CSIRTG users.
I’ve heard a number of arguments over the years downplaying the value of this kind of scale:
You can't trust data from users you don't know
What if state actors are poorly influencing the network?
I need full control of the honeynet end to end to be able to trust it
You can't capture value unless your have known honeypots in different networks
Most of these arguments are from well respected security operators, who often times don't understand the value of statistics and scale. While most of these concerns are completely valid, they ignore what you can do with scale. They ignore the value you capture from just having the data and then accurately accounting for these problems.
It may not be prudent to take the raw data from every feed and block on it, especially if you operate a large ISP. However, comparing and contrasting the data between known trusted feeds and unknown trusted feeds will absolutely provide you value if you model it properly. This enables you to gain valuable insights from a much larger and diverse honeynet than you could operate on your own.
This scale, in and of itself when leveraged properly not only provides a reduction in the cost of the data, but SIGNIFICANTLY improves the value of the data because of its diversity. Not only are you not responsible for maintaining the network- the value is in the sum of its parts. As new users churn through the system, the result data flow becomes ever more diverse with that comes more value. Something you'd never be able to obtain on your own as the network scales.