An IPv6 Regex is Slow, Here's a Better Way

The last few years, i've found myself developing the same types of functions over and over again. Functions that manipulate an indicator, functions that communicate over a network, functions that manipulate the network itself. Over time, I find many of these functions get copy, pasted and augmented through the life cycle of their respective applications if only because the larger "what is this?" pattern hasn't really presented itself.

In my post about "the indicator as a platform", I began to describe the "ah-ha!" moment when that higher level (err lower level?) pattern emerges. When this happens, abstracting away all the common code into a package (eg: csirtg-indicator) is easy. It turns out Python has a lot of useful, lower level libraries when it comes to network "stuff", what it lacks is some well known higher level libraries to abstract away the low level network "stuff". Things like "What's my default gateway?", "What's my default IP address?" or "is this an IP address? is it an IPv6 address?".

These are non-trivial questions, and if you search for "IPv6 regex" you'll likely find more than one answer. But did you know, instead of simply regexing all the different ways to detect an IPv6 address, you could simply ask your platform? "Hey, this is a v6 address, right?" and it'll raise an error or it won't. As it turns out, this is incredibly more future proof, simpler and FASTER than regex when you're trying to churn through billions of indicators every day. Instead of scripting through the byte code, you're going straight to the source, "hey kernel, would you route this?".

I've found over the years many people like to prototype writing their own code first. I myself do this all the time, and it is a great way to solve problems. However, that prototyped code eventually becomes production and all those functions get lost into the app. One day- all hell breaks loose and you end up blocking netflix.com (true story). All this because one of your super complicated (it worked at the time!) regex's is off by a digit. You didn't take the time to abstract out the generic stuff, write some tests and try to get more eyeballs on the problem, competitors or otherwise.

Often times, it's simply a matter of discovery. You don't know what you don't know and it's hard to find rock solid libraries that just get the job done. You're busy, have deadlines and it's often better to solve the problem in front of you than try to research what's already out there. Or maybe you've found the library, but it doesn't quite do what you want. The repository owners are complete boneheads, never merge pull-requests, over critique your contribution and have an outstanding queue that reaches in the hundreds.

Do some research before you try re-writing a piece of code

Chances are good someone has written SOME of the functionality you need. Augment that, there be things in there you haven't thought of yet. Try to contribute back, but if they are boneheads about it, fork and improve, but keep an eye on the upstream. They might solve things you just haven't reached yet. Spend some time researching what's already been solved. It may cost you some hours up front, but it may save you months in the long run.

Open vs Closed

There's a lot to be said for both proprietary code and open-source code. One isn't better than the other and they have different use-cases. For lower level library (eg: common functions), it's always better to have more eyes on the problem. This will offset both your development costs and QA costs in the long run. Your code will attract other like minded pioneers, maybe even a competitor or two. In the long run those people weren't going to pay you for your work, but you can take their lessons learned and sell them to your customers. You should be vigilant about this- it will keep you far ahead of your competitors if you learn how and where to balance this.

Explain Context

Write about your lessons learned. In security operations, we're often far too busy to sit down and explain WHY we wrote what we wrote or how to use it. A lot of published libraries have README's and some API doc, but many of them lack context. A lot of times I find myself looking at a simple set of functions and wondering "how did you get here?". Sometimes, that's the difference between leveraging something and not. If I can better understand what the library author was thinking, I might be more willing to invest in their vision of the problem rather than my possibly more flawed view.

I see a lot of SOC code [poorly] written because of these things. In the short run, things work OK. In the long run these great ideas, "written in a bubble" usually become harder to scale and maintain. One set of functions leads to another, 5 years later becomes a monolith only the original author understands. Worse yet- 100 other SOCs have their own version of the exact same functions (eg: is this an IPv6 address?). Instead of having the resources to invest fighting the bad guys- they're spending the bulk of their resources trying to adapt and evolve their poorly written tools.

Competing tools and ideas aren't a bad thing, in fact competing ideas is a GREAT thing, they just come at a cost that should be identified early on. Similar to the indicator library I published as a common way to manipulate IOCs, i'm pushing a lot of the common, lower level network based logic found in CIF, as well as other network based tools i've built into a common library too called csirtg-network.

Learn how to outrun the competition, even when you give away some of your ideas for free.



Did you learn something new?