It's 2018 and you're STILL writing SQL? Why?

The security tools you used early in your career were mysql or postgres based. You're used to manually hammering out SQL statements to see what the data looked like, maybe even a custom update or two? You're earlier code automated some of these processes and the next logical step involved connecting to a database and processing the SQL statements.

Not a lot of thought went into this prototype- it was just that, a simple prototype. Something to help you with your daily security operations. But, one simple SQL statement becomes two and two soon becomes twenty. Pretty soon that well intentioned script is littered with custom SQL. Much like our previous post in automation, things become brittle, confusing and hard to adapt to new challenges.

You start asking yourself questions like:

If I trigger this statement, what happens?
How do I recreate the schema so I can test it?
What changes to the schema did I make last month?

There are a few problems here, the first being treating your data like objects and classes instead of raw SQL. The second being version control against both the objects and the schema. As you write more SQL your application becomes larger and complex. The more you have to keep track of, the better the chances of introducing a bug, or worse- destroying your data when you run the changes. You become paralyzed, your application suffers and so does your operations center.

If you're still writing SQL in your application, chances are you're not the type of person who's testing the application very well either. Think about why you're using SQL in the first place, what are you doing with the results? You're taking the list of records, transforming them into a list of dicts and treating them more or less like the objects. The only value your custom statements add is complexity, and complexity comes at a massive cost.

This is where Object Relational Mapping comes in handy, or "ORM". ORM is the design pattern that- when used properly maps your data schema to your objects in your programming language. It effectively abstracts away the SQL from your application and leaves you with a set of standardized interfaces to the data. This not only means cleaner interactions with the data- it can also now be more easily tested [with automatically generated data]. Love or hate the term 'ORM', if "being a developer" is not your full time job, you should be using them until you understand WHY you shouldn't.

SQLAlchemy is one of the more popular Python based ORMs, but there are others. This doesn't mean you never write SQL, it just means 99% of the time, if you try to, there's usually a better way to do it. Often times; by using pre-existing ORMs you get to leverage lessons learned (performance, etc) from others too. It's like having a second set of really smart programmers helping you build your application. If you find yourself trying to be too clever, you're probably doing something wrong.

We've been heavy users of SQLAlchemy in both CIF v3 and v4. It's certainly less than perfect, but if you read the search and upsert functions, they're outstanding examples of how to both leverage the abstractive power of an ORM with custom logic that can make things fast. The upsert function is incredibly hard to read- but stupid fast. The search functions are incredibly generic, mostly efficient but easy to read. Pretty soon we'll be able to take some of these upsert lessons learned, push them down the stack a bit (into an SQLAlchemy plugin?), making them easier to read, use and leverage- even if your own custom applications.

Alembic

The second part of this problem is tracking your database migrations. Add a table and want to track that add as a diff? Tools like Alembic may be worth considering. These tools enable you to treat changes to your schema like git commits. Want to roll back from a change? Want to treat schema changes like pull requests? Want to branch changes and test in a sandbox without having to manually make all those CREATE/DROP statements?

Traditional web frameworks like Rails have had this sort of thing baked in for a long time. That's not to suggest other frameworks like Alembic are inferior, just that some of the non web based frameworks have a little less traction. This makes the doc (and the tools) a little less user friendly and examples just a little bit harder to find. For example- https://csirtg.io being a Ruby on Rails application, I use database migrations LITERALLY ALL THE TIME. In my Python applications (CIF, etc).. I have yet to introduce the concept.

Actually, the reason I'm even writing about Alembic is to force myself to use it. While I've used SQLAlchemy pretty heavily in the past two versions of CIF things like Alembic hasn't yet been introduced. I hope to change that in the future..

(Unless someone sends me a pull-request first…)

Combat Automation in OPSEC - How to Stop Writing SQL

Alembic

Did you learn something new?

Related stories

The Cutting Edge of eCrime Research is...

Threat Feeds Based on Architecture- Not Indicators

Finding The Needle in the Needle Stack using 6th Grade Math