In what i'm going to suggest are first and second precursors to this post, I talked a bit about productionalized deployment of CIFv3. I showed how anyone can deploy a new CIF instance in ~10min or less. That said- I effectively glossed over the parts where we test build each release. I'm not talking about the simple QA pipeline stuff, but how we test the end to end framework and make sure all the parts are doing what they're supposed to be doing.
What does that mean? It means, even though i've got great test coverage over the functions, are we testing for odd situations that crop up when we're passing all sorts of odd data through? Functions are simple, platforms are complex and while it's trivial to develop many different function tests, it's harder to test a system end to end before cutting each release. It's even harder if you're trying to test an open-source platform across different environments. For example, your tests may pass and the platform appears functional on Ubuntu 16 LTS, but it might fail on RHEL because it uses a different version of Python.
This doesn't even get into other more modern environments like Amazon Linux and Docker where your platform may just perform differently under different circumstances. The simple answer to this is just- "well, stick to the platforms your customers use, eg: the ones that pay the bills". Short term, listening to your market is important, but so is future scale. If you orient your build tests around a specific set of customers, you might alienate an entire audience of new customers in the process too. As with life, there's a balance and with most things, building open-source seems to help illuminate where the edges really are.
Back in the day
In the early days of CIFv0, v1 and v2 we did everything manually- even the install (well, v2 had a crude easy-button, but that was it). This means, when we were ready to release, we
Spun up a VMware VM of Debian (even though we were developing on OSX)
Manually ticked all the boxes and ran through a script (not a bash script, a poorly written "oh this got us last time, does that search still work?" text file).
Spent the better part of an hour trying to remember what things we tested last time
Oh crud, we found something broken
Lather, Rinse… Repeat.
This process took… a few hours. If something didn't work quite right, we had to fix it.. and start over.
Releases took months, not minutes, which means if you provided us with feedback (or a bug), it took us MONTHS to get it fixed. If the fix was wrong, well, it's free software, what do you want?
The easy solution to any poorly written 'script' is to- WRITE A [SHELL] SCRIPT! Something you can easily run against a freshly deployed system. This enforces a few good things:
You run the same end to end platform tests every time the same way across each architecture
It gives you a "cheap" way to toss more "gotchas" on the pile
It's accessible to traditional non-programmers (eg: sysadmins, security eng's, etc)
It's easy to understand, portable
When the script gets too big, it helps you identify which tests are important, group them and pushing them down the stack to the traditional QA pipeline (eg: function tests, CI, etc)
Gives you a piece of mind, that if- at-least all these things pass, I can release more often.
There are a few modern ways to go about this and it's usually with Docker or Vagrant. While i'm not going to get into which is better for you, lately i've started building for Docker when i'm in prototype mode. When I hit late stage alpha- and into beta builds i'm more apt to start building out a simple Vagrantfile. This helps me triangulate what kinds of things i'm going to need in the final stages when I start Ansiblizing what has become 'the easy button'.
If you try to accomadate too many platforms (or builds) too early, you'll spend more time trying to maintain them across releases and less time solving the real problems. In the earlier stages, pick one release type and stick with it. Documentation of this (a good FAQ too, help engage people and set expectations appropriately) is helpful too. The focus should be on releasing prototypes [and writing about them], not on if your release works on "RHEL v5". If your prototypes are good, you'll build excitement and engagement and either someone will want to pay you to support RHEL or they'll contribute it back.
Whether or not you use Docker or Vagrant is really up to you. Lately i've used Docker to prototype because the simplicity of "docker run" and magic happens has slightly less friction to it than fiddling around with Vagrantfile. That said- when i'm trying to debug stuff, fix a serious issue with a platform and/or testing a platform end to end, I almost always use some pretty interesting Vagrantfile setups to test multiple distro's. A vagrant VM is about as close to bare metal (or VM) as your going to get, which is probably what your target audience is using in production these days anyway.
Assuming each pull-request is tested in some form of CI, by the time you get to a release you'll have some simple helper scripts that will configure your Vagrantfile, boot the correct platform and execute your test shell script. This will both deploy the correct system services, but also start throwing data at them to see if they respond correctly. Are the Hunters resolving the URL's? Are IP Addresses getting geo tagged appropriately? Am I able to create tokens? Does the VM run out of memory during these tests?
Next Steps
Today I still eyeball the output of the tests, and I sort of know what to look for. The next obvious step is to programmatically check the output of each test (eg: was there output? was the output what I expected?). This is a bit trickier with live data, given that process is actually pulling a LIVE openfish feed, with effectively random data (eg: we can't test for specific data, but we can test that data was inserted and it had a similar look/feel to what we expect). Something simple would start by writing the output to a file and making sure the file-size was greater than 0 bytes.
Why should you invest in these ideas? The more often you're able to release something, with confidence, the more engagement you'll get. The more engagement you get, the more customers you'll be able to target. If you're able to teach them these types of ideas, they'll run faster WITH you rather than creating legacy technical debt.
If you don't, your competition already is.