unModified()

Break stuff. Now.

Explosive Workflows

When people don't fully understand what's going on

May 8, 2016

This week was... crazy I believe is the right word for it. I was dropped into two projects, both of which are in the middle of their sprints. Taking out a day to set up for both, that left me with just a few days to work on them. But that's cool because we have project managers who can move worlds and make room for stuff. One thing they can't do, however, is deal with problems related to development workflow. It's something only we developers can do. Here's a few that keep biting back and may need to be dealt with when I get back to work.

Explosive partial deploys

We use the deployment module strategy in Drupal to run developer-orchestrated operations during a deploy. In a gist, the procedure uses a dummy module's update hook to run operations under the guise of a module update.

Let's say you need to introduce to the system a content type called Blog Post. You create the content type, export the it's configuration as a feature, and enable the feature via a new update hook on the deployment module. The CI takes care of the deployment, and off the code goes to the QA environment.

Update | DB state
-------+---------
xxx0   | SoR state # state before update
xxx1   | Blog Post # state after update

Now let's say the QA found something missing, saying the Blog Post content type needs to have a field for a summary. Simple enough. You just need to add the field, re-export updated configuration and introduce it in another update. It gets deployed, the QA environment is updated and we live happily ever after. Right? Right???

Update | DB state
-------+---------
xxx0   | SoR state
xxx1   | Blog Post              # state before update
xxx2   | Blog Post with summary # state after update

Quick recap: We now have code that has update xxx1 and xxx2 to be deployed. But there's a problem here. Update xxx2 is written under the assumption that the target environment already has update xxx1, the QA environment. But the source of record is still at xxx0. If the code was to be deployed to the source of record, update xxx1 will execute, introducing the Blog Post content type with summary and rating fields already bolted on. This makes update xxx2 erroneous, causing the deploy to blow up in spectacular ways.

# Deploy to QA
Update | DB state
-------+---------
xxx0   | SoR state
xxx1   | Blog Post              # state before update
xxx2   | Blog Post with summary # state after update

# Deploy to source of record
Update | DB state
-------+---------
xxx0   | SoR state              # state before update
xxx1   | Blog Post with summary # Introduces Blog Post with all fields
xxx2   | Exploded               # Erroneous

To get around this, you should always sync state with the source of record to ensure it works on a real deploy. It's worth noting that this procedure, however, is actually a huge hack. Update hooks are meant to update modules, not entire systems. We're essentially using something outside of its intended design. While it does work, it's something to keep in mind.

Explosive identical setups

The thing I really like about functional programming is the concept of stateless, side-effect-free systems. Given the same input run through a side-effect-free system, you should always get the same output. I'm pretty sure any system can be modeled this way to some extent, be it code, math, electronics or even infrastructure. That means we can build deployment workflows that produce the same output all the time given the same code, configuration, data and deploy instructions.

However, one thing that perplexes me is that someone will always cry out saying their deploy or sync broke in a certain environment. But how could that be? We all have the same setup. We all have the same code. We all have the same configuration. We all have the same data. We all have the same deploy procedures. How can yours be different from ours? How did you do that? It's so mind boggling. In the end, after a few hours of debugging, we eventually found out that there was something different. The update started off with different data, the scenario mentioned earlier.

This is why I'm advocating containerization of setup into the team's workflow. The idea is to work on the same setup for across environments, across different machines, and across different platforms. This way, code, config, data and infrastructure will all be the same. Anything that breaks can be blamed on a defective keyboard controller. It may be harsh, but at least it focuses debugging efforts to user errors rather than setup-related issues.

Explosive tasks

Drupal is great. It provides one the ability to build websites without touching any code at all. To put you into perspective, I set up a small website for my parents where they can track the books they've read. Setting up the host took a weekend, but setting up the site and importing existing data from an older version of the site was just a Saturday night and a few cans of margarita.

However, the same can't be said when you're working with Drupal in large projects, where you have to version-control changes, follow deployment procedures, work with multiple developers and follow really tight deadlines. The workflow becomes necessarily complex to deal with the situation, but it also consumes enormous chunks of time - something we don't have a lot of.

Again, to put you into perspective, let's ask this management question: How long does it take to create a page that displays a list of things? This would be how I would go about doing it:

  1. Know what the content contains.
  2. Create the content type in the admin interface.
  3. Export the content type configuration so that it's versioned in code.
  4. Create and configure the view that displays the content type.
  5. Export the view configuration so that it's also versioned in code.
  6. Create a deploy update to import the configuration.
  7. Re-sync the development environment with production.
  8. Import the exported configurations
  9. Check if everything was configured well.

That's a long list of things to do. If all goes well, you stop at #9. But most of the time, that's never the case. You'll probably forget a piece of configuration, or some code from upstream is in conflict with your changes, or someone from the business updates the requirements. You'll probably be re-doing at least half, if not all, of this list all over again which can take hours.

Conclusion

Tight timelines and scarce resources are something the developers cannot deal with. But for everything else, especially painful development workflows, it's something a developer can deal with and must be addressed ASAP. Nothing can cripple a developer more than a setup that just explodes unexpectedly during crunch time. Don't let that happen. It's time for a change. If Time Warner can change, then so can you.