How We XP: Develop & Deploy

1 November 2015

Originally written (and posted to Medium) for a friend who while around software for a while, and seen some agile teams, he was just starting to manage one directly. So I wrote up this bit of a braindump.

–dwf


I’m not going to go into planning. That’s another post.

I’m going to assume that stories — the work that is to be done — are broken into small, individually-acceptable pieces of work. “Acceptable” means that a Product Manager (PM) can verify that developers did the agreed work completely. Keeping the stories small means it is easier to make progress, show progress, and validate progress.

I’m also assuming that the team is using git for version control, the developers are test-driving their code (Test-driven Development or TDD), the current test suite is green on a Continuous Integration system (CI), and that we are talking about a web app.

Development

  • All new development is done locally
  • Developers pull up to latest code and run the test suite. It should be green. If not, fix it and push back to origin before continuing.
  • Developers test drive the new feature.
  • When developers think the story is done based on the acceptance criteria agreed-upon with the PM, all tests should be green and the new functionality should be working as expected.
  • This is a great time for developers to find refactoring or dead code and clean those up as well. Tests should stay green.
  • Developers now pull up to latest code from origin in case another part of the team has pushed new code. If there are changes, merge and run tests again. If there are broken tests, fix them before continuing.
  • Optional: some teams like feature branches. They should be short-lived and merged/re-based with master often. The longer your changes stay away from master, the more tech debt you create. Merge early, merge often. Git-flow is a pretty good pattern here. = Once all tests are green locally and all changes are ahead of origin master, push to origin.

Continuous Integration (CI)

  • Your git repo should have a post-commit hook that starts a CI build. On every push to master, no exceptions.
  • This is a full test run on a server that looks as close to your Production deploy configuration as possible.
  • This can be a long pipeline. Run unit tests, run integrations and functional tests and regression tests - whatever. Often your pipeline will have later stages that integrate multiple apps/services and cross-browser tests. Make sure you are getting feedback at all stages of the pipeline.
  • When CI is green all the way, you should have a CI step that deploys to Staging. This should happen several times per day. And if it’s automated, even better.

Staging (aka Acceptance)

  • Staging is a deployment (or environment) that looks exactly like Production, with the same versions of OS, database, etc.
  • Staging doesn’t need to serve the same capacity — so maybe just a primary database (no replication or secondaries ). Since there will be fewer active users, you probably only need one app server.
  • Optional: If your system has feature that needs acceptance that requires master/slave DB or app server failover, then by all means install/configure Staging to match that configuration.
  • Everyone should know when a Staging deploy is done and what new work it contains. Notify the team via email or an information radiator.
  • PM should verify work is done as expected on Staging. Validating that work is complete is PM’s role, Not QA.
  • If work is not satisfactory, Reject the work. Rejected work should be started before any new work is started.
  • If something is broken that used to work, PM writes a bug. PM should prioritize bugs against other work. That is, a given bug may not be as important to fix as some new feature stories.
  • QA should be looking at Staging to find bugs, problems, regressions, or any inconsistencies. They should be using the product as a user would and should be looking for behavior that engineering and product didn’t find. This is often called exploratory QA. You should have your QA lead read the book on Exploratory QA, Explore It! written by my good friend and colleague, Elisabeth Hendrickson

Production

  • Production is where the shipping product lives.
  • When all stories on Staging have been accepted, PM should consider pushing that version of the code to Production.
  • If users will have a bad experience due to known bugs or an incomplete user experience, then don’t push to Production.
  • Push to Production has often as possible.

Other Deploys

This list is by no means exhaustive. But small-to-medium-sized projects tend to have one or more of these deploys.

Patch Staging

A copy of Staging, meant as a short-lived deploy just to verify a hot-fix. Using a second Staging means mainline work doesn’t slow down while a patch is developed and shipped. Same rules as Staging apply, but tear this deploy down as soon as the fix is deployed to Production.

Demo

A copy of Staging, meant as a short-lived deploy for specific set of users for a short amount of time. Like a PatchGood use cases for a Demo deploy include demoing to potential investors and user research. Tear this deploy down as soon as the demo is over or when feedback is no longer required.

Integration/User Acceptance Test

A copy of Staging that can be used for more real-world testing. This usually stays in line with the normal Staging and is used for CI pipelines. Some examples are performance testing, smoke testing, and long user scenario testing.

Not Scary

Continuous Delivery

What’s described above is Continuous Delivery; insist that code always move in one direction: towards Production. Also, set up the whole team so that it’s easier to move towards Production and release often.

Production Patches

Here’s how you should deal with hot-fixes:

  • Developers branch from the last Production deploy version.
  • Developers fix the bug.
  • CI runs against this version of the code to ensure no regressions.
  • Developers deploy this version to a temporary Staging environment.
  • PM accepts the fix.
  • Developers push this branch to Production.
  • Developers merge the fix back to master.

Sometimes Scary

Continuous Deployment

Some teams are able to push every green build to Production. I’m not going to cover this in this post. But there is additional process and code needed to support this.

Scary

These are the things you should truly worry about.

Long duration test suites

When your test suite or CI pipeline is too long, you run the risk of completing work faster than you can verify it. If you keep locally-run suites fast - say, under 10–20 minutes - then your developers can get features to staging multiple times a day. Long CI pipelines are less awful. For example, a web app can have a multi-stage pipeline that puts an all-browser compatibility suite near the end. Main development can continue based on success of unit tests, collaboration tests, and one browser stage of tests. Then the multi-browser suite can run once or twice a day, and browser-based problems from that stage can be bugs that get prioritized against other work.

Long-lived branches

Like long test suites, this can keep your feedback loops long. In this case you have bigger risks around the rest of your app changing faster than the branch’s work can keep up. If it’s taking too long to merge something back to master, then consider that the work should be broken up into smaller bits or should be started when its dependencies are more stable.

Complex merges

This usually comes from long-lived branches. Keeping all development on master helps to minimize this.

Manual deployment

…either manual configurations and/or inconsistent deployments from Product & Staging. These are often related. You want Staging, Production, and any other deploys to have consistent configuration. Bugs due to different code interpreters, databases or OS libraries are very subtle and guaranteed time wasters. Invest in DevOps or a Platform-as-a-Service (PaaS) in order to keep your deploys consistent and repeatable.

Breaking the Stream

Code should always move towards Production. It increases the confidence in the code and keeps regressions low. When a fix is made downstream of development, then it’s harder than normal development to integrate that fix back in development.