My journey to DevOps has been long and storied. I’ll stop short of saying it was troubled, but there was definitely trouble along the way. Let me explain…
In my first software development job, circa 1992, there were product managers (who were business focused), development managers (focused on the development process), and implementation managers (IMs), who were focused on the deployment and maintenance of the developed software. They managed schedules, servers and environments, and the deployment of software. After months of development (often six to nine months or longer), IMs managed at least three months of additional testing, rollouts, and sign-offs.
This was not efficient. And although I am describing a situation that took place decades ago, the problems we encountered still persist in many organizations today in various forms.
Software Deployment Problems Before DevOps
Here are some specific examples of the problems we dealt with in my first developer jobs before moving to DevOps. The problems differ in specifics, but share the common theme of requiring complex, costly, slow and error-prone manual deployment processes that didn’t involve development. There was a conscious decision by management to put up a wall between development and operations staff.
Big Iron Deployments Meant Iron Walls
The first example involved software deployed on PCs and DEC Vax systems (ranging from VAXstations to mainframe class machines), respectively. The tools to deploy to, monitor, and manage these systems were mostly custom, with some DEC-specific tools. On the PC side, custom software configuration management tools were mandated until PVCS and SourceSafe (prior to the “Visual” prefix) were allowed. In all cases, developers were hands-off in terms of software deployments, with employees kept on staff specifically to handle them.
Releases often reached users more than a year after we kicked off a project. By then, everyone was on to the next release, with little time or energy available to focus on customer feedback provided on previous releases, so users felt they were being ignored. Developers became frustrated with processes that were increasingly unnecessary, and architectural changes took years to implement. All of this resulted in long-suffering users, slow uptake and migration to improved technology, and limited business growth.
The second example (at the same company in 2002) involved Java servlets and JavaServer Pages, all of which are straightforward to deploy. By this time, implementation managers were phased out, staff with specialized mainframe skills was reassigned, and a modern datacenter was in place. However, an old-fashioned conservative IT approach remained. Specialized deployment tools, rigorous processes, and dedicated release staff were still in place, even for simple Java deployments. The staff would get annoyed because I would call them, ask questions, and offer to help or provide tools and guidance to try to make their lives easier. They insisted on being left alone, and would tell me when things were deployed (days later, all for a servlet-based application).
The Glass Wall
At a more recent job, in 2011, things were better, but not ideal. A wall between development and operations staff still existed, but I would call it a glass wall because there was open communication. For example, developers were not allowed to deploy software or “touch” production machines, but we were required to verify success. As a result, we often collaborated through instant messaging tools or by phone during a deployment. This was an improvement, as the collaboration was open and friendly, and the tools used were more standardized.
For instance, we used Git and Jenkins to manage code and builds. But our automated deployments stopped short of production due to corporate policy. The operations staff effectively served as developers’ hands, typing the commands to do the actual deployments. Although the collaboration and tools were improved, a near-100% manual production deployment process remained.
Stability to the Rescue—Not!
In all cases, the wall between development and IT was a mistake. The reason it has remained there is historical. Systems used to be shared, and more fragile. A full staff was once required to manage and maintain the Big Iron of the past. I recall IT avoiding reboots of servers and mainframes out of fear they wouldn’t boot back up properly, or at all. Even as hardware, operating systems, and environments have improved, and stability has grown, that old and conservative mindset still persists.
Today, we’re comfortable with shutting down computers, whether they’re in a datacenter or our laps. In fact, with virtualization, thousands of OS instances may be spun up and down per minute in a moderately sized cloud environment. Surely stability is the reason we have DevOps, right? Not the case.
I recently took part in a project where a DevOps practice was the goal. However, due to perceived lack of time and resources, it was difficult to put in place. With the pressures of meeting deadlines, automated procedures to handle mundane tasks were always postponed. “I’ll just do it this one time and automate it for the next release,” we would tell ourselves. Unfortunately for most, that time never comes, and the processes tend to stay manual. The IT systems and our software were fine—Stability suffered due to rushed, error-prone manual processes.
The DevOps rescue began when we moved to Git and cloud services. We’ve added continuous integration and deployment pipelines that automatically keep our target systems up-to-date with the same set of software. These deployments are triggered as part of an orchestrated workflow that we’ve defined. So although they’re automated, we remain in control of when and where our software gets delivered. We continuously create additional sets of automated tests that are part of the CI/CD pipeline and let us verify as much as possible without human intervention. The only time someone needs to get involved manually is due to an issue or failure (reported automatically), or when a customer signoff is needed.
Containers have helped us to experiment more, and using the cloud to host them has increased our agility. We’ve started taking steps to integrate Slack into our workflow, which is effectively the start of a ChatOps implementation. And with tools and solutions available that integrate into our existing pipelines, we anticipate this effort will be completed sooner, and with less effort than our current sprint. Although our sprint development velocity has remained the same, the increase in stability (and less rework needed as a result) means our overall velocity has improved, and continues to improve.
DevOps Delivers Stability
It doesn’t matter that our schools, computer hardware, operating environments, development tools, and even our languages and platforms have delivered increased stability. Software stability comes more from process and practice than design.
Practicing the deployment process continually leads to perfection, by ironing out the kinks and bottlenecks. This also reduces risk, since releases involve smaller amounts of change deployed more often. To many, this is counterintuitive, and is part of what I call “the deployment paradox.” Limiting deployments to once per year due to their perceived complexity and risk only increases their complexity (due to their larger size), and unnecessarily increases risk. To lower risk and complexity, we need to involve more people in the deployment of smaller chunks of new software to more users, more often. We also need to equip ourselves with the right tools and automation to achieve success.