Managing Infrastructure

Munich, Germany

In the last go around I covered the advent of the public cloud which is effectively a hosting resource for the web. Before the cloud, developers could pay specialised companies for hosting services. There were a few problems with this, and one of them was reliability at scale. Irrespective of whether a device is playing the role of a client or a server, they all have resource limits, and if you impose enough stress onto a computer system you can find that out first hand.

Large companies took their destiny into their own hands and managed their own hardware and networking infrastructure in an arrangement colloquially referred to as "on-prem". There are a number of reasons to own infrastructure today such as data sovereignty (see society), trade secrecy for certain technological projects (a market consideration) and it may be the case in some circumstances that the economics of on-prem are more favourable than what an equivalent arrangement in the public cloud can offer.

Infrastructure in a software development context isn't restricted to hardware, today there is "infrastructure as code" which is used to reliably repeat the establishment of equivalent deployments, and for many software projects it is quite common to have multi-stage pipelines to build and deploy software from code, and for one or more of these stages to directly reference primitives that are associated with their project infrastructure.

To put this in more practical terms, it is common for mobile software projects to utilise build pipelines - a collection of steps where a remote machine (either in the cloud or on-prem) accesses a project's code, performs a sequence of predefined actions that ultimately produces executable software, and then uploads that build to one or more places, i.e. the mobile storefronts. There are many companies that offer such a service with varying levels of customisation and integrations with other software platforms on offer. Generally you are able to select aspects of the computer system that the work will be performed on in terms of both hardware as well as software. Choosing a provider for continuous deployment and making decisions within the framework that provider offers is a subject that has a clear connection to previous posts on technology choice and the supply chain.

But why use an external process at all? Why not perform these steps on machines that the developers use? In many cases it is possible, and maybe even the right thing to do. One reason comes back to device limitations, build pipelines can be resource intensive tasks that may run for a very long time, which can inhibit the ability to do anything else with the machine performing the work. Another relates to sources of truth.

The transformation process that takes code and gives executable software involves the production of intermediary files that are used to inform subsequent steps. It is not uncommon for small differences to emerge in these files between different machines, a phenomenon that produces a scenario where two colleagues with seemingly identical setups have different outcomes - one capable of producing software and the other cannot for indiscernible technical reasons. It is also a relatively common practice for certain pieces of information to be stored in a different place to where the code is stored. These could be large assets that are omitted for performance reasons when a developer is building and running a software project locally, or plain text secrets - pieces of information that are deliberately separated from the code for security reasons.

In the post about versions there was a discussion about how evolving software over time can introduce defects in the realised digital product, but there was no explanation about how versions as managed from a code perspective. If a codebase is comprised of a set of text files, how can we examine the files associated with versions 1 and 2 of a unit of software effectively? Users might imagine that we copy the original files to a new place that becomes our new working directory. Certainly there are projects where this does happen, but the professional way is to use an important piece of project infrastructure for millions of projects around the world - source control.

Source control allows developers to efficiently navigate past, present and possible "future" states of a codebase, something that is often of interest during the investigation of issues. Git is the most popular tool to do this today, and it is designed to work with text files. Git helps to highlight changes in content so that humans can easily observe them. Many pieces of software have large non-text content (e.g. high definition images or video) and these are generally stored through other means. The alternative is that we transmit significantly more information over the network for no reason other than having all assets live in a single place, which does carry a certain attractiveness but this is outweighed by the reality that a single video file could be larger than 100x the size of all other files in a software project combined.

This decoupling of components introduces a way for the software that developers experience to diverge from the resulting software downstream that is experienced by others as there is not necessarily a guarantee that non-code components are up-to-date at all times. It is possible to introduce automations (another piece of project infrastructure) to solve this class of problem but much like the production of code there are trade-offs to consider. If a number of expensive operations are running on a regular basis on a development machine it may impact the ability to perform other tasks (e.g. "the actual job") efficiently. Much like the design and maintenance of the digital factory, the design and maintenance of project infrastructure involves a balance of considerations and the "right" configuration at any one time may be wrong when evaluated later.

Until next time.