Why does Software Suck? Part II Sunday, January 10, 2010 at 2:27 pm
Last week I went over some of the reasons why modern-day computers are what they are. Today I plan to go over some reasons why, regardless of what it’s running on, writing correct software is hard – one of the hardest engineering feats out there. Not in terms of requiring lots of intelligence, but diligence.
When I read The Mythical Man-Month a few months back, I was struck how dead-on accurate it was about the pitfalls of software engineering, even though it was written back in 1975, when the craft of software was so much younger. But here ware, more than thirty years later, and although we’ve built systems up higher and higher on top of yesterday’s systems, and we have the internet and dual core processors and the Playstation 3 and Photoshop, most of Brooks’ critiques are just as valid today as they were then. He begins his book by comparing software engineering to the tar pits of bygone eras, trapping powerful dinosaurs and sabre-tooth tigers, sinking them, struggling with all their awesome might, into the pit. If you want to understand software – or even how to manage extremely complex projects – I can’t recommend the book to you strongly enough. Here will follow some things I got out of both the book and my experiences, pulling from my often-inaccurate memory.
Here, in a nutshell, are the problems with software engineering:
1. You must be perfect. You cannot be almost-perfect or leave a few things ambiguous. Nothing is ambiguous because everything must become some series of zeros and ones for the processor to run. Every line you write, every bit that is compiled, every flip of some switch deep inside the computer’s memory bank must be perfect. If it isn’t perfect, maybe it’ll work right most of the time. And maybe sometimes it’ll crash terribly and destroy all your data. Human beings are not accommodated to working perfectly and without flaw. In fact, sometimes it is the imperfections – the noticeable paintbrush strokes, the symmetrical dimple, the beauty spot, the awkward laugh – that we find charming. We adapt to imperfections and interpret them. You do not have such wiggle room in programming a machine. A computer does not interpret; it is a dumb machine that does exactly what you tell it to. A slight mistake causes significant consequences.
2. You must be perfect in continually unique tasks. When you are laying a building, you have simple repetitive tasks that must be done. All must be done well – laying the foundation, constructing support beams, laying bricks – but these are a couple of tasks repeated thousands of times. Laying the second brick is not a different task than laying the first brick, it is just another brick. After several hundred, you become better at them and become a bricklaying expert. There is no such analogue in computer programming. If a programmer finds himself writing the same piece of code, what he does is separate that task into its own subroutine, and whenever he needs it done, he makes a call to that one task. This was the whole point of a computer – if you define how to do something once, you don’t have to define it ever again, and the computer will do it over and over again for you. What this means is when you are making a program, you don’t have the repetition of laying a thousand bricks; once you’ve figured out how to lay a brick, you define the steps needed to lay one brick and then just make a subroutine call to do that every time you find yourself needing to lay a brick. You don’t see the problem of bricklaying ever again (unless you find out you did it wrong and need to modify it). This means that when a programmer is writing a set of tasks, almost everything is unique. There are generally not repetitious programming tasks which must be done over and over again, everything is approached afresh, defined, and then submitted to some library of common tasks. And each task must be done perfectly.
3. Reading code is much more difficult than writing it. It is very difficult to explain this to someone without the experience of working on a software project. Programming and coding are not easily-visualized disciplines. In fact, there is nothing inherently visual about them at all, regardless of how many flow-charts you may want to make. A programmer goes from a pure (or vague) algorithm in his head straight to a list of concrete instructions. These are not lists like “Pick up milk at the grocery store” – but rather explicit instructions about memory structures and how to process those memory structures. Again, these are not visual and beyond a certain level of complexity cannot be described comprehensively with any two-dimensional visual aid. When it’s all at the front of your mind, and you’re seeing the math of how it works, it’s relatively straightforward to define. However, unless you are extremely strict about writing down why you’re doing everything as you do it, you can go back to these mathematical definitions of how to move memory around and ask yourself what on earth did I do. And if it is hard, a month or two down the line to interpret what you yourself did, it is far more difficult to interpret what someone else did. And if you are on a large software project, you will have to look at and fix problems in other people’s code. If you fail to interpret precisely what they were trying to do, you are likely to introduce further problems. I assure you there are lines in Windows code that no one any longer knows what they’re there for. But if you remove them, the product breaks. This is why software projects tend to get larger and larger, and never smaller – no one knows what the “legacy code” is (that’s what we call this old code nobody knows what it does anymore but it’s somehow necessary) or how to fix it.
4. On large-scale projects, you have many external dependencies. It doesn’t sound so bad if you have to rely on someone else to do their job, but remember from 1) and 2) above that all these jobs must be done absolutely perfectly. I promise you, no matter how great a company is, not everyone there will write perfect code. Any given software engineer writes code that other people rely on and he has to rely on code written by other people. Consider Jim, who’s in a team of people writing the task that renders images when you double-click on an image file. Jim has to rely on code written by people working in the file system, code which takes something like a filename and gives him back the series of zeros and ones which he will eventually make into an image. If there’s anything wrong in the file system code, Jim’s code will not work. Jim’s code also relies on the code that makes a window with the little ‘x’ in the corner and file drop-down menu, and if there’s anything wrong there, Jim’s code will not work. And so on for other tasks which determine things like the monitor size, what kind of monitor it is, what the color scheme on the computer is, and so forth. And this is all before Jim even gets down to brass tax. If those teams have failed, Jim is going to be behind schedule (and quite possibly harassed by upper management for being behind). After that, Jim has to figure out his part of the code – determining what kind of image file it is, then processing it, then displaying it. Once Jim’s written this code, it may be called into by other people – the file system folks may then again re-use his code to display a preview image, or another program may want to show an image in the same way and re-use Jim’s code to do that. And if those people find problems in Jim’s code (or if they try to use it in a way Jim didn’t anticipate), then their code will fail and Jim will have to fix what he did. Every single one of these literally dozens of dependencies for something as simple as displaying an image on-screen is an opportunity for something to go wrong, for a bug to creep in, or for communication to fail between people and between teams. And if the product ships with any problem left unfound or unfixed, it is left for people who come along later trying to use the product as a start point for a bigger project to discover a work-around for the less-than-perfect product.
Issues 1 & 2 (and to some extent, 3) above are about programming anything – whether in a group or solo. Because perfection is required, fixing a problem in code – or as we say, fixing a bug – has a law of diminishing returns. Every time you try and fix an imperfect piece of code (and remember, it may be imperfect because something you are depending on is imperfect), you have some probability of introducing another imperfection, and possibly a devastating one. The larger and more incomprehensible a programming project becomes, the more difficult it is not to introduce a new bug. Although this is true for individual projects, it is especially true when more than a handful people are working on the same product. This is why large-scale programming products begin limiting the number of fixes they will make before the product ships – because every time you “fix” something you have some probability (dependent upon the complexity of the code and the thoroughness of your engineers) of breaking something else.
Issues 3 & 4 are specifically about large-scale team projects. Issue 3 – the difficulty of interpreting code – is why once you have a product, parts of it remain unchanged for very long periods of time, even if everyone recognizes that they are buggy or need to be changed. It is just too difficult to interpret exactly what something is doing and why it is there. And 4 simply exponentiates the problems of 1, 2, and 3, because every new dependency is an opportunity for a schedule to fall behind, communication or interpretation to break down, or for a bug to be introduced.
Although all these problems are, I think, part of the nature of software, they can be mitigated with good practices. I have not seen very many good practices put into practice, but in theory they could be. To avoid the problems of imperfection, rigorous testing can be demanded for every task in a program, on top of rigorously-defined functionality for each task. In most places I have been, a lot of code has been written before the programmer had a clear idea of what it was needed for. Although planning for the product as a whole is always undertaken, planning for each step and each piece is needed as well. Up-front planning is expensive, but in the end it will create better software, and make it easier to read code (if each piece has a rigorous definition). Likewise, testing is usually done from a high-level perspective, but if every task – every entry and exit point of every function – were tested for completion and correctness, this could cut down substantially on imperfections that creep into software. Again, the reasons this is not done is because doing so is very time-expensive, but a failure to do so just increases end-of-cycle testing and the scope and number of bugs in a product. And the final, and I think one of the most significant issues – cross-dependencies on large-scale products – can only be gotten around by clearly defining interchangeable parts to a programming product. The industrial revolution turned on the concept of interchangeable parts – the firing piece of one musket was the same as another, because all the pieces that touched other bits of a rifle were built to a particular specification. Computing has yet to catch up with this concept. I have yet to work on a project where low-level internal interfaces were clearly defined. On the level of the product as a whole, inputs and outputs to a program are clearly and rigorously defined. However, inputs and outputs from one programmer’s code to another programmer’s code are not defined at all but rather vaguely and sloppily hashed out as we go along. This is why the guts of software often look to me like a plate of spaghetti; if there were a more clearly architected inside to a product, I think this would help tremendously with all of the problems of software – bugginess, late ship schedules, difficult maintenance, and so on.
There is one final issue which exacerbates all the above problems, although it is not an issue of programming but of capitalism. Although I am attempting to make the case in the above, that it is much more difficult to make functional software than it is to make a functional building or a functional piece of hardware, in one sense software is much easier than any of these: software can be changed, and distributed, on the fly. Once you build a building, to modify it you typically have to shut it down, move people in, and spend days or weeks or even months retooling it. In software, it is a button on a keyboard that changes these. It is a few hours to recompile the program and then you can just update a released product with a patch online. Software is by its nature ephemeral. From a venture capitalist point of view, because software can be changed quickly, the investment input is minimal compared with other ventures. It’s because investment is small and turnaround time is quick that we saw things like the dot-com bubble. In many ways, software is a sort of venture capital wet dream. It’s cheap and changes fast. Everyone can get rich quickly (that’s the theory, if not the reality). This impulse toward capitalist ephemerality works against the necessity of software to be written perfectly. Perfection takes time, and when near-perfection can be done quickly to the siren-song of a million potential dollars, the time to make software air-tight, or even to perform well, is rarely taken. That will put you behind-market! And so we get buggy, better-than-nothing software offered up by the marketplace.
Welcome to software. I have no easily-implemented solutions to the above, and any solutions I do have conflict with the drive to market.


