Wisdom from the Internet Tuesday, January 26, 2010 at 10:33 pm

Just some tomfoolery from the internet, a little website that compares Google search suggestions. When you’re typing into the search box on Google, they’ll make suggestions of what you could query for based on common searches. This little web app lets you compare two search terms with suggestions of popular searches Google would give you to ‘complete’ your query. The two queries are on the left and right and the Google suggestions are in the middle. Thicker arrows represent more popular searches. Check it out here.

Some of my favorite comparisons:

Everyone gets to be an idiot, Hitler, and the Antichrist. How they both manage to be stupid and an evil mastermind at the same time escapes me:

It’s at the top of everyone’s mind:

It’s all a matter of perspective:

Internet wisdom:

More wisdom, this time on science:

Pop culture:

My Body, My Self Sunday, January 24, 2010 at 1:53 pm

As those who’ve kept in touch with me personally know, six months ago I was regularly doing yoga once – and sometimes twice – a week. For various reasons, I’m not doing yoga anymore (although I sometimes think about getting back into it). Despite yoga, I still have some of the tightest hamstrings on the planet, can’t touch my toes, and can’t do half moon. At least I got crow down.

Fast forward to this past week. I have some knots in my back that’ve been annoying me, variously, while working out or moving my arms in the wrong way. Thinking I could just get this “taken care of” like a routine physical check-up, I made an appointment for a massage this past Thursday. It was the first massage I’ve had. Although it was a good experience, the masseuse said (and I could tell) that I have a whole lot of tension, the type usually associated with stress (neck, shoulders, back, jaw; the tense hamstrings are who-knows-why). She made the comment afterward that she could’ve done a deep tissue massage but it would’ve been very painful on me because I hadn’t learned how to relax and receive a massage, and I would’ve been sore for days afterward and wouldn’t've liked it very much. She also said, finding that I work for Microsoft, that having a lot of Microsofties come through the spa, and being married to one, it is clearly a high-stress job and people who don’t figure out some way to deal with the stress, after ten or so years of it their body ends up being destroyed by it. This was not the first time I’d heard this (and I think I can point to people at the company who are examples of this).

I noticed a lot of similarity here to yoga practice. The point of yoga was to get to the end and do savasana, which allows your body to completely relax, after limbering up your muscles and tendons through yoga. Though there are various types of yoga, throughout it you are supposed to be focusing on your breathing, the impermanent and necessary taking and giving of breath, and going through the poses to loosen yourself up and be centered in your body and in your breath. Although they are of course radically different, both massage and yoga are meant to bring yourself back into your body and work on relaxing and loosening up all the various parts that are tight (usually because of stress, or just misuse). Then you start carrying that practice through the other parts of your life.

The point of all this, and something I’ve been learning, forgetting, and relearning over the past year, is that who we are is deeply tied up with our bodies. Learning to relax isn’t a purely mental exercise (as if there were some differentiation between mind and body), but it’s a physical exercise. Relieving stress isn’t an exercise on being mentally relaxed, it’s an exercise in healthiness. You are your body. I am my body. My personality is some combination of the biochemistry of my physical brain. What I do and how I act is some combination of the biology of my body interacting with the biology of my brain. That’s it. To be what I want to be, to be healthy and balanced and whole means affecting my body just as much as my brain. There are many different ways of being whole and balanced, and I have a pretty clear idea of the way I want and the way that suits me best, but it is a coherent symphony between body and mind, which are inseparably tied up together in that thing I call myself. I’m going to schedule some massages once or twice a month so I can get to the point of learning to be relaxed and undoing all the knots of stress I carry, usually without realizing it, to the detriment of my body an myself. And I may need to throw yoga back into the mix.

Music Alert Sunday, January 17, 2010 at 7:08 pm

I was a bit worried about OneRepublic’s new album because I loved their first one, Dreaming Out Loud, so much. It’s hard to follow an act like that. So with some fear of being let down, I purchased their latest album, Waking Up. I was very pleasantly surprised. If their first album had echoes of techno and pop influences, their second album adds rap influences, while still not quite being any of these. Ryan Tedder slips seamlessly between melodic singing and speaking while the rhythm and music go on behind him. The group keeps the background strings (cello, violin in some songs) and piano that have helped give their pieces a distinctive flavor, and combined with the various musical influences, Waking Up makes some layered and complex songs. But it is still a pop album, if a well-executed one, so don’t expect classical music. The album overall is much more upbeat than Dreaming Out Loud, which was a bit darker and more contemplative, whereas Waking Up is mostly a happy album, almost deliriously so at times. There are several songs where I find it difficult not to dance (awkwardly, of course) or sway along with the music. All in all, I remain very impressed with OneRepublic, and look forward to their future musical development. Do yourself a favor and buy the album, if you haven’t already. Here’s a taste of it, a song called “Good Life” and one of my favorites:

Why does Software Suck? Part II Sunday, January 10, 2010 at 2:27 pm

Last week I went over some of the reasons why modern-day computers are what they are. Today I plan to go over some reasons why, regardless of what it’s running on, writing correct software is hard – one of the hardest engineering feats out there. Not in terms of requiring lots of intelligence, but diligence.

When I read The Mythical Man-Month a few months back, I was struck how dead-on accurate it was about the pitfalls of software engineering, even though it was written back in 1975, when the craft of software was so much younger. But here ware, more than thirty years later, and although we’ve built systems up higher and higher on top of yesterday’s systems, and we have the internet and dual core processors and the Playstation 3 and Photoshop, most of Brooks’ critiques are just as valid today as they were then. He begins his book by comparing software engineering to the tar pits of bygone eras, trapping powerful dinosaurs and sabre-tooth tigers, sinking them, struggling with all their awesome might, into the pit. If you want to understand software – or even how to manage extremely complex projects – I can’t recommend the book to you strongly enough. Here will follow some things I got out of both the book and my experiences, pulling from my often-inaccurate memory.

Here, in a nutshell, are the problems with software engineering:

1. You must be perfect. You cannot be almost-perfect or leave a few things ambiguous. Nothing is ambiguous because everything must become some series of zeros and ones for the processor to run. Every line you write, every bit that is compiled, every flip of some switch deep inside the computer’s memory bank must be perfect. If it isn’t perfect, maybe it’ll work right most of the time. And maybe sometimes it’ll crash terribly and destroy all your data. Human beings are not accommodated to working perfectly and without flaw. In fact, sometimes it is the imperfections – the noticeable paintbrush strokes, the symmetrical dimple, the beauty spot, the awkward laugh – that we find charming. We adapt to imperfections and interpret them. You do not have such wiggle room in programming a machine. A computer does not interpret; it is a dumb machine that does exactly what you tell it to. A slight mistake causes significant consequences.

2. You must be perfect in continually unique tasks. When you are laying a building, you have simple repetitive tasks that must be done. All must be done well – laying the foundation, constructing support beams, laying bricks – but these are a couple of tasks repeated thousands of times. Laying the second brick is not a different task than laying the first brick, it is just another brick. After several hundred, you become better at them and become a bricklaying expert. There is no such analogue in computer programming. If a programmer finds himself writing the same piece of code, what he does is separate that task into its own subroutine, and whenever he needs it done, he makes a call to that one task. This was the whole point of a computer – if you define how to do something once, you don’t have to define it ever again, and the computer will do it over and over again for you. What this means is when you are making a program, you don’t have the repetition of laying a thousand bricks; once you’ve figured out how to lay a brick, you define the steps needed to lay one brick and then just make a subroutine call to do that every time you find yourself needing to lay a brick. You don’t see the problem of bricklaying ever again (unless you find out you did it wrong and need to modify it). This means that when a programmer is writing a set of tasks, almost everything is unique. There are generally not repetitious programming tasks which must be done over and over again, everything is approached afresh, defined, and then submitted to some library of common tasks. And each task must be done perfectly.

3. Reading code is much more difficult than writing it. It is very difficult to explain this to someone without the experience of working on a software project. Programming and coding are not easily-visualized disciplines. In fact, there is nothing inherently visual about them at all, regardless of how many flow-charts you may want to make. A programmer goes from a pure (or vague) algorithm in his head straight to a list of concrete instructions. These are not lists like “Pick up milk at the grocery store” – but rather explicit instructions about memory structures and how to process those memory structures. Again, these are not visual and beyond a certain level of complexity cannot be described comprehensively with any two-dimensional visual aid. When it’s all at the front of your mind, and you’re seeing the math of how it works, it’s relatively straightforward to define. However, unless you are extremely strict about writing down why you’re doing everything as you do it, you can go back to these mathematical definitions of how to move memory around and ask yourself what on earth did I do. And if it is hard, a month or two down the line to interpret what you yourself did, it is far more difficult to interpret what someone else did. And if you are on a large software project, you will have to look at and fix problems in other people’s code. If you fail to interpret precisely what they were trying to do, you are likely to introduce further problems. I assure you there are lines in Windows code that no one any longer knows what they’re there for. But if you remove them, the product breaks. This is why software projects tend to get larger and larger, and never smaller – no one knows what the “legacy code” is (that’s what we call this old code nobody knows what it does anymore but it’s somehow necessary) or how to fix it.

4. On large-scale projects, you have many external dependencies. It doesn’t sound so bad if you have to rely on someone else to do their job, but remember from 1) and 2) above that all these jobs must be done absolutely perfectly. I promise you, no matter how great a company is, not everyone there will write perfect code. Any given software engineer writes code that other people rely on and he has to rely on code written by other people. Consider Jim, who’s in a team of people writing the task that renders images when you double-click on an image file. Jim has to rely on code written by people working in the file system, code which takes something like a filename and gives him back the series of zeros and ones which he will eventually make into an image. If there’s anything wrong in the file system code, Jim’s code will not work. Jim’s code also relies on the code that makes a window with the little ‘x’ in the corner and file drop-down menu, and if there’s anything wrong there, Jim’s code will not work. And so on for other tasks which determine things like the monitor size, what kind of monitor it is, what the color scheme on the computer is, and so forth. And this is all before Jim even gets down to brass tax. If those teams have failed, Jim is going to be behind schedule (and quite possibly harassed by upper management for being behind). After that, Jim has to figure out his part of the code – determining what kind of image file it is, then processing it, then displaying it. Once Jim’s written this code, it may be called into by other people – the file system folks may then again re-use his code to display a preview image, or another program may want to show an image in the same way and re-use Jim’s code to do that. And if those people find problems in Jim’s code (or if they try to use it in a way Jim didn’t anticipate), then their code will fail and Jim will have to fix what he did. Every single one of these literally dozens of dependencies for something as simple as displaying an image on-screen is an opportunity for something to go wrong, for a bug to creep in, or for communication to fail between people and between teams. And if the product ships with any problem left unfound or unfixed, it is left for people who come along later trying to use the product as a start point for a bigger project to discover a work-around for the less-than-perfect product.

Issues 1 & 2 (and to some extent, 3) above are about programming anything – whether in a group or solo. Because perfection is required, fixing a problem in code – or as we say, fixing a bug – has a law of diminishing returns. Every time you try and fix an imperfect piece of code (and remember, it may be imperfect because something you are depending on is imperfect), you have some probability of introducing another imperfection, and possibly a devastating one. The larger and more incomprehensible a programming project becomes, the more difficult it is not to introduce a new bug. Although this is true for individual projects, it is especially true when more than a handful people are working on the same product. This is why large-scale programming products begin limiting the number of fixes they will make before the product ships – because every time you “fix” something you have some probability (dependent upon the complexity of the code and the thoroughness of your engineers) of breaking something else.

Issues 3 & 4 are specifically about large-scale team projects. Issue 3 – the difficulty of interpreting code – is why once you have a product, parts of it remain unchanged for very long periods of time, even if everyone recognizes that they are buggy or need to be changed. It is just too difficult to interpret exactly what something is doing and why it is there. And 4 simply exponentiates the problems of 1, 2, and 3, because every new dependency is an opportunity for a schedule to fall behind, communication or interpretation to break down, or for a bug to be introduced.

Although all these problems are, I think, part of the nature of software, they can be mitigated with good practices. I have not seen very many good practices put into practice, but in theory they could be. To avoid the problems of imperfection, rigorous testing can be demanded for every task in a program, on top of rigorously-defined functionality for each task. In most places I have been, a lot of code has been written before the programmer had a clear idea of what it was needed for. Although planning for the product as a whole is always undertaken, planning for each step and each piece is needed as well. Up-front planning is expensive, but in the end it will create better software, and make it easier to read code (if each piece has a rigorous definition). Likewise, testing is usually done from a high-level perspective, but if every task – every entry and exit point of every function – were tested for completion and correctness, this could cut down substantially on imperfections that creep into software. Again, the reasons this is not done is because doing so is very time-expensive, but a failure to do so just increases end-of-cycle testing and the scope and number of bugs in a product. And the final, and I think one of the most significant issues – cross-dependencies on large-scale products – can only be gotten around by clearly defining interchangeable parts to a programming product. The industrial revolution turned on the concept of interchangeable parts – the firing piece of one musket was the same as another, because all the pieces that touched other bits of a rifle were built to a particular specification. Computing has yet to catch up with this concept. I have yet to work on a project where low-level internal interfaces were clearly defined. On the level of the product as a whole, inputs and outputs to a program are clearly and rigorously defined. However, inputs and outputs from one programmer’s code to another programmer’s code are not defined at all but rather vaguely and sloppily hashed out as we go along. This is why the guts of software often look to me like a plate of spaghetti; if there were a more clearly architected inside to a product, I think this would help tremendously with all of the problems of software – bugginess, late ship schedules, difficult maintenance, and so on.

There is one final issue which exacerbates all the above problems, although it is not an issue of programming but of capitalism. Although I am attempting to make the case in the above, that it is much more difficult to make functional software than it is to make a functional building or a functional piece of hardware, in one sense software is much easier than any of these: software can be changed, and distributed, on the fly. Once you build a building, to modify it you typically have to shut it down, move people in, and spend days or weeks or even months retooling it. In software, it is a button on a keyboard that changes these. It is a few hours to recompile the program and then you can just update a released product with a patch online. Software is by its nature ephemeral. From a venture capitalist point of view, because software can be changed quickly, the investment input is minimal compared with other ventures. It’s because investment is small and turnaround time is quick that we saw things like the dot-com bubble. In many ways, software is a sort of venture capital wet dream. It’s cheap and changes fast. Everyone can get rich quickly (that’s the theory, if not the reality). This impulse toward capitalist ephemerality works against the necessity of software to be written perfectly. Perfection takes time, and when near-perfection can be done quickly to the siren-song of a million potential dollars, the time to make software air-tight, or even to perform well, is rarely taken. That will put you behind-market! And so we get buggy, better-than-nothing software offered up by the marketplace.

Welcome to software. I have no easily-implemented solutions to the above, and any solutions I do have conflict with the drive to market.

Why does Software Suck? Part I Sunday, January 3, 2010 at 11:01 pm

Anyone who has used a computer for any length of time has seen it. The program suddenly loses data, it goes slowly for no reason at all, it freezes, your operating system crashes. If you are on a Windows, this can be met with useful messages like “A fatal exception 0E has occurred at 002D:4C21000E” graced with a gentle blue background. Thank you, Windows. (Although newer versions try to avoid showing you the infamous blue screens of death). On a Macintosh, OS X crashes by giving you a little translucent pane in gray with the words “You need to restart your computer” in four languages. Contrary to popular belief, crashes are not more pleasant with beveled edges. Thank you, Apple.

Why does this happen? The personal computer market started in the 1970s. It is now the year 2010. Why haven’t we had more progress in creating reliable systems over the past forty years? The short answer is that we have had progress – vast progress, think back to something even as recent as Windows 95 – but the progress has been slow and halting and there’s no time in the foreseeable future that we will have widely-available multi-purpose computers that do not crash, or that perform uniformly quickly and reliably. A little introduction to computer hardware and computer history is necessary to demonstrate why I believe this. So in part 1 I’m going to explain what all programs generally and the operating system specifically has to do to even get off the ground, and the historical reasons why the machinery we’re using is a mismatch for the tasks we are trying to do; and in part 2 I’m going to go through why programming anything at all correctly is somewhere between extremely difficult and impossible.

Computer hardware was originally designed, and continues to be designed, based on something called the Von Neumann architecture. The quick-and-dirty summary of the Von Neumann architecture is this: there is a piece of hardware which contains space for a set of instructions (we call this a program) which is then sent to a processor that executes all thosetente instructions.* If the program needs to store any information, it can put this in memory (RAM, hard drive). This is how computers have worked since they first appeared, and all in all, it’s a pretty functional system. However, notice something: this system of hardware implicitly assumes you are only running one program at a time. There is space for one set of instructions to be run on one processor. Which works great until you want the machine to do more than one thing at a time – for example, use a text editor and download internet content, or play music and scan for viruses (or a million billion other common tasks).

But computers are fast – this was the whole point of them, performing complex and repetitive mathematical tasks quickly – and it is possible to execute many hundreds of thousands of these instructions sequentially at a blazingly fast rate. So to get more out of them, it would be nice to run multiple programs (in the architecture we’re discussing, these are instruction sets) at once, right? So to get around the single-program structure of the Von Neumann architecture, software engineers came up with something that is basically time-sharing.

Let’s assume you are rich. Maybe you are, I don’t know. Let’s further assume you own a summer home on the beach that you and your wife (or husband) take the whole family there for three months a year. The rest of the year, that real estate is just sitting there, unused but still costing you money. You come up with this great idea: let’s rent it out to other families during the rest of the year. That way it’s still getting used and we’re making up a little bit of the cost for it.

This is exactly how multi-processing (executing multiple programs at once) works. The beach house is your computer’s processing hardware. You (and the other tenants) are the programs that run on it. To execute multiple programs on a piece of hardware that was fundamentally designed to run one program at once, we time-share. The process of switching tenants is called “task switching” – one program is taken off the processor and all its data and everything it’s doing is stored precisely in memory so it can come back on the processor later without knowing anything has happened at all. (Think of Han Solo frozen in carbonite.) Then another program is taken from memory and put on the processor and starts up. This happens many, many times a second.

So everything should be solved right? Not quite. When you time-share a beach house (or computer), you are somewhat at the mercy of the other tenants. You could come back to your beach house and find it totally trashed. Blinds askew, furniture toppled, hairballs and cat fur everywhere. You could be stuck cleaning up a previous tenant’s mess. The same is true for programs that get plopped back on the processor, with one key difference: unlike you and I coming back to our beach house, the program doesn’t know that it has been away. It was just frozen in time, stored, and then restored. It has no way of knowing that someone else was using its house, and usually can’t tell that any time has passed at all. It can’t take a look around because it doesn’t know that anything has changed. So the program is going to continue as if nothing has happened, and if something has happened – if a piece of memory it had assumed was one thing was accidentally changed by another program, for example – well, that’s when you get strange behavior and program crashes.

This brings us up roughly to the Windows 95 era. This is when you would select Start > Shut Down and there would come up a screen saying “It is now safe to turn off your computer.” And everyone recommended you to restart your machine every day. Why? Well your computer was only the one beach house and after having all those tenants in it it was impossible to assure everyone that the place was just like they expected it to be. So it was not uncommon for programs to tread on other programs’ toes, so to speak. Best to just reboot the whole thing so you know where everything is.

The computer operating system was originally a program that was designed to provide support to other programs – a kind of library of common operations. Do you need to draw something on the screen? Do you need to find out what the time and date is? Do you need to write letters to the screen and read from the keyboard? The operating system can help with that! The operating system would also help boot up your computer and allow you to navigate around the file system. As we moved more and more toward multi-processing, there was another place the operating system could obviously help with: keeping processors separate so they didn’t interfere with each other. And this is just what was developed. The system is called “virtual memory” – and while it’s not important to get into the nitty-gritty, it’s basically carving up the time-shared house into different rooms for each program to live in. Although a program has full control of the processor when it’s running on the processor, in order to access storage, it now has to go through the operating system – and what the operating system does is it lies. The program thinks it’s accessing one place, but the operating system actually keeps a separate copy of every location for every program so they can’t interfere with each other. In fact, there is no way they can access each other’s storage, even accidentally. The operating system is the tidy butler keeping every tenant separate so that no one else has to see their mess. And ideally, none of them will realize that anyone else is ever there.

This seems really great, but where this opens pandora’s box is when it comes to what computer programmers call “threading.” Threading is getting a single program to create several copies (or forks) of itself. Why on earth would you do this? As programs have become more complex, it has become obvious that not only do you want multiple programs to run simultaneously, but you want a single program to do different tasks simultaneously – like spell-checking and doing a word-count. It just speeds everything up! Thus, “threading.” Each thread usually has a different job to do (if you work in the corporate world, think on how many things Microsoft Outlook is doing at the same time – checking mail, checking your calendar, looking at a to-do list…). It’s not uncommon for a large program to be running dozens of threads. And remember, these threads are treated just like different programs by the operating system** – so they are taken on and off the processor dozens of times a second. If this seems like it could get complex very quickly, it does – it is easy to have threads lying around that aren’t doing anything, but are taking up time on the processor, or threads that are all waiting on each other to do something and never do anything themselves (thread deadlock). Threads make a conceptual mess very, very quickly. And when looking at how many different processes are having to be taken on and off your processor, threads add up just like programs. The overhead of having to freeze and store all of a program or thread’s information, and then bring another back from memory to start running adds up much more quickly with threads involved.

The supposed answer to all this is multiple processors, but these are a long way from being an ideal solution – or even a workable solution. To some extent, you can run multiple programs better with multiple processors. But the way these have been designed, they are still accessing the same memory, and the hardware infrastructure around the processors was designed for one, not two of them. So they cannot both access memory at the same time. One processor cannot talk to the other very easily, and so running multiple threads from one program across multiple processors is difficult. Currently the biggest advantage to having more than one processor is you have to only do half as many task switches between threads/programs (or one-fourth, if you’ve shelled out a lot of money for one of the quad-cores). Fundamentally, we have taken two single-program processors and glued them to the same bit of memory.

So let’s summarize. The computer you are reading this on bears its internal organs from a machine designed to run one program at a time. Presently it is running multiple programs at a time by taking them on and off its internal brain more quickly than you can perceive. Not only that, but within some of those programs, it is still taking different threads on and off its internal brain, all in the pursuit of the illusion of multitasking. These internals have not been substantially redesigned from the original single-program model; all these things are hacks and small, cumulative modifications to get around it. There’s enough space in all this to drive through truckloads of program crashes and system slow-downs. And this is all just the infrastructure your computer and operating system has to support to run anything useful on top of it. Although there is some hope of things looking up eventually with multiple processors, the way they are designed now does not significantly change this infrastructure.

These are some of the historical reasons we have what we have today – we are not using our computer architecture for what it was originally designed to do, and although we’ve gotten better at it, the more complex workarounds we make for the machine, and the more adjustments we slap onto it, the more likely there is to be some point at which one of them will fail, and the less likely it is that anyone will understand why or where the failure happened.

* These instructions can include conditional statements – this is how we create programs that do different things every time depending on input – and this input can be from a human interacting with a keyboard, from a file, an internal clock, a random number generator, whatever.

** With one exception: threads of a single program will all see the same memory space – that is, they are all given access to the same rooms in the beach house.