Sometimes it is a great achievement just to get through the day.

Reasons why software sucks
Fri, 20 May 2011 10:00:00 GMT

All software sucks. Somehow. Or at least most of it. I was wondering why. Well, I do not have that much expierience yet, but of course, I can still think about it. I like to point out a few of the views I have got with my limited expierience, feel free to comment with corrections if I am wrong.

Usually, I think that software complexity must be justified: Software that does much may be more complicated than software that does essentially nothing. Therefore, I think that SSH is good software, as it is versatile and the complexity is comparably low, X11 is neutral, as it is very complex but at least can do a lot of things, while Nepomuk is bad, in my opinion, since I see its complexity, but I do not see why it is useful at all, except for a bit meta-data management of files (and as a buzzword throwing machine).

So in theory, everybody could write software that is only as complicated as it needs to be - whatever "needs" means in this case.

How complicated does a software need to be - a question quite a lot of people have argued about, and for which the worse-is-better-philosophy may be an answer, and unfortunately, it turns out that it is the paramount philosophy for most programmers, in the end.

For free programmers, it is a natural principle: Free software usually comes from companies that do not expect to earn much money with its development anymore and therefore release it to the public, or from programmers that want to solve a certain problem. And these problems are mostly flippant, without any deeper meaning for the rest of the world. Usually, it is not to make something other projects can rely on, but to make something that works as fast as possible for the moment - or sometimes just because somebody wants to show his hacker-skills.

One example which makes that clear is the plugin situation of Firefox: There is Gnash and Swfdec, trying to become an alternative to Flash. As well Gnash as Swfdec can play YouTube-Videos very well. In no way can they replace the real Flash player, but at least for the special purpose of watching YouTube-Videos, they can - but who cares, if you do not want flash, just use youtube-dl to watch them. On the other hand, I do not know of any single free implementation of a Java Applet Plugin: There is one together with GCJ, but since in GCJ nobody cared about security, besides crashing often, there is no security concept behind this applet plugin. And even worse, plugins for mplayer and vlc and xine are unusable, which is why I mostly do not install them at all. There is a lack of interest in developing these plugins.

But also what was said to become the next-generation-replacement of Flash, namely SVG, will never spread, because there has never been reasonable support, basically. And with WebGL being deactivated even in Linux-Firefox by default, the dominance of Flash will remain a looooong time, I think.

Another example I have to feel right now is the remote-desktop-solution NX. Actually, from the graphical perspective, RDP and VNC and even X11 are good enough for vitally everything that can be done with NX. The notable part about NX is the integrated sound- and samba-forwarding, which is integrated into the NX Client which also runs properly under Windows. This is, in my opinion, the main advantage of NX. But the free implementations NeatX and FreeNX lack of this support somehow, FreeNX supports it in theory, but it is impossible to configure if you need something non-standard.

Well, most of the existing software seems to have this problem. But of course, there are exceptions. Sometimes people see larger problems and are willing to try to solve them - which often leads to a worse problem, namely hundreds of reimplementations of the same problematic peace of software, but seldom there evolves a real solution. Why is that?

Again, let me give you an example. I am writing at Jump and Run games for 6 years now, the most recent immanation being Uxul World (which is likely to get finished this year, if some other things will not fail). Actually, I finished some smaller games, but I never released them, except to some friends. One example was a maze-game written in C++, in which simple mazes could be generated using text files. Why did I not release it to the public?

Firstly, it is written in C++ - I do not want people to thing that I usually write code in C++. Secondly, it was too small, and lacked of features: When I showed it to some friends, they all liked it, but they all had suggestions on how to make it better, and unfortunately, these suggestions were nonpoint and some of them were mutually exclusive: One person wanted to make some shooter out of it, like Cyberdogs, another person wanted to add more structural features like doors, switches and teleporters, another person wanted me to make it 3d and use OpenGL instead of SDL (which I was using at that time). Thirdly, a computer scientist who "reviewed" my code on request (at that time I was still mostly using Java, and new to C++) commented on my collision engine that it was way too complicated, and "can probably be used to shoot satellites into space", meaning that my code was hard to understand because it was more accurate than code of that kind usually is.

I simply did not want to write that kind of code: I do not like the concept of worse-is-better in software I actually want to release. But then again, you see people writing a "good" game in half a year, and since you do not cooperate with all of that "experts" telling you to use a pre-assembled library for that, you will not get support at all. And it goes this way for other kinds of software, too - mostly there are either solutions for your problem that other people consider "sufficient" (while you do not), or they do not understand why anybody would want whatever you want to create. So in fact, people are forced to make their software "worse" or impose a lot of additional work to themselves.

Unfortunately, while there are at least some free projects claiming to be "better" than "worse", for commercial programming this principle can never be economically advantageous, at least according to what I have heard from people working in the software industry. Software must be cheap to create, and the cheapest way still seems to be hacking around until the software works - which is what extreme programming essentially is about (except that one usually uses more buzzwords to describe it). Hack it, test it, release it, and hack again.

Especially, in the commercial world, there is no point of taking too much care about backends of programs, as long as the frontends fit for the users; making software more complicated ensures that people who once used it will depend on it: If you keep them dependent of your old software long enough, they will use newer software from you, too, on which they will depend later. Backward compatibility is not that expensive, as The Old New Thing points out in many of its posts.

Ok, it is no secret that the commercial world is absurd in many ways. But also in the scientific world, worse-is-better is a reasonable way of programming. Also scientists do have some pressure, at least bibliometry. And also in science, you do not always rewrite anything, but search for "microsolutions" using Google & co. to come to your solution faster. And above that, science is often interested in proof-of-concept-implementations rather than production-environments.

In any of the three cases, the programmer does a trade: By increasing the complexity of his software, he achieves his goal earlier, and the software spreads faster. And software can get extremely complicated. Take Windows as an example. Or Linux. Or Firefox. Or Mediawiki. Or X11. Projects with a long history. Projects which have grown extremely complicated. Active projects which "work". That is an argument I heard so often now: Implying that it is "good" just because "it works". Using a telephone to mow your lawn will work if you put enough efforts in it. Using a toaster to dry your hair will, too (I tried actually, but I would not recommend it). You can make vitally everything "work" if you take enough effort. The reason why your Windows desktop is so shiny and simple, the reason why your Debian vserver has almost no downtime, the reason why your Mac OS X recognizes your iPad so well, is not because the software is essentially "good", it is because a lot of peope are working hard to make it "work" for you.

The implication from "working" to "good" is often related to something I call "pragma ideology". Often, pragmatism and ideology contradict each other. It sounds obvious that the only reason on which one should choose software is whether it serves its purpose best, and therefore, this "pragmatic view" is chosen as a new ideology, an ideology that ideologically declines every form of ideology.

Instances of such ideology often refuse Lisp and Garbage Collection in general, but PHP, Perl and Python are appreciated since there is so much software written with it. Innovative ideas are seldom appreciated, since new ideas tend not to work out immediately. With this ideology, no real development is possible, and quite a lot of stuff we have today would have never been possiple. The "web" was a very bad idea in the past. Wikipedia was "condemned to disinterest" at a time when there was no article about navel lints. A Professor once told that even such a basic thing as a graphical user interface was seen more as a science fiction anecdote than a real work space in the beginning.

But pragma ideologists do not see this. They see what there is "now", and what is used by them and "works" according to their imagination of what "working" actually means. I always find it interesting to see two pragma ideologists with different opinions talk to each other. Since you cannot be a pragma ideologist without a bit of arrogance, of course, every both of them think that the other's software is crappy, and that he can "prove" this by his "expierience". Well, my expierience tells me that the really expierienced people are generally open to new ideas, but very sceptical about them. The expierienced people can usually tell at least two anecdotes for every new development, one constituting their openness, and one constituting their scepticism. Thus, in my expierience, pragma ideologists are usually not expierienced.

Of course, when having to make and keep a pool of computers or a rack of servers work, a little bit of pragma ideology is necessary to keep the system stringent. And the same holds for larger software projects. But there must be a balance, and expierienced people know this balance. They know when not to block new ideas.

But they usually also know when to do so. Because while pragma ideology is - in my opinion - one cause for very bad software, being too fast replacing old software by new is - in my opinion - also one. I see two major reasons for throwing perfectly working software away.

One reason is the rise of new "better" standards that everybody wants to support.

Imagine you want a simple and free replacement of the proprietary ICQ. Well, having a buddy list and chatting to multiple or single people works pretty well with IRC. So you could adapt IRC for that purpose: It worked well since 1993, but it has one major problem: It does not use XML. Thus, XMPP had to be invented, with a lot of "extensions" almost nobody uses. Who uses Jingle? Who uses file transfers in any other way than they were possible with IRC-DCC?

Imagine you want a language with a large library that is simple to learn, has a mighty object system, with an intermediate bytecode compiler to make the commercefags happy not to have to open their source, which is available on vitally every platform. You could just take a Common Lisp implementation like Clisp, extend it with a JIT-Compiler for its bytecode, extend it with a bit UI pr0n, and deploy it and make everyone happy. But why would you do that, if you can just create a new bytecode with a new interpreter and a programming language basing on C++, keeping enough C++ to confuse people not familiar with it while taking enough C++ away to angry C++ lovers.

Imagine you want a file transfer protocol supporting file locks and meta information. You could extend FTP by a few additional commands like LOCK, SETXATTR and GETXATTR. But you could also put a huge overengineered bunch of XML meta information on top of a HTTP substandard and extend it by a few new methods, and then give it a fancy meaningless name.

Another reason for throwing away working pieces of software is the NIH syndrome. The recent discussion about Unity vs. GNOME in Ubuntu seems like an instance of this to me. But also Flash seems to be an instance - it used to base on Java, but now has its own virtual machine. Also, the only reason why BTRFS is still developed seems to me like an instance of the NIH syndrome.

In fact, it is not always possible or useful to use old software or old standards and base new stuff on them. In the end, many systems evolved by just adding small pieces of features, and after they have grown complex, it may sometimes be better to abandon them, and create something new, basing on the expieriences of the old system. It would be nice if that could happen to X11 finally - it is time for X12! It would be nice if that could happen to the whole bunch of "web standards" (javascript, XML, XHTML, SVG, jpeg, etc.) finally. But still, that means not just creating a new system that is as crappy as the old one, but creating a new one with the expieriences of the old one.

Most of this holds for scientists as well as pragmatists - I do not think that for example some sort of pragma ideology cannot also be found in a scientific setting. So these poins are similar for both classes of software producers. But while they of course have a lot of things in common, there is a reason why I think it is necessary to choose wheter one is a computer scientist or a programmer. It is not that one person cannot be able to be both, and I do not want to imply here that one is worse than the other. It is just that sometimes I get the impression that some people cannot decide which of them they are, and sometimes even in larger projects there might be this problem because some of the programmers are programmers, some are scientists, and some do not know which of both they are. Well, that is at least what I see, and how I explain some flaws in several pieces of software which I saw.

For example, take a look at object system of C++. Compared to Smalltalk and Common Lisp, even Java, it is ludicrous. And since it is so ludicrous, as far as I see in history (well, it was before my time), nobody really used most of the mechanisms it had from other object systems, and nowadays, object oriented programming mainly means putting a few methods (which are mostly just plain functions) into an own namespace - so suddenly, the namespace has become the important part, and thusly, some people get confused about what Common Lisp considers as a "class".

Looking at Linux device files in the /dev directory, one notices that block devices and character devices can usually be accessed by the default libc functions, as if they were files. So whatever /dev contains is an abstraction away from the hardware, which is sufficient for most purposes, but of course not for all purposes. Now one might expect that for example NFS or Samba will be able to export a device file as well. And in fact, they do, but they do not export it as the file it appears to be on the actual computer - they export it as an actual device file - which means that it gets a major and minor number, as all device nodes do, and it will then become an actual device pointing to the client. That is because in the end, the filesystem is nothing but a namespace, and of course, there might be reasons not to export whole disks via NFS (and there are other solutions to do that), and there might be reasons to export device nodes pointing to client devices rather than devices on the NFS server. But in my opinion, the latter is the more low-level way, and should therefore not be the default way. This is because I consider myself as a "scientist" rather than a "programmer" (while actually I am none of both yet). The programmer would say "it does what its specification sais and there are alternatives who can achieve what you want if you really want to do so". The scientist wants an axiomatically reasonable piece of software with no "surprises".

Another thing I hear very often is a mixup of software certification vs. software verification vs. whatever else. There is a process called software verification in which you usually run your software through a lot of test cases, as you would do with hardware. This is reasonable, as long as you think about your code before you test it, and not just after some tests failed. Then there is formal software verification, something that should be done whenever possible (and is almost never done). And then there is certification - which means, as far as I saw, that some company looks at the software and gives its OK. These are three concepts that are essentially different approaches to similar problems, and there seems to be a lot of confusion about which one does what.

Formal verification is still not used widely enough I think, which may be caused by the fact that non scientists usually cannot imagine what "formal verification" is. If you have a specification of what a certain piece of software should do, and you have a piece of software that really does this, then this is provable! There is no exception. I have heard so much opinions on that topic yet, but this is not a matter of opinion, this is a fact, as long as you accept that the computer is a machine that works in a deterministic way, and of course as long as you assume that the hardware complies to its specifications - if you do not do so, then you will have no way of creating software complying any specification anyway! Modern computers may react on temperature changes, brightness and the current gravitation vector, which are non deterministic, but still, your computer reacts in a deterministic way on their inputs! If you cannot tell how your software reacts on them, your software is crap, and as soon as you can, you can prove its behaviour. Again, this is not a matter of opinion, this is a fact. There is currently no widely used verified operating system and therefore no way of using an actual formal proof checker to check whether your reasonings on your software are correct, but formal verification can as well be done on the paper: Can you print out your code and reason on its correctness with pen and paper? If you cannot, then you probably do not know how and why your software works, it is as simple as that.

But formal verification will not solve all problems as well, even though some theorists think so. Even though the hardware does what its specification sais, correctness of your program may not be enough, since correctness just sais that your software does what you specified. With formal reasoning, you can eliminate common bugs you know of, by specifying that they do not occur and then proving this specification. This has the major advantage that common bugs will probably only occur finitely often until every piece of software extends its specification. But there is still the problem whether the specification is really what you wanted to have. For example, some XSS-exploits did not do anything outside the common standards, they would have worked in a perfectly verified browser, they were mainly exploiting the fact that in the past, JavaScript was not used in the way it is now. XSS-exploits are a major problems, since there is no real formal way to solve them inside the browser, since the browser's entelechy is to run the scripts given by websites, and formally therefore, the several web interfaces would have to be verified - which is neither realistic, nor does it solve the general problem: Not all bugs are bugs outside the specification. In addition to that, there is software for OCR or handwriting or other pattern recognition which basically cannot be verified to work correctly from the user's perspective. Thus, testing and informal verification will always be necessary.

Certification is just letting a company do the work, probably imposing the responsibility for problems on that company. This solves no problems a computer scientist should care about, it may solve problems for smaller companies that need some kind of insurance that their software will not make their machines burn or something.

Reliability is something very important to software users. Which brings me to the next point: Sometimes it seems like the larger software companies are trying to keep their customers stupid. And in fact, I often see that there is the attitude that "the computer knows best". They should better tell their customers the truth: The computer knows nothing! It is a complicated, sophisticated machine, but it is still a machine. Maybe one day, there will be a strong artificial intelligence, but so far, there are only weak ones, and they may be useful, but they are not reliable!

There is so much software that uses non optional heuristics. Copypasting on modern systems is an example where these heuristics can get annoying: You want to copy some text from a website into your chat client, and it uses some strange formatting that it sort of takes from the piece of text you copied, while you actually wanted only that text. On the other hand, when you use your text editor, and want the actual style information, you will only get that text. These are annoying heuristics - one could educate the user that there are two kinds of pasting, the cleartext and the formatted text, and in fact, that is what Pidgin does, it has an option "paste as text".

Another example that annoyed me more than once now is the XHTML autocorrection of WordPress, which cannot be turned off on the WordPress-hosted blogs - probably for the reason that they do not allow arbitrary content. If it at least then would just disallow any form of XHTML. But it does not, it puts on your code a heuristic that tries to guess whether you are writing HTML or plain text. It sometimes swallows backslashes and quotation marks. It is annoying!

Probably the most annoying thing, which at least can be turned off on most systems, are mouse gestures. I have not seen a single system where they worked for me - neither Mac, nor Linux, nor Windows. But I actually never got the point in them anyway - two clicks versus a stupid gesture ... what is the advantage?

The computer does not know best! He applies heuristics, and heuristics may fail. That is why for unimportant things I accet heuristics, but when it comes to the computer having to decide whether I want to delete a file or open it, this goes too far. In general, I do not like software that is written by programmers who think that nobody wants anything they did not think of.

LaTeX is a good example of such a piece of software. I tried a few times to get deeper into the mechanisms of LaTeX, as there do not seem to exist much people doing it. Well, the more I knew the less I wanted to know. And above that, there is no real community to ask when you do not understand something. As long as you have simple questions like how you get something into the center of the page or how you change the font to the size you need, there are a lot of "experts", but as soon as you want to understand the LaTeX software, there is almost nobody knowing anything. Why should you know something somebody else has already done for you? There is nothing you could want that LaTeX cannot do. And if so, then you do not really want it, you either do not know about typesetting rules or you just want something that nobody is supposed to want and you should rethink it.

Everything that is hardcoded is law! This does not only hold for LaTeX. Hardcoding stuff like library paths or paths of executables is very common, especially in commercial software, but also in kernel modules that require firmware files. With libsmbclient, a library to access SMB shares, you cannot connect to SMB shares on a non standard port, the port is hardcoded. It is hardcoded under Windows, too, well, not quite hardcoded, at least there is one central setting in the registry. Windows XP supports, besides SMB shares, WebDav shares. WebDav bases on HTTP, and quite a lot of secondary HTTP servers are running on a port different from 80, sometimes 8000 or 8080. At least the last time I tried, Windows did not support any port different from 80. Hardcoding stuff that should be configurable is a very annoying problem that unfortunately occurs very often.

Ok, I have named a lot of problems I see. I will also name some solutions I see.

One major soulution to a lot of problems would be formal program verification. Formal verification can be done on the paper as well - there is no excuse of not doing it, just because there is no widely used proof checker out there. You do not need to be a mathematician to do that (though maybe having done simple mathematical proofs may be a good start), you do not need to give a proof in hoare logic. Most mathematical proofs are given in natural language, too. Just try to give a formal reasoning that can be challenged!

Then you always should ask yourself whether you will create a new standard for something that already has a standard, when you write software. If there is one standard, can you just extend it, instead of making something completely new? If you can not, can you make your standard at least similar to the old one? If everybody tries to keep the number of formats to support small, especially not inventing new of them without a reason, then maybe the programmers could focus on software quality rather than portability.

And probably, the most important part would be the education of the users. Software is not like hardware: It is easy to change, replace, re-invent. So the user's decision is far more important. Users should be told the fact that it is their job to command the computer, and not the other way around.

Reasons why software sucksFri, 20 May 2011 10:00:00 GMT

Reasons why software sucks
Fri, 20 May 2011 10:00:00 GMT