Das Tüpfelchen auf dem k.

Proofcheckers can be used to verify the correctness of proofs, which - in turn - can prove the correctness of programs. In theory. In practice, there is a slight problem: All of the modern proofcheckers are implemented in some high-level functional language, and their runtimes tend to be rather complicated. And they are not verified themselves.

There is the common concept of the "trusted codebase": You have to start with something, and the trusted codebase is the amount of program code you have to trust. Unfortunately, trying to keep the trusted codebase small while keeping proofs simple are contrary goals.

Coq addresses this problem by having a small kernel that checks the proofs, and lots of unverified code that produces proofs. This way, proving even complicated theorems becomes bearable. However, this can also result in regressions with newer versions of Coq: Tactics might do things differently in different versions, and automatic variable names might change. The common solution is to try to write proofs that do not depend on those names, or at least explicitly give these names. It is, however, not always possible. In theory, it is possible to export the proofs into a common format which is (mostly) version independent, but doing that is like saving compiled libraries to prevent compiler incompatibilities: Specifications tend to change, and proofs should therefore be maintainable, and maintainable proofs can always be ported to newer versions of Coq.

The situation is not perfect, but at least, no false proofs are accepted - mostly. There have been bugs that allow deriving False, and therefore everything else. This is a common criticism of program verification - how trustworthy can an unsound proofchecker be? But think about what we are actually talking about: People have written an algorithm, have thought about an informal proof ("proof in words") that would otherwise be accepted, and then formalized it for a proofchecker that has a bug which is usually some exploit of an edge case. I think a scenario where this actually leads to a bug in the resulting program code is highly unlikely. I wonder if something like that ever even occured. I think, in conclusion one can say, a verified implementation is usually formally correct, and it is almost impossible for it to not satisfy the specification. And, above all, you do not lose anything: You will have to write code anyway, the process of verifying it will not introduce additional bugs.

That being said, it is – of course – very desirable to have a correct proofchecker. Ideally, it should be metacircular. To make it less prone to bugs, it is desirable to keep it as simple as possible. I tried to implement a simple proofchecker several times, but I always failed because at some point I didn't find it sufficiently simple anymore. In a functional language with algebraic types and automatic garbage collection, it might not be too hard — but also not elementary. OCaml, Standard ML and Haskell have sophisticated runtimes which might be a good thing when writing unverified high-level code, but for my taste, it is rather complicated for a proof checker. So I was thinking about what is needed to write a realistic proof checker.

I think, in the current situation, it is unrealistic to go to a lower abstraction level than C: C is the lingua franca. However, it is probably a good idea not to use C as a "portable assembler", but rather have a clean and obvious programming style rather than focussing on efficiency.

One reasonable way of encoding the several recursive definitions is using structs, unions and enums. It is probably the easiest way, but probably not the most efficient representation, but it should be portable and the resulting code should be readable. However, this results in complicated memory management. Any implementation would probably have lots of allocations of small structs that point to each other. Allocating objects can be done with malloc(3), but usually, malloc(3) is not meant to manage many small objects. A slice allocator on top of malloc(3) would solve that problem. But then again, malloc(3) bases on mmap(2). It would also make the implementation more complex.

Allocating and deallocating objects can be done directly in the code, but as this is done frequently, it is probably not a good idea to always do this explicitly. A copying garbage collector would probably be the best for this situation, and a simple copying garbage collector is probably bearable – however, it would increase the trusted codebase. Using a reference counter would also be sufficient, and one could simplify it a lot using __attribute__((__cleanup__())) from GCC. Still, all of those possibilities – including manual memory management – are rather complicated

The question is whether deallocating objects at all is necessary. As The Old New Thing points out, the null garbage collector is correct. A proof checker gets axioms and already proved theorems, a conclusion and a proof as input, and its only objective is to check whether this proof is correct, and maybe return an error message that indicates the error. A simple stack-oriented language that is sent through stdio should be sufficient, and therefore, it might be sufficient to never actually deallocate. This would be the UNIX way, probably: Do one thing and do it right. If it occurs that not enough memory is available for some proof, one could preload libgc.

At some point, the question is whether all of this is worthwile at all. One could just use C++ and std::shared_ptr and be happy. But then, the additional complexity of the C++ runtime arises. Well, LEAN appears to be written in C++, but I don't know about its memory management.

To this point, it is questionable whether it is really worthwile to try to write a typechecker only using low-level programming techniques, rather than relying on well-tested abstractions. Besides the already mentioned small likelihood of a critical bug in typecheckers that leads to wrong proofs of algorithms, even in the presence of such a bug, it would only have to be fixed once, so after 30 years without critical bugs, one could be pretty sure that everything works fine.

Of course, this is not the only reason for wanting an implementation of a proof checker in C. All the Systems appear to try to create an own programming language. I would rather like to have a library which I can use in the language of my choice. On the other hand, Coq has a way of calling external programs to provide proofs, which is in many cases equivalent to such a library.