Abenteuer aus tausend und einer Nacht.

In a former post (on my old blog), I already had the idea of using SIGSEGV-handlers to create a crude form of lazy evaluation. However, this was not very usable yet. I meanwhile engineered around a bit, and found out a lot of interesting stuff about memory management, and signal handling in general. Though I do not think that this is a good way of doing lazy evaluation anymore, it was worth learning this much. Still, with the proper kernel support, I think lazy evaluation could be implemented easier, it is just that I do not think that SIGSEGV-handlers are the appropriate way to do this. However, keep in mind that I am not an expert, I might be wrong with what I say, and I am open to constructive comments of any kind.

Also, this post is long and I give a lot of intermediate steps until the final end, since the outcome is not as interesting as the many facts one can learn from it, I think.

So, since in the beginning, I did some assembler-haxxory, I wondered how exactly this signal handling stuff worked. If I had to create a signal handling architecture, I would probably just save the context of the program into a special reserved memory region, and then call the signal handler, leaving everything else to this handler. That is due to the fact that I think it is the best for a kernel to keep quiet as long as possible.

Linux does something else however. It maps a memory region into the process, called the vsyscall-page. It used to be on the next-to-last block in the address space, but meanwhile it is, as far as I read, on a randomized position in memory. There, the current system time is saved, and some other functions which can be called instead of calling a syscall, so this is an optimization.

Now, when a signal handler is called, it pushes a return adress on the stack, such that when the handler returns, the process jumps into this vsyscall-page. That page then calls the sigreturn-syscall (see 5.6@here). Two interesting things related to that topic: Handling signals in assembly, and the list of Linux syscalls.

So far with the theory. Let us dig in a bit deeper. One problem with the approach I had in my former post on this topic was that the actual memory page had to be allocated after the segmentation fault. It was not possible to redirect the process to some other memory region. For this, two things are necessary.

Firstly, every segmentation fault comes from some memory accessing assembler instruction, which dereferences some register. When jumping back from the handler, this instruction is called again, and causes the SIGSEGV to be sent again. We have to set the register to a new value, pointing to the place where the lazily allocated object is saved.

The register state is saved in a ucontext-structure documented in setcontext(2). It contains a non-portable mcontext-structure, which is defined in sys/ucontext.h. It is not even portable across x86_32 and x86_64. It has an array gregs[], in which the register values are saved, and from which they are restored. If we know the dereferenced register, we can therefore set it to something else. With a bit assembler magic, we can therefore already get the following code:

#include <stdio.h>
#include <signal.h>
#include <string.h>
#include <stdlib.h>
#define __USE_GNU
#include <ucontext.h>

// set this to REG_ECX on x86_32
#define myREG REG_RCX

int k = 42;

void handle_segv(int segv, siginfo_t* siginfo, void* ucontext) {
  ucontext_t* uc = (ucontext_t*) ucontext;
  uc->uc_mcontext.gregs[myREG] = (greg_t) &k;
}

void cause_segv (void* ptr) {
  int d;
  // d =  *ptr;
  asm ("movl (%%ecx), %%edx" : "=d"(d) : "c"(ptr));
  printf("%d\n", d);
}

int main (void) {
  struct sigaction q;
  bzero(&q, sizeof(q));
  q.sa_sigaction = handle_segv;
  q.sa_flags = SA_SIGINFO;
  sigaction(11, &q, NULL);
  cause_segv(NULL);
}

The output of this program is "42". And even though the program segfaults, it does not occur in dmesg.

However, we do not want assembly code, if possible. That is, we do not want to assume, that it is the ECX-Register which was dereferenced, but we need to know which register was. Therefore, we use a library for x86 disassembly: libudis86. We just disassemble the code at the instruction pointer in our gregs[]-array, and see what it does. Here is the code we get:

#include <stdio.h>
#include <signal.h>
#include <string.h>
#include <stdlib.h>
#define __USE_GNU
#include <ucontext.h>
#include <udis86.h>

/* This code is only for x86_64 */

int k = 42;

void handle_segv(int segv, siginfo_t* siginfo, void* ucontext) {
  ucontext_t* uc = (ucontext_t*) ucontext;

  /* do disassembly at from IP */
  ud_t ud_obj;
  ud_init(&ud_obj);
  ud_set_mode(&ud_obj, 64);
  ud_set_syntax(&ud_obj, UD_SYN_ATT);
  ud_set_input_buffer(&ud_obj, (unsigned char*) uc->uc_mcontext.gregs[REG_RIP],
		      10);

  /* was disassembly successful? */
  if (!ud_disassemble(&ud_obj)) {
    printf("Disassembly fail!\n");
    exit(-1);
  }

  /* is disassembly a memory-operation? */
  struct ud_operand op0 = ud_obj.operand[0];
  struct ud_operand op1 = ud_obj.operand[1];
  struct ud_operand op;

  if (op0.type == UD_OP_MEM) {
    op = op0;
  } else if (op1.type == UD_OP_MEM) {
    op = op1;
  } else {
    printf("Instruction unknown\n");
    exit(-1);
  }

  /* find out the register - this part is clumsy as we have two sets
     of constants */

  int setreg;
  switch (op.base) {
  case UD_R_RAX:
  case UD_R_EAX:
    setreg = REG_RAX;
    break;
  case UD_R_RCX:
  case UD_R_ECX:
    setreg = REG_RCX;
    break;
  case UD_R_RDX:
  case UD_R_EDX:
    setreg = REG_RDX;
    break;
  case UD_R_RBX:
  case UD_R_EBX:
    setreg = REG_RBX;
    break;
  case UD_R_RSP:
  case UD_R_ESP:
    setreg = REG_RSP;
    break;
  case UD_R_RBP:
  case UD_R_EBP:
    setreg = REG_RBP;
    break;
  case UD_R_RSI:
  case UD_R_ESI:
    setreg = REG_RSI;
    break;
  case UD_R_RDI:
  case UD_R_EDI:
    setreg = REG_RDI;
    break;
  case UD_R_R8:
  case UD_R_R8D:
    setreg = REG_R8;
    break;
  case UD_R_R9:
  case UD_R_R9D:
    setreg = REG_R9;
    break;
  case UD_R_R10:
  case UD_R_R10D:
    setreg = REG_R10;
    break;
  case UD_R_R11:
  case UD_R_R11D:
    setreg = REG_R11;
    break;
  case UD_R_R12:
  case UD_R_R12D:
    setreg = REG_R12;
    break;
  case UD_R_R13:
  case UD_R_R13D:
    setreg = REG_R13;
    break;
  case UD_R_R14:
  case UD_R_R14D:
    setreg = REG_R14;
    break;
  case UD_R_R15:
  case UD_R_R15D:
    setreg = REG_R15;
    break;
  default:
    printf("Register not supported!\n");
    exit(-1);
  }

  /* set the register - as before */
  uc->uc_mcontext.gregs[setreg] = (greg_t) &k;
}

void cause_segv (int* ptr) {
  int d;
  d =  *ptr;
  printf("%d\n", d);
}

int main (void) {
  struct sigaction q;
  bzero(&q, sizeof(q));
  q.sa_sigaction = handle_segv;
  q.sa_flags = SA_SIGINFO;
  sigaction(11, &q, NULL);
  cause_segv(NULL);
}

With this, we can lazily evaluate one object.

Secondly, we need every memory access to be dereferenced twice, so we need a local pool of addresses - so we can do it for more than one object.

We will use Matthias Benkard's implementation of Patricia Trees to achieve a dynamic array from protected pointers to actual functions. We also need a slice allocator. Actually, creating that code was rather sophisticated, and I do not want to post anything as "finished" yet, if you are interested in this most general code, look at my git repository for this project. It defines a function for creating list comprehensions from a function from integers to integers, which may depend on the actual list.

However, that code is not threadsafe. Making it more threadsafe is one of my aims with this project, if I keep working at it.

Having this work so far, of course I do not think that this is a good solution. Segmentation faults are faults, by design, and they should be considered faults. That does not mean that they should not be handled properly (which most software does not do), but they should not be produced on purpose. Rather, the kernel should add another state of memory pages, say "lazy pages", which may be allocated, but on which also a signal is sent, or some other sort of interrupt.

Some other sort of interrupt would be better, since signal handling involves two context switches. In fact, it would be sufficient to just push the state of the thread on the stack, and jump to a handling function, without having to call sigreturn. Handling such an interrupt does not involve anything on the background of the process, so the state can be restored by the handling function itself, at least mostly. Maybe a flag, indicating whether some sort of sigreturn is to be called, could be helpful, for example, if the kernel had to extend the stack or something similar.