Der Ente ist ein Fluss.

It seems like in unix-like systems, the default way of executing another process is using fork(2) and execve(2), and if you want to redirect the input and output, using dup2(2). As far as I read, fork(2) creates a virtual copy of the process image, that is, it mapps the pages of the original process in a copy-on-write mode.

Now, the man page of malloc(3) sais

By default, Linux follows an optimistic memory allocation strategy.  This means that when malloc() returns non-NULL there is no guarantee that the memory really is available.  This is a really bad bug.  In case it turns out that the system is out of memory,  one  or  more  processes  will be killed by the infamous OOM killer.  In case Linux is employed under circumstances where it would be less desirable to suddenly lose some randomly picked processes, and moreover the kernel version is sufficiently recent, one can switch off this overcommitting behavior using a command like:

# echo 2 > /proc/sys/vm/overcommit_memory


This behavior seems to also affect fork(2): When forking a huge process, even though there is no need to duplicate any pages, it will not be allowed, because the kernel must not overcommit. However, I do not really like the idea of having an OOM killer that randomly kills processes if there happens to be less space then necessary. So if one writes software, normally one should have only a small process that forks. On the other hand, sometimes one just wants to execute a little external executable, which is why one probably does not want two processes per default, and fork(2), execve(2) and dup2(2) are a bad choice since they need to have about twice as much available memory as the forking process.

I have heard of this problem the first time when somebody tried to use Runtime.exec in Java, and it failed, because he turned off overcommitting, and the jvm apparently used fork(2) internally - hence, the created process is as large as the whole jvm. Googling around a bit, I found that this problem is known and will probably be changed in further jvm versions.

I wondered about the alternatives, and there seems to be an alternative, called posix_spawn(p), which executes an external command without forking.

The problem is that posix_spawn(p) is not a linux-syscall, and in theory, it could be implemented via fork(2), since POSIX specifies interfaces rather than implementations. So I tried a little test. By

# echo 0 > /proc/sys/vm/overcommit_memory

I can ensure that I have the default behaviour of Linux. The following code works:

#include <malloc.h>
#include <stdio.h>
#include <sys/types.h>
#include <unistd.h>

int main (void) {
  volatile void *bla = malloc (1024*1024*1024);
  int f = fork ();
  printf("Pid: %d\nFork: %d\n---\n", getpid(), f);
  scanf("scanf");
}


and has an output like:

Pid: 28875
Fork: 28876
---
Pid: 28876
Fork: 0
---


Now when I turn off overcommiting by

# echo 2 > /proc/sys/vm/overcommit_memory

and run the same again, it fails:

Pid: 29014
Fork: -1
---

fork(2) returns -1, which means that it was not successful, as expected. Of course, therefore, I cannot execute an external process, because one of both processes would have to run execve(2). Now, I have written code that creates a new process using posix_spawn(p). It runs sleep(1), and waits for it. If I tried to do this using fork(2), execve(2) and dup2(2), I would run into exactly that problem. However, the following code works without overcommiting:

#include <malloc.h>
#include <unistd.h>
#include <spawn.h>
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <sys/types.h>
#include <string.h>

int main (void) {
  volatile void *bla = malloc (1024*1024*1024);
  int status;
  int pid;

  char* spawnedArgs[] = { "/bin/sleep", "2d", NULL };
 
  int f = posix_spawnp(&pid, spawnedArgs[0], NULL, NULL, spawnedArgs, NULL);

  printf("Pid: %d\nposix_spawn: %d\npid: %d\n---\n", getpid(), f, pid);
  scanf("scanf");

  wait(&status);
}

and produces an output like

Pid: 29075
posix_spawn: 0
pid: 29076
---


Nice. However, when it comes to controlling the input and output of the created process, dup2(2) is not enough. One must use posix_spawn_file_actions_t for that. The following code was created by me and Matthias Benkard, and does exactly this.

UPDATE: Removed a bug found by the commentor "dothebart", thank you!

The data structure saves actions that need to be done on the file handles. I dont want to give a deeper introduction into the functions used, they have good manpages, but I think a working example is a good starting point for understanding them anyway.

#include <spawn.h>
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <string.h>

int main(int argc, char **argv) {
  int out[2];
  int in[2];
  int pid;
  posix_spawn_file_actions_t action;
  char* spawnedArgs[] = { "/bin/cat", NULL };
  int status;
  char r[1024];
  bzero(r, 1024);

  pipe(out);
  pipe(in);

  posix_spawn_file_actions_init(&action);
  posix_spawn_file_actions_adddup2(&action, out[0], 0);
  posix_spawn_file_actions_addclose(&action, out[1]);

  posix_spawn_file_actions_adddup2(&action, in[1], 1);
  posix_spawn_file_actions_addclose(&action, in[0]);
 
  posix_spawnp(&pid, spawnedArgs[0], &action, NULL, spawnedArgs, NULL);

  close(out[0]);
  close(in[1]);

  write(out[1], "Hallo!", strlen("Hallo!"));
  close(out[1]);


  read(in[0], r, 1024);
  printf("Read data: \"%s\"\n", r);

  wait(&status);
  posix_spawn_file_actions_destroy(&action);

  return EXIT_SUCCESS;
}


The function calls "/bin/cat", which reads from the stdin and writes to the stdout. We are taking these streams and send "Hallo!" to the input, and read the output. And in fact, its output is

Read data: "Hallo!"

Of course, if I had more data to send, this program could run into a deadlock, since the write-call would block if the buffers are full. This can be solved in the usual ways, by asynchronous I/O or by multithreading. But for this example, it is better to use this simpler method.