Interview Questions

I'm trying to exec() a program from my server, and attach my socket's IO to it, but I'm not getting all the data across. Why?

Unix Socket FAQ for Network programming


(Continued from previous question...)

I'm trying to exec() a program from my server, and attach my socket's IO to it, but I'm not getting all the data across. Why?

If the program you are running uses printf(), etc (streams from stdio.h) you have to deal with two buffers. The kernel buffers all socket IO, and this is explained in ``section 2.11''. The second buffer is the one that is causing you grief. This is the stdio buffer, and the problem was well explained by Andrew:

(The short answer to this question is that you want to use a pty rather than a socket; the remainder of this article is an attempt to explain why.)

Firstly, the socket buffer controlled by setsockopt() has absolutly nothing to do with stdio buffering. Setting it to 1 is guaranteed to be the Wrong Thing(tm). Perhaps the following diagram might make things a little clearer:

               Process A                   Process B
           +---------------------+     +---------------------+
           |                     |     |                     |
           |    mainline code    |     |    mainline code    |
           |         |           |     |         ^           |
           |         v           |     |         |           |
           |      fputc()        |     |      fgetc()        |
           |         |           |     |         ^           |
           |         v           |     |         |           |
           |    +-----------+    |     |    +-----------+    |
           |    | stdio     |    |     |    | stdio     |    |
           |    | buffer    |    |     |    | buffer    |    |
           |    +-----------+    |     |    +-----------+    |
           |         |           |     |         ^           |
           |         |           |     |         |           |
           |      write()        |     |       read()        |
           |         |           |     |         |           |
           +-------- | ----------+     +-------- | ----------+
                     |                           |        User space
 ------------|-------------------------- | ---------------------------
                     |                           |       Kernel space
                     v                           |
                +-----------+               +-----------+
                | socket    |               | socket    |
                | buffer    |               | buffer    |
                +-----------+               +-----------+
                     |                           ^
                     v                           |
             (AF- and protocol-          (AF- and protocol-
              dependent code)             dependent code)

Assuming these two processes are communicating with each other (I've deliberately omitted the actual comms mechanisms, which aren't really relevent), you can see that data written by process A to its stdio buffer is completely inaccessible to process B. Only once the decision is made to flush that buffer to the kernel (via write()) can the data actually be delivered to the other process.

The only guaranteed way to affect the buffering within process A is to change the code. However, the default buffering for stdout is controlled by whether the underlying FD refers to a terminal or not; generally, output to terminals is line-buffered, and output to non- terminals (including but not limited to files, pipes, sockets, non-tty devices, etc.) is fully buffered. So the desired effect can usually be achieved by using a pty device; this, for example, is what the 'expect' program does.

Since the stdio buffer (and the FILE structure, and everything else related to stdio) is user-level data, it is not preserved across an exec() call, hence trying to use setvbuf() before the exec is ineffective.

If it's an option, you can use some standalone program that will just run something inside a pty and buffer its input/output. I've seen a package by the name pty.tar.gz that did that; you could search around for it with archie or AltaVista.

Another option (**warning, evil hack**) , if you're on a system that supports this (SunOS, Solaris, Linux ELF do; I don't know about others) is to, on your main program, putenv() the name of a shared executable (*.so) in LD_PRELOAD, and then in that .so redefine some commonly used libc function that the program you're exec'ing is known to use early. There you can 'get control' on the running program, and the first time you get it, do a setbuf(stdout, NULL) on the program's behalf, and then call the original libc function with a dlopen() + dlsym(). And you keep the dlsym() value on a static var, so you can just call that the following times.

(Continued on next question...)

Other Interview Questions