scsh-users
[Top] [All Lists]

Threads, forks and file descriptors

To: scsh@zurich.csail.mit.edu
Subject: Threads, forks and file descriptors
From: stktrc <stktrc@yahoo.com>
Date: 02 Nov 2003 17:19:41 +0100
Sender: news <news@sea.gmane.org>
In a recent message I suggested a work-around to a problem presented
by ZHAO Wei, which later on appeared to not be a problem, but just
lack of understanding the system.  The reason I suggested the
work-around was that I thought ZHAO Wei was experiencing a problem
similar to one that I had earlier, for which I had discovered the
suggested work-around.

After realizing ZHAO Wei's problem really wasn't a problem, but a lack
of understanding the system, I decided to go back and track down my
old problem to see if the case was the same there.  I found out it
likely isn't, so I'm presenting it here now.

The program in question was required to read commands on standard
input and at the same time both read and write to a subprocess (with
reads/writes happening without synchronization).  This was done by
spawning a thread responsible for reading from the subprocess which
forked the subprocess with pipes set up appropriately on the stdio
file descriptors.

The problem was that the forked SCSH subprocess seemed to block, and
didn't get to execing the real program supposed to run in the
subprocess.

I now managed to narrow it down to move->fdes blocking, as
demonstrated by the SCSH program below.  The program first spawns a
thread after which it tries to read from standard input.  The spawned
thread forks the process and the child calls to move->fdes to set up
stdin, but here it blocks, and never outputs "not reached" as the
intention is.

A work-around to make the program work as intended is to add a
(sleep 1) immediately after the spawn.

#!/bin/sh
exec scsh -o threads -e main -s "$0" "$@"
!#

(define (main args)
  (spawn
   (lambda ()
     (receive
      (rport wport) (pipe)
      (fork
       (lambda ()
         (display "trying move->fdes") (newline)
         (move->fdes rport 0)
         (display "not reached") (newline))))))
  ;; uncomment next line to make it work
  ;(sleep 1)
  (read-line))

So, why doesn't the program work as intended?  I can only speculate as
I don't understand enough of what is going on.  I'm looking for an
explanation.

I have a wild guess of what is happening though:

The main thread spawns a new thread (which isn't scheduled for
execution yet though) and then blocks on the read on standard input.
Now the spawned thread is scheduled for execution and goes on and
forks the SCSH process.  The child SCSH process starts executing and
calls move->fdes, where it blocks indefinitely.

Why does it block there?  I speculate that when the SCSH process
forked, there was some kind of lock active on file descriptor 0 due to
the other thread attempting to read stdin, which was carried over to
the child process.  But if this is so, the lock isn't relevant in the
child, because there is no other thread in the child which is trying
to read stdin.  In that case, maybe this is a question of deadlock due
to killing threads (during the fork) without giving them a chance to
return allocated resources.

What supports this theory is that by adding the (sleep 1) makes the
program work as intended, explained by that the thread doing the fork
then has a chance to fork before there is any lock on stdin.  Also,
allowing threads to continue in the child, by adding a #t argument to
the fork call as shown below, makes the move->fdes call unblock after
something has been written on stdin.

;         (display "not reached") (newline)) #t))))

What I'd like to know is if preventing this alleged deadlock is the
responsibility of the user, or if SCSH could help here.


<Prev in Thread] Current Thread [Next in Thread>