scsh-users
[Top] [All Lists]

Zombie processes

To: mb12@coconet.de
Subject: Zombie processes
From: Olin Shivers <shivers@clark.lcs.mit.edu>
Date: Sun, 2 Apr 95 22:54:07 -0400
Cc: scsh-bugs@martigny.ai.mit.edu
Reply-to: shivers@mintaka.lcs.mit.edu
Right. That is because run/string doesn't bother to wait for the subprocess
to terminate; it returns as soon as it reads EOF from the subprocess stdout.
I thing your problem is a good argument for changing the definition of 
run/string to wait for process death.

This is a tricky change, however. What I need to do is redefine the port
returned by run/port so that when when it encounters EOF or is closed, the
port read or close operation does a wait(2) on the subprocess to clean the
zombie out of the process table. This requires getting deep into the port
system, which is a pain.

This is a fundamental flaw with Unix that you are tripping over, by the way --
the fact that you must/can-only wait on a subprocess exactly once.  If PID's
were GC'd data structures instead of integers, then the GC could mediate the
lifetime of zombie process's process table entry. But that is not the Unix
way. I should change scsh to do this, now that I think about it...

As an interim fix, you can do a (wait) call to reap the subprocess from the
kernel's process table after the run/string. Even if it reaps some other
zombie from the process table, a
    (let ((s (run/string (date))))
      (wait)
      s)
will keep the number zombies from increasing.

More precisely, you could use
    (receive (port pid) (run/port+pid (date))
      (let ((s (port->string port)))
        (close port)
        (wait pid)
        s))
instead of (run/string (date)). It is much clumsier, but does precisely
what you want.
    -Olin
-------------------------------------------------------------------------------
    i am using scsh.03 on  'HP-UX hp1 A.09.05 A 9000/720' 
    and when i use 'run/strings' , i get zombie-processes :


   >  (define get-date 
         (lambda () (car (run/strings  (date)))))
   > (get-date)
   "Mon Mar 13 21:48:37 MEZ 1995"
   > (let loop ((i 0)) (if (< i 10) (begin (format #t "date: ~a~%" (get-date)) 
(loop (+ i 1)))))
   date: Mon Mar 13 21:49:57 MEZ 1995
   date: Mon Mar 13 21:49:57 MEZ 1995
   date: Mon Mar 13 21:49:57 MEZ 1995
   date: Mon Mar 13 21:49:58 MEZ 1995
   date: Mon Mar 13 21:49:58 MEZ 1995
   date: Mon Mar 13 21:49:58 MEZ 1995
   date: Mon Mar 13 21:49:58 MEZ 1995
   date: Mon Mar 13 21:49:58 MEZ 1995
   date: Mon Mar 13 21:49:58 MEZ 1995
   date: Mon Mar 13 21:49:58 MEZ 1995
   > (let loop ((i 0)) (if (< i 40)(begin (format #t "date: ~a~%" (get-date)) 
(loop (+ i 1)))))
   date: Mon Mar 13 21:50:19 MEZ 1995
   date: Mon Mar 13 21:50:20 MEZ 1995
   date: Mon Mar 13 21:50:20 MEZ 1995
   date: Mon Mar 13 21:50:20 MEZ 1995
   date: Mon Mar 13 21:50:20 MEZ 1995
   date: Mon Mar 13 21:50:20 MEZ 1995
   date: Mon Mar 13 21:50:20 MEZ 1995
   date: Mon Mar 13 21:50:20 MEZ 1995
   date: Mon Mar 13 21:50:20 MEZ 1995
   date: Mon Mar 13 21:50:20 MEZ 1995
   date: Mon Mar 13 21:50:20 MEZ 1995
   date: Mon Mar 13 21:50:21 MEZ 1995
   date: Mon Mar 13 21:50:21 MEZ 1995
   date: Mon Mar 13 21:50:21 MEZ 1995
   date: Mon Mar 13 21:50:21 MEZ 1995
   date: Mon Mar 13 21:50:21 MEZ 1995
   date: Mon Mar 13 21:50:21 MEZ 1995

   Error: 11
          "No more processes"
          #{Procedure 8148}
   1> 
   > 
   Exit Scheme 48 (y/n)? y
   mb12: (hp1 ~ 532) :

   mb12: (hp1 ~ 513) :scsh
   Scsh 0.3
   > (define get-date 
         (lambda () (car (run/strings  (date)))))
   > (get-date)
   "Mon Mar 13 21:57:44 MEZ 1995"

   ;;
   ;; output of ps:
   ;;
   ;; mb12  1485   469  0 21:57:36 ttyp5    0:00 scsh -o 
/usr/local/lib/scsh/scshv
   ;; mb12  1486  1485  7 21:57:44 ttyp5    0:00 <defunct>   (zombie)  
   ;;
   ;;  



      michael


<Prev in Thread] Current Thread [Next in Thread>
  • Zombie processes, Olin Shivers <=