scsh-hackers
[Top] [All Lists]

[Scsh-hackers] design review

To: scsh-hackers@lists.sourceforge.net
Subject: [Scsh-hackers] design review
From: shivers@cc.gatech.edu
Date: Wed, 7 Feb 2001 17:52:11 -0500
List-id: Discussion among the implementors <scsh-hackers.lists.sourceforge.net>
Reply-to: shivers@cc.gatech.edu
Sender: scsh-hackers-admin@lists.sourceforge.net
>>> Martin, I have gone over your proposal and added comments and proposals
>>> marked by ">>> " prefixes.
>>>     -Olin

FROM: Martin Gasbichler
DATE: 01/22/2001 06:20:32
SUBJECT:  [Scsh-hackers] 0.6 API

Here comes my proposal for the new stuff in the 0.6 API. Note that it
sometimes doesn't match the CVS tree in which case the proposal is my
last thought. Packages marked with (o) are opened by default.

scsh-events:
-----------

A few weeks ago, I replaced David Fishers implementation of events;
his code was based on placeholders. The problem in his implementation
was, that the RTS deadlocked, if all threads were waiting for an
interrupt since the RTS only saw the blocked placeholders. Now, the
base event system is a part of the RTS itself and the scheduler checks
if there are any threads, waiting for an interrupt, just as it does
for I/O.

>>> I don't understand the problem. Can you explain it again?

; obvious:
(most-recent-event)

; block, until interrupt occurs:
(wait-interrupt int last-event)

; block until one of the interrupts in set occurs:
; interrupt sets are constructed as in 0.5.2
(wait-interrupt-set int-set last-event)

; Same as above, but return if no pending interrupt exists
maybe-wait-interrupt
maybe-wait-interrupt-set

>>> I am not so fond of the "maybe-" prefix to mean non-blocking.
>>> Also, the WAIT-INTERRUPT procedure doesn't always wait. You can
>>> use it to scan and re-scan events that are in the past, if you've
>>> retained a pointer to an old event. 
>>> - Why not merge WAIT-INTERRUPT's functionality into NEXT-EVENT:
>>>   NEXT-EVENT event [filter] -> event
>>> - Then make a NEXT-EVENT/NO-WAIT procedure that returns false
>>>   if you scan off the end of the event chain.
>>> The FILTER parameter is a set of "event classes" (e.g., interrupt codes)
>>> or a general Scheme predicate; I could go either way on that one. A
>>> predicate is Schemeish & general, but the generality prevents you from
>>> putting up threads on a fixed set of event-class queues -- you simply have
>>> to re-execute all predicates of all blocked threads whenever a new event
>>> occurs. That doesn't scale well for lots of threads.

; record, returned by wait-interrupt-X    
event?
next-event
event-type

>>> - What is the range of the EVENT-TYPE function? Is it just Unix async
>>>   interrupts?
>>> - If the only events are Unix async interrupts, then "event" is perhaps
>>>   overly general. Are there other sorts of events? If not, possibly
>>>   change the name to "sigevent"? Do we anticipate ever extending the
>>>   set?

scsh-interrupts: (o)  
---------------

number-of-interrupts

; from 0.5.2
interrupt-set

; extensions to get a useful ADT
(interrupt-in-set? int set)
(insert-interrupt int set)
(remove-interrupt int set)

>>> Urp. Are interrupt sets pure or side-effectable? Pure or "linear update"
>>> would work, I think. In which case, we should use set lexemes from SRFI-1,
>>> such as "adjoin", and "-contains-" instead of "-in-"

; includes all interrupts
full-interrupt-set

; all interrupts
interrupt/...

signal-handler: (o)
--------------

There is a fundamental problem with the interaction of signal-handlers
and the event system: While it is possible to have both of them
(actually sighandlers is built on top of the event system now) the
default actions for signal handlers will normally just kill the
process. For compatibility with old code, the signal handlers should be
turned on by default. Maybe something like

(disable-all-signal-handlers!)

would come in handy. On the other hand, the signal handler for SIGINT
is very useful as it allows you to stop all threads.

interface as in 0.5.2, but without interrupt/


------------------------------------------------------------------------

I'd like to declare select and select! "deprecated" as it doesn't work
well with the thread system. There should be only one select in the
whole system.

>>> Why? A single thread may still wish to attend to multiple i/o
>>> sources/sinks. We just have to provide a "fake" select, just as we provide
>>> "fake" blocking I/O implemented in terms of non-blocking I/O and SIGIO
>>> or scheduler-loop polling.

------------------------------------------------------------------------
network: (o)
-------

The code in 0.6 is built on top of channels. This was necessary to let the
scheduler call other threads if something blocks. 

internet-host-addresses are now represented as byte-vectors. There
exist a few conversion functions:

(number->internet-host-addresse address32) ->bv
(internet-host-address->number bv) -> old representation

(bytes->internet-host-address b4 b3 b2 b1) ->bv
(internet-host-address->-bytes bv) -> (b4 .. b1)

(internet-host-address->dotted-string bv) -> "123.123.123.123"
(dotted-string->internet-host-address string) ->bv

>>> Brian should comment on this. Why do we need to represent IP addresses
>>> with byte vectors? Let us assume we have reasonable bit-ops on ints;
>>> then we can always extract the octets as needed. And note that IP
>>> addresses don't really come in octets; that's just an external
>>> written form. The partitioning into net & subnet & netmask varies at
>>> bit granularity. The dotted-string parser/unparser routines, however,
>>> seem like a nice convenience.
>>>
>>> It would be much shorter and just as clear to replace 
>>> "internet-host-address" with "ip-address", which is also a precise 
>>> technical name for the thing.

crypt:
-----

; Simply calls the C library function and returns its return value.
(crypt key salt)

>>> This was not Posix when I did the very first implementation of scsh. Of
>>> course, neither were symlinks, and I put those in. How portable is crypt?
>>> 
>>> Doesn't FreeBSD complicate matters with the crypt-classic/crypt-MD5
>>> split?

syslog:
------

The Scsh system assigns syslog-ids to every call of openlog. The
syslog-id of the last openlog is recored. If syslog-w/id is called
later and the syslog-id of the last open is not the same as the
argument of syslog-w/id, openlog is called with the values of
syslog-id prior the actual syslog call.

>>> Excellent idea -- another global resource eliminated.

; do openlog, return a syslog-id
(openlog ident [option [facility]]) -> syslog-id

; version without syslog-id for the brave. 
(syslog message [level [facility]])

; call openlog, if current syslog-id is not the given one
(syslog-w/id syslog-id message-id [level [facility]])

(closelog)

>>> - Not very Schemeish names. I propose OPEN-SYSLOG, CLOSE-SYSLOG and SYSLOG.
>>>   CLOSE-SYSLOG returns true if the the syslog was previously open; false if
>>>   it was already closed. Syslogs are also closed by GC.
>>> - Let's not call these things "syslog-ids." Let's call them "syslog
>>>   channels," since each one is a connection to the syslog system.
>>> 
>>> Now we should play the standard game we play with global resources:
>>> turn them into explicit resources, with facilities to allow us to
>>> control the default with dynamic scope. There's a standard set of
>>> facilities and naming conventions one does for these things, common
>>> in architecture across current i/o ports, cwd, umask, and so forth.

>>> That would give us a core facility of the following
>>> (open-syslog ident [option [facility]]) -> syslog-channel
>>> (close-syslog syslog-channel) -> boolean
>>> (syslog-channel? x) -> boolean
>>> (syslog-write string [level [facility [syslog-channel]]]) -> unspecified
>>>     Passing SYSLOG-FACILITY/DEFAULT as the facility for SYSLOG-WRITE
>>>     gets you the facility you specified when you opened the channel.
>>>     Similarly for SYSLOG-LEVEL/DEFAULT. Or maybe allow #f for this case?
>>>
>>> Extension:
>>> (syslog-format syslog-channel level facility fmt-string . params) -> 
>>> unspecified
>>>     Acts like FORMAT.
>>> 
>>> (call/syslog-channel ident option facility proc) -> value(s) of proc.
>>>     Applies proc to the channel, and guarantees to close the channel
>>>     even if you throw out.
>>> 
>>> Dynamic scoping of syslog channels:
>>> 
>>> (with-current-syslog-channel* slchan thunk) -> value(s) of thunk
>>> (with-current-syslog-channel slchan body ...) -> value(s) of thunk
>>>     Introduces new dynamic scope.
>>> 
>>> (current-syslog-channel) -> syslog-channel
>>> (set-current-syslog-channel! slchan) -> unspecified
>>>     Side effect is visible to all who share this dynamic scope.
>>> 
>>> (with-current-syslog-channel* ident option facility thunk) -> value(s) of 
>>> thunk
>>> (with-current-syslog-channel ident option facility body ...) -> value(s) of 
>>> body
>>>     These three close the channel for you if you throw out.
>>>     Err... I don't have good names for these two to distinguish them
>>>     from the simple current-syslog-channel binders. Don't we have
>>>     an analogous case in 0.6 with cwd's, where we have both "cursors" and
>>>     strings that name directories?

; As syslog is not part of any standard, this is an intersection of
; Linux, FreeBSD, AIX, IRIX, HP-UX and Solaris. 

>>> Too bad!
>>> Below I list some alternate names for options. I like names that use
>>> longer, lexemes-separated-with-hyphens Scheme names that are more clear.
>>> This has been a consistent tradition in scsh naming (e.g., see the tty
>>> driver options).

syslog-option/default
syslog-option/cons              >>> syslog-option/console-on-error
syslog-option/ndelay            >>> syslog-option/open-now
syslog-option/pid               >>> syslog-option/include-pid ??? I dunno...

syslog-facility/default
syslog-facility/auth            >>> /authorisation
syslog-facility/daemon
syslog-facility/kern            >>> /kernel
syslog-facility/local0
syslog-facility/local1
syslog-facility/local2
syslog-facility/local3
syslog-facility/local4
syslog-facility/local5
syslog-facility/local6
syslog-facility/local7
syslog-facility/lpr
syslog-facility/mail
syslog-facility/user

syslog-level/default
syslog-level/emerg              >>> /emergency
syslog-level/alert
syslog-level/crit               >>> /critical
syslog-level/err                >>> /error
syslog-level/warning
syslog-level/notice
syslog-level/info
syslog-level/debug

>>> Can y'all explain something to me? Here are (2 of 3 of) the syslog calls
>>> on my Linux man page:
>>> 
>>>        void openlog( char *ident, int option, int  facility)
>>>        void syslog( int priority, char *format, ...)
>>> 
>>> It says that PRIORITY is a "combination" of facility & level. What does
>>> this mean? You OR them or add them together to creat a priority value? And
>>> a 0 facility (i.e., just a level value) means "use the facility passed to
>>> openlog"?


dot-locking:
-----------

Performs an obscure series of open, close, delete and ln,
ending up in a file named filename.lock.

; I'd like to add further locking strategies, all obeying this interface:

(obtain-fs-lock filename)
(maybe-obtain-fs-lock filename)
(release-fs-lock filename)
(with-fs-lock filename body) :syntax

>>> I don't understand what particular problem these functions solve,
>>> and reading the source hasn't helped. Can you explain to me the intended 
>>> use?
>>>
>>> BTW, 
>>> - I think you can simplify this code a little bit:
>>>   (define (create-temp filename)
>>>     (create-temp-file filename))
>>> - the OBTAIN-LOCK loop waits 1000 seconds between tries. ???
>>> - Syntax of the form WITH-foo is conventionally accompanied by
>>>   a procedure with a name like WITH-foo*, which takes a thunk
>>>   where WITH-foo has a body block. So there ought to be a WITH-LOCK*
>>>   to go with WITH-LOCK (and then WITH-LOCK is a one-line macro).
>>>
>>> What properties do we want the locking system to have?
>>> - Locks named in a process-global way in the filesystem namespace?
>>> - Scalable (no polling)
>>> If we don't need #1, we can use a hack involving pipes, where the pipe
>>> either has a single byte in it (unlocked) or no byte (locked). If
>>> we want locks to be visible in the filesystem namespace, for inter-process
>>> coordination, can the FILENAME lock-name name a file we can 
>>> modify/delete/create, or must we *not* modify that file? (E.g., perhaps
>>> we are locking *access* to a perfectly good file?)
>>> Note that testing for the existence of a file requires polling, so the
>>> implement-locks-by-creating-a-file trick doesn't scale, and you can do
>>> it better with named fifos.

libscsh:
-------

Libscsh resembles Scsh as a C-library.  It is intended for applications,
that want to use Scheme/Scsh as their scripting language. This is
vital to our fight against guile.

>>> Woo, cool!

libscsh = scshvm without "main". Call

int s48_main(long heap_size, long stack_size, char *image_name, int argc, 
char** argv)

to fire up Scsh out of your own program. By default, s48_main behaves
just like Scsh itself: it will start a REPL. For batch mode, add the
appropriate switches to argv (e.g. "-c", "-s",...).

>>> - Wait, I don't understand. s48_main() is not who determines if a repl
>>>   happens, it's the *image* that determines this. If I dump out an image
>>>   whose top-level does something else, then that's what happens when I
>>>   fire up the vm w/that image.
>>> - I think we need to export an in-core heap-image or heap data structure.
>>>   Then you could have (1) a function to read a heap image from a file
>>>   into memory, and (2) another function to fire up the vm on a heap.
>>>   One advantage of this is that one could have read-only heap images
>>>   linked into the text segment of the binary, making a standalone
>>>   binary that could call out to scsh quickly.
>>> - Possibly I am asking for something here that requires too much vm 
>>>   hacking. But the s48_main interface above does seem pretty crude.

If you want to add your own C functions to call from Scheme, write
an initialization function as described in external.ps, but instead of
adding this function to EXTERNAL_INITIALIZERS of the Makefile, apply

int s48_add_external_init(void (*init)())

to it. This feature is new in Scsh, Scheme48 doesn't have it. You
have to ensure, that all calls to s48_add_external_init() happen
before you call s48_main.

As the VM uses global variables, it's not possible to start several
Scshs at the same time.

>>> This seems like a mistake that will get us eventually. Is there a way
>>> we can do this, but make the API be such that we leave the path open to
>>> possibly fixing this later?

This is a proposal. _Please_ add your comments.

-- 
Martin


<Prev in Thread] Current Thread [Next in Thread>