scsh-users
[Top] [All Lists]

Re: suggested features

To: Brent.Benson@mail.csd.harris.com
Subject: Re: suggested features
From: Olin Shivers <shivers@clark.lcs.mit.edu>
Date: Fri, 2 Dec 94 14:23:15 -0500
Cc: scsh@martigny.ai.mit.edu
Reply-to: shivers@mintaka.lcs.mit.edu
Brent-

Thanks for your suggestions. Here are my comments.

   * argc

   I found that I used (length command-line-arguments) at least once in
   most of my scripts.  It would be useful to have a function that
   corresponded to `arg' and `argv' that is used for finding the number
   of arguments to a script.

         (argc [command-line]) => integer

       When no arguments are supplied, `argc' returns the number of
       command line arguments or (length command-line-arguments).  When
       a list of arguments is provided, it returns the length of that
       list.

   A magic variable `argument-count' could be provided instead.

A magic var is out, because of the problem of keeping it in sync with
the command-line-arguments variable. If you side-effect the latter,
the magic var wouldn't be automatically updated. This makes me nervous.

The ARGC function you propose is stupid: LENGTH provides a superset of
functionality, and is already present. But... the stupid thing is the
useful thing here, because as you point out, 
    (length command-line-arguments)
is a lot of typing, where (argc) would read a lot easier.

Creeping featurism... I'll think about it.

   * split
   There should be a set of functions for splitting strings into a list
   of strings.

I got you covered here. I designed and wrote the field-reader routines
last night when I should have been studying for a Mandarin midterm.
I didn't do a record parser that splits things based on regexps; I
specify field delimiters as sets of characters. Do you think the
extra power of regexps would be useful? It's an interesting idea, but
I can't think of any cases where I would want it off the top of my head.
I'm interested to hear your thoughts on this.

The field reader goes with my awk macro. To embed awk in scsh takes
52 lines of code.

         (file->string filename)
         (file->sexp-list filename)
         (file->string-list filename)
         (file->list reader filename)

These are fine ideas, but they are so short I think I'll leave them out. I am
worried about bloating the name-space of the system out -- it's already a
serious kitchen sink. It's easy for the programmer to define the one of these
that he needs for his particular script at the top of the script.

I'm not convinced this is the right decision, but I'll roll with it for now.

   (define (argc . maybe-list)
     (let ((command-line #f))
       (cond
        ((null? maybe-list) 
         (set! command-line command-line-arguments))
        ((null? (cdr maybe-list)) 
         (set! command-line (car maybe-list)))
        (else (error "bad argument to ARGC")))
       (if (not (list? command-line))
           (error "bad command line to ARGC"))
       (length command-line)))

If I may make a suggestion, you don't need to use side-effects in the
above procedure. Try this instead:
   (define (argc . maybe-list)
     (length (if (pair? maybe-list)
                 (if (pair? (cdr maybe-list))
                     (error "Too many args to ARGC" maybe-list)
                     (car maybe-list))
                 command-line-arguments)))

But you can exploit one of scsh's internal utility functions and shorten
this a bit:
   (define (argc . maybe-list)
     (length (optional-arg maybe-list command-line-arguments)))

Thanks for the mail. The regexp-delimited field reader -- I'd like to hear
more about your thoughts on that. I'm appending my interface to this msg
so you can see what I did last night.
    -Olin
-------------------------------------------------------------------------------
Since all of the readers discussed below require the ability to peek
ahead one char in the input stream, they cannot be applied to raw 
integer file descriptors, only Scheme input ports. This is because
Unix doesn't support peeking ahead into input streams.

(field-reader [fdelims f-elide? eor-is-eof? rdelims r-elide?])
    Returns a procedure that reads records with field structure from a port.
    When the reader is applied to an input port (which defaults to the current
    input port), it returns two values: the record and a field vector.
    The record is a string; the field vector is a vector of strings and
    is the record split up into strings at field boundaries.
    When called at eof, returns [eof-object #()].

    For example, if port p is open on /etc/passwd, then
        ((field-reader ":" #f) p)
    returns two values:
        "wandy:3xuncWdpOKhR.:112:22:Wandy Saetan:/users/wandy:/bin/csh"
        #("wandy" "3xuncWdpOKhR." "112" "22" "Wandy Saetan" "/users/wandy"
                  "/bin/csh")

    - FDELIMS is either a charset, string, char, or char predicate.
      It is coerced to a charset and defaults to the set {space, tab}.
    - F-ELIDE? defaults to #t.
    - RDELIMS is similar to FDELIMS, and defaults to the set {newline}.
    - R-ELIDE defaults to #f.

    The F-ELIDE? and R-ELIDE? values determine whether or not
    contiguous sequences of field or record delimiting characters
    are considered a single delimiter. If F-ELIDE is #f, for
    instance, then five field-delimiter characters in a row will
    produce four empty-string fields.

    A field reader consumes the record-delimiting char(s) before returning
    (so there's no way to distinguish a record terminated by eof and one
    terminated by a delimiter).

    EOR-IS-EOF? determines if the end of the record is also an implicit end
    of field. It can be #f, 'OPTIONAL, or some other true value, defaulting
    to OPTIONAL. If OPTIONAL, then if the record doesn't end with an 
    end-of-field character, an end-of-field is assumed anyway. <Give
    a little table showing possibilities here.>

(record-reader [record-delims record-elide?])
    Returns a procedure that reads records from a port. A record is
    a sequence of characters terminated by one of the characters
    in RECORD-DELIMS or eof. If RECORD-ELIDE? is true, then a contiguous
    sequence of delimiter chars are taken as a single record delimiter.
    If RECORD-ELIDE? is false, then a delimiter char coming immediately after
    a delimiter char produces an empty string record. The reader consumes
    the delimiting char(s) before returning from a read (so there's no
    way to distinguish a record terminated by eof and one terminated by
    a delimiter).

    RECORD-DELIMS defaults to the set {newline}. It may be a charset,
    string, character, or character predicate, and is coerced to a charset.
    RECORD-ELIDE? defaults to #f.

    The reader procedure returned takes one optional argument, the port
    from which to read, which defaults to the current input port. It returns
    a string or eof.

(read-delimited char-set [port]) -> string or eof
    Read until we encounter one of the chars in CHAR-SET or eof.
    The terminating character is not included in the string returned,
    nor is it removed from the input stream; the next input operation will 
    encounter it. If we get a string back, then (eof-object? (peek-char)) 
    tells if the string was terminated by a delimiter or eof.

    CHAR-SET may be a charset, a string, a character, or a character
    predicate; it is coerced to a charset.

    This operation is likely to be implemented very efficiently. In
    the Scheme 48 implementation, the Unix port case is implemented directly
    in C, and is much faster than the equivalent operation performed
    in Scheme with PEEK-CHAR and READ-CHAR.

(read-delimited! char-set buf [port start end]) -> nchars or eof or #f
    Variant of READ-DELIMITED.
    #f means buffer filled up without encountering delimiter or eof.
    If an integer is returned, then (eof-object (peek-char port))
    tells if the string was terminated by a delimiter or eof.

((field-parser [fdelims f-elide? eos-is-eof?]) str [start end]) -> str vector
    FDELIMS is either a charset, string, char, or char predicate.
    It is coerced to a charset and defaults to the set {space, tab}.
    F-ELIDE? defaults to #t.
    EOS-is-EOF? determines whether or not the end of string is treated
      as an end-of-field marker. It defaults to 'OPTIONAL.

Note: be careful when writing this to handle case where
FDELIMS and RDELIMS overlap in a sensible way.

-------------------------------------------------------------------------------
Old text; reshape into eor-is-eos? table.

    The field delimiter and record delimiter sets are allowed to overlap.
    This can be useful -- making the record delimiters also be field 
    delimiters effectively adds a field delimiter to the end of the
    record. This makes the other field delimiters act as field *separators*
    not as field *terminators*. The following examples will illustrate
    the difference. In the second column, the field delimiter (colon)
    acts as a field terminator. In the third column, colon doesn't terminate
    fields, it separates them.

    Record              : terminates            : and EOR terminate fields
    ------------------------------------------------------------
    ""                  #()                     #("")
    ":"                 #("")                   #("" "")
    "foo:bar"           ERROR                   #("foo" "bar")

    But what if I want to allow for an *optional* colon at the end of the
    line (and I don't want to elide field delimiters)? Then I'm screwed.
    If newline is included in the FDELIMS set, a terminating colon will
    cause an extra empty field to be included at the end of the record;
    the user will have to trim this record when it occurs.
    That's the price of unambiguity.


<Prev in Thread] Current Thread [Next in Thread>