scsh-users
[Top] [All Lists]

The AWK loop and macro hacking

To: scsh@martigny.ai.mit.edu
Subject: The AWK loop and macro hacking
From: shivers@ai.mit.edu (Olin Shivers)
Date: 01 Apr 1996 03:29:09 -0500
Organization: Artificial Intelligence Lab, MIT
Reply-to: shivers@ai.mit.edu
   From: mwh@gradine.cis.upenn.edu (Michael Hicks)
   Newsgroups: comp.lang.scheme.scsh
   Date: 29 Mar 1996 18:38:32 GMT

   I am currently writing a script in scsh that uses the UNIX cpp output
   to extract #include's from a .c file to create a make utility
   dependency file.  Here are two versions of a function to extract the
   #include's, the first uses "awk", the second doesn't:

   (note: put-nondup-in-list just inserts an element in a list, making
   sure there are no duplicates)

   (define (get-includes port-in)
     (let ((read (field-reader (infix-splitter "[ \t\n\"]"))))
       (awk (read port-in) (line fields) #f ((include-list '()))
            ("^(# 1 \")(.*h)(\")"
             (put-nondup-in-list (nth fields 3) include-list)))))

   (define (get-includes-old port-in)
    (letrec ((get-line (record-reader))
              (get-include (lambda (include-list)
                             (let ((line (get-line port-in)))
                               (if (eof-object? line)
                                   include-list
                                   (let ((match
                                          (string-match  
                                           "^(# 1 \")(.*h)(\")"
                                           line)))
                                     (if (regexp-match? match)
                                       (let ((string (match:substring
                                               match 2)))
                                           (get-include (put-nondup-in-list
                                                                     string
                                                             include-list)))
                                       (get-include include-list))))))))
       (get-include '())))

   I then timed the functions (using the scsh time+ticks syscall)
   and found that for the a 5000 line file, the awk version took
   about 10 seconds and the non-awk took around 4 seconds.  The output
   list ended up being about 10 elements, with only about 8 or so
   matches weeded out by put-nondup ...  I ran this on an SGI multiprocessor.

Well, in the first version, you parse every line into fields *whether or not*
it matches
    # 1 "frotz"
In the second version, you never break the line up into fields at all -- you
just re-use the results of the regexp match. So it is definitely faster.
But it can also be written in three lines of code (see below), if you use the
right features from AWK.

A few other points:
- I don't think you should bother defining GET-LINE, when READ-LINE already
  does what you want.
- The #f in your GET-INCLUDES's AWK form is not necessary. The counter var is
  *optional*.

One could rewrite your second version as the following three-line AWK form:

(define (get-includes-old)
  (awk (get-line port) (line) ((include-list '()))
    ("^(# 1 \")(.*h)(\")" => 
     (lambda (match) (put-nondup-in-list (match:substring match 2))))))

This macro-expands to exactly the code you wrote by hand (barring two
trivial bits of beta-substitution).

Scsh's regexp-matching is, unfortunately, rather slow, because regexps
currently get compiled each time they are used. If I hacked in support
for pre-compiling regexps, regexp-matching loops would speed up.

For the macro-curious, I'd point out three useful bits of S48 advice:
- ,expand <foo>
  Macro-expands <foo> for you.
- ,open pp
  Provides a pretty-printer named p. So
    ,expand (awk ...)
    (p ##)
  will macro-expand and pretty-print an AWK form.
- The emacs command C-M-q in a Scheme buffer will do a better job of
  re-indenting a form for you after you've printed it out with the P
  pretty-printer.
    -Olin

<Prev in Thread] Current Thread [Next in Thread>