From: mwh@gradine.cis.upenn.edu (Michael Hicks)
Newsgroups: comp.lang.scheme.scsh
Date: 29 Mar 1996 18:38:32 GMT
I am currently writing a script in scsh that uses the UNIX cpp output
to extract #include's from a .c file to create a make utility
dependency file. Here are two versions of a function to extract the
#include's, the first uses "awk", the second doesn't:
(note: put-nondup-in-list just inserts an element in a list, making
sure there are no duplicates)
(define (get-includes port-in)
(let ((read (field-reader (infix-splitter "[ \t\n\"]"))))
(awk (read port-in) (line fields) #f ((include-list '()))
("^(# 1 \")(.*h)(\")"
(put-nondup-in-list (nth fields 3) include-list)))))
(define (get-includes-old port-in)
(letrec ((get-line (record-reader))
(get-include (lambda (include-list)
(let ((line (get-line port-in)))
(if (eof-object? line)
include-list
(let ((match
(string-match
"^(# 1 \")(.*h)(\")"
line)))
(if (regexp-match? match)
(let ((string (match:substring
match 2)))
(get-include (put-nondup-in-list
string
include-list)))
(get-include include-list))))))))
(get-include '())))
I then timed the functions (using the scsh time+ticks syscall)
and found that for the a 5000 line file, the awk version took
about 10 seconds and the non-awk took around 4 seconds. The output
list ended up being about 10 elements, with only about 8 or so
matches weeded out by put-nondup ... I ran this on an SGI multiprocessor.
Well, in the first version, you parse every line into fields *whether or not*
it matches
# 1 "frotz"
In the second version, you never break the line up into fields at all -- you
just re-use the results of the regexp match. So it is definitely faster.
But it can also be written in three lines of code (see below), if you use the
right features from AWK.
A few other points:
- I don't think you should bother defining GET-LINE, when READ-LINE already
does what you want.
- The #f in your GET-INCLUDES's AWK form is not necessary. The counter var is
*optional*.
One could rewrite your second version as the following three-line AWK form:
(define (get-includes-old)
(awk (get-line port) (line) ((include-list '()))
("^(# 1 \")(.*h)(\")" =>
(lambda (match) (put-nondup-in-list (match:substring match 2))))))
This macro-expands to exactly the code you wrote by hand (barring two
trivial bits of beta-substitution).
Scsh's regexp-matching is, unfortunately, rather slow, because regexps
currently get compiled each time they are used. If I hacked in support
for pre-compiling regexps, regexp-matching loops would speed up.
For the macro-curious, I'd point out three useful bits of S48 advice:
- ,expand <foo>
Macro-expands <foo> for you.
- ,open pp
Provides a pretty-printer named p. So
,expand (awk ...)
(p ##)
will macro-expand and pretty-print an AWK form.
- The emacs command C-M-q in a Scheme buffer will do a better job of
re-indenting a form for you after you've printed it out with the P
pretty-printer.
-Olin
|