scsh-users
[Top] [All Lists]

regexp matching over entire file?

To: scsh-news@zurich.ai.mit.edu
Subject: regexp matching over entire file?
From: "Anton van Straaten" <anton@appsolutions.com>
Date: Sun, 12 Jan 2003 22:24:41 GMT
Organization: EarthLink Inc. -- http://www.EarthLink.net
What is the canonical way (if any) in SCSH to perform a regexp search that
detects all non-overlapping matches of a regexp pattern in an entire file?

The regexp functions apparently only accept string arguments, which raises
the requirement of reading the file into strings in chunks which are then
passed to the regexp functions.  It seems as though the awk functionality
might allow something like this.  One way would be to use an awk-style loop
to read the file a line at a time and search for matches on each line, but I
wondered if there wasn't a more transparent way to do this?

Just to hack something together, I did it by reading the entire file into a
single string, which resulted in code like this:

(define (process-all-matches filename pattern process-match-proc)
  (call-with-input-file filename
    (lambda (port)
      (regexp-for-each
       pattern
       (lambda (m)
         (let-match m (match match-str subpattern-match)
           (process-match subpattern-match)))
       (read-string 65535 port)))))

(Extracted from real code, so may contain trivial errors.)  This works, but
is obviously limited by the chosen file size.  Chunks don't work well unless
the file can be chunked in such a way as to avoid breaking up a possible
match.  Line-oriented reading of the file is one way around this in many
cases, but what's really needed here is stream functionality, for example as
provided by Scheme's ports.  But SCSH doesn't seem to have a way to perform
regexp matching on ports.  Have I missed something?

Anton




<Prev in Thread] Current Thread [Next in Thread>