Michel Schinz:
But couldn't we imagine a regular expression matcher that fetches
characters from ports as it needs? I sometimes wish I had a "lex"
macro in scsh, providing something similar to what the (f)lex tools
provides for C (and other languages). I find it sad that I have to
revert to "read-char" as soon as I need to analyze text files which
cannot be read line-by-line (e.g. any source file of a language
allowing multi-line comments).
For general Posix regexps, it isn't quite as simple as fetching
characters as needed -- the matcher needs random access to
characters already matched.
However, there is no need to use only your imagination. The Rx
regexp engine permits matching over non-contiguous, dynamically
constructed strings. In both the Posix functions and some lower
level functions it has everything you need to make a fast lexer --
I use it for that purpose in Systas Scheme. See http://www.regexps.com.
Primitives for fast I/O in scheme are a persistent need and Systas
has some nice examples of those, too.
Systas has a process mgt. interface largely cribbed from SCSH, but
would need a few more lines of code to actually be source-level
compatible.
(In many other ways, though, Systas basically sucks and I don't recommend
using it for anything critical -- it's just a place to consider grabbing
ideas and techniques for better regexp support and neat I/O primitives.)
-t
|