scsh-users
[Top] [All Lists]

Re: Usage of Shivers' SRE regular expression notation

To: scsh-news@zurich.ai.mit.edu
Subject: Re: Usage of Shivers' SRE regular expression notation
From: Alex Shinn <foof@synthcode.com>
Date: Tue, 18 Feb 2003 16:21:25 +0900
Organization: NTT Communications Co.(OCN)
>>>>> "Anton" == Anton van Straaten <anton@appsolutions.com> writes:

    Anton> A simple example I used recently was to extract the URIs or
    Anton> filenames referenced by HREF="..." and FILE="..." statements
    Anton> in server-side HTML templates (which couldn't be successfully
    Anton> parsed by a pure HTML parser).  I used the following SRE
    Anton> expression, in SCSH:

    Anton> (rx (w/nocase (: (or "href" "file") (* whitespace) "=" (*
    Anton> whitespace) "\"" (submatch (* (~ ("?\"")))) "\"")))

For comparison, the Perl 5 equivalent is

  /(href|file)\s*=\s*"([^"]*)"/i

the equivalent (overly-) commented, structured version is

  my $KEYWORDS = "( href | file )";
  my $OPTWHITE = "\s*";             # zero or more whitespace chars

  /
    $KEYWORDS   # match either an href or file
    $OPTWHITE   # optional whitespace
    =
    $OPTWHITE
    "
    (           # start group
      [^"]      # a non-quote character
      *         # ... repeated zero or more times
    )           # end group
    "
  /xi

Using variable interpolation in Perl regular expressions is quite
common.  In fact, many of the most common recurring patterns in regular
expressions are collected in the Regexp::Common module.  The above could
be written:

  use Regexp::Common;

  / (href|file) \s* = \s* $RE{quoted} /xi

Most regular expression operations I find are concatenations of other
regular expressions, so this works out well.  You can also define higher
order regular expression functions, with only slightly less convenience
than in an sexp syntax, but that's of questionable use to begin with.
If you need that much structure, regular expressions are probably the
wrong tool.  Something that bridges the gap between regular expressions
and full grammars (i.e. what Perl 6 plans) would be great in Scheme, but
that doesn't seem to be what SRE's are for.

Generally, I'm a huge advocate of sexp's over other syntax in almost
every case.  But the above example looks about as simple as you can hope
for in Perl.  The SRE version has about the same number of conceptual
elements, but is much more verbose and "looks" nothing like what you're
trying to match.  I'd like to see an example where SRE really shines
over conventional regexp syntax.

-- 
Alex


<Prev in Thread] Current Thread [Next in Thread>