> (* <regex> ...) 0 or more matches
I'm not sure how to interpret the ellipses here. Is (* A B)
equivalent to (* (& A B)), or (* (* A) (* B)), or something else?
It seems to me that * should be a unary operator.
> (LET* ((<name> <regex> ...) ...)
> <regex> ...)
> <name> Named regex use
Does this have a big advantage over Scheme's normal quasiquote
mechanism? Why should I write
'(let* ((digit (in (- "09"))))
(& (+ digit) ":" digit digit))
when I could write
(let ((digit '(in (- "09"))))
`(& (+ ,digit) ":" ,digit ,digit))
The former is superficially simpler, but actually more complicated
because it makes the meaning of a regexp depends on an environment.
This is a mess. I see your 2nd note, about the "primitive regexps",
as a sign that we're already encountering hygene problems: "Are
bindings in regexps statically or dynamically scoped?" Ugh.
The latter expression is ordinary Scheme code, whose semantics are
already explained and tested.
The only advantage I can see to giving the regexp notation its own
LET* is that some repetitions are made explicit, and the back end
could perhaps do some optimizations guided by that information. But
the information is not hard to derive, and I don't know of any extant
regexp back ends that could take advantage of it anyway.
Suppose we strike LET*. Then, since a symbol is not a valid regexp,
it would be unambiguous to make concatenation (what you call
"Sequence", I think?) implicit. That is, we could write ("a" (* "b")
"c") for "ab*c". Concatenation is the most common operator, and I
think this would also make it easier to guess that (* A B) is (* (A
B)) (if that is indeed what you meant...).
This reminds me a lot of VMS's answer to GNU Emacs, TPU. It has a
Pascal-like extension language with a datatype for regular expressions
("patterns").
- The search function took a pattern value as its argument.
- TPU had operators that constructed bigger patterns given smaller ones.
- Because they were a real datatype, you could write your own functions
on patterns.
- Pattern values were immutable, so each node could have an (internal)
pointer to a compiled representation of the regexp of which it was
the root, generated on demand. (I don't know if they actually did
that, but it would have been easy.)
>Named (not numbered) submatches?
TPU had an operator called @ that did this. I don't remember it
exactly, but I think that PATTERN @ VARIABLE matched exactly what
PATTERN matched, except that VARIABLE was set to the text PATTERN
matched. This pattern, of course, could be used as part of a larger
pattern. Don't ask me what happened if VARIABLE went out of scope.
|