Alex Shin:
Most regular expression operations I find are concatenations
of other regular expressions, so this works out well. You can
also define higher order regular expression functions, with
only slightly less convenience than in an sexp syntax, but
that's of questionable use to begin with. If you need that
much structure, regular expressions are probably the wrong
tool. Something that bridges the gap between regular
expressions and full grammars (i.e. what Perl 6 plans) would
be great in Scheme, but that doesn't seem to be what SRE's are
for.
Generally, I'm a huge advocate of sexp's over other syntax in
almost every case. But the above example looks about as
simple as you can hope for in Perl. The SRE version has about
the same number of conceptual elements, but is much more
verbose and "looks" nothing like what you're trying to match.
I'd like to see an example where SRE really shines over
conventional regexp syntax.
Both the forum and the status of most regexp engine implementations
fight against a good response. Both demand a fairly small reply, yet
one of the most significant aspects of SRE's is that they scale nicely
to quite large examples.
I use a close variant of Olin's SRE syntax, backed by my own regexp
engine. Enclosed is a page from my Scheme-based wiki -- I'm not
confident it'll give you a clear picture -- but at least it should
give you a hint.
Let me call your attention to some aspects of this code.
First, a _huge_ regexp is expressed in lots of separate clauses, each
of which uses SRE syntax. Application code which processes this
definition composes the separate clauses into a massive SRE using
simple list operators (map and backquote). That code doesn't have to
fuss with generating strings that happen to satisfy regcomp -- the
list structure nicely reflects the intended structure of the regexp.
It's a very natural-feeling way to program.
Second, SRE-implementation code which processes the resulting massive
SRE uses simple list operations to, for example, identify keywords
with matching clauses of a regexp "|" expression.
In short, SRE makes manipulating regexps feel to a Scheme programmer
just as natural (and quite similar) to manipulating Scheme source code
in s-exp form. I can sling around SRE fragments quite robustly using
just map and backquote, without having to think too hard about the
peculiarities of `regcomps' surface syntax. In contrast, Perl's
structured regexp syntax requires regexp-specific string mungers to
work with similarly.
Quite honestly, when I first saw Perl's introduction of a "structured"
regexp syntax, I thought, "hey, that's funny: it's an SRE parody."
Finally, let me add that a little bit of Emacs bigotry. I navigate
and edit my SREs using emacs s-exp navigation and edit commands. Some
people never get past the set of operators present in, say, vi -- but
those of us who have that capability -- well, we appreciate an s-exp
notation for the precision and ease with which it can be manipulated
by an elegant editor such as Emacs.
Another way to say the same thing: what's wrong with you that you
don't find obvious value in an economy of notational concepts?
-t
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;;; Division of Input into Typed Paragraphs
;;;
(begin
(define wiki-paragraph-rules
;; (type test separator)
;;
;; Test is a structured regexp to be compiled in a larg `(| ...)'
;; of all of the test expressions. The leftmost-longest matching
;; test expression determines the type of the first paragraph in a
;; given string.
;;
;; `(separator string)' returns a list: `(paragraph remaining-string)',
;; separating the first paragraph from the rest of the string.
;;
;; The `test' expression and `separator' procedure can safely assume
;; that the string is not empty, and does not begin with any blank lines.
;;
`((:form-feed (& (* ([] blank)) "\f" (* ([] blank)))
,(lambda (s) (one-line-separator s)))
(:comment-line (& "%%%"(* ([^] "\n")))
,(lambda (s) (one-line-separator s)))
(:rfc822ish (& (* ([] blank)) "+++" (* ([] blank)))
,(lambda (s) (ordinary-paragraph-separator s)))
(:title (& "!" (+ ([^] "\n")))
,(lambda (s) (ordinary-paragraph-separator s)))
(:card-boundary "\f---"
,(lambda (s) (one-line-separator s)))
(:heading (& (* ([] blank))
(+ "*")
([] blank)
([^] ")#*\n")
(* ([^] "\n")))
,(lambda (s) (ordinary-paragraph-separator s)))
(:menu (& (* ([] blank))
"-*-*-"
(* ([] blank)))
,(lambda (s) (one-line-separator s)))
(:verbatim (& (* ([] blank)) "<<<" (* ([] blank)))
,(lambda (s) (verbatim-paragraph-separator s)))
(:small-paragraph (& (* ([] blank)) "(((" (* ([] blank)))
,(lambda (s) (small-paragraph-separator s)))
(:text-area (& (* ([] blank)) "?<<<" (* ([^] #\nl)))
,(lambda (s) (verbatim-paragraph-separator s)))
(:one-line-verbatim (& "#" (* ([^] "\n")))
,(lambda (s) (one-line-separator s)))
(:separator-line (& (* ([] blank)) "---" (* "-") (* ([] blank)))
,(lambda (s) (one-line-separator s)))
(:horizontal-rule (& (* ([] blank)) "===" (* "=") (* ([] blank)))
,(lambda (s) (one-line-separator s)))
(:poetry-start (& (* ([] blank)) "\\\\" (* ([] blank)) (?
"//") (* ([] blank)))
,(lambda (s) (one-line-separator s)))
(:i-poetry-start (& (* ([] blank)) "\\\\_" (* ([] blank)) (? (*
"_") "//") (* ([] blank)))
,(lambda (s) (one-line-separator s)))
(:ib-poetry-start (& (* ([] blank)) "\\\\__" (* ([] blank)) (? (*
"_") "//") (* ([] blank)))
,(lambda (s) (one-line-separator s)))
(:poetry-end (& (* ([] blank)) "//" (* ([] blank)) (?
"\\\\") (* ([] blank)))
,(lambda (s) (one-line-separator s)))
(:bullet-item (& (* ([] blank)) (+ "*") ")" (* ([^] "\n")))
,(lambda (s) (ordinary-paragraph-separator s)))
(:definitions-item (& (* ([] blank))
"+" (* ([^] "\n"))
(| (& ":::" (* ([^] "\n")))
(& "+" (* ([] blank)))))
,(lambda (s) (ordinary-paragraph-separator s)))
(:numbered-item (& (* ([] blank)) (+ "*") "#" (* ([^] "\n")))
,(lambda (s) (ordinary-paragraph-separator s)))
(:poetry-display (& (* ([] blank)) "~~~" (* "~") (* ([] blank)))
,(lambda (s) (one-line-separator s)))
(:display-separator (& (* ([] blank)) "###" (* "#") (* ([] blank)))
,(lambda (s) (one-line-separator s)))
(:begin-block-quote (& (* ([] blank)) "```" (* "`") (* ([] blank)))
,(lambda (s) (one-line-separator s)))
(:end-block-quote (& (* ([] blank)) "'''" (* "'") (* ([] blank)))
,(lambda (s) (one-line-separator s)))
(:table-start (& (* ([] blank)) (+ "v") "===" (* ([^] "\n"))
(* "=") (* "v") (* ([] blank)))
,(lambda (s) (one-line-separator s)))
(:table-end (& (* ([] blank)) (+ "^") "===" (* "=") (* "^")
(* ([] blank)))
,(lambda (s) (one-line-separator s)))
(:meta-x-form (& (* ([] blank)) "?M-x" (+ ([] blank)) (+ ([]
alnum "-")) (* ([] blank)))
,(lambda (s) (one-line-separator s)))
(:form-start (& (* ([] blank)) (+ "v") "???" (* "?") (* ([^]
"\n")))
,(lambda (s) (one-line-separator s)))
(:form-end (& (* ([] blank)) (+ "^") "???" (* "?") (* ([]
blank)))
,(lambda (s) (one-line-separator s)))
(:section-name (& (* ([] blank)) "::" (* ([^] "\n")) "::" (*
([] blank)))
,(lambda (s) (one-line-separator s)))
(:note-name (& (* ([] blank)) "{" (* ([^] "}")) "}::" (*
([] blank)))
,(lambda (s) (one-line-separator s)))
(:biblio-name (& (* ([] blank)) "[" (* ([^] "]")) "]::" (*
([^] "\n")))
,(lambda (s) (one-line-separator s)))
(:include-topic (& (* ([] blank)) "@@" (* ([^] "\n")))
,(lambda (s) (one-line-separator s)))
(:short-table (& (* ([] blank)) "|" (* ([^] "\n")) "|")
,(lambda (s) (one-line-separator s)))
))
(define identify-paragraph-type
(let ((regexp (structured-regexp->procedure `(^$ (| ,@(map
(lambda (rule) `(! ,(car rule) ,(cadr rule)))
wiki-paragraph-rules)))
:pick-spec '?)))
(lambda (s)
(if (string-null? s)
:eof
(let ((re-case (regexp (make-shared-substring s 0 (or
(string-index s #\nl) (string-length s))))))
(or re-case :regular-paragraph))))))
(define ordinary-paragraph-separator
(let ((pattern (structured-regexp->procedure `(^$ (| (* ([]
blank))
,@(map
(lambda (rule) `,(cadr rule)) wiki-paragraph-rules)))
:cflags
'REG_NEWLINE
:eflags
'REG_NOTBOL
:pick-spec '(< (0
>)))))
(lambda (s)
(or (pattern s)
(list s "")))))
(define verbatim-paragraph-separator
(let ((pattern (structured-regexp->procedure `(^ ">>>\n")
:cflags
'REG_NEWLINE
:pick-spec '((<
0) >))))
(lambda (s)
(or (pattern s)
(list (string-append s "\n>>>\n" ""))))))
(define small-paragraph-separator
(let ((pattern (structured-regexp->procedure `(^ ")))\n")
:cflags
'REG_NEWLINE
:pick-spec '((<
0) >))))
(lambda (s)
(or (pattern s)
(list (string-append s "\n)))\n" ""))))))
(define one-line-separator
(structured-regexp->procedure `(& (* ([^] "\n")) (| "\n" ($ "")))
:pick-spec '(0 >))))
;; (wiki-paragraph-lexer string)
;;
;; Return a procedure `lexer'.
;;
;; Calling lexer with no arguents return successive paragraphs from `string'
;; in the form:
;;
;; (type paragraph)
;;
;; where `type' is one of the keywords from the table `wiki-paragraph-rules'
;; and `paragraph' a string.
;;
;; Passing a single true argument causes the same return value, but the
;; paragraph is not removed from the stream.
;;
(define (wiki-paragraph-lexer initial-string)
(let* ((known-next-lexeme #f)
(string "")
(lexer-procedure (lambda (:optional peek?)
(if (not known-next-lexeme)
(set! known-next-lexeme
(wiki-lex-paragraph (without-initial-blank-lines string))))
(let ((answer (list (car known-next-lexeme)
(cadr known-next-lexeme))))
(if (not peek?)
(begin
(set! string (caddr
known-next-lexeme))
(set! known-next-lexeme #f)))
answer))))
(set! string initial-string)
lexer-procedure))
(define without-initial-blank-lines
(let ((ordinary-case (structured-regexp->procedure `(^ (* (& (* ([]
blank)) "\n"))) :pick-spec '>))
(all-blanks? (structured-regexp->procedure `(^$ (* ([]
blank))))))
(lambda (s)
(cond
((string-index s #\nl) (ordinary-case s))
((all-blanks? s) "")
(#t s)))))
(define (wiki-lex-paragraph string)
(let* ((type (identify-paragraph-type string))
(rule (or (assq type wiki-paragraph-rules)
(if (string-null? string)
`(:eof #f ,(lambda (s) (list "" "")))
`(:regular-paragraph #f ,(lambda (s)
(ordinary-paragraph-separator s)))))))
(cons type ((caddr rule) string))))
|