scsh-users
[Top] [All Lists]

Re: help with regexps

To: scsh-users@scsh.net
Subject: Re: help with regexps
From: Michel Schinz <Michel.Schinz@epfl.ch>
Date: Tue, 12 Oct 2004 10:50:04 +0200
Cancel-lock: sha1:0mQJvqri4YocdbG2sSIuw1yGvYI=
List-id: <scsh-users.list-id.scsh.net>
Sender: news <news@sea.gmane.org>
Andreas Bernauer wrote:

> Hello,

Hi Andreas,

> can anybody help me with this?  Why does using tmprx result in an
> error, but using tmp2rx works?

To me, this looks like a bug, maybe in simplify-regexp, see below.
Some comments on what you wrote, before.

> Top level
> ;; 
>> (fold (lambda (choice regexp)
>                               (rx (| (submatch ,choice) ,regexp)))
>                             (rx eos bos) ;; the regexp that never
>                             ;; matches anything

I would suggest that you use the ADT creation functions to create what
you want. Something like this:

  (re-choice (map (lambda (s) (make-re-submatch (re-string s)))
                  '("yes" "no")))

This doesn't need any regexp matching nothing.

To come back to your problem, a nice way of seeing what is going on is
to use the regexp->sre function, as follows:

----------------------------------------------------------------------
> (define ab-rx (fold (lambda (choice regexp)
                        (rx (| (submatch ,choice) ,regexp)))
                      (rx eos bos)
                      '("yes" "no")))
> (regexp->sre ab-rx)
'(| (submatch "no") (| "yes" (: eos bos)))
----------------------------------------------------------------------

Here you see that the submatch for "yes" was removed by the comma
inclusion, as you guessed. Now the reason why I think there is a
problem:

----------------------------------------------------------------------
> (simplify-regexp ab-rx)

Error: exception
       wrong-type-argument
       (checked-record-ref '#{Re-choice} '#{Record-type 48 re-seq} 1)
1> 
----------------------------------------------------------------------

Here I get the same error as you did. Now, if I do a round-trip with
regexp->sre and sre->regexp (the composition of which should be the
identity function, as I understand it), it works:

----------------------------------------------------------------------
> (simplify-regexp (sre->regexp (regexp->sre ab-rx)))
'#{Re-choice}
> (regexp->sre ##)
'(| (submatch "no") "yes" (: eos bos))
----------------------------------------------------------------------

But it's still not what you want. What I proposed above works, though:

----------------------------------------------------------------------
> (re-choice (map (lambda (s) (make-re-submatch (re-string s)))
                  '("yes" "no")))
'#{Re-choice}
> (regexp->sre ##)
'(| (submatch "yes") (submatch "no"))
----------------------------------------------------------------------

I find it a little strange that you want each alternative to be
included in a separate submatch, though. Are you sure that you do not
want to include the whole "or" part in a submatch? If this is the
case, you should use the following instead:

----------------------------------------------------------------------
> (make-re-submatch (re-choice (map re-string '("yes" "no"))))
'#{Re-submatch}
> (regexp->sre ##)
'(submatch (| "yes" "no"))
----------------------------------------------------------------------

HTH,
Michel.

<Prev in Thread] Current Thread [Next in Thread>