In article <qijd8vebe9x.fsf@lambda.ai.mit.edu>,
Olin Shivers <shivers@ai.mit.edu> wrote:
>OK, I'm convinced it's handy. But it's also tricky. I do not think it is
>trivial to do complement or intersection on RE's using the standard
>notation. Systems that work by translating to classical string notation can no
>longer do a simple translation. The only way that I can think of to do this is
>to (1) compute the NDFA for the regexp(s), (2) do the complement or
>intersection op on the NDFA, (3) run the state-minimisation algorithm on the
>result (because intersection make a big NDFA), (4) run the NDFA -> regexp
>algorithm. This is a mess! Very slow, too. Could you preserve submatches info
>across all these operations? Anybody volunteering to write the code?
I strongly recommend that you abandon any attempt to do intersection
or complement of regular languages. The reason is that intersection
can shorten the length of a regexp by an exponential amount, or
equivalently, grow the automaton by an expontential amount.
|