| Right. It's an ASCII reader, and strings are ASCII strings. I'd have to
| and change the low-level string and character reps before I could change
| the reader.
While there is some merit to making scheme-48 a fully Unicode based
system, that wasn't what I was proposing (and, unfortunately, don't
have nearly enough time for). Rather, I was objecting to the fact that
read.scm limits the characters that it will read into a symbol, instead
of just accepting any characters that aren't reserved for something else.
That would make scsh iso-latin-1 clean, which would be enough to use
UTF-8 (multibyte encoded Unicode characters) in symbol names, even
without teaching the scheme runtime anything about Unicode or changing
the representation of anything.
In order to support unicode, Plan 9 uses both arrays of chars (bytes)
and arrays of Runes (16 bit characters). The system stores files as
UTF-8 chars, not as Runes, just as in Unix. That means that you always
need char*, and often need Rune*, but Scheme has only one character
type and one string type. Sounds complicated to fix.
|