Tod-
You sent me timings around Thanksgiving reporting very slow read times for
scsh's delimited readers, even using 0.4.2's new native-code
implemenation. Reading /usr/dict/words took 30 seconds.
I tried it on my PC, and also got very slow times. So I worked on it a bit
(as I've reported on the scsh newsgroup lately).
I got it from 35.32 seconds down to 20.9 seconds by hacking the optional
arg parsing and defaulting.
Here is my test (a little bit leaner than your test):
> (define (ltest r)
(call-with-input-file "words"
(lambda (p)
(let lp ()
(or (eof-object? (r p)) (lp))))))
> ,collect
Before: 513235 words free in semispace
After: 663373 words free in semispace
> ,time (ltest read-line)
Run time: 20.97 seconds
#t
So, OK, 41% improvement is good. But that's still pretty slow.
So I re-did the test using the low-level %READ-DELIMITED! routine:
> (define (foo)
(let ((buf (make-string 1024))
(cset (string->char-set "\n")))
(call-with-input-file "words"
(lambda (p)
(let lp ()
(receive (term nread) (%read-delimited! cset buf #t p)
(or (eof-object? term) (lp))))))))
> ,collect
Before: 630710 words free in semispace
After: 663001 words free in semispace
> ,time (foo)
Run time: 13.10 seconds
#t
>
and the even lower-level direct, internal interface to the C routine:
scsh-level-0-internals>
(define (bar)
(let ((buf (make-string 1024))
(cset (string->char-set "\n")))
(call-with-input-file "words"
(lambda (p)
(let lp ()
(receive (term nread)
(%read-delimited-fdport!/errno cset buf #t p 0 1024)
(or (eof-object? term) (lp))))))))
scsh-level-0-internals> ,collect
Before: 627216 words free in semispace
After: 662758 words free in semispace
scsh-level-0-internals> ,time (bar)
Run time: 9.14 seconds
#t
scsh-level-0-internals>
So, I pretty much can't get below 9.14 seconds; that's the limit imposed
by the byte-code interpreter and my C FFI. It's interesting that the
Scheme-side overhead accounts for a little over half of the run time --
about 11 seconds worth. If S48's byte compiler was better, this would
improve somewhat, but, again, that's the facts of byte-code life.
Just to depress you, I tried this one, too:
#include <stdio.h>
main()
{
char buf[1024];
while(fgets(buf, 1024, stdin));
}
Which clocked in at
0.058u 0.024s 0:00.07 100.0% 44+226k 0+0io 0pf+0w
But what's two orders of magnitude for the sake of elegance and beauty?
Go buy yourself a P6 box.
-Olin
|