Does anyone know the code for the Encode-symbol procedure for Exercise
2.68 from the Book "Structure and Interpretation of Computer Programs"
??? My codes are not running :-(
Basically, you have a tree structure of symbols which contains the
weights used while assigning positions in the tree to each symbol. You
can use that tree (as shown in the previous exercise), but it's easier
to think about it if you discard the weights and just position the
symbols:
(define tree (cons 'A (cons 'B (cons 'D 'C))))
Ignore the fact that tree is an improper list ('A 'B 'D . 'C) -- that
only confuses things because unlike many other applications, it is not
being used as a list. Here's a function for traversing the tree. Note
that it searches for a matching symbol while maintaining the path as a
list of 0s and 1s corresponding to the car and cdr branches of the tree.
(define (encode-helper symbol path tree)
(cond ((eq? symbol tree) (reverse path))
((not (pair? tree)) #f)
((encode-helper symbol (cons 0 path) (car tree)))
(else (encode-helper symbol (cons 1 path) (cdr tree)))))
Note that the path is maintained in reverse order because it's much easier
to cons the 0s and 1s to the front than to append them to the end, and
all but one of the paths developed during the search will be discarded.
The third test in the cond clause returns the test result as a
byproduct of not having any consequents to evaluate and the fact that
we're using #f as the method of returning no result.
(You could factor out the constant "symbol" from this by getting it
from an enclosing environment, but I'll leave that up to you, plus
the question of how you incorporate the result in the procedure
encode-symbol.)
(define (test symbol) (encode-helper symbol '() tree))
(test 'A)
(0)
(test 'B)
(1 0)
(test 'C)
(1 1 1)
(test 'D)
(1 1 0)
(test 'Z)
#f
It is also well-behaved with the null tree:
(encode-helper 'A '() '())
#f
You might be concerned that this is an inefficient way to write the
encoding algorithm, because of the search. However, search encounters
the stored symbols in order of frequency in the document being encoded,
so the search time per symbol is typically less than half the size of
the tree.
For small symbol spaces, such as the ASCII character set, an alternative
encoding method involves converting the tree into a vector containing the
sequences indexed by the numerical character code. For larger symbol spaces
hashing is a good approach. This makes the lookup time per symbol very fast,
which pays off for all but small documents.
Peter Olson
|