Hi all,
Erik Naggum wrote:
>
> * Charles Lin
> | Now, perhaps I completely misunderstood what Ousterhout did, but I was
> | under the impression that he said everything in Tcl (the external
> | representation) is a string. Why are we talking about internal
> | representations at all?
>
> because the internal representation in Tcl is the string.
This was true some times ago, but not anymore. See below for more complete
explanation.
> | Internal representations aren't going to be important because they are
> | hidden from the user (mostly).
>
> I think you're missing the point all over the place. Tcl's purpose is to
> glue different applications together, but also to do something with the
> returned "values" (i.e., output) of these programs. of course we're
> dealing with interfaces where external representations will be used, but
> Tcl uses the same external representations for internal purposes, too.
> when you execute a numeric operation, the string is converted to a number,
> the operation is performed, and then converted back to a string. in
> contrast to this view, AWK converts a string to a number when it is used
> for a numeric purpose, and a number to a string when it is used for a
> string purpose, but not otherwise. (at least, this is what I have
> understood from what I have read about Tcl.)
Same as above.
> | The point is that (almost) no one wants to write a program whose
> | *external* representation is numbers.
>
> nobody has suggested they do, and nobody in their right mind would.
Former discussion in this thread was rather ambiguous on this point. Some
suggested to use lists (as in Lisp). Some to use numbers, and gave a way
to represent strings as numbers. But nobody wants to code in numbers! (as
an external representation). Anyway, to be a little extremist, I could
pretend that there's a bijection between every data representation, since
everything in a computer is a vector of _bits_ ;-) Just to caricaturate
former aguments. Enough on that point.
> | Strings are more flexible, if they're the only type you have, as external
> | representation.
>
> but external representations _aren't_ typed!
They are "semantically typed": see below.
> I'd venture that that's the whole point in using an _untyped_ string (or
> even byte-sequence) representation is that whoever reads it would be able
> to interpret it, indeed that type would emerge from the interpretation.
Yes, that's the point.
Ok. Now, why did JO choose strings as external representation?
Let me remind everybody that Tcl means Tool Command Language, and was
intended to glue apps together. JO then choosed to use an external interface
similar to the main() function in C programs, and THAT is the great point
of Tcl, because this kind of interface is well known and widely used since
every C program HAS to use it as an entry point. This was the best choice
IMHO as an external interface for a glue language, and I don't think we
can argue against that. JO didn't choose lists or numbers, not because they
are bad or whatsoever, but because command lines arguments are strings, and
string are the only data structure that is common to any programming language
_as an external interface_ and that there's (nearly) always a string
representation of any data structure. Without performance considerations,
you can represent strings, numbers, lists, trees, and so on, using Tcl lists,
provided you use proper conventions.
Tcl's first intent was to glue programs together. Thus, one programs could
be written in Fortran, one another in C, even in Lisp or anything. Then
every program had to export its external interface as Tcl commands. This
is very easy since it's like writing a main() function. Then you had to
write the Tcl script to glue things together (calling programs via their
external interfaces, passing string arguments and getting back string
results).
Tcl was not intended to act as a programming language by itself. But it's
a fact that many apps that use Tcl are Tcl-only, because there are enough
primitives in Tcl for that. But since everything in Tcl is a string, there's
a huge overhead if the whole program is written in Tcl, while there's little
overhead if Tcl only acts as a glue.
Thus, strings are the only external interface in Tcl, and were the only
internal representation as well until some months ago. Tcl-only procedures
were strings. Number were strings. Lists were strings. And so on. This
wasn't a big deal if Tcl was used as a glue language, but Tcl was quite slow
if used as a stand-alone programming language.
So, and AFAIK due to the increasing number of Tcl-only projects, the Tcl
Team decided to change the internal representation of data. This was a
long time planed project, and many attempts were made before, but the main
challenge was to keep the "everything is a string" semantic of Tcl from
an external point of view, especially regarding the metaprogramming and
dynamic capabilities of Tcl (along with pass-by-name stuff).
So, the current version of Tcl (version 8.0, still in alpha, but soon beta.
First public release in Dec 1996) introduces several changes:
- internal representation is now "objects" (not in the usual meaning).
Now, an "object" still has an external string representation, but can
internally be a string, a list, an integer, program bytecode... And
internal representation can be ubiquitous (it is cached).
- former Tcl strings were C strings. This prevented Tcl from processing
binary strings/data, due to the terminating null char. Now, Tcl strings
know their length and can contain arbitrary characters.
- former Tcl lists were space-separated strings, with {}'s for nesting.
access was in O(n). Now, lists are _real_ lists and access is O(1).
Strings are "aggressively" parsed, and although they remain strings from
an external point of view, they can be used as lists and vice versa. This
results in a major speedup on list commands, and now lists can be used as
compound types, for which arrays were formerly used for performance issue
(use of hash tables).
- Data is now "semantically typed" : string "1" is first a string, then an
integer representation is generated if it's used eg in a math. expression.
"my name is Fred" is a string that is turned into a list when list commands
are used. "puts foo" is a string that can be turned into bytecode if it's
eval'd, and similarly into a list. And so on.
For example :
set i 0
while {$i < 50} {
incr i
puts $i
}
* variable i is set to the string "0"
* "while" command is called with 2 arguments: the strings "{$i < 50}" and
"{\n incr i \n puts $i \n}". While then evaluates the 1st as a logical
expression, thus generates an int representation of i, that is used in
the comparison expression. If the result is true, it then evaluates the
second argument. A bytecode representation is then generated by the
bytecode compiler, and evaluated. "incr i" adds 1 to the value of i, but
it already has an int representation, so it's used instead of the string
representation. "puts $i" prints the string representation of i, but since
it has changed, it has been invalidated and another representation is
generated
from the current int representation (well, I think so). Then we test again
the expression {$i < 50} and reevaluate the loop body and so on.
Note that only the first iteration is slightly slower than in previous Tcl
versions, due to the extra computations. But following iterations are much
faster. So if you use Tcl as a glue language, the speed difference is not
significant, but if you use it as an extension language or a standalone
language, then there's a significant speed boost.
So, I think that the "string is bad as an external representation" argument is
irrelevant. Lisp has lists. Prolog has predicates. And so on. I think that the
only relevant argument is what you can do with the language (which has something
to do with the internals), and not which external representation you use.
Last point: there _are_ several implementations of Tcl. Ioi Lam is working on
a promising Tcl interpreter written in Java, called Jacl (Jackal). Its internal
structures will be arbitrary Java objects. See:
http://simon.cs.cornell.edu/home/ioi/Jacl/
So the whole point on "Tcl is bad because everything is a string" is irrelevant.
It's not because an implementation of Tcl uses strings as its internal
representation that it's a Tcl problem: it's the implementation's.
Sorry for the length of this message, but many things that were said about Tcl
(eg "Tcl is a glorified preprocessor") made me wonder which language we were
talking about. Sometimes it was a bit too close to Tcl-bashing on irrelevant
points. There have been major changes in Tcl since its first release. I hope I
refreshed many people's mind.
See you, Fred
--
Frederic BONNET fbonnet@irisa.fr
Ingenieur Ecole des Mines de Nantes/Ecole des Mines de Nantes Engineer
IRISA Rennes, France - Projet Solidor/Solidor Project
------------------------------------------------------------------------
Tcl: can't leave | "Theory may inform but Practice convinces."
$env(HOME) without it! | George BAIN
|