scsh-users
[Top] [All Lists]

Re: Object IDs are bad (nuh-uh!)

To: scsh-news@martigny.ai.mit.edu
Subject: Re: Object IDs are bad (nuh-uh!)
From: wilson@cs.utexas.edu (Paul Wilson)
Date: 25 Apr 1997 03:15:09 -0500
Organization: CS Dept, University of Texas at Austin
In article 
<wujsp0gz0y5.fsf_-_@wistaria.i-have-a-misconfigured-system-so-shoot-me>,
Peter Ludemann  <ludemann@inxight.com> wrote:
>
>Absolutely correct and correctly so.
>
>If two things LOOK the same, then they ARE the same.  It's called
>"referential transparency" and it's a Good Thing, as opposed to
>"pointers" which are a Bad Thing.  Why confuse your life with "object
>identity"?

Confusing pointers with addresses is a Bad Thing.  A pointer is 
language-level thing, and an address is an implementation-level
thing.  The mapping between them is often not simple.

>And why do you want to use the memory address for object identity to
>identify it --- an address can change during garbage collection.

Are there any garbage collectors that DON'T deal with this?  First off,
it's not an issue in non-copying GC, and copying GC's are highly
overrated.

As far as I know, every copying GC already deals with the issue of
updating addresses consistently to ensure that the pointer abstraction
is maintained at the language level.  And this isn't really much of an
added cost---even in pure, lazy FP languages you want to maintain
the reachability graph for efficiency reasons at the implemenation
level.  (E.g., you don't want to turn a general graph of of
unreduced combinators or closures into a tree, because then
memoization of lazily computed values may be defeated.)

>not use a time stamp of object creation?  Oh, it's for efficiency
>reasons, you say ... ah-ha!
>
>But let's suppose that you really really really do need to identify a
>particular subtree.  In other words, you want to NAME it.  No problem:
>just create a dictionary (hash table) that maps names to subtrees.

Wow.  You think it's a good idea to add a *hashing* cost to what's
conceptually a pointer traversal?  And to make more work for either
the programmer or the GC?  (Adding extra indexes to data structures
can cause retention of data that will never be used again.  If you
use plain tables, the programmer has to remember to remove the table
entries when the corresponding objects die---whenever that is;  if
you use weak tables understood by the GC, it adds significant
overhead.)

>That'll let you have two differently named entries which might happen
>to have the same values.  And it won't expose pointers.  And it'll be
>efficient.

Where's the win?  If my language has plain old pointers, it can
efficiently support the abstraction of object identity.  (It gets
harder in distributed systems, but still...)

In many cases, cobbling up your own notion of object identity is extra
hassle for the programmer, and quite inefficient to boot.

>Repeat after me: "if two things look the same and act the same, then
>they are the same".  Don't depend on some hidden property (pointer) to
>differentiate them.  If there are important differences, then don't be
>shy: bring them out in the open and NAME them.

Try repeating this: if identical twins are indistinguishable to me, 
then they must be the same person, and I don't need to distinguish 
between them.  Does that seem right?

The reason for object identity is that the identity of a language-level
object allows you to distinguish between objects known to represent the 
same conceptual object and objects which are not known to represent
the same conceptual object (but are otherwise indistinguishable,
given the attributes you've recorded).  Sometimes you can make
a safe closed-world assumption in which case a pointer comparison
makes exactly the distinction between sameness and difference
of what the program-level objects represent.

This comes up most clearly when you're representing knowledge about
the real world, but it also comes up in the internals of programs.
When conceptual object identity matters, and encodes useful
knowledge, then having object identity in the language can 
often be very useful.

>[I once took an object-oriented database that used object-IDs and
>translated it to a relational form which just used names for things;
>performace improved by about about an order of magnitude.  But that's
>another rant for another day ...]

This anecdote doesn't mean much without a lot more information.  I
know of programs that run a hundred to a thousand times faster in
some OODBs than in most commercial database systems, and I know
*why*.  For some things, a relational (value-oriented) language
works great, because the limitations of the model make life
easy for query optimizers.  In other cases, there are awkward
problems that no query optimizer in the world can optimize much,
and using a relational database where you need pointer semantics
just adds orders of magnitude of overhead.

Please don't insult our intelligence by saying "repeat after me:"
followed by something simplistic and poorly-argued.  It just acts
as flame bait.  The issue of object identity is an important
and deep one, intertwined with the meaning of "meaning".  There
are good arguments for it and some good arguments against it,
too.  This is not an area where simplistic maxims are useful.

Followups have been directed to comp.lang.misc.

-- 
| Paul R. Wilson, Comp. Sci. Dept., U of Texas @ Austin (wilson@cs.utexas.edu)
| Papers on memory allocators, garbage collection, memory hierarchies,
| persistence and  Scheme interpreters and compilers available via ftp from 
| ftp.cs.utexas.edu, in pub/garbage (or http://www.cs.utexas.edu/users/wilson/) 
     

<Prev in Thread] Current Thread [Next in Thread>