From: email@example.com (Paul Wilson)
Date: 2 Jan 1996 15:49:29 -0600
I think the evolution of natural language has interesting lessons to teach
about programming language design. ... Another lesson is
that if you don't *design* the language well, it will be a painful
kludge that will cause unnecessary suffering for a very long time.
So what do we make of this? What useful lessons can we learn from
Perl and common shell languages, without having a Scheme-based shell
degenerate into a whole bunch of syntactic quirks?
What do we look for in a syntax for a shell language, including one
that's typed at an interactive prompt? I think the parenthophobes
hold sway, rightly or wrongly, and not *entirely* wrongly.
My answer is twofold:
- Exploit the general s-expression framework of Scheme to embed targetted
sublanguages, such as loop expressions, Awk control structure, or
process-control notation, within a general-purpose language. This
way, you can "pay as you go" for your syntactic quirks, as long as
you are willing to live within the parenthesis paradigm.
- Design a simple, terse, non-parenthesized notation for interactive
use, but provide escapes in this notation to full-blown Scheme.
Now simple pipes, redirections, program invocation, job control,
and so forth are as easy as in csh, but if you want something even
as complex as a conditional or a loop -- escape to Scheme.
Tcl has horrid rules for when things are evaluated and
when they are not, with the wrong defaults. (Like many shell languages,
you have to force evaluation of a variable, because the default is that
things are not evaluated. I prefer Lisp and Scheme's default, which
is that things *are* evaluated, which would be a problem in that lots
of literals would have to be quoted, except that many common literals
are "self evaluating." Apparently Ousterhout and most shell language
designers have never caught on to self-evaluating literals, because
they never learned Lisp. So you end up with things like [set $a $a+1].
I firmly believe that Scheme is a better basis for such things, if
we come up with some handy syntax that can easily be explained.
Actually, I think the literal-by-default choice is better adapted to shells
than the variable-by-default choice. Most tokens in my interactive
commands -- and even in my /bin/sh shell scripts -- are intended as
constant file-names, program switches, or arguments. If you notice,
I *kept* this design decision in scsh. You have to use comma in the
process notation to get a variable:
(if (zero? (run (mail ,user) (<< ,msg)))
Scheme is cool because I'm allowed two notations here. Outside the RUN
form, we have Scheme, where literals like ZERO? are variables. Inside
the RUN form, we have process notation, where literals like MAIL are
constants used to search the file system for programs. When I'm using
process notation, I have to mark my vars with commas: ,USER and ,MSG.
Each notation is adapted for its intended use.
1. Can we come up with a command syntax that's nice and terse for
scripting purposes, but doesn't make it hard to write larger
routines? Ideally, I'd like a syntax that works for both
scripting and "real" programming, so I don't have to switch
syntaxes. (Naturally, there would be no problem of interoperability
of code written in different Scheme syntaxes---it's just sugar.)
I claim scripting is real programming. Do this. Do that. Branch. Loop.
It's not worth it to try and tease these two apart; they blur into each other.
2. How do we distinguish between calls to built-in or user-defined
Scheme functions and commands that are sent to programs outside
Scheme? For scsh, you currently have to explicitly indicate
that you want to run a UNIX program by using a (run ...) form
or whatever. I think this may be the right default for
nontrivial programs, but for many scripts it's awkward. Somebody
(John Ellis?) did a Lisp shell several years ago, where any
unbound function name was assumed to be an indication that
the programmer meant to call a UNIX program. So, for example,
if rm was bound, (rm foo) meant call the rm procedure, but if
rm was unbound, the shell would look for a unix program named
rm, and call it with the argument foo.
Argh. This is the fundamental error of just about everyone who tries the
functional-language-Unix-shell game. I discuss this at length in my paper on
scsh. (See the section on the Tao of Scheme and Unix.) The two paradigms do not
match semantically, so trying to match them up syntactically is bound to
lose. (BTW -- if you want to track down the Ellis paper Paul mentions above,
my scsh paper cites it.)