Keywords, Lists, C, Smalltalk, Io -- My misinterpretations of life -- Michael Lucas-Smith

2008-07-31

I like the Io language, I think it's really neat - but they didn't include keyword selectors. A few years ago when I was talking to the creator of Io about this we began to pontificate on the nature of arguments to functions.

The C world suggests you have an arguments list and this is a reasonably pervasive idea because it maps really well to a stack based machine, of which we pretty much exclusively use. In Smalltalk, we use the function name to hintedly suggest what the parameters are and give their order some semantic meaning. Ask any Smalltalker and they'll tell you what a godsend this technique is - Keyword selectors.

Io doesn't have keyword selectors and one of the advantages of not doing that is it's easier to call C code and other external interfaces, which also don't have keyword selectors. In Smalltalk, we end up with external calls like glFrustrum:with:with:with:with:with: which is .. well .. horrible.

So I began to wonder, what if function calls didn't have any arguments at all. What if we did away with keyword selectors and lists entirely and instead examined the whole 'implicit-self' paradigm in a new way.

Consider the following Smalltalk method:

peopleA with: peopleB do: [:a :b | ...]

with: aCollection do: aBlock
| stream |
self size ~= aCollection size ifFalse: [^self noMatchError].
stream := aCollection readStream.
self do: [:each | aBlock value: each value: stream next]

In this method we have access two three variables - aCollection, aBlock and stream which all live on the stack in one form or another. Now if we rewrite the method with implicit-self:

with: aCollection do: aBlock
| stream |
size = aCollection size ifFalse: [^noMatchError].
stream := aCollection readStream.
do: [:each | aBlock value: each value: stream next]

We can see there's very little difference. But what if we changed the way we invoke methods, such that all methods are verbs and they take no arguments, per se, but the calling method can set up the stack object ahead of time, eg:

people do(with: aCollection, block: [:a :b | ...])

do(with, block)
| stream |
size = with size ifFalse: [^noMatchError].
stream := with readStream.
do(block: [:each | block(a: each, b: stream next)])

I've jumped a few steps forward here. What's going on? Well, the () syntax is a block of code that will run to set up the stack of the method we're about to call. In fact, its "self" object is the stack, so when we say block:, we're assigning the "block" slot on the stack object, so that when the #do method runs, it has that slot filled in.

Next I've made a variant of the #do method that expects there to be a "with" and a "block" slot on our stack, which is sort of like operator overloading.

The unique thing here though, is that the #do method doesn't need to specify which slots should already be on its stack, it can just reflect off of them. So now we can actually write "C" code in the same language, while still having "keyword" like parameters, eg:

externalMethod(a, b, 'test', 3)

externalMethod
   ^(first + second + fourth), third

We can set up both named and indexed values on our stack object, because it is just another "hash" like any object in Javascript or Io. We can also be nice and reflective about our behavior too.. consider the following example:

peopleCollection detect(
  select: [:each | each isOfAge],
  reject: [:each | each isHappy],
  found: [:person | ...].
  none: [...])

The combinations that we can put together in to our query are more interesting - so much so that we can approach the level of detail in a complex SQL statement if we so wanted, without keyword-selector explosion or the coder-expensive &vaarg approach we have in C, et el.

But more importantly, to me, what's interesting about this approach is how it removes a whole concept from our programming language. We already have a stack object and it is explicitly accessible, yet we barely use it. In Smalltalk, we can't use it like this because we cannot dynamically change the shape of our objects on an individual basis - but if we could, this sort of programming technique might start to make sense.

So if we could do it, then the idea of passing parameters to a method becomes extinct. I really like throwing away code programming ideas and replacing them with something more powerful and interesting. If we iterate the idea a little more, you quickly start to see that there's no such thing as the [] syntax any more either - or block parameters for that matter. In fact, all closures can be represented with a straight () and an expectation on naming (an expectation which you can swiftly ignore, because the slots are ordered). So I should rewrite the examples above to reflect this:

people do(with: aCollection, block: (first name, ' ', second name)

do(with, block)
| stream |
size = with size ifFalse: [^noMatchError].
stream := with readStream.
do(block: (block(a: each, b: stream next)))

peopleCollection detect(
  select: (isOfAge),
  reject: (isHappy),
  found: (dance))

I've taken the #detect example one step further now. The select, reject and found 'self' objects are "each", such that we don't even have to access 'each' from 'self' to do our iteration. This is a very powerful concept in itself - we're saying that we can call a method and give it an object to act as 'self', which is the same as saying, we're giving it an object to act as the stack frame. Deep.

I also knocked off the 'none' argument because I'm going to assume the #detect method is smart enough to realize when it doesn't need to do something (either through polymorphism or a test, but that's an implementation detail we, as the user of this API, don't care about).

There's one more piece to the puzzle here - we want to make things easy for the programmer. We want to be able to offer them the keywords that the method is expecting. Luckily enough, this is really easy to do - we have the system look at the method you're calling and check out the slots it accesses (polymorphically of course) and offer them with intellisense as the developer programs. In this way, the development environment helps you build your own keyword selects as you go.

So, what about optimization? Well with analysis, a compiler can realize that many of these blocks are redundant and don't actually need to be called. Instead, they can be treated as a hash which, instead of making an object, can be pushed to the stack like any other stack machine, thus removing the overhead we've introduced, while letting the developer be more expressive and - when needed - use that overhead to do something tricky and interesting.

The syntax of this language is interesting too, since we have only one kind of control statement, which is (), we start to wonderful if we've just reinvented lisp. In this case, no, we've dodged the wheel bullet. Here we've done something truely interesting - () represents a -new- object that includes code to build its state or run some code. It is both an object and a closure all in one and with the most convenient syntax possible. This () object acts as both a dictionary/hash as well as an array. Its slots can contain values or executable code - and presumably you can 'import' behavior in to it too to make it something more than a nothing.

This fulfills my desire for a "Knowledge" object too, so long as the slots can be objects beyond symbols and perhaps a syntax to help reconcile 'topic' lookup.

One more concept we can introduce is the #with method, which sets the 'self' to the object you're calling on, eg:

person with(name: 'Bob')

This simple little concept makes it easy to 'extend' and modify another object in the context of that object. More specifically, it's 'message cascades' -- the ; piece of syntax in Smalltalk -- on steroids. Consider the following Smalltalk method, and then we will rewrite it in the language described in this post:

renderAsJsonOn: stream
stream
 nextPutAll: '{class: ';
self class name renderAsJsonOn: stream.
stream
 nextPutAll: ', id: ';
 nextPutAll: id printString;
 nextPutAll: '}'

which we now transform in to:

renderAsJson
stream with(
 append('{class: ')
 class name renderAsJson(stream: stream)
 append(', id: ')
 append(id printString)
 append('}'))

The two methods are quite similar. The Smalltalk method suggests that the code probably belongs on Stream, while the second example lets the programmer feel okay about not putting the method on Stream.

There's only one bit of syntax we haven't explored yet - how do we give an object to be the stack instead of building a new one with () ? We can't simply write "method myObject" because there's no way to differential between sending myObject to the receiver method, or sending #method with myObject as the stack.

The answer? Well it certainly looks like we need a new piece of syntax. We can't write method(myObject) because that would create an array containing myObject instead of making myObject the focus on the stack. That would work find if method then wrote first with() and put all its code inside the with block, but that'd be a really crappy programming paradigm.

Perhaps we could reintroduce the [] syntax. In Smalltalk, the last value in the [] block is the returned value. Given that idea, method [myObject] would work just fine. It would even let us write code inside the [] block to figure out what myObject should be. We'd probably not actually use that kind of calling convention very often in this sort of language, but it seems like a useful concept to keep around. That means we can probably figure out how #do works now:

do(block)
| index |
index := firstIndex
while(
 do: (block[at: index]),
 continue: [
  index := index + 1
  index = lastIndex])

while(do, continue)
do()
if (continue(),
    while[self])

In this code example we run the 'block' argument with a 'self' of what we would normally call "each" and we also run the while method with a 'self' of a boolean indicating whether we're reached the end of our collection or not.

We have to go to a deeper level, understanding #while, to see how the #do really works. So we've written that above too. It runs the do block, then runs the continue block inside an #if call. If it is true, then we call #while again with the same stack -- this enables the VM to do tail-recursion, since there's no new stack frame.

Finally, to put the rest of the pieces of the puzzle together, we need the boolean operations. Heck, why would we leave those to the VM? We've got a Smalltalk lineage to uphold here after all:

Object>>if
first test(true, false)

True>>test
true()

False>>test
false()

And that probably wraps it up for now. The language described in this post does not exist yet - may be one day it will exist. I just wanted to brain dump the idea out there to see what people thought. No more arguments, just messages/verbs, object-closures.