The address of an object -- My misinterpretations of life -- Michael Lucas-Smith

2008-10-20

For those who read the right wiki, they would have known that it's possible to get the address of an object in Smalltalk. (Though, if it's in new space or old space, it may move.. perm space and fixed space are safe though).

The technique was complicated and messy. It involved having a new class with one instance variable. you would fill an instance of it with the object you want the address of. Then a method would create an UninterpretedBytes instance, become the instance with the bytes, set the slot and then return the slot.. this would give you back the pointer to the object.

If this sounds complex, that'd be because it is. What's going on here? Well basically the method that's executing keeps on running after the become has run. The object 'self' has become an uninterpreted bytes, but the method thinks it still has a slot that it can fill. The method fills the slot but in reality just writes in to the uninterpreted bytes.

At the Smalltalk Superpowers, a new way of finding the address of an object was revealed and improved upon. It is a stunning technique. Basically, what if we could avoid creating a new class and calling become: ?

When you call super in a method, the receiver that we're doing a super search from is stored as a literal in the CompiledMethod. That means we can change it - just by inspecting it and setting a new object in the slot. Now, when we run the method, the super call will actually search from a different class hierarchy but still not realize that it's about to splat memory in dangerous ways.

All that we need is a class that has one slot - eg: Model, which has an instance variable called dependents. We need one of its subclasses, which I chose ValueModel. John Brant demonstrated this technique using ApplicationModel. Now we implement a method like #addressOf: anObject

addressOf: anObject
super myDependents: anObject

Now, we inspect the method and change its first literal to ValueModel. We need to implement this on some kind of data array - such as ByteArray or UninterpretedBytes. John Brant's version used ByteArray but I immediately realized there was no reason we couldn't just use LargePositiveInteger and cut straight to the chase.

LargePositiveInteger>>addressOfObject: anObject
super myDependents: anObject

LargePositiveInteger classs>addressOfObject: anObject
^((self basicNew: 4) addressOfObject: anObject) compressed

(LargePositiveInteger compiledMethodAt: #addressOfObject:) literalAt: 1 put: ValueModel

So what happens here? We have 4 bytes which represent an integer of some arbitrary length (we need to do this because SmallInteger is not the full 32 bits, but a large positive integer of 4 bytes is). We trick the VM in to filling the first slot, 'dependents' by calling the method from a different class hierarchy, filling in the first slot with the pointer - which fills our 4 bytes.

Finally, we compress the LargePositiveInteger so that, if it can, it'll become a SmallInteger again (but if not, we still have a valid number to work with). We don't strictly need to do the compress step, if for some reason you feel that speed is important.

The cool thing about this technique is that it's just two methods and a method instance hack and we're done. The code is not intention revealing, sure, but hey, it works without having to change the VM.

I'm really impressed with this new technique.