Introduction
The other night, I came across the problem of implementing autovivication in Ruby hashes. The solution I devised exemplifies an elegant and practical use of the Y combinator, and it seemed worth sharing.
For those of you who just want the solution, here it is:
# Define the Y combinator module YCombinator def y_comb(&generator) proc { |x| proc { |*args| generator.call(x.call(x)).call(*args) } }.call(proc { |x| proc { |*args| generator.call(x.call(x)).call(*args) } }) end end # Give the y_comb function to all objects class Object include(YCombinator) end # Call the Y combinator, providing it with a procedure that # accepts a recursive callback function and returns a procedure. # The Y combinator returns the recursive function. Then call that # function to produce a Hash object. hash = y_comb { |callback| proc { Hash.new { |h, k| h[k] = callback.call } } }.call # Now this will work: hash['a']['b']['c']['d']['e']['f']['g'] = 1 # And hash looks like: hash # => {"a"=>{"b"=>{"c"=>{"d"=>{"e"=>{"f"=>{"g"=>1}}}}}}}
Background: Hash Autovivication
In Perl, the following line will successfully run:
$hash{'a'}{'b'}{'c'} = 1;even if
%hash
was previously undefined. Perl will automatically
set %hash
to be an empty hash, it will assign
$hash{'a'}
to be a reference to an empty hash, and so on, all
automatically. This is called autovivication in Perl: hash and array
variables automatically “come to life” as necessary. This is
incredibly convenient for working with multidimensional arrays, for example.
Ruby, sadly, does not have autovivication (though not surprisingly, given the structure of the language). It does, however, provide a facility for default values of hashes. For example, one could write:
hash = Hash.new { |hash, key| hash[key] = {} } hash['a']['b'] = 3in which case Ruby will create an empty hash, assign it to
hash['a']
, and set the key 'b'
in that new hash to 3.
However, this only works for two levels. This would not work:
hash['a']['b']['c'] = 3because
hash['a']['b']
returns nil
, causing the
attempt to access ['c']
to fail.
One solution would be to make an even more complicated default procedure:
hash = Hash.new { |hash, key| hash[key] = Hash.new { |hash, key| hash[key] = {} } }But again, that will fail for even higher dimensions of access.
What is needed is a recursive definition of the default function, one that continues to assign auto-generating hashes to each higher dimension. A standard recursive approach will work:
def make_hash return Hash.new { |hash, key| hash[key] = make_hash } endBut this requires that you define and store the
make_hash
function
somewhere. This is risky: what if someone redefines or deletes the method? Can
it be done without defining any methods, to prevent this problem?
The Y Combinator in Ruby
Luckily, there is a solution made just for this situation: the Y combinator! There are several good web pages that derive and explain the Y combinator in greater detail (e.g., Wikipedia, and a blog post that claims that the Y combinator has “no practical use”!), so this will be a brief explanation of how it works.
Say you have a recursive Ruby procedure:
factorial = proc { |arg| if arg.zero? then 1 else arg * factorial.call(arg - 1) end }Right now this procedure relies on the variable
factorial
to
contain the procedure.
We can remove this dependence by passing the recursive call's procedure as an argument. Instead of writing a function that calculates factorials, we write a function that generates another function that calculates factorials:
factorial_generator = proc { |callback| proc { |arg| if arg.zero? then 1 else arg * callback.call(arg - 1) end } }Now we just need to pass the right
callback
to this procedure to
implement the recursion.
Imagine what an appropriate callback
procedure would look like.
When the Y combinator is applied to generator
, it should return a
procedure that can be called to perform the recursive function (here, the
factorial function). Call this function recursive
. In the midst of
executing recursive
, the callback
procedure may be
executed. If callback
is identical to recursive
, then
we will have successfully implemented a recursive function. So the goal is to
design a Y combinator that accepts a generator and produces a procedure that,
when executed, will set the generator's callback to be identical to that same
procedure.
Here is the Y combinator implemented in Ruby:
y = proc { |generator| proc { |x| proc { |*args| generator.call(x.call(x)).call(*args) } }.call(proc { |x| proc { |*args| generator.call(x.call(x)).call(*args) } }) }It is difficult to explain how this works, but here is how I think of it at least. Think of the procedure that reads:
proc { |x| proc { |*args| generator.call(x.call(x)).call(*args) } }as the “ycombinator-function”. Now the Y combinator looks like:
y = proc { |generator| “ycombinator-function”.call(“ycombinator-function”) }
Now we pass the generator function to the Y combinator:
recursive = y.call(factorial_generator)and the following things happen. First, the
generator
variable is
bound to factorial_generator
. Then, the
“ycombinator-function” is executed with itself as the argument. All
that the function does is return another procedure, so the result of calling the
Y combinator is the procedure:
recursive = proc { |*args| generator.call(x.call(x)).call(*args) }with
generator
bound to factorial_generator
and
x
bound to the “ycombinator-function”.
Now what happens when we call this resulting function with some arguments (e.g.,
recursive.call(5)
)? First, the argument to
generator.call
is evaluated. But since that argument is
x.call(x)
, and x is bound to the
“ycombinator-function”, the result is a procedure identical to
recursive
! Hence, we have successfully set the
callback
to what we wanted from above.
Next, the generator
procedure is executed, setting
recursive
to be the callback procedure. This produces the procedure
specified by the generator
. That procedure is run with the passed
arguments. If that generated procedure needs to make a recursive call, it
executes the callback procedure, and because the callback procedure is the
recursive procedure, this whole cycle repeats itself.
Whew! Hopefully some of that makes sense. The main point, though, is that:
factorial = y.call(proc { |callback| proc { |arg| if arg.zero? then 1 else arg * callback.call(arg - 1) end } }) factorial.call(5) # => 120will produce a proper factorial procedure, without ever having to store a name for that procedure.
Implementation of Autovivifying Hashes
For autovivifying hashes, we start with our regular recursive implementation:
make_hash = proc { Hash.new { |hash, key| hash[key] = make_hash.call } }Now we separate out the named procedure call to produce a generator function:
proc { |callback| proc { Hash.new { |hash, key| hash[key] = callback.call } } }The next step is to implement the Y combinator. Unlike the above explanation, I will make it a method available to all objects rather than a procedure.
module YCombinator def y_comb(&generator) proc { |x| proc { |*args| generator.call(x.call(x)).call(*args) } }.call(proc { |x| proc { |*args| generator.call(x.call(x)).call(*args) } }) end end class Object include YCombinator endThe autovivifying hash, now, is created by:
hash = y_comb { |callback| proc { Hash.new { |h, k| h[k] = callback.call } } }.call # Now this will work: hash['a']['b']['c']['d']['e']['f']['g'] = 1Congratulations! You have successfully implemented hash autovivication, without needing to define any external methods or variables. (Other than the Y combinator, of course. It is possible to do that, though I’ll leave that as an exercise to the reader.)
Postscript
In writing this, I discovered an unusual “feature” of ruby. If you run the code:
x = 1
y = proc { |x| x + 3 }
y.call(5) # Now x == 5
it turns out that x
is not local to the procedure stored in
y
, but rather is actually identical to the variable outside.
I’m not sure I like this aspect of Ruby, but there it is.
It is possible to implement autovivication of hashes without using the Y combinator but still retaining the safety of having no global variables. Just take advantage of scoping inside procedures:
hash = proc { make_hash = proc { Hash.new { |h, k| h[k] = make_hash.call } } make_hash.call }.callThis works fine as long as there is no variable
make_hash
already
defined. But this approach uses 33 lexical tokens, and the Y combinator one only
uses 31 (not including the Y combinator itself). So there!
Ryan Davis sent me an object-oriented implementation of autovivication:
class AVHash < Hash def initialize super { |h,k| h[k] = AVHash.new } end endThis is probably the best approach if you need to make many autovivifying hashes. This is what I like about Ruby: there are so many ways to do things, and they are all surprisingly elegant.
A comment on
this article on Hacker News observed that my implementation was vulnerable
to someone overwriting the Y combinator method that I defined in
Object
. This is partially correct: if someone overwrites the
y_comb
method, then uses of that method afterwards may fail,
although already created (“combinated”?) procedures will work fine.
But, more importantly, naming the Y combinator is actually
unnecessary to use the Y combinator. You could just instantiate and run it
all in one go:
# Y combinator with no unbound variables
hash = proc { |generator|
proc { |x|
proc { |*args|
generator.call(x.call(x)).call(*args)
}
}.call(proc { |x|
proc { |*args|
generator.call(x.call(x)).call(*args)
}
})
}.call(proc { |callback|
proc {
Hash.new { |h, k| h[k] = callback.call }
}
}).call
The commenter is right, though, in that the Y combinator shouldn't really be
necessary in most modern production languages used by coders of reasonable
skill—though I see no reason not to use it if the opportunity
presents itself and it is documented properly (consider linking to this
article!).
The commenter also expressed difficulty in understanding this article, as well as the Y combinator in general. If it is any consolation, it took me three full days before I managed to wrap my head around it. This is tricky stuff, but that’s what makes the best brain exercise.