Tracing a Ruby IO Object for Reads
·So Tracksperanto has a progress bar. Actually both the commandline and the web version have one, which helps to improve the user experience a great deal (some files uploaded on the site were converting for more than a few minutes). However, we need to display this progress bar somehow and to do it we need to quantify the progress made.
For exporting this has been simple.
tracker_weight_in_percent = trackers.length / 50.0
trackers.each_with_index do | t, i |
export_tracker(t)
report_progress( (i + 1) * tracker_weight_in_percent )
end
However, for importing stuff the pattern is not so evident. Most parsers in Tracksperanto work like this:
p = Parser.new
trackers = p.parse(file_handle_or_io)
It’s completely opaque for the caller how the Parser can quantify it’s work and report on it. Parsers do send status messages that we can use for status line of the progress bar, but no percentages (they also mostly consider the IO unquantifiable and just read
it until it eof?
s. Also, there are many parsers and introducing quantification into each one of them would be kinda sucky.
So I looked for a different approach. And then an idea came to me: we are reading an IO (every parser does). Parsers mostly progress linearly, that is - they make trackers as they go along the IO (with a few exceptions), so the offset at which the IO is can be a good quantifier of where the import is. What if we feed the parsers an IO handle that can report on itself?
Easy! To do that, we make use of one of the most lovely features of the Ruby standard library that is delegate.rb
. As it says,
This library provides three different ways to delegate method calls to an object. The easiest to use is SimpleDelegator. Pass an object to the constructor and all methods supported by the object will be delegated. This object can be changed later.
Advantages: it’s not method_missing
. That is, the object will properly respond_to?
everything it’s asked about, give proper kind_of?
clues and so on. Second, you don’t have to worry about method signatures and forwarding since you can just call super
. Ruby delegates give us the capability to quickly “aspectize” our IO objects with some traits wrapped around their standard methods. So off we go.
First of all, I went to the Ruby stdlib doco and found all the reading methods that the IO supports. There are not that many of them by the way. Here’s how our skeleton delegate will look like:
class ProgressiveIO < DelegateClass(IO)
def initialize(with_io)
__setobj__(with_io)
end
end
This is all we need to have a good wrapper for IO’s, without method_mising tricks of any kind. Then we will write a copy of the ActiveSupport returning
idiom
private
def returning(result)
yield; result
end
This one will come handy later on. Now we make a method that will report on the pos
of the IO since this is the one that tells us how far we are inside it.
def notify_read
pos # This will change later
end
And now we can override the readers. The general pattern goes like this:
def getc
returning(super) { notify_read }
end
Why super
? Well, the default behavior for the class that the DelegateClass()
function creates is to forward the calls using the __getobj__
- think of it being method_missing, but with a few smarts inside (look at the delegate.rb source for more info on that). In our case super
will do
___getobj__.send(method_name, *any_args, &block_if_given)
Now, there was a question on why we do not use alias_method_chain
and things like that to implement these. Well, in our case we do not want to change the way a specific object works. What we are after is the after advice in AOP terms. Also, note that for many objects the calls within them may be intertwined - so if you change the read
behavior and some other method in the object calls read
or reads are done recursively you might be screwed. What we want is a non-destructive wrapper around the behavior of the object as a whole that does not change anything about the object in particular, that is we want a facade around the object that will also execute our tracing functions. Also our goal is to be able to put this facade around any object that conforms to the calling conventions of an IO, and not to change the method behavior on it.
If the method accepts arguments you need to make a splat-capture
def seek(*a)
returning(super) { notify_read }
end
In both of these cases, super
will call the custom class delegate.rb made for us, which, in turn, will call the contained object. However, using just super
without parentheses and such frees us from managing the arguments to it - a default super
handles it all, except for blocks (this is tricky). Here’s how we implement the each
def each(sep_string = $/, &blk)
# Report offset at each call of the iterator
result = super(sep_string) do | line |
yield(line)
notify_read
end
end
alias_method :each_line, :each
Here we inject our notify_read
call into the block because we want notifications to come every line, not when the method is called.
We also want to override each_byte
def each_byte(&blk)
# Report offset at each call of the iterator
super { |b| yield(b); notify_read }
end
Same here - no alias_method_chain
, just plain-ole super
.
Now we need to make the callback that the object will use to report on it’s reads. The easiest way to do callbacks in Ruby is by using the to_proc
method. It will transform a callable object into something that you can save as a Proc object, thus a variable! Moreover, you can do this with a verbatim block passed to the method
def save_block(&blk)
@stashed_block = blk.to_proc # I stashed your block!
end
so that
save_block do | argument_of_the_block |
# everything you request here will be done by the Proc object
end
So the easy way for us would be to implement a constructor for our delegated IO that accepts a block that reports the progress.
def initialize(with_io, &offset_callback)
__setobj__(with_io)
@callback = offset_callback.to_proc
end
and rewrite our notify_read
method to do this:
def notify_read
@callback.call(pos)
end
And presto: everytime your IO is read by something in the system, you will get a call on your block.
file = ProgressiveIO.new(File.open("/tmp/blobz", "r")) do | offset |
puts "Read to offset #{offset}"
end
# Now pass the file to anything that expects a File object
Progress bars galore! Here’s how the class looks, ready to be snatched