Tracing a Ruby IO Object for Reads

So Tracksperanto has a progress bar. Actually both the commandline and the web version have one, which helps to improve the user experience a great deal (some files uploaded on the site were converting for more than a few minutes). However, we need to display this progress bar somehow and to do it we need to quantify the progress made.

For exporting this has been simple.

tracker_weight_in_percent = trackers.length / 50.0
trackers.each_with_index do | t, i |
   export_tracker(t)
   report_progress( (i + 1) * tracker_weight_in_percent )
end

However, for importing stuff the pattern is not so evident. Most parsers in Tracksperanto work like this:

  p = Parser.new
  trackers = p.parse(file_handle_or_io)

It’s completely opaque for the caller how the Parser can quantify it’s work and report on it. Parsers do send status messages that we can use for status line of the progress bar, but no percentages (they also mostly consider the IO unquantifiable and just read it until it eof?s. Also, there are many parsers and introducing quantification into each one of them would be kinda sucky.

So I looked for a different approach. And then an idea came to me: we are reading an IO (every parser does). Parsers mostly progress linearly, that is - they make trackers as they go along the IO (with a few exceptions), so the offset at which the IO is can be a good quantifier of where the import is. What if we feed the parsers an IO handle that can report on itself?

Easy! To do that, we make use of one of the most lovely features of the Ruby standard library that is delegate.rb. As it says,

This library provides three different ways to delegate method calls to an object. The easiest to use is SimpleDelegator. Pass an object to the constructor and all methods supported by the object will be delegated. This object can be changed later.

Advantages: it’s not method_missing. That is, the object will properly respond_to? everything it’s asked about, give proper kind_of? clues and so on. Second, you don’t have to worry about method signatures and forwarding since you can just call super. Ruby delegates give us the capability to quickly “aspectize” our IO objects with some traits wrapped around their standard methods. So off we go.

First of all, I went to the Ruby stdlib doco and found all the reading methods that the IO supports. There are not that many of them by the way. Here’s how our skeleton delegate will look like:

		class ProgressiveIO < DelegateClass(IO)
			def initialize(with_io)
				__setobj__(with_io)
			end
		end

This is all we need to have a good wrapper for IO’s, without method_mising tricks of any kind. Then we will write a copy of the ActiveSupport returning idiom

    private
    def returning(result)
       yield; result
    end

This one will come handy later on. Now we make a method that will report on the pos of the IO since this is the one that tells us how far we are inside it.

   def notify_read
   	pos # This will change later
   end

And now we can override the readers. The general pattern goes like this:

def getc
   returning(super) { notify_read }
end

Why super? Well, the default behavior for the class that the DelegateClass() function creates is to forward the calls using the __getobj__ - think of it being method_missing, but with a few smarts inside (look at the delegate.rb source for more info on that). In our case super will do

 ___getobj__.send(method_name, *any_args, &block_if_given)

Now, there was a question on why we do not use alias_method_chain and things like that to implement these. Well, in our case we do not want to change the way a specific object works. What we are after is the after advice in AOP terms. Also, note that for many objects the calls within them may be intertwined - so if you change the read behavior and some other method in the object calls read or reads are done recursively you might be screwed. What we want is a non-destructive wrapper around the behavior of the object as a whole that does not change anything about the object in particular, that is we want a facade around the object that will also execute our tracing functions. Also our goal is to be able to put this facade around any object that conforms to the calling conventions of an IO, and not to change the method behavior on it.

If the method accepts arguments you need to make a splat-capture

 def seek(*a)
   returning(super) { notify_read }
 end

In both of these cases, super will call the custom class delegate.rb made for us, which, in turn, will call the contained object. However, using just super without parentheses and such frees us from managing the arguments to it - a default super handles it all, except for blocks (this is tricky). Here’s how we implement the each

	def each(sep_string = $/, &blk)
	  # Report offset at each call of the iterator
	  result = super(sep_string) do | line |
	    yield(line)
	    notify_read
	  end
	end
	alias_method :each_line, :each

Here we inject our notify_read call into the block because we want notifications to come every line, not when the method is called.

We also want to override each_byte

def each_byte(&blk)
  # Report offset at each call of the iterator
  super { |b| yield(b); notify_read }
end

Same here - no alias_method_chain, just plain-ole super.

Now we need to make the callback that the object will use to report on it’s reads. The easiest way to do callbacks in Ruby is by using the to_proc method. It will transform a callable object into something that you can save as a Proc object, thus a variable! Moreover, you can do this with a verbatim block passed to the method

   def save_block(&blk)
     @stashed_block = blk.to_proc # I stashed your block!
   end

so that

   save_block do | argument_of_the_block |
      # everything you request here will be done by the Proc object
   end

So the easy way for us would be to implement a constructor for our delegated IO that accepts a block that reports the progress.

  def initialize(with_io, &offset_callback)
     __setobj__(with_io)
     @callback = offset_callback.to_proc
  end

and rewrite our notify_read method to do this:

  def notify_read
    @callback.call(pos)
  end

And presto: everytime your IO is read by something in the system, you will get a call on your block.

 file = ProgressiveIO.new(File.open("/tmp/blobz", "r")) do | offset |
      puts "Read to offset #{offset}"
 end
 # Now pass the file to anything that expects a File object

Progress bars galore! Here’s how the class looks, ready to be snatched