Musings on Module Registration (And Why It Could Be Better in Rails)
·Having the same architecture problems over and over does give you perspective. We all love making fun of the enterprise FizzBuzz but there are cases where those Factories, Adapters and Facades are genuinely very useful, and so is dependency injection. Since I had to do dependency injection combined with adapters a wee many times now, it seems like a good idea to share my experience.
What I will describe here mostly applies to Ruby, but it mostly applies to the other languages and runtimes too.
A refresher on Adapters
An adapter is a well known pattern where an object acts as a proxy between the caller and the callee. Since the whole FP-OOP fight is utter nonsense, functions are great adapters too. Observe:
class DatabaseAdapterSQLite
def get_first_value(query)
@db.get_first_value(query)
end
end
class DatabaseAdapterActiveRecord
def get_first_value(query)
@connection.select_value(query)
end
end
But in, say, JS:
function selectOneSQLite(query) {
return sqliteConnection.getFirstValue(query);
}
function selectOneMySQL(query) {
return mysqlConn.query(query + " LIMIT 1").firstRow().firstColumn();
}
An adapter is nothing more than a module. Whether it is a function, a closure or an object is irrelevant. The important part is that we have multiple modules that implement the same API and allow calling them without having to know which module our call gets dispatched to (late binding).
Module Table Shape
There are generally two patterns I’ve seen for using multiple modules with a single piece of input (a dispatch of a function call / method call without arguments can also count as input):
- Modules need to be applied in some order, until a module provides output that is the best for the job.
- There is some kind of table where the module to be called gets looked up.
There are also combinations of the two. For example, in format_parser it is mostly the former and sometimes the latter. The main table is an array of modules ordered by priority. More specific parsers - for files that are less likely to occur - are at the start of that list. The reason for that is that multiple files may detect as TIFFs or JPEGs, and parsing for some file types is less reliable / less confident, depending on the file format. The parsing then goes in this manner, roughly:
parsers_ordered_by_priority.each do |parser|
maybe_result = parser.parse(file)
return maybe_result if maybe_result
end
However, there is also a couple of tables for doing the latter approach. If you ask, for example, to only have image
-natured files parsed - omitting a file it is not an image - there are also auxiliary tables per nature:
parsers_for_nature(:image).each do |parser|
# In this case `parsers_for_nature(:image)` is already
# sorted by the same priority as `parsers_ordered_by_priority`,
# so less common filetypes will be checked first
maybe_result = parser.parse(file)
return maybe_result if maybe_result
end
There is also a table of known file types that you may want to detect for:
parsers_for_format(:cr2).each do |parser|
# In this case `parsers_for_format` is already
# sorted by the same priority as `parsers_ordered_by_priority`,
# so less common filetypes will be checked first
maybe_result = parser.parse(file)
return maybe_result if maybe_result
end
This is probably the biggest example of module registration I have done in my career, with more than a dozen different parsers being plugged into it. The actual code is slightly more involved, you can examine it here but the functionality is that. At Cheddar we have such a table for talking to different banks. That one is lookup-only:
bank_connector = lookup_connector_for("barclays")
accounts = bank_connector.query_accounts_for(account_access_consent_id: aac_id)
So let’s go and make a module table!
Creating a Module Table
There are a few parts to doing this well. First, it has to be convenient to register a module with such a table. Second, it should be cheap to do a lookup in that table. Let’s omit the case where we need to do detection by applying modules in sequence, and focus on a table use case first. Let’s imagine we have a UserDetailsProvider
interface, and we want to create a module table for different providers of user data. The API we will need to support is going to be just one method - call(email)
. So the module we register needs to be callable, and should return user data from that provider.
Nothing is easier:
google_details_provider = ->(email) { GoogleAPI::Workspace.details_for_account(email) }
facebook_details_provider = ->(email) { Facebook::Users.lookup(email: email) }
internal_provider = ->(email) { User.where(email: email).first }
module_table = {
"google" => google_details_provider,
"facebook" => facebook_details_provider,
"internal" => internal_provider,
}
When we need to lookup a module, we do a simple Hash#fetch
. The use of #fetch
allows us to either raise a KeyError
if we do not have a provider:
module_table.fetch(provider_name).call(email)
or use our “internal” provider as default:
module_table.fetch(provider_name, internal_provider).call(email)
That’s it to satisfy the cheap to do a lookup requirement. Following from that are conveniences.
Creating a Convenient Module Table
Conveniences are subjective, of course. There are entire programming language ecosystems where not providing conveniences is touted as a virtue - so it depends on the style preference, in large part. However, I do subscribe to the notion that working on a system we must make reasonable effort to make it convenient - both to end users and for people who are going to work on the system in the future. With module registration, I set the following rules defining convenience:
- Adding a module should be possible from a namespace external to the namespace in which lookup gets done
- Removing a module from the table should be possible – primarily for testing
- Removing a file that adds its modules into the table should also remove the module registration.
Illustrating the first point:
google_details_provider = ->(email) { GoogleAPI::Workspace.details_for_account(email) }
UserDetails.register_provider(name: "google", provider: google_details_provider)
The second point - and how it would be used:
def test_custom_provider
custom_provider = ->(email) { :VIP }
UserDetails.register_provider(name: "test", provider: custom_provider)
test_user.user_details_provider_name = "test"
assert_equal :VIP, test_user.get_user_details
ensure
UserDetails.deregister_provider(name: "test")
end
And the third. It’s way simpler than it seems - we just place the google_details_provider
inside a separate file, and make sure this file does get require
d somewhere in the caller code. Note that this will not happen by default in Rails - which is an important consideration:
# google_details_provider.rb
google_details_provider = ->(email) { GoogleAPI::Workspace.details_for_account(email) }
UserDetails.register_provider(name: "google", provider: google_details_provider)
Once we remove google_details_provider.rb
from our source tree, its registration will also be removed - nothing to do. More locality of behavior
and less things to manage overall.
I normally also create a method which returns all the registered modules, so that they can be subjected to conformance testing:
UserDetails.with_each_known_provider do |name, provider|
assert_conformant_provider(provider)
end
The registration from the module file I tend to do like this:
module GoogleProvider # FacebookProvider, AppleProvider...
def call(email_address)
end
# Make the module itself callable - no need to create instances
extend self
# Register the module just as it gets defined
UserDetails.register_provider(name: "google", provider: self)
end
Note that GoogleProvider
does need to be anywhere inside UserDetails
namespace-wise. Neither is there any smart resolving done.
Playing Nice With Rails Autoloading
When I use this pattern, I tend to have my modules eagerly loaded - because in Rails a provider is only going to be require
d by Zeitwerk once you try to use its constant in the code:
GoogleProvider.call(email) # Will attempt to load google_provider.rb
To make module registration work, you need to register your modules from files which get loaded eagerly - like an initializer, and you need to register the module names as opposed to callable objects themselves:
module_table = {
"google" => "GoogleDetailsProvider",
"facebook" => "FacebookDetailsProvider",
}
# Calling "String#constantize" forces Zeitwerk to attempt autoloading of the module
module_table.fetch(name, "InternalProvider").constantize.call(email)
It is a passable compromise, but for it to work adding the module names to the table must be done outside of the modules themselves. Consequently, if you remove a file defining a module, its registration will still be performed - but the call into the module will fail at call site (as the constant will not be resolved). The way I prefer to fix this is to require modules eagerly. This can be done using Dir.glob
. Note that I sort
the glob output because the order in which glob results get returned is OS-dependent.
Dir.glob(File.dirname(__FILE__) + "/user_information_providers/*_provider.rb").sort.each do |path|
require path
end
The modules get defined, and the self-registration code runs immediately. In Rails you must do this inside of ActiveSupport::Reloader.to_prepare
block, so that what you require
will be subjected to the same live-reloading Zeitwerk enables for most other modules inside your Rails app:
ActiveSupport::Reloader.to_prepare do
Dir.glob(Rails.root + "/app/user_information_providers/*_provider.rb").sort.each do |path|
require path
end
end
Since we would be using instantiation here, we would self-register from a class:
class GoogleProvider
def call(email_address)
end
# Register the module just as it gets defined
UserDetails.register_provider(name: "google", provider: to_s) # this calls Class#to_s
end
Sad State of Rails Module Registration
In my work I have to implement multiple extensions to Rails, with those extensions being not hacks - but official plug-in features, designed to be used in specific, documented “cut points” for external integrations. So far, under my belt:
- 2 ActiveJob adapters
- 1 cache store adapter
- 2 ActiveStorage services
Now, let’s rehash the requirements I set:
- It should be convenient to register a module
- Module lookup should be cheap
- Adding a module should be possible from a namespace external to the namespace in which lookup gets done
- Removing a module from the table should be possible – primarily for testing
- Removing a file that adds its modules into the table should also remove the module registration.
Sadly, the way Rails does module registration is lacking on all 5. The approach differs between Rails components - presumably, due to different people having built them in the first place, but what does remain a recurring theme is the fact that module registration involves the following two things:
- Defining modules inside of Rails namespaces
- Having to define them in files that satisfy the Rails file naming conventions - including the Rails default namespacing.
This creates extra files that serve very little purpose, and forces one to pollute the namespaces that are not theirs - which is completely unnecessary. I never had to do an ActiveRecord connection adapter, but there the situation is the same. Just look at trilogy shim for a good example of that. To give a bit more detail, let’s examine the component I am currently working with - ActiveStorage (expect a nice announcement in that area soon). The entry point for lookup in a Rails app is a configuration file called storage.yml
, which defines your storage service as follows:
main:
public: true
service: Disk
root: <%= Rails.root.join("tmp/storage") %>
Imagine that you want to implement a file storage solution using the blockchain (I won’t judge, let’s pretend):
main:
public: true
service: Blockchain
ledger_path_: <%= Rails.root.join("ledgers/file_storage.blockchain") %>
The service
key is what is responsible for module lookup. An ActiveStorage::Service
class is the one that gets looked up, and then instantiated - passing it the options like public: true
and root: some_path
. The interesting bit is how Rails resolves the string "Disk"
to a concrete class to instantiate. Same would need to be done for your Blockchain
service, even though it has nothing to do with Rails internals - it likely uses them, but is not a part of them. Let’s look at the pertinent part of the Rails source, from active_storage/service/configurator.rb
(here for version 7.2.2):
def resolve(class_name)
require "active_storage/service/#{class_name.to_s.underscore}_service"
ActiveStorage::Service.const_get(:"#{class_name.camelize}Service")
rescue LoadError
raise "Missing service adapter for #{class_name.inspect}"
end
So:
- Rails assumes that there will be a file called
active_storage/service/blockchain_service.rb
somewhere on theLOAD_PATH
and requires it. Adding files toLOAD_PATH
slows down the entire application, sometimes in interesting ways. Moreover - every gem you require, should it be not very careful, adds to$LOAD_PATH
- and every other piece of code doing arequire
will have its ownrequire
run slower as a result. - Rails assumes that the class name for your service is
BlockchainService
, and that this class is inside theActiveStorage::Service
namespace.
$LOAD_PATH
lookups are not a joke! In my current application, there are 318 items in the load path, and lookup of a new file there can’t be done quicker than linear. It can also involve file stat checks.
And from the standpoint of code organization: this means that you can’t really provide a service without invading the Rails namespace. What if Rails starts shipping their own BlockchainService
? Whoops. What if you don’t want to add things to LOAD_PATH? Whoops. Is it possible to register your BlockchainService
from an external file (say, a Railtie
) - without defining a “shim” file that will be require
d? Whoops.
Same for ActiveJob. Same for cache stores.
With ActiveJob and cache stores the situation is actually even worse. The module gets resolved not by string inclusion (which at least allows you to grep for things), but with an additional transformation to de-underscore your module name. For example, to configure Gouda in our app, we need to do this:
config.active_job.queue_adapter = :gouda
and we need to know that :gouda
will Somehow Magically™ resolve to ActiveJob::QueueAdapters::GoudaAdapter
, which we need to have in place before trying. Note that having to be inside a private Rails namespace has issues too - for example, Ruby’s constant lookup (which is sometimes-lexically-scoped-and-sometimes-not) changes subtly if your module is brought into another namespace. You may need to start using module names with their full qualification where you previously did not have to, and so forth.
Now, doing lazy registration - resolving those modules after the application has booted - has its very legitimate reasons. For example, in your application config:
config.anonymize_database_columns = ColumnAnonymizer.columns_to_anonymize
would imply that the ColumnAnonymizer
may go into the database and scan it for all columns called ssn
, address
, iban
and so forth. This would blow up, because at configuration load multiple parts of the app are still not “live” - there may be no database connection yet. Or there may be a database connection, but no database created - like with a freshly checked out codebase. So it is, in general, a good idea to defer the actual calls into the registered modules until the time they are actually needed - or, more precisely, until the time that these modules can produce a meaningful result from getting called. Modules may also interact with each other (include each other’s submodules, require base classes from each other and so forth), which is something that is better deferred until loading has reached some stable state.
But there is zero reason whatsoever to demand the modules you use be in private Rails namespaces. Zero. How this could be solved? Quite easily (but please, seriously - we need to rename this class_name
because it isn’t):
KNOWN_SERVICES = {}
def register3(name, service_class_name)
KNOWN_SERVICES[name.to_sym] = service_class_name.to_s
end
def resolve(name)
class_name = KNOWN_SERVICES.fetch(name.to_sym) do
require "active_storage/service/#{class_name.to_s.underscore}_service"
"ActiveStorage::Service::#{name.camelize}Service"
end
class_name.safe_constantize
rescue LoadError
raise "Missing service adapter for #{name.inspect} - known adapters are #{KNOWN_SERVICES.keys.join(", ")}"
end
That way, an external gem could call the following line from its Railtie
and be done:
ActiveStorage::Service.register("Blockchain", "Web3::ActiveStorageService")
In Summary
Module registration is not something out of “Enterprise Application Architecture With Java Version 1.0” tome that is collecting dust under your desk. It is a very useful pattern for making good use of polymorphism. It is both useful in the OOP world and in the FP world. It is also useful in the in the strictly-typed world, where your module just needs to satisfy a type constraint. When you will need to grab for it - hope these tips can be of use.