A Can of Shardines: SQLite Multitenancy With Rails
·There is a pattern I am very fond of - “one database per tenant” in web applications with multiple, isolated users. Recently, I needed to fix an application I had for a long time where this database-per-tenant multitenancy utterly broke down, because I was doing connection management wrong. Which begat the question: how do you even approach doing it right?
And it turns out I was not alone in this. The most popular gem for multitenancy - Apartment - which I have even used in my failed startup back in the day - has the issue too.
The culprit of does not handle multithreading very well
is actually deeper. Way deeper. Doing runtime-defined multiple databases with Rails has only recently become less haphazard, and there are no tools either via gems or built-in that facilitate these flows. It has also accrued a ton of complexity, and also changes with every major Rails revision.
TL;DR If you need to do database-per-tenant multitenancy with Rails or ActiveRecord right now - grab the middleware from this gist and move on.
If you are curious about the genesis of this solution, strap in - we are going on a tour of a sizeable problem, and of an API of stature - the ActiveRecord connection management. Read on and join me on the ride! Many thanks to Kir Shatrov and Stephen Margheim for their help in this.
The advantages of the “database per tenant”
If you have a tenanted application (your “tenant” is a subgraph of your data model that can function independently, and mostly references other entities from within itself), you have a number of ways to approach an architecture like that. Imagine we have a system where the tenant is a Site
. That system is some kind of end-user-serviceable CMS, and users own multiple Sites
and can manage pages, media and other items within a Site
. The data model will be as follows:
class Site < ActiveRecord::Base
end
class Page < ActiveRecord::Base
belongs_to :site
has_many :media_blocks
has_many :pages, through: :media_blocks
end
class MediaBlock < ActiveRecord::Base
belongs_to :media_item
belongs_to :page
end
class Comment < ActiveRecord::Base
belongs_to :page # and thus to a Site, "through"
end
class MediaItem < ActiveRecord::Base
has_many :media_items
has_many :pages, through: :media_items
has_one_ :site, through: :pages # since it can be reused across multiple pages
end
Sites very rarely get merged together, and 99% of the data that gets created within a Site
stays inside that Site
, forever. Either because your system has outgrown hosting a single Site
, or because you want to have robust isolation (you don’t want Jane to post to her Site
only for the article to end up on Blake’s Site
by mistake), or because your system is wildly successful and profitable, you may want to apply the following strategies:
- Just like our initial model - the
Site
has anid
, some models link to it directly, some - through others - Every model gets a
site_id
. EveryINSERT
,UPDATE
orDELETE
then knows whichSite
a particular model makes part of - and a deletion can address the database where the site is stored. Databases will then be shards and store multipleSites
. If you decide to become a host for Slashdot, and get millions ofPage
records and bullions ofComment
records, they will likely be extracted into a separate DB. There will be a mapping table of sorts, that will record thatslashdot.org
gets mapped todb_slashdot_tenant
explicitly. - Just like our initial model - but there is only one
Site
record in the entire database. All records belonging to aSite
are stored inside that database.
There are other tricks for doing sharding/multitenancy well - for example, using generated primary keys which contain the tenant ID within them - so that shards can be merged, etc.
But what interests us here, specifically, is the last approach - having one database per tenant. For my smaller sites, using SQLite as the database has become part and parcel. Having a DB server that you can configure easily is very good. Having a database server that you do not have to configure at all - exceptional, though. Same for backups: centralised backup is great and useful. But nothing beats an rsync
if that’s all you need to do a backup. And the schema becomes smaller too - we can move the Site
out of the database outright, and the rest of the models no longer needs the associations to site
:
class Page < ActiveRecord::Base
has_many :media_blocks
has_many :pages, through: :media_blocks
end
class MediaBlock < ActiveRecord::Base
belongs_to :media_item
belongs_to :page
end
class Comment < ActiveRecord::Base
belongs_to :page # and thus to a Site, "through"
end
class MediaItem < ActiveRecord::Base
has_many :media_blocks
end
Development-wise those setups are a breeze too - if you need to debug something inside a particular Site
, all you need to do is download this site’s data. With just one scp
command, usually. And there are elephants in the room too:
- Doing schema migrations where a migration runs on one tenant, but fails on another
- Accessing the same DB from multiple servers
- Doing backups is somewhat unorthodox - there are many ways to do it
But remember: using this approach has one jarring advantage. It firmly pushes you out of the “big data” territory, and even out of “medium data” - it is “tiny data”.
“Data which, when stored on immediately-accessible random-read storage media of reasonable speed, does not fit under your desk” is my formal definition for “big data”, if that helps.
Some joints were exceptionally successful doing this. Expensify, for one, is notorious for pushing and pulling SQLite well beyond the boundaries most folks would call comfortable. I know that Autodesk’s own ShotGrid - back when it used to be Shotgun and was an independent software product - used SQLite3 pretty extensively. Along with the obligatory SQLite3::BusyException
every now and then 😉
Why this is challenging with Rails
When using SQLite3 “bare”, handling a database “open” and “close” is absolutely trivial:
SQLite3::Database.open("site_1.sqlite3") do |db|
site_title = db.get_first_value("SELECT title FROM sites LIMIT 1")
pages = db.execute("SELECT * FROM pages ")
end
If we use Rack, we just wrap this in a middleware:
def call(env)
SQLite3::Database.open("site_1.sqlite3") do |db|
@app.call(env.merge("site_db" => db))
end
end
But for this to work, the db
variable - the handle to the database - has to be explicitly used for every query! ActiveRecord, however, manages the connections not through a variable you give it, but through it’s own “recollection” of what database a particular ActiveRecord superclass connects to:
class Page < ActiveRecord::Base
establish_connection database: "site_1.sqlite3"
end
Needless to say, this code runs just once (and you don’t know exactly “when” - to which the answer is at first query), and is not at all designed for disconnecting and reconnecting all the time. Now, if there was a way to do this:
class Page < ActiveRecord::Base
obtain_connection_from { tenancy_system.database_config }
end
it would have been easier, but alas. And with the addition of connection pooling, query cache, schema cache - you are looking at a sizeable contraption of things which are put on top of other things. Which is what makes this exercise so frustrating: you know something utterly trivial with a “bare” API is infuriatingly complicated when doing it through ActiveRecord. Moreover - this is one of the headliner use-cases for SQLite3, and ActiveRecord seems to make it nigh-impossible to execute.
How come?
Churn, inevitable
The reason for difficulties with multiple databases in Rails comes down to the history of that feature and the needs of the hyperscalers - the Githubs, the Shopifys and the Zendesks of the ecosystem.
Since I have been using Rails - and ActiveRecord - for quite a long while - here is a brief history recap:
- Rails 1 already had database assignment per
ActiveRecord::Base
subclass - Rails 3 added connection pooling
- Rails 4 added
connection_handling
(albeit - hidden) - Rails 6 added
connected_to
- Rails 7 expanded on
connected_to
with the addition of shards (so now you have both roles and shards)
The interesting part of it all is that while ActiveRecord example code includes snippets like this:
class Person < ActiveRecord::Base
establish_connection adapter: "sqlite3", database: "foobar.db"
connection.create_table table_name, force: true do |t|
t.string :name
end
end
the changes in how Rails handles multiple databases have led to the fact that this example code is useful only in a very small number of cases. For example: with a snippet like this, how do we close_connection
? Or, how do we tell Person
to connect to a different database after having done a query or two?
If you start looking into this, an entire world opens befor your eyes. And this world has its own taxonomy - and it is sizeable. ConnectionHandling
, DatabaseConfig
, DatabaseConfigurations
(yes, plural), Resolver
, PoolConfig
, PoolManager
… and all these things interact, live and breathe in a carefully managed dance. Spoiler: they can do everything we need, but we have to conduct them like a little orchestra - just like so.
Divergent API design
The way ActiveRecord is designed - having model classes with no explicit way to tell them “through which connection” they should work for this query - means that there will always be some hidden state. It can be a global, or a class variable (which is… a glorified global) - or a thread-local, but somewhere there is a connection, and until recently (Rails 6, to be exact) there was no official way to tell ActiveRecord which connection to use.
In theory, an API like this could be realised:
Page.with_connection(tenant_db_conn).first
However, this database connection argument would then need to be provided to every call to ActiveRecord - and the API is truly vast.
An extra complication is that a lot of the design of AR assumes that a connection to the database (and that it is going to be the database) will be opened early, and then kept intact. The schema cache (letting ActiveRecord subclasses know which columns are in the tables, for example) gets loaded once. The query cache gets initialised once. Migrations get run once - and, again, they run on the database,
Divergent configuration lifecycle
If you want to build a multitenant system of small tenants using SQLite3, with a single database being allocated per tenant, your needs are not exactly in alignment with a hypothetical Shopify: they want to have cluster_a
, cluster_b
, cluster_eu
and cluster_us
, each tens of terabytes in size. You, instead, want to have site_1
, site_2
and so on - with some being just a few KB in size.
This would mean that for them, the configuration of those big clusters can be output into database.yml
programmatically. It can be source-controlled, and follow strict and specific semantics regarding
- When the file gets read
- Whether (and when) templating is done in it, for example - to inject credentials
- That all internal datastructures - such as connection pools - get initialised ahead of time
- Preconfigured, large clusters where data is usually sharded - using things like
shop_id
- but not segregated.
None of the “big guys” from the mentioned three have true, single-database-per-tenant setups – or at least I never heard they do.
Most of the modern ActiveRecord infrastructure is built around those assumptions, not because the makers of the feature want to work against what “we” want - they just made different tradeoffs.
Divergent DB engine tradeoffs
Another important item is database performance. “Big” database servers are designed with some assumptions. For example, if you have a database, the engine would be interested in holding file descriptors open for that database or mmap()
ed files from it. If the tablespaces are large - they will be cached in memory, and cached fairly aggressively. If there are indices - the engine will try to cache them in memory as well, and keep access to the files containing the index data close at hand.
The end result is that, at least when I was working on an Apartment-based system with MySQL 5.7, after a certain number of databases created you would start hitting file descriptor limits. Those are set low on MacOS, but it was still noticeable - and it was clear that it was a question of time (and scale) - which we haven’t hit though - before that would become an actual problem.
On balance, thus, a database server is optimised for few large databases – not for thousands of small ones. This is another reason why the approach with a static database.yml
seemed so appealing.
Just check this article out:
By implementing these optimizations, I’ve seen remarkable performance improvements: single MySQL server handling 2M+ ticket reservation transactions per minute while the average latency of SELECT … FOR UPDATE SKIP LOCKED query staying under 400μS.
This is the kind of perf those “big engines” optimise for. Not “quickly handling 120 pages within this site, which is one of 2 thousands”.
With SQLite3, the story is completely different. SQLite3 thrives with multiple small databases. Since a SQLite3 database is just a file (well, 3 files sometimes, but you get the point), it makes perfect sense to have a single database per tenant in the system. Since the system is multitenanted, a request for tenant A
is guaranteed not to need data from tenant B
. Moreover - when we are handling a request for tenant A
, we don’t need any resources from B
- so we don’t even need a connection (file handle).
Having smaller SQLite3 databases has more affordances - for example, it’s much faster - and more granular - to do backups on a per-tenant basis. Debugging becomes much easier - instead of doing a sophisticated sequence of SELECT
s to extract the “slice” of data for a particular tenant, you just copy the tenant’s DB wholesale. Same for granular restore. Same for deletions - removing a tenant, even a large one, is just an unlink
away.
So one of the reasons why the modern multi-DB features in Rails do not support dynamic tenant management with automatically allocated databases - in large numbers - is because, at least on the surface, only SQLite3 currently makes this pattern viable.
And it’s not only viable – it is glorious. Did you know that the way iCloud works, for example, is literally millions of isolated SQLite databases, stored inside of larger Cassandra databases?
Back to the original problem
So, I had an app. It has been running for more than a decade now. It was initially built with static HTML with some templating getting pre-processed before server upload using rsync
- it was a glorified static site generator, essentially. Then came the “admin” features, and the app acquired databases. From the get go, the app - which is a mini-CMS of sorts - provided every website owner a UI to edit the content of their website. Every website also has its own database. Requests between them never cross - and no site_id
is involved anywhere in the process.
Initially it was based on raw ERB and some glue code. Then it got rewritten into Camping, and the database switching looked roughly like the establish_connection
example above. This was pre-Rails-3, so no Rack middleware, no frills, no nothing.
With the ActiveRecord 3 upgrade came the dance of splitting code into something more appropriate, along with a changeover to Sinatra. And the connection management - which got moved into a Rack middleware - took the following shape:
def call(env)
begin
ActiveRecord::Base.establish_connection(adapter: 'sqlite3',
database: env.fetch('site_db_path'),
timeout: BUSY_TIMEOUT)
s, h, b = app.call(env.merge('database' => self))
connection_closing_body = ::Rack::BodyProxy.new(b) do
::ActiveRecord::Base.clear_active_connections!
end
[s, h, connection_closing_body]
rescue Exception
::ActiveRecord::Base.clear_active_connections!
raise
end
end
And this worked… mostly. Around the same time I have installed Sentry, but I didn’t take the habit of looking at it regularly - don’t remember the exact reasons. But, after a few years of painless operation, the app started throwing odd errors. The error that caught my eye was ActiveRecord::ConnectionNotEstablished
, and it became more frequent the more load on the site there was. More visits - more frequent errors. Fast-forward a few years, and the error became quite frequent.
My assumption was that to figure out what is going on, I can better upgrade to at least ActiveRecord 6. This version is the oldest Rails version which, by virtue of Ruby version compatibilities, was already able to run on Apple Silicon - which I am a proud owner of now.
Some hours later and a multitude of CoffeeScript files converted (and even more Ruby files edited) the update was complete. I tested it locally, verified everything was in good working order, and deployed the app.
And just 30 minutes in - ConnectionNotEstablished
. And not only that, but 10x as frequently as before. The update hasn’t fixed the problem – in fact, it made it worse. Some experiments I did:
- Allocating a separate connection pool per tenant and managing it myself
- Doing a connection checkout from a pool and checking it back into the pool
- Juju and voodoo magic
Nothing worked. With a helpful hint from Kir – who is responsible for exactly the managing of database sharding at one of the “big boys” – I got the idea that it should be possible to use the new roles:
parameter - and a fake database configuration - to achieve this functionality.
My mistake was that I was trying to manage pools and connections myself, manually - while the new Rails functionality is actually geared towards Rails maging the pools and connections for you. So in this instance there was also… a divergent understanding of the API.
I was still in the paradigm that you use establish_connection
- like in the olden days - to manage that infrastructure. But the “blessed” approach is actually to furnish Rails the connection configurations and let it handle them automatically.
The solution
ActiveRecord::Base
now has a class method called connected_to
. It allows you to do exactly the thing you need for a database-per-tenant setup - hop into a block with something being your “main” database. Previously, it accepted database:
with a whole DB configuration, but now it only accepts role:
(and - with Rails 7 and above - shard:
). This is how you use it:
ActiveRecord::Base.connected_to(shard: "sites_1") do
site = Site.find(site_id) # Which lives on this shard, "sites_1"
posts = site.pages.order(created_at: :desc)
end
The challenge is where the sites_1
gets configured. Normally it would be in your database.yml
, as per the official doc:
production:
primary:
database: my_primary_database
adapter: mysql2
primary_replica:
database: my_primary_database
adapter: mysql2
replica: true
primary_shard_one:
database: my_primary_shard_one
adapter: mysql2
migrations_paths: db/migrate_shards
But if you want to switch between tenants live - and tenants get created (and deleted!) at runtime - having this static config with cross-referencing keys is not going to fly at all. Moreover - even if you can change that “God config” – how do you force ActiveRecord to reload it? How can you tell ActiveRecord that a shard/tenant no longer exists? And how do you do it in a thread-safe manner? Does it lead to a reinitialisation of all the connection pools, or just addition-deletion?
The solution then becomes focused in one area: taking over from ActiveRecord in managing those connection pools and naming the roles
and shards
automatically. This is where the bulk of the work was, in the end. We want to convert our tenant database name/filename into a string to devise the role
name that we can furnish to AR. For me, I only went to update to Rails 6, so I didn’t go into shards yet. If I did (and I might, eventually) - the tenant name would be the shard
, and the reading
and writing
roles could be used for hosting a readonly: true
DB connection and a writable one. However, a method I ended up using does not support shards even on Rails 8, so read on.
So how do we create those pools?
The way to do it is to query ActiveRecord for whether a particular connection pool is already setup or not. If it is not - there is going to be a NoConnectionPool
exception if you try to switch to the role/shard that doesn’t exist. But doing this via rescues is not great – the control flow becomes a bit intricate. What we can do instead is check whether there is a connection pool set up for a particular role/shard, and then connect if there are none. Note that since this manages pools - it needs to be protected by a mutex:
MUX.synchronize do
if ActiveRecord::Base.connection_handler.connection_pool_list(role_name).none?
ActiveRecord::Base.connection_handler.establish_connection(database_config_hash, role: role_name)
end
end
After that we can use connected_to
- which is going to be thread-safe, fast and pretty neat:
MUX.synchronize do
if ActiveRecord::Base.connection_handler.connection_pool_list(role_name).none?
ActiveRecord::Base.connection_handler.establish_connection(database_config_hash, role: role_name)
end
end
ActiveRecord::Base.connected_to(role: role_name) do
pages = Page.order(created_at: :desc).limit(10) # Only selects from that site/tenant
end
Don’t forget about streaming Rack bodies
There is a small additional element we need to take care of, though: doing it correctly in Rack. To have something similar to Apartment::Elevator we need to do something that - again - apartment
doesn’t do correctly. If we assume all the renders of our app are buffered, we can do this:
def call(env)
site_name = env["SERVER_NAME"]
connection_config_hash = {adapter: "sqlite3", database: "sites/#{site_name}.sqlite3"}
role_name = "site_#{site_name}"
# Create a connection pool for that tenant if it doesn't exist
MUX.synchronize do
if ActiveRecord::Base.connection_handler.connection_pool_list(role_name).none?
ActiveRecord::Base.connection_handler.establish_connection(connection_config_hash, role: role_name)
end
end
ActiveRecord::Base.connected_to(role: role_name) do
@app.call(env) # returns [status, header, body]
end
end
But we are aware, of course, that Rack bodies are callable and iterable - and Rack response bodies may also just be doing SQL queries. Code that lives in the Rack response body and serves streaming data is not any less useful than the one living in the app call()
method, even though much fewer people use it. I even worked on a big patch to Appsignal, my favorite APM that made it report what happens inside a Rack streaming body the same way it would for the app’s call
. The way it usually works for resource release with those bodies is this:
f = File.open(path, "rb")
status, headers, body = @app.call(env)
body_with_close = Rack::BodyProxy.new(body) { f.close }
[status, headers, body_with_close]
This attaches a callback to the #close
method of the Rack body we return, which - according to the Rack SPEC - must be called by the webserver or by the calling middleware.
Reasonably enough, the ActiveRecord API for connected_to
only works with a block. That’s a good idea from the point of encouraging the correct (and safe) usage of a rather blunt tool. However, specifically in this case, it gets in the way. Luckily, this problem can be bypassed with judicious application of a Fiber:
connected_to_context_fiber = Fiber.new do
ActiveRecord::Base.connected_to(role: role_name) do
Fiber.yield
end
end
connected_to_context_fiber.resume
status, headers, body = @app.call(env)
body_with_close = Rack::BodyProxy.new(body) { connected_to_context_fiber_.resume }
[status, headers, body_with_close]
And thus, our “tenant switching middleware” for ActiveRecord connection management with one database per tenant becomes:
class Shardine
MUX = Mutex.new
def initialize(connection_config)
@config = connection_config
@role_name = connection_config.fetch(:database).to_s
end
def with(&blk)
# Create a connection pool for that tenant if it doesn't exist
MUX.synchronize do
if ActiveRecord::Base.connection_handler.connection_pool_list(@role_name).none?
ActiveRecord::Base.connection_handler.establish_connection(@config, role: @role_name)
end
end
ActiveRecord::Base.connected_to(role: @role_name, &blk)
end
def enter!
@fiber = Fiber.new do
with(conn) { Fiber.yield }
end
@fiber.resume
true
end
def leave!
# Probably there is something in ConnectionHandling
# that can be used here, but I was too lazy to look
to_resume, @fiber = @fiber, nil
to_resume&.resume
end
class Middleware
def initialize(app, &database_config_lookup)
@app = app
@lookup = database_config_lookup
end
def call(env)
connection_config = @lookup.call(env)
switcher = TenantDatabaseSwitcher.new(connection_config)
did_enter = switcher.enter!
status, headers, body = @app.call(env)
body_with_close = Rack::BodyProxy.new(body) { switcher.leave! }
[status, headers, body_with_close]
rescue
switcher.leave! if did_enter
raise
end
end
end
which we then configure in config.ru
(or in Rails similarly) like so:
use Shardine::Middleware do |env|
site_name = env["SERVER_NAME"]
{adapter: "sqlite3", database: "sites/#{site_name}.sqlite3"}
end
And there you go - a safe and performant database-per-tenant switcher.
An additional hurdle
Since I was upgrading to Rails 6 - which seemed the lowest “modern” version I really had to go to - there was an extra snag. By default, when you use ActiveRecord without Rails, it gets configured “conservatively” - or, rather, not configured at all. Rails 6 has the concept of legacy_connection_handling
. Without going into too much detail, to make this solution work that parameter must be turned off explicitly. In Rails 7 and above this parameter no longer exists.
Some remaining work
Since I initially migrated my app to ActiveRecord 6 I don’t have shard
support yet. It would actually make perfect sense to have your “reading replica” be a readonly: true
SQLite3 database, as Stephen has written.
Another aspect is that there’s currently no API to remove a connection pool if a tenant gets removed from the system, which I just don’t need (my tenants don’t change as frequently).
Handling other contexts when you need to “step into” a Tenant can be handled similarly, either using connected_to
or using the fiber approach.
And, of course, the “database per tenant” workflow is just starting and it’s only in the recent years, with product from the ONCE family specifically, where SQLite3 began to shine again - as an engine of “small data, in big numbers”.
May we live to see this pattern come into the spotlight, finally.