Testing a Thousand Applications With Flipper

Feature flags are amazing. No, really, did I tell you that feature flags are amazing? They are. But you might be running a thousand applications. When this kind of complexity gets involved you might need to test combinations of feature flags, sometimes - dozens of those combinations. Exhaustive testing to the rescue!

As I mentioned: if you have many feature flags, sometimes your application might be dependent on the state of multiple feature flags at once. Imagine you have a feature flag called deferred_checkout, and another called buy_one_click. Since the formula for the number of possible states is 2 ** feature_count, we know that we have a matrix of 4 possible states:

On, On   | On, Off
Off, Off | Off, On

With every extra feature flag, the matrix grows with 1 row and 1 column. There is in fact a great technique for testing these types of matrices - exhaustive testing. With that technique, we can feed our software all the possible inputs, and see how it reacts. And computers are great at enumerating large datasets. Way better than us, humans. Why not make our test suite generate test cases for all these combinations? When using Flipper for instance, we could then do this:

test "the checkout screen renders correctly", feature_flags: [:buy_one_click, :deferred_checkout] do
  get "/checkout" #...

Thanks to the meta-programming abilities of Ruby we can put together such a helper quite easily. In your test_helper.rb, add the following:

class FeatureFlagCombo
  def initialize(table)
    @table = table

  def set_flags!
    @table.each_pair do |flag, is_enabled|
      is_enabled ? Flipper.enable(flag) : Flipper.disable(flag)

  def to_s
    @table.map do |flag_name, is_enabled|
      "#{flag_name}: #{is_enabled ? :on : :off}"
    end.join(", ")

Then we will need a method which executes a block passing it a combination of on|off values for every flag. This is a bit obtuse (and makes a good question for a tech interview which you probably should not be asking): generate the entire set of possible vectors with a vector having N dimensions and values of every dimension being restricted to a finite set.

The N in this case is the number of feature flags involved, and the set of possible values per dimension is [true, false] - but if you ever need such a contraption for more possible values it will work just fine.

def self.with_every_feature_flag_combination(*feature_flags)
  bit_values = [false, true]
  possible_combinations_of_enabled_and_disabled = bit_values.product(*[bit_values] * (feature_flags.length - 1))
  possible_combinations_of_enabled_and_disabled.each do |booleans|
    feature_combo = FeatureFlagCombo.new(feature_flags.zip(booleans).to_h)

This method will yield you a FeatureFlagCombo object for every such feature flag combination. If you have 2 flags - 4 yields, 10 - 1024 and so forth. Then we need to extend ActiveSupport::Testing::Declarative to allow it to accept a keyword argument:

def self.test(name, feature_flags: [], &block)
  if feature_flags.any?
    with_every_feature_flag_combination(*feature_flags) do |combo|
      super("#{name} with features #{combo}") do
    super(name, &block)

and we can define our tests:

test "a purchase is always refundable", feature_flags: [:discounted_purchase, :rapid_refund] do
  purchase = Purchase.create!
  assert_predicate purchase, :refundable?

This is actually where RSpec can be nicer than Minitest because of its contexts. Note that Flipper automatically installs a test helper for you, and will revert all the feature flags after every test case. Flipper is amazing.