Julik Tarkhanov

Making Rails Global IDs safer

The new LLM world is very exciting, and I try to experiment with the new tools when I can. This includes building agentic applications, one of which is my personal accounting and invoicing tool - that I wrote about previously

As part of that effort I started experimenting with RubyLLM to have some view into items in my system. And while I have used a neat pattern for referencing objects in the application from the tool calls - the Rails Global ID system - it turned out to be quite treacherous. So, let’s have a look at where GlobalID may bite you, and examine alternatives and tweaks we can do.

What are Rails GIDs?

The Rails global IDs (“GIDs”) are string handles to a particular model in a Rails application. Think of it like a model URL. They usually have the form of gid://awesome-app/Post/32. That comprises:

  • The name of your app (roughly what you passed in when doing rails new)
  • The class name of the model
  • The primary key of the model

You can grab a model in your application and get a global ID for it:

moneymaker(dev):001> Invoice.last.to_global_id
  Invoice Load (0.3ms)  SELECT "invoices".* FROM "invoices" ORDER BY "invoices"."id" DESC LIMIT 1 /*application='Moneymaker'*/
=> #<GlobalID:0x00000001415978a0 @uri=#<URI::GID gid://moneymaker/Invoice/161>>

Rails uses those GIDs primarily in ActiveJob serialization. When you do

DebitFundsJob.perform_later(customer)

where the customer is your Customer model object which is stored in the DB, ActiveJob won’t serialize its attributes but instead serialize it as a “handle” - the global ID. When your job gets deserialized from the queue, the global ID is going to get resolved into a SELECT and your perform method will get the resulting Customer model as argument.

All very neat. And dangerous, sometimes - once LLMs become involved.

🧭 I am currently available for contract work. Hire me to help make your Rails app better!

Basics: LLM tool calls

LLM tool calls are ways for the model to call your application and get actionable results. For example, you may have a tool which allows your model to search for all unpaid invoices:

# Finds all invoices which are unpaid
def call
  unpaid_invoices = Current.account.invoices.unpaid.select(:id).all
  result = unpaid_invoices.map do |invoice|
    {gid: invoice.to_global_id}
  end
  JSON.dump(result) # [{"gid": "gid://moneymaker/Invoice/32"}, {"gid": "gid://moneymaker/Invoice/45"}]
end

Then, there may be another tool in your system which allows the model to get details about a particular invoice:

def call(invoice_gid)
  invoice = GlobalID::Locator.locate(invoice_gid)
  JSON.dump(invoice.attributes)
end

Why are GIDs so appealing when working with LLMs? Well, LLMs work with text tokens. They are very adept at recognizing patterns, and they can be instructed to both read those tokens from tool calls and to construct them in-situ. For example, we can instruct the LLM in our tool to pass us invoice GIDs:

Use this tool to get details about an invoice. The argument of the tool is the invoice GID, which looks like:

gid://moneymaker/Invoice/32

And now comes the meat of the problem.

When GIDs turn deadly

When working with GIDs, it is important to remember three things:

  • GIDs are not guaranteed to be generated by your application code, as they are trusted identifiers
  • GIDs are not checked for authorization when doing the lookup - they are meant to be generated above the authorization layer, and to be consumed above the authorization layer
  • GIDs use ActiveRecord::Base.find

That last one is important. My invoices also have a system_identifier, which is a UUID. A couple of tools do know about the existence of those identifiers. Once, I was stunned to have the tool find an invoice for me, but the tool actually pulled up the wrong record. It did not fail with a RecordNotFound or other exception, it actually substituted the UUID into a GID it has generated and passed to a tool call which worked similarly to call(invoice_gid).

An investigation was done, and a very interesting quality of GIDs has become apparent. As I mentioned previously, the GIDs call into find under the hood - you can check the code for yourself in https://github.com/rails/globalid/blob/main/lib/global_id/locator.rb

And ActiveRecord::Base.find has a very interesting property. See, from its inception Rails valued proper, clean URLs like a necessary convenience. You should be able to have URLs like posts/my-post-slug and have them lookup effortlessly - which is a noble endeavour. But storing a separate slug and its index and whatnot seemed wasteful. Thus, a “holy pair” of methods has been devised:

def to_param
  slug = title.titleize.gsub(/\s+/, "").underscore.dasherize # "Amazing post title" => "amazing-post-title"
  [id, slug].join("-") # 761-amazing-post-title
end

Then, the Rails finder methods would do something quite clever (pseudocode):

def find(id)
  just_int_id = id.scan(/\d+/).flatten.first.to_i
  where(id: just_int_id).first!
end

So you could feed your "761-amazing-post-title" - which is a String - to all the find-related methods and it would lookup record 761 (an Integer) for you instead. Which is neat, no doubt about it (and no, dear strong typing adepts, it is not horrifying - it is actually neat).

That said, if you allow arbitrary strings into those lookups - interesting things may happen. For example, the LLM may hallucinate that a GID for an invoice it needs to examine is actually not a composition of <model class>/<primary key>, but instead <model class>/<other identifier it associated with the invoice>. And here, an interesting thing will happen. Remember that Rails scans for digit sequences to infer the ID? Now watch:

  • The LLM hallucinates a GID with gid://moneymaker/Invoice/22ecb3fd-5e25-462c-ad2b-cafed9435d16
  • That GID then gets passed to the tool which looks up the invoice. Or to a view!
  • ActiveRecord happily extracts the first sequence of digits from that UUID (using scan(/\d+/)), assuming this is an ID with a slug, and finds us… the invoice 22 - since “22” is the first digit sequence extracted from “22ecb3fd-5e25-462c-ad2b-cafed9435d16”.
  • That found invoice then may get shown, may get modified - most anything, really. But, more importantly, it may belong to a different user!

And - remember - that the GIDs have two other properties:

Gids are not guaranteed to be generated by your application code

A GID can be a freeform string that is composed anywhere - for example, hallucinated by an LLM. But when you turn it into a handle, you do not perform any verification on whether that GID has been actually produced by your code. Imagine you have an Account, and it belongs to a user. Another user is then using the system, and the LLM hallucinates a GID with an actually present ID of an account, but that is the account of someone else! And since GIDs are not signed - the GID produced by the LLM will be accepted “at face value”.

This means that resolving an arbitrary GID you get from an LLM creates potential for information disclosure.

Gids exist above your authorization layer

When you receive an ID for something that belongs to a User, you are likely to query for it like this in your controllers and models:

current_user.credit_statements.find(id_from_params)

This automatically scopes the query to the current user, and prevents information disclosure. But you do not have that luxury when you do GlobalID::Locator.locate(gid_string) - it just does a primary key lookup. Remember that GIDs were made for facilitating ActiveJob serialization - they are a system-level facility, not a product-level facility. “Bare” GID lookups (“locations”) are, thus, by definition, unsafe.

Now, GIDs are neat also! They are a very nice way for an LLM to reference objects in the system it interfaces with. They are also very neat for recalling those objects into views and parts of the application which do something useful with the LLM output. But “as is” they are fundamentally unsafe. So, should you want to use GIDs with LLM calls (or in other “not quite authenticated” contexts), here is what you can do.

GIDs support a concept called “app” - that is the name of the application that contains the GID being referenced. In Rails, you can actually have several GID namespaces, which will be correctly used when you perform your locate calls. This can be used to install a special Locator just for use by LLMs, which will be subject to way more restrictions than the app-wide locator used for ActiveJob.

class LLMSafeLocator < GlobalID::Locator::BaseLocator
  def locate(gid, options = {})
    model_id = gid.model_id.to_s
    raise "Malformed pkey in #{gid}" unless model_id.match?(/^\d+$/)
    super(gid, options).tap do |maybe_model|
      authorize_access!(maybe_model) if maybe_model
    end
  end

  def locate_many
    # ... similar
  end

  def authorize_access!(model)
    return unless model.respond_to?(:owner)
    unless model.owner == Current.user
      raise "Unauthorized access to #{model.class}##{model.id} from #{Current.user.inspect}"
    end
  end
end

This way, we enforce two things:

  • Remove the “magic” for find to avoid misinterpretation of our primary keys
  • Add an authorization layer so that lookups done by the LLM will be forced to the authorization scope

We then install our locator under a separate app ID:

# application.rb
GlobalID::Locator.use :tainted, LLMSafeLocator.new

and we then add a method to our ApplicationRecord:

def to_tainted_global_id
  SignedGlobalID.create(self, app: "tainted")
end

Note that we are applying a signed global ID here, because we don’t want the LLM to be hallucinating these GlobalIDs for us. If you are feeling adventurous and want to permit the LLM to generate those GIDs anyway:

def to_tainted_global_id
  GlobalID.new(self, app: "tainted")
end

Then, the tool calls and prompts should hint the LLM that the GIDs will be signed:

The IDs you receive and use as GIDs are _opaque_. Do not manipulate or decode them as you may damage the ID. They look like this:

eyJfcmFpbHMiOnsiZGF0YSI6ImdpZDovL21vbmV5bWFrZXIvSW52b2ljZS8zMiIsImV4cCI6IjIwMjYtMDEtMTBUMTI6MjQ6NDQuMDQ3WiIsInB1ciI6ImRlZmF1bHQifX0=--397235ab0a0d32e1d29ed0e2f136b34f573244a4

or hint it that the GIDs it operates with will have our tainted:// app identifier:

The IDs you receive and carry are GIDs, and look like this:

gid://tainted/Invoice/12

and from your code return not the app’s default GIDs, but your “tainted” GIDs instead:

# Finds all invoices which are unpaid
def call
  unpaid_invoices = Current.account.invoices.unpaid.select(:id).all
  result = unpaid_invoices.map do |invoice|
    {gid: invoice.to_tainted_global_id}
  end
  JSON.dump(result) # [{"gid": "gid://tainted/Invoice/32"}, {"gid": "gid://tainted/Invoice/45"}]
end

and make sure your LLM only ever gets - and sends you - the “tainted” GIDs:

def call(invoice_gid_str)
  gid = GlobalID.parse(invoice_gid_str)
  raise "The passed GID is not correctly scoped - it uses #{gid.app.inspect}" unless gid.app == "tainted"

  invoice = GlobalID::Locator.locate(gid)
  JSON.dump(invoice.attributes)
end

There is a caveat: if you call locate yourself, you want to do so through your Rails application locator, because the global locator will revert to your “system” locator if the app you supply it is not configured.

Or even better

Do not play with GIDs unless you strictly have to, and use signed_id instead, passing the value into the ActiveRecord relations:

# As output
invoice.signed_id(purpose: "llm")

# As input
current_user.invoices.find_signed(signed_invoice_id, purpose: "llm)

This is not as polymorphic but much safer in the long run.

To summarize

GlobalID is a neat concept in Rails, but it is not very safe - if you allow your LLMs to touch it, relatively severe security consequences can take place - ranging from information disclosure to data exfiltration. Should you choose to use them - do so with guardrails. And, in general, treat your LLM input into your tools as untrusted user input, with all that entails.