Create public-facing unique keys alongside your primary keys

Three Good and Three Bad Things about... Achieve Static Typing Benefits in Rub...

Create public-facing unique keys alongside your primary keys

Peter Shilling wrote a blog post about generating what he calls ��cool ids�� for models in Rails. Instead of numbers, models have an id prefixed with its type, like cus_abcdefg1234 for a customer. I like the idea, but don��t think these ids should be used as primary keys. External IDs are incredibly useful for a lot of things, but they aren��t great as primary keys. I��ll explain what public-facing external IDs are, how to make them, and why they shouldn��t be your primary keys.

Screenshot of Stripe's Admin UI showing the ID for a payment intent

Stripe��s admin UI surfaces identifiers for various entities, but these identifiers aren��t numbers, nor are the UUIDs. They are valueslike cus_734t8wri4thugiuh or pi_4t98yerihrsdf. The bit before the underscore tells you what sort of thing it is (a customer, and payment intent, respectively), and the remainder is a unique value. I��m going to call these external IDs since, from Stripe��s perspective, they are being shared externally.

I��ve come to realize that you should almost always create external IDs for your database tables, and that they should be prefixed witha type identifier, but not be used as the primary key for your tables.

What��s So Great about External IDs?

When data is in a database, there is often need to refer to it outside whatever systems are built to manage it. For example, acustomer service agent may need to ask an engineer about a particular customer, and they need to do that unambiguously. Sharingpersonal details like email or name is not a great practice and it��s often imprecise.

In a typical Rails app, you��d see the database primary keys in URLs, like admin.example.com/customers/1234. The internal team willstart using these ids, e.g. ��Hey Pat, can you figure out why customer 1234��s password resets aren��t working?��. There are manydownsides to this, but a big one is that if you see the value 1234 out of context, it��s hard to know what it is.

Even if everyone is dilligent about providing context, mistakes can get made. Imagine a URL likeadmin.example.com/customers/1234/.... If someone isn��t careful, they may copy the 4567 and refer to that as a customerID.

External IDs that are prefixed with a type eliminate this problem. Instead of a customer ID being a number, it starts with a prefixtelling you what it is, e.g. cus_1234. That way, if the order ID is copied mistakenly, it will be obvious that it isn��t a customerID, since it would look like ord_4567.

While you could certainly create these ids as derived fields, as below, you don��t want to.

class Customer # Don't do this - see below def external_id = "cus_#{self.id}"end

Even though this value is not the primary key, it��s derived from it and thus, you are sharing the primary key externally, which shouldbe avoided.

Primary Keys Should Not Be Externalized

Exposing primary keys has a lot of downsides and pretty much no upside (especially when see how straightforward it is to have anexternal ID). Most systems use numeric, monotonically increasing values for primary keys. This opens up any URL containing them to anenumeration attack, where a threat actor will just increase the number they see hoping to find a value they can access but shouldn��t.

Numeric keys can also expose business metrics. If you get a receipt with your customer ID on it, and that ID is, say, 123987, youcan likely surmise the company has at least 100,000 customers.

You can certainly mitigate these problems by using UUIDs for your primary keys, however UUIDs still won��t address the most perniciousof issues, which is shadow processes using the IDs.

A good rule of thumb is that any data you externalize��even to internal users��is likely to be input into another system or process that you may not control��or even know about.

Any data you externalize is likely to be input into another system or process you don't control.

This means that the values you��ve set up to ensure referential integrity (i.e. keeping your database consistent and correct) are now coupled to systems and processes you may not be able to change. I had a job once where primary keys were used in other systems and when our database��s key ended up rolling over to a sixth digit, all hell broke loose in these heretofore unknown other systems.

It��s this last issue which is why you should create clear separate between columns in your database.

External IDs are Keys, Just Not Primary Ones

The word ��primary�� in ��primary key�� is there for a reason. It implies the existence of many keys, with one being used most of the time. But you can have many other keys as needed by the engineering team, database administrator, or business owners.

For example, many systems require unique email addresses. This is a key, even if not the primary one (and, if the business no longer requires unique emails, this key is no longer a key��good thing it wasn��t your primary key :).

An external ID is just another key you can use to uniquely identify a row. But, critically, it doesn��t serve any other purposeinside the database. It��s not used as a foreign key, and it doesn��t encode any domain information. This means you can be relativesafe exposing it as a unique identifier for any reason.

This allows you to change it if you needed to, or move what table it��s a part of. If your primary keys were being shared or exposed, both of those operations would be difficult or impossible.

External IDs in Practice

There are many ways to generate external IDs. In Rails, you can use before_save to ensure the value is there:

class Customer before_save do |record| if record.external_id.blank? hex_id = SecureRandom.hex record.external_id = "cus_#{hex_id}" end endend

Note that there gems like the aforementioned cool_id or external_id that you should consider or at least copy to make this mechanism easier to manage and use.

If you are using Postgres, you can use database triggers instead. First, define a function that generates the ID. It accepts a singleargument, which is the prefix (yes, Postgres�� programming language is super weird):

CREATE FUNCTION generate_external_id()RETURNS trigger AS $$BEGIN NEW.external_id := TG_ARGV[0] || '_' || md5(random()::text); RETURN NEW;END;$$ LANGUAGE plpgsql;

Then, for any table you create, you��ll use this to create a trigger:

CREATE TRIGGER customers_external_idBEFORE INSERT ON customersFOR EACH ROWEXECUTE PROCEDURE generate_external_id('cus');

I don��t think Rails has a way to do this directly, but you can wrap any SQL in a call execute(...) inside a migration. You can alsowrap this into a method you can call inside a migration to avoid copying and pasting this boilerplate.

Once the external IDs are being inserted into the database, you��ll want to use them to e.g. build URLs. In Rails, URLs are built from Active Records and Active Models via the to_param method:

class Customer def to_param = self.external_idend

external_id will show up by default when you call to_json, but you may want to omit id from that payload to avoid leaking the IDyou are working to protect. You can do that by overriding serializable_hash:

class Customer def serializable_hash(options=nil) default_options = { except: :id } options = if options.nil? default_options else # Allow params to override our defaults # to preseve the method's expected behavior default_options.merge(options) end super(options) endend

Lastly, you��ll want a way to locate records by this id. You can certainly do Customer.find_by!(external_id: params[:id]), howeveryou could also override find:

class Customer def find(*arguments) if arguments.size == 1 customer_by_external_id = find_by( external_id: arguments.first ) if customer_by_external_id.present? return customer_by_external_id end end super(*arguments) endend

This would preserve find��s behavior on the primary key, but also allow looking up by the external id, too. I��ve done this before, but now feel it��s better to use find_by!(external_id:) to be more explicit about what��s going on.

You can extract all of this behavior to a module or put it into ApplicationRecord.

View more on David Bryant Copeland's website »

Like • 0 comments • flag

Published on August 26, 2024 02:00

No comments have been added yet.

David Bryant Copeland's Blog

David Bryant Copeland's profile
15 followers