needle

One foreign key to rule them all and in the darkness, bind them.

Overview
Readme

Needle #

hex.pm
hexdocs

Universal foreign keys, virtual schemas, and shared data mixins for Ecto

One foreign key to rule them all and in the darkness, bind them.

-- Gandalf, paraphrased.

A means of foreign keying many tables in one field. Designed for
highly interlinked data in highly dynamic schemata where tracking all
the foreign keys is neither desired nor practical.

Note: a universal foreign key is actually a hard problem. Many
approaches are on offer with a variety of tradeoffs. You should
carefully consider a variety of approaches rather than just blindly
adopting the one that fitted our project's needs the best!

Background #

A Table is a record of a table that may be linked to by a pointer.
A Pointer is a pointer id and a table id.

With these two ingredients, we can construct a means of pointing to
any table that has a Table entry.

Pointer and Table IDs are both Needle.ULID, a UUID-like type
that combines a millisecond-precision timestamp and some randomness to
reduce the likelihood of a clash. It naturally sorts both in binary
and text form by time and as far as postgres is concerned, it's a UUID.

Installation #

Aside from the hex dependency, you will also need to write a simple
migration to set up the database before you can start writing your
regular migrations:

defmodule MyApp.Repo.Migrations.InitPointers do
  @moduledoc false
  use Ecto.Migration
  import Needle.Migration

  def up(), do: inits(:up)
  def down(), do: inits(:down)

  defp inits(dir) do
    init_pointers_ulid_extra(dir) # this one is optional but recommended
    init_pointers(dir) # this one is not optional
  end
end

Defining a Pointable Type #

Pointable tables require a unique sentinel ULID to identify
them. These must be 26 characters long and in the alphabet of
Crockford's Base32.
They should be easy to identify in a printout and might be silly.

There is a helper function, synthesise!/1 in Needle.ULID to
assist with this process - give it a 26-character long binary of ascii
alphanumerics and it will give you the closest ULID that matches back.

Let's look at a simple schema:

defmodule MyApp.Greeting do
  use Needle.Pointable,
    otp_app: :my_app,
    source: "myapp_greeting",
    table_id: "GREET1NGSFR0MD0CEXAMP1E000"

  pointable_schema do
    field :greeting, :string
  end
end

To declare a pointable schema, we start by using Needle.Pointable,
providing the name of our otp application, the source table's name in
the database and our chosen sentinel ULID.

We then call pointable_schema and define any fields we wish to put
directly in the table. For the most part, pointable_schema is like
Ecto's schema macro, except you do not provide the table name and
let it handle the primary key.

If for some reason you wished to turn autogeneration off, you could
pass autogenerate: false to the options provided when using
Needle.Pointable.

Now let's define the migration for our schema:

defmodule MyApp.Repo.Migrations.Greeting do
  @moduledoc false
  use Ecto.Migration
  import Needle.Migration

  def up() do
    create_pointable_table(:greeting, "GREET1NGSFR0MD0CEXAMP1E000") do
      add :greeting, :text, null: false
    end
  end

  def down() do
    drop_pointable_table(:greeting, "GREET1NGSFR0MD0CEXAMP1E000")
  end
end

As you can see, it's pretty similar to defining a regular migration,
except you use create_pointable_table and
drop_pointable_table. Notice that our sentinel ULID makes an
appearance again here. It's very important that these match what we
declared in the schema.

Referencing Needle #

Ecto does not know anything about our scheme, so unless we
specifically want something to reference one of the pointed tables, we
typically belongs_to with Needle.Pointer. The table in which we
do this does not itself need to be pointable.

defmodule MyApp.Foo do

  use Ecto.Schema
  alias Needle.Pointer

  # regular ecto table, not pointable!
  schema "hello" do
    belongs_to :pointer, Pointer # who knows what it points to?
  end
end

You may choose to reference a specific schema rather than Pointer if it
will only point to a single table. If you do this, you must ensure
that the referenced record exists in that table in the normal
way. There may be some performance benefit, we didn't benchmark it.

The migration is slightly more complex, we have to decide what type of
a pointer it is. Needle come in three categories:

  • A strong pointer is not nullable and is deleted when the object it
    points to is deleted.
  • A weak pointer is nullable and is nilified when the object it points
    to is deleted.
  • An unbreakable pointer will raise when you attempt to delete the
    object it points to.
Type Nullable? On Delete
Strong No Cascade
Weak Yes Set Null
Unbreakable No Raise

In this case we will use a strong pointer, because we want it to be
deleted if the pointed object is deleted.

defmodule MyApp.Repo.Migrations.Hello do
  @moduledoc false
  use Ecto.Migration
  import Needle.Migration

  def change() do
    create_if_not_exists table(:hello) do
      add :pointer, strong_pointer(), null: false
      add :greeting, :text, null: false
    end
  end
end

If you are pointing to a specific table instead of pointer,
strong_pointer/1 allows you to pass the name of that module
(strong_pointer/0 calls this with Needle.Pointer).

Dereferencing Needle #

It is common that even though you have a universal foreign key, you
will want to issue different queries based upon the type that is being
pointed to. For this reason, it is up to you to decide how to perform
an onward query.

Needle.Pointers.schema/1 turns a Pointer into an Ecto schema module name
you can switch against. Needle.Pointers.plan breaks down a list of Needle
into a map of ids keyed by schema module. It is handy to define some
functions in your (non-library) application that can load any type of
pointer in given contexts.

Querying Needle #

Since Pointer has a table, you can use its table_id field to
filter by pointed type. Needle.Tables.id!/1 (or ids!/1 for a
list) can be used to obtain the IDs for a table or tables.

Then you run into another problem, that even though you know all of
the tables you're working with will have a certain field, you need to
know which table they are to work with them! The solution to this is
what we are calling 'mixin tables' for convenience.

A mixin table has a Pointer primary key along with any other fields
you wish to store in this mixin. By moving fields out to mixin tables,
you gain knowledge of the table name to which you need to join.

An example mixin schema:

defmodule My.Creator do
  use Needle.Mixin,
    otp_app: :my_app,
    source: "creator"

  mixin_schema do
    belongs_to :creator, My.User
  end
end

Mixin tables are not themselves pointable, so there is no need to
specify a table id as when defining a pointable schema.

The migration for this is slightly more complicated:

defmodule My.Creator.Migration do

  import Ecto.Migration
  import Needle.Migration

  defp creator_table(), do: My.Creator.__schema__(:source)
  defp user_table(), do: My.User.__schema__(:source)

  def migrate_creator(index_opts \\ []),
    do: migrate_creator(index_opts, direction())

  defp migrate_creator(index_opts, :up) do
    create_mixin_table(creator_table()) do
      add :creator_id, strong_pointer(user_table()), null: false 
    end
    create_if_not_exists(unique_index(creator_table(), [:creator_id], index_opts))
  end

  defp migrate_creator(index_opts, :down) do
    drop_if_exists(unique_index(creator_table(), [:creator_id], index_opts))
    drop_mixin_table(creator_table())
  end
end

Virtual pointables ("virtuals") #

Virtuals are a new addition in needle 0.6.0. They behave like
pointables that you have not added any fields to.

We noticed it was very common in bonfire to create pointables with no
extra fields just so we could use the needle system. Virtuals are
alternative for this case that requires less typing and provides a
reduced overhead vs pointables.

Virtuals are backed by a writable view onto the needle table. This
means that when we can save the cost of maintaining a primary key in
that table and the associated disk space.

In all other respects, they behave like pointables. You can have
changesets over them and select and insert as usual.

Elixir-based logic #

The practical result of needle is that it pushes a certain amount of
validation and consistency logic back into elixir land. It is
therefore your elixir code's responsibility to ensure that data is
inserted into the appropriate mixin tables when inserting a pointable
object and to manage deletions as appropriate.

When assembling queries with mixin tables, pay careful attention to
the type of join you are performing. An inner join is explicitly
asking not to be shown objects that do not have a record for that
mixin. You quite possibly wanted to left join.

Configuration and overrides #

Every pointable or mixin schema is overrideable with configuration
during compilation (this is why using them requires an :otp_app to
be specified). For example, we could override Needle.Table (which
is a pointable table) thus:

config :needle, Needle.Table, source: "my_pointers_table"

The table_id is also configurable, but we don't recommend you change it.

In addition, all pointable and mixin schemas permit extension with
Exto. See the Exto
docs
for more information about how to
extend schemas via configuration. You will probably at the very least
want to insert some has_one for mixins off your pointables.

Tradeoffs #

All solutions to the universal primary key problem have tradeofs. Here
are what we see as the deficiencies in our approach:

  1. It forces a ULID on you. This is great for us, but not
    everyone. ULID exposes a timestamp with millisecond precision. If
    the time of creation of a resource is sensitive information for
    your purposes, ULIDs are not going to be suitable for you.
  2. Ecto has no knowledge of the specialty of Pointer,
    e.g. Repo.preload does not work and you need to specify a join
    condition to join through a pointer. Use our functions or add extra
    associations with exto configuration.
  3. Dereferencing a list of needle requires a select query per table
    type that occurs in the input set.
  4. Reliance on user attention. You have to follow the instructions
    correctly to make the system work at all.
  5. There is likely some performance impact from postgres not
    understanding the relationships between the various tables
    properly. It's hard to gauge and we haven't even tried.

These are not likely to change. If you're going to pick
this library, do so in the full knowledge of the tradeoffs it makes.

Alternatives include (I'm sure you can think of others):

  • Storing the table name in a second column alongside every foreign key.
  • A compound datatype of id and table name.
  • Byte/String manipulation tricks.
  • Evil SQL hacks based upon compile time configuration.

While we have our gripes with this approach, once you've gotten the
hang of using it, it works out pretty well for most purposes and it's
one of the simpler options to work with.

Copyright (c) 2020 needle Contributors

Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.

Details
Updated 07 Jan 2024
License apache-2.0

Bonfire Networks

Star
Open Collective account Open Source