Skip to main content

ยท 2 min read

Structs in Elixir are great, they let you define some data structure and lets you do all sorts of nifty stuff like default values. But what if you want to use this particular struct inside of an Ecto query and then preload associations based on a given field?

An Example Problemโ€‹

We have a struct called Deal, which is build dynamically from an SQL query. This means that there is no table associated with our Deal structs.

We initially define it as so:

defmodule MyApp.Deal do
defstruct total_price: nil,
product_id: nil,
product: nil
end

The SQL query then populates the product_id field with the id of a product that is currently on sale, as so:

from(p in Products,
...
select: %Deal{total_price: sum(p.price), product_id: p.id}
)
|> Repo.all()

If we were to query this, we would get back a Deal struct as so (for example):

# for product with id=5
[%Deal{ total_price: 123.20 product_id: 5}]

All smooth sailing so far... or is it?

The Preloadโ€‹

What if we wanted to load our product information onto the struct? Could we perhaps use Repo.preload/3 to help?

from(p in Products,
...
select: %Deal{total_price: sum(p.price), product_id: p.id}
)
|> Repo.all()
|> Repo.preload([:product])

Trying out this updated query function out will give us this error:

 function MyApp.Deal.__schema__/2 is undefined or private

D'oh! Seems like our Deal struct does not have the schema metadata that is used by Repo.preload/3. It seems like we'll have to ditch the struct and implement a full schema backed by a table...

The Solution: Embedded Schemas To The Rescueโ€‹

The post's title kind of gave it away, but we're going to use Ecto's embedded schemas to declare our schema without actually having a table backing our schema. This allows us to declare associations with other tables, and we can then use Repo.preload/3 to load these associations automatically for us! ๐Ÿคฏ I know right?

Refactoring our code for our Deal struct into an embedded schema gives us this:

defmodule MyApp.Deal do
alias MyApp.Product
use Ecto.Schema

embedded_schema do
field(:total_price :float)
belongs_to(:product, Product)
end
end

Note that we don't have to specify both product_id and product fields, as they are automatically created with the Ecto.Schema.belongs_to/2 macro.

Now, preloading our product information works perfectly fine!

ยท 3 min read

Ocassionally, when you've got lots of business logic defined, you may need to perform some heavy calculations outside of your main SQL query and then join back the calculated result set into your final query to perform some statistical final calculation. Usually, the calculated result set would be in the form of a list of maps or list of tuples.

Thankfully, we can use Ecto.Query.with_cte/3 to help with this. With the help of the PostgreSQL function unnest, we can interpolate arrays into the query while also defining the data type for that temporary column.

There are 3 main steps with this technique:

  1. Prepare the data into separate lists
  2. Create the CTE query
  3. Join on the CTE child query as a subquery in the main query

Step 1: Prepare the Dataโ€‹

We need to get the data into a format which we can then interpolate easily as lists. We also need to convert them to a data type that PostgreSql can understand. For example:

iex> data = [test: 1, id: 2] |> Enum.unzip()
iex> data
{[:test, :testing], [1, 2]}
iex> {string_col, int_col} = data
iex> string_col = Enum.map(string_col, &Atom.to_string/1)

iex> string_col
["test", "testing"]

iex> int_col
[1, 2]

Creating the Common Table Expression Ecto Queryโ€‹

Creating the query requires the use of fragments, as well as specifying the data type for each interpolated column. We will also need to provide the CTE with a name. Note that the name must be a compile-time string literal, as noted in the docs. This means that dynamic table names are not possible.

scores_query = with_cte("names", 
as: fragment("""
select name, val from unnest(?::text[], ?::integer[]) t (name, val)
""",
^string_col,
^int_col
)
)
|> select([n], %{name: n.name, val: n.val})

The fragment calls the unnest sql array function, and creates two columns that we can then name name and val. Within the fragment, we also select the name of the columns that we want.

Thereafter, we use an Ecto.Query.select/2 function to help make this query understandable to Ecto. This helps when we utilize this query in dynamic query compositions.

Joining on the CTEโ€‹

Since we have created our CTE query in something that Ecto can understand, we can finally use it in our main query.

from(s in Student
join: sc in subquery(scores_query),
on: sc.name == s.name,
select: {s.name, sc.val}
)
|> Repo.all()

In this scenario, the name column in the students table is the primary key, and we join on that to allow us to select the respective scores of each student.

Hope this helps!

ยท 6 min read
Ziinc

So you've trained your spaCy NLP model, you've got it to the highest accuracy you can push it to, and now you want to use it in your project or product. Now, what do you do?

If we were to take a glance at the official spaCy documentation for packaging, they advise using their command line interface to build a deployable Python package containing the model. This results in a package that can then be imported as follows:

import my_model_name

nlp = my_model_name.load()

However, that isn't good enough for us, organization-wise. Imagine having to manage so many separate packages for each machine learning model that you want to deploy. Just think of those import statements! Oh the horror! shudders

ยท 6 min read

Composability is the name of the game when it comes to writing Ecto queries. With such a beautifully designed DSL given to us, we should make full use of its design to build our queries.

A Brief on Recursive Functionsโ€‹

Recursive functions are functions that call themselves. There is the risk of functions becoming infinite loops if they are not given exit conditions.

Here's an example:


def eat(food) when food == "biscuits" do
eat(food, "water")
end

def eat(food) do
eat(food, nil)
end

# return the final digestable result
def eat(food, drink) do
{food, drink}
end

In the example, we keep calling the eat function until we end up with a digestable form of input, a food input and drink.

Building Queries Based on Flagsโ€‹

When we want to build a query, the most simple way (as an end user) would be to use flags. This allows the user to describe the type of data that they want returned.

For example, if I wanted to specify a filter, all I would need to do is add an option where_id: 5 to filter the query to all records where the id is 5.

Hence, ideally, we should interface with our function like so:

...
food = list_food(is_dry: true, origin: "us")

Using Enum.reduce/3 for a Naive Implementationโ€‹

Utilizing Enum.reduce/3 is an extremely simple way to build up your query, as it will iterate over all your options and call a function for each option.

def list_food(opts \\ [])
# we give some default options, in this case we limit the output to 5 by default
opts = Enum.into(opts, %{limit: 5})
base_query = from(f in Food)

# the base query is the accumulator, and we constantly call the function for each option pair
Enum.reduce(opts, base_query, &build_query/2)
|> Repo.all()
end

# the accumulator is always the second argument
# the map key-value pairs are passed as tuples
def build_query({:limit, value}, query), do: limit(query, value)

# to filter by origin, a string column
def build_query({:origin, loc}, query) when is_binary(loc), do: where(query,[f], f.origin == ^loc)

# to filter by dryness
def build_query({:is_dry, true}, query), do: where(query,[f], f.type == "dry")

# this is for unrecognized options
def build_query(_, query), do: query

This implementation will work for simple situations, but what if we have an option that depends on another option? Or what if we need to access multiple options at once?

Ah, these issues are not so simple to solve when using Enum.reduce/3 as the backbone for our recursive query building.

What other methods can we use, then, for managing the recursive nature of our function? Why, the head-tail recursion technique, of course!

Head-Tail Recursion Implementationโ€‹

Let's re-implement the function, but this time, addressing our new list of concerns.

This function needs to:

  1. access all options at the same time
  2. control the option execution order
def list_food(opts \\ [])
# our defaults
opts = Enum.into(opts, %{limit: 5})

from(f in Food)
|> build_query(opts)
|> Repo.all()
end

# we expect the 2 arity function to always receive the option as a map.
# We then convert the map to a list of keys, and use it as the 3rd parameter
def build_query(query, opts), do: build_query(query, opts, Map.keys(opts))

# match for the :limit option
def build_query(q, %{limit: value}, [:limit | t]) do
limit(query, value)
|> build_query(opts, t)
end

# match for the :origin option
def build_query(q, %{origin: loc}, [:origin | t]) when is_binary(loc) do
where(query,[f], f.origin == ^loc)
|> build_query(opts, t)
end

# match for the :is_dry option
def build_query(q, %{is_dry: true}, [:is_dry | t]) do
where(query,[f], food.type == "dry")
|> build_query(opts, t)
end

# control the option execution stack as needed
# to process this option, we need to have a join with brands first
def build_query(q, %{country: iso}, [:is_dry | t]) when is_binary(iso) do
if has_named_binding?(q, :brands) do
# we utilize the named binding to filter by the country's ISO abbreviation
where(query,[f, brands: b], b.iso == ^iso)
|> build_query(opts, t)
else
# add the :brands key to the front of the stack, then add country again, then the remaining tail end.
build_query(q, opts, [:brands, :country] ++ t)
end
end

# we add this join on demand, as not every query needs it
def build_query(q, _opts, [:brands |t]) do
join(:left, [f], b in Brands, on: food.brand_id == b.id, as: :brands)
|> build_query(opts, t)
end

# this is for unrecognized options, we skip over it
def build_query(q, opts, [_ | t]), do: build_query(q, opts, t)

# no more options to process, let's exit the function now
def build_query(q, _, []), do: q

Let's break this down:

  1. We call the function with a map of our options
  2. We convert these options into a list of keys, and pass it as the 3rd parameter of our recursive function.
  3. For each option handler, we match on our required option keys in the map (on the 2nd parameter), and on the head of the list. Essentially, we are popping off an option from the stack of option keys we need to process.
  4. If we have pre-requisite conditions before processing the option, we can call add in or sort the option stack to ensure oure requisite option is processed first before the current option is procesed (see the :country option handler and how it checks for the :brands join before adding a where clause to the query).
  5. Call the function again with the tail-end of the option stack, ensuring that each option in the stack is processed.
  6. Exit the recursive function when all options are processed and an empty list is remaining.

Although more verbose, this gives you the ultimate control over the flow of query building, as well as allowing you full access to all options at the same time.

Other Benefitsโ€‹

Besides composability at outer layers of your application, you can also utilize this to build your subqueries easily.

For example, you can use this technique to build queries for other tables, then use the resultant query in a subquery.

A super simplified example:

...
food_query = from(f in Food)
|> Foods.build_query(is_dry: true)

# to get all dry restraunt food
from(r in RestrauntFoods,
join: f in subquery(food_query),
on: r.food_id == f.id
)
|> Repo.all()
...

I usually use the above in situations where the subqueries are complex aggregations and would benefit from the re-usability.

The head-tail recursion method is very common in Elixir, and having it in your Ecto query building toolbox would save you from re-writing many queries from scratch.