Today, we're talking about something that isn't really well explained in the Erlang documentation โ resource locking.
13 posts tagged with "elixir"
View All TagsUsing A Hunky Poolboy To Manage Your Python ErlPort Processes in Elixir
So, you've got lots of Python code being executed from your Elixir app, and lots of ad-hoc calling of your Python code. On one hand, you want to dynamically scale to manage demand, but at the same time, you don't want to accidentally crash your application by starting too many Python processes. So... what what's a frazzled developer gotta do to save himself some headache? The key problem that we are trying to solve here pooling, which is the the management of limited resources kept at the ready for any ad-hoc requests for usage.
Why Erlang/Elixir? A Fanboy's Argument for Scalable, Distributed, Concurrent Programming While Staying Sane
I think that it isn't a well kept secret that I'm a fanboy of Elixir. Almost half of my blog posts (at point of writing) are related to Elixir and its ecosystem. But, how exactly did I get to this point? And why? In this post, let's deep dive into what makes Erlang/Elixir so special and why the latest innovations out of this ecosystem keeps me coming back for more.
The Whatโ
What is Erlang?โ
Erlang is functional programming language that made its debut in 1986 in the telecoms industry, due to the need for distributed, fault-tolerant, and highly available telecoms systems at Ericsson. As a result of this initial use case, the base suite of libraries that make up the core of Erlang was named the Open Telecom Platform (OTP), and the name persists till this date, even though Erlang and sibling languages are used beyond the telecoms arena.
The BEAMโ
When we run Erlang code, we need to run it on the BEAM, a virtual machine that compiles Erlang code into bytecode and then executes it. At runtime, the interpreter maps each instruction to executable C-code, making instruction dispatching fast.
Functional Programmingโ
Erlang code is functional, meaning that there is no shared state, mutability, or side effects. Instead of creating classes to encapsulate our logic and state, we compose pure functions to do the exact same.
The design choice to prevent shared state is to allow for Erlang processes to fail without affecting other processes. No shared memory means that one failure cannot cause another.
As Joe Armstrong, one of the creators of Erlang states:
Object Oriented Programming, this is the art of hiding side effects. If you can do it successfully you can write this in program. If you do it unsuccessfully, you get in a sucking mess.
from 26 Years with Erlang, September 22, 2014, Joe Armstrong
It is safe to say that he definitely wasn't a fan of Object Oriented Programming. By preventing side-effects, we essentially isolate our state and the way we can update it to a new state
Stateful Processesโ
Given that Erlang is functional, how exactly do we hold state in our programs? Through processes. Processes holds a programs state to memory, and can only be updated by itself. The immutability feature of Erlang means that each time we update the state, the entire state is copied, meaning that it is always predictable.
Side note: Processes is the way to encapsulate the updating of state in Erlang, but the parallel form in Haskell is called monads.
Concurrent Programming and the Actor Modelโ
Each lightweight process holds its own state, running its own code instructions. This intuitively means that we are able to write concurrent programs, with processes working in isolation of each other.
However, this gives rise to the problem of how these isolated processes will communicate. Obviously, sharing some sort of state between the two processes is not feasible and would go against the idea of state immutability and no side effects. The answer to this problem is the Actor Model, which states that "everything is an actor". In this case, each processs is considered an actor entity.
When actors receive a message, it can concurrently do either of the following:
- send messages to other actors
- create more actors
- designate message handling behaviour
This means that in order to communicate between different processes, we send messages between them. Processes then receive these messages in an inbox, and will handle them before sending more messages, either to itself or to others.
The Whyโ
Now that we know the mechanics of Erlang and what it offers, this section deals with why exactly we will reach for Erlang for building applications.
Fault Toleranceโ
The cornerstones of fault tolerant programming are to isolate errors to make sure if one thing crash it does not crash anything else. That's what the processes in operating system do.
from Faults, Scaling and Erlang concurrency, September 24, 2014, Joe Armstrong
When writing our program, we want to ensure that if something fails, it does not bring down the entire system. Not only that, but we want it to be able to gracefully recover from unexpected failures automatically, by recovering to a known working state.
However, in order to properly recover from a failure, there must be something that watches out for that failure, and can then perform error recovery for that crash. In Erlang, any process that crashes from unhandled errors will notify any monitoring processes, allowing us to restart the crashed processes from a known working state if needed. This way, our applications can run forever due to the way we can recover from processes that crash.
This error recovery mechanism gave birth to Erlang's "let it crash" philosophy, where we would rather allow the process to crash than to program defensively. Note that whether processes crash does not actually affect the stability of the BEAM. Processes that crash will not affect the underlying Erlang virtual machine. The BEAM will simply trudge along and continue running and scheduling all remaining processes for computation.
Scalabilityโ
Process Scaling with a Single Nodeโ
At a micro level, we are able to scale computations easily by spawning multiple processes to concurrently run code instructions. As there is no limit to how many processes that can be spawned, we can shift our mindset from a sequential mindset (where incoming requests/messages are handled sequentially, one after the other), to a concurrent mindset (where a new process is spawned for each new incoming request).
This means that to handle an increase in volume, all we need to do is to create more processes to match that volume. The Erlang runtime will handle the tricky part of scheduling tasks for computation in the most efficient manner possible.
Node Scaling with Distributed Erlangโ
Of course, a single machine is usually insufficient for humungous workloads, and we will often need to horizontally scale in order to effectively tackle monumental volumes of requests. In such a case, the answer is still exactly the same, for each request received, we will spawn a new process to handle it. However, we will instead spawn this process on a separate machine that currently does not have as much computational workload. This is achieved through distributed Erlang, where each Erlang Run-Time System is capable of communicating with other nodes. As each nodes and each process are given unique identifiers, we can send spawn and send messages between nodes seamlessly, distributing the workload across multiple machines.
As alluded to in the section on fault tolerance, we can use this process monitoring capabilities to handle the situation where an Erlang node dies (whether through OS-level failure or otherwise), and reactively "move" them by spawning new processes on a different node. This functionality is provided through community libraries like swarm and horde. This improves on the fault tolerance of the Erlang cluster at a multi-node level instead of the multi-process level described above.
Other Cool Erlang Featuresโ
Besides the above selling points, there are also other features that are of note:
- Hot code replacement, where old code is replaced with newer code at runtime with zero downtime. This contributes to the Erlang VM's high-availability capabilities.
- Ports, which allows the Erlang VM to communicate with external applications and other programming languages.
- Native Implemented Functions allows for performance optimizations in areas which Erlang is not optimized for. One great example is the community library
rustler
for using Rust to expose memory-safe functions. Discord recently used this library to scale to 11 million concurrent users
Relationship with Kubernetesโ
Kubernetes is a technology that allows for the provision and orchestration of containerized applications.
Although some may think that using an Erlang cluster is mutually exclusive with a Kubernetes cluster, that is far from the truth. This by itself is a large topic of exploration beyond the scope of this post, and has already been covered by the great Jose Valim himself on the Platformatec blog.
The key points are:
- Kubernetes provides self-healing at the node level, while the Erlang VM provides self-healing at the application level.
- You can use Kubernetes' deployment mechanisms instead of Erlang's Hot Code Swapping
- We can leverage Kubernetes' registry to provide auto-clustering capabilities to our Erlang cluster, a feature provided by the community library libcluster.
The Elixir Renaissanceโ
Okay, so we have established that Erlang, the Erlang VM, and distributed Erlang are all great pieces of tech, but we also need to remember that the language is extremely old. Like, almost 40 years old (38 years at time of writing). This means that much of the syntax, paradigms and language design has not really changed much over the years. Then, in 2011, the Elixir programming language was create to build upon the solid foundation that Erlang has provided, compiling to BEAM byte code and interoperating with Erlang as well. This means that Elixir is capable of using ALL of the Erlang ecosystem and features.
Not only that, but the language draws much syntax inspiration from Ruby, resulting in a much more readable and simple syntax as compared to Erlang.
Elixir provides many improvements, such as meta-programming capabilities through abstract syntax tree maniupulation (resulting in zero-cost code generation capabilities and macros), in-built tooling for tests, documentation, releases, formatting, debugging, and lots more.
The meta-programming capabilities has allowed for many libraries to implement their own domain-specific language within the language itself through the use of macros.
Not only that, but because the language builds upon the shoulders of giants, the Elixir language is actually considered feature complete as of version 1.9. This is a huge boon for backwards campatability, code maintenance, and library churn.
But the language and its awesome tooling isn't the only thing that is drawing people to Elixir...
IoT Clustering and Communicationโ
The Nerves Project allows for the deployment of embedded software to Internet-of-Things devices, giving these devices all the benefits of Erlang clustering and inter-cluster communication.
Phoenix LiveViewโ
Fresh out of the oven, the latest cutting edge feature of the Phoenix web framework for Elixir allows for server-controlled DOM manipulation. In essence, it provides HTML diffing to send only the minimum required DOM updates over websockets, reducing the need for front-end client side code. Client events are also sent over the wire, resulting in seamless navigation that simulates a Single Page Application.
Internally, each active request creates an Erlang process, which allows for us to hold and update state base on the user's interactions (much like how Redux does it on frontend-only clients). This hybrid approach allows for more advanced server abilities and vastly reduces the required client state-related JavaScript code.
Furthermore, the API is completely inter-operable with modern JavaScript libraries, which means that you don't have to throw away all of your React/Vue/frontend-framework-du-jour code, you can interact with the server completely through JavaScript as well.
This idea is not new, with many other competing libraries and frameworks, such as Hotwire by Basecamp, intercooler.js and successor htmx, and Laravel's LiveWire for PHP. However, Phoenix LiveView leverages and builds upon many of the existing Phoenix concepts and capabilities, which focuses on developer productivity and speed of shipping.
Numerical Elixirโ
This one is hot off the press, but Numerical Elixir provides multi-dimensional arrays (tensors) to Elixir, providing a gateway for future data-oriented libraries in Elixir and making deep learning within the Elixir ecosystem possible. Notable libraries are Torchx, a LibTorch client, and EXLA, a client for Google's XLA (Accelerated Linear Algebra).
How to Adopt Erlang/Elixir?โ
If this article has swayed you in any way or form, you should understand that you don't have to start from scratch, especially if you have lots of code in different languages. What Elixir excels at is acting as a glue between languages. This is accomplished through Ports, community libraries, such as:
- ErlPort, which provides an interface for calling Python and Ruby code (and there are forks that have achieved the calling of Java code)
- Pyrlang, which is a project that implements the Erlang protocol in Python and allows Python code to interact with an Erlang cluster.
Not only that, but the Elixir language guide is extremely clear and well written, which makes picking up the language a breeze.
I could write more and more and more about Elixir, but I'll save that for other blog posts. Otherwise, this article would transform into a 10k word monstrosity.
Using Embedded Schemas for Easy Peasy Ecto Preloading
Structs in Elixir are great, they let you define some data structure and lets you do all sorts of nifty stuff like default values. But what if you want to use this particular struct inside of an Ecto query and then preload associations based on a given field?
An Example Problemโ
We have a struct called Deal
, which is build dynamically from an SQL query. This means that there is no table associated with our Deal
structs.
We initially define it as so:
defmodule MyApp.Deal do
defstruct total_price: nil,
product_id: nil,
product: nil
end
The SQL query then populates the product_id
field with the id
of a product that is currently on sale, as so:
from(p in Products,
...
select: %Deal{total_price: sum(p.price), product_id: p.id}
)
|> Repo.all()
If we were to query this, we would get back a Deal
struct as so (for example):
# for product with id=5
[%Deal{ total_price: 123.20 product_id: 5}]
All smooth sailing so far... or is it?
The Preloadโ
What if we wanted to load our product information onto the struct? Could we perhaps use Repo.preload/3
to help?
from(p in Products,
...
select: %Deal{total_price: sum(p.price), product_id: p.id}
)
|> Repo.all()
|> Repo.preload([:product])
Trying out this updated query function out will give us this error:
function MyApp.Deal.__schema__/2 is undefined or private
D'oh! Seems like our Deal
struct does not have the schema metadata that is used by Repo.preload/3
. It seems like we'll have to ditch the struct and implement a full schema backed by a table...
The Solution: Embedded Schemas To The Rescueโ
The post's title kind of gave it away, but we're going to use Ecto's embedded schemas to declare our schema without actually having a table backing our schema. This allows us to declare associations with other tables, and we can then use Repo.preload/3
to load these associations automatically for us! ๐คฏ I know right?
Refactoring our code for our Deal
struct into an embedded schema gives us this:
defmodule MyApp.Deal do
alias MyApp.Product
use Ecto.Schema
embedded_schema do
field(:total_price :float)
belongs_to(:product, Product)
end
end
Note that we don't have to specify both product_id
and product
fields, as they are automatically created with the Ecto.Schema.belongs_to/2
macro.
Now, preloading our product information works perfectly fine!
Using Common Table Expressions for Temporary Tables in Elixir (Ecto)
Ocassionally, when you've got lots of business logic defined, you may need to perform some heavy calculations outside of your main SQL query and then join back the calculated result set into your final query to perform some statistical final calculation. Usually, the calculated result set would be in the form of a list of maps or list of tuples.
Thankfully, we can use Ecto.Query.with_cte/3
to help with this. With the help of the PostgreSQL function unnest
, we can interpolate arrays into the query while also defining the data type for that temporary column.
There are 3 main steps with this technique:
- Prepare the data into separate lists
- Create the CTE query
- Join on the CTE child query as a subquery in the main query
Step 1: Prepare the Dataโ
We need to get the data into a format which we can then interpolate easily as lists. We also need to convert them to a data type that PostgreSql can understand. For example:
iex> data = [test: 1, id: 2] |> Enum.unzip()
iex> data
{[:test, :testing], [1, 2]}
iex> {string_col, int_col} = data
iex> string_col = Enum.map(string_col, &Atom.to_string/1)
iex> string_col
["test", "testing"]
iex> int_col
[1, 2]
Creating the Common Table Expression Ecto Queryโ
Creating the query requires the use of fragments, as well as specifying the data type for each interpolated column. We will also need to provide the CTE with a name. Note that the name must be a compile-time string literal, as noted in the docs. This means that dynamic table names are not possible.
scores_query = with_cte("names",
as: fragment("""
select name, val from unnest(?::text[], ?::integer[]) t (name, val)
""",
^string_col,
^int_col
)
)
|> select([n], %{name: n.name, val: n.val})
The fragment calls the unnest
sql array function, and creates two columns that we can then name name
and val
. Within the fragment, we also select the name of the columns that we want.
Thereafter, we use an Ecto.Query.select/2
function to help make this query understandable to Ecto. This helps when we utilize this query in dynamic query compositions.
Joining on the CTEโ
Since we have created our CTE query in something that Ecto can understand, we can finally use it in our main query.
from(s in Student
join: sc in subquery(scores_query),
on: sc.name == s.name,
select: {s.name, sc.val}
)
|> Repo.all()
In this scenario, the name
column in the students
table is the primary key, and we join on that to allow us to select the respective scores of each student.
Hope this helps!