Adding Python into your Elixir web application is a very tantalizing proposition. Elixir isn't that great at crunching numbers, but by leveraging the data science powers of Python, we can add in some machine learning magic into our applications.
There are two main ways to do this:
- Using Erlang's port protocol and Elixir's equivalent to interface with the external system
- Spin up an Erlang node that implements python (a la the Pyrlang project)
In this guide, we'll focus on the first method. As always, before embarking on a project, always check for prior art. In our case, we have the Erlport project, that attempts to provide a nice interface for spinning up Python and Ruby projects. However, if we were to check the github project source code, we can see that this project is on life-support and has not had many pull requests merged in for quite a long time (at least 4 years). That doesn't totally discount this project, though. Checking out the fork network, we can see that there are many individuals still utilizing this project, with some even implementing other languages like Java.
Digging a little deeper, we also discover that some community members have taken it upon themselves to create a separate erlport organization (mentioned here) to merge in the pull requests.
Ok, we've got that settled. We can still use erlport as our solution to integrating Python. However, erlport is still not very Elixir-friendly. Doing a search on hex.pm reveals that there is the Export project, that provides a nice wrapper around the erlport project. Note that the installation of erlport is quite a hassle (as it requires the building of binaries using make), and the Export project nicely wraps the dependency without us having to manually build the project from source. It also includes a macro for syntactic sugar when calling Python code from Elixir.
I will be focusing on calling Python code synchronously. For long-running tasks, you are actually able to call code asynchronously, but that is definitely more advanced and involves message passing between the Python process and the Elixir process, hence we will leave that for another day.
Let's go!
(Optional) Create a Test Suiteโ
We must obey the testing goat. Create a unit test file in your test
directory like so to define our PyWorker
GenServer implementation.
defmodule MyApp.PyWorkerTest do
use ExUnit.Case, async: true
alias MyApp.PyWorker
test "starts up a python process" do
assert {:ok, %{py: pid}} = PyWorker.init(%{})
assert pid
end
test "duplicate/1 performs duplication of text" do
# Python code always returns charlists instead of strings
assert 'texttext' = PyWorker.duplicate("text")
end
end
This defines two tests. First, we test that a python process is started up correctly when the PyWorker is initialized. Secondly, we test that the duplicate/1
function correctly duplicates the test.
Running our tests now mix test
should cause both tests to fail.
Create a GenServer to Wrap the Python Processโ
The way erlport (and consequently Export) interfaces with the Python interpreter is through a Python process. This Python process is started with Export.Python.start/1
and Export.Python.start_link/1
, where an Elixir process identifier is returned. This PID needs to be stored somewhere, hence we will utilize a GenServer to store this state.
Create the following file in lib/py_worker.ex
:
# lib/py_worker.ex
defmodule MyApp.PyWorker do
use GenServer
use Export.Python
# optional, omit if adding this to a supervision tree
def start_link(_) do
GenServer.start_link(__MODULE__, %{}, name: __MODULE__)
end
def duplicate(text) do
GenServer.call(__MODULE__, {:duplicate, text})
end
# server
def init(state) do
priv_path = Path.join(:code.priv_dir(:my_app), "python")
{:ok, py} = Python.start_link(python_path: priv_path)
{:ok, Map.put(state, :py, py)}
end
def handle_call({:duplicate, text}, _from, %{py: py} = state) do
raw = Python.call(py, "my_module", "duplicate", [text])
{:reply, result, state}
end
def terminate(_reason, %{py: py} = state) do
Python.stop(py)
:ok
end
end
Wow that's a lot of code. Let's go through this line by line:
- We will
use
theExport.Python
macro. This will alias and require theExport.Python
module. - The
PyWorker.start_link/1
function will be called when we add this GenServer to a supervision tree. We also register the name of the process as the module name, so that we can reference it in the future without constantly passing process IDs. This assumes that you are only having one GenServer process in your supervision tree. In the unlikely event that you do not require this process to be supervised, you can omit this callback. - We define a
duplicate/1
function that makes a synchronous call to the process, that will be handled by thehandle_call/3
callback defined further below. - The
init
callback initializes the Python process and stores its proces id under the:py
key in the GenServer's state. Note that we have to use:code.priv_dir/1
in order to retrieve the private directory folder in our release. We cannot simply usePath.expand("priv/python")
, as this would not be available in our application's release distribution. The:python_path
option is the module search path that python will search for. We will create thispriv/python
directory later on.- Also note that we use
Export.Python.start_link/1
, which will link the created Python process to this current PyWorker process. This is desirable, as it will ensure that if either process crashes and dies, the PyWorker process will be restarted. As the saying goes, let it crash.
- Also note that we use
- Implement the
handle_call/3
callback, which is called by theduplicate/1
function declared above. We pattern match on the state to obtain the Python process ID, and then take use it to call theduplicate
function in themy_module
python module. The list is passed to this function call, with thetext
variable being placed as the first positional argument. - We implement the
terminate/2
callback, to gracefully stop the python process (so that we don't have orphaned processes running around our machine).
Implement the Python Interface Moduleโ
We will create an interface module that will declare or import the relevant python functions that we want. As you may have inferred from above, you can have multiple namespace modules for different Python contexts. That is up to you to decide, and there are no hard rules for this.
Create the following file in your priv
directory:
# priv/python/my_module.py
def duplicate(text):
return text + text
We define a simple function to duplicate our text through string concatenation. Of course, you can do more complex stuff like text cleaning, etc. I suggest using a simple function for wiring things up first.
Checking Our Testsโ
Now, since our python code is implemented, let's try running the tests again with mix test
. You should have two passing tests now!
Installing Python Dependenciesโ
In production use, you will likely utilize some python dependencies and you'll need to include them in your distribution.
Simply ensure that you have your Python virtual environment activated before running your test suites or production code. When deploying your distribution, ensure that you have your dependencies installed through requirements.txt
beforehand.
If you need to customize the python executable path, use the :python
option for Export.Python
.
Wrapping Upโ
You should now have working python code inside your Elixir application!
Some things to note:
Results returned by the Python process are charlists. This includes map keys. For example:
iex> duplicate_return_as_map("text")
%{'result' => 'texttext'}
I recommend manually converting map keys into atom keys using a for/into/do
comprehension, to standardize your interfaces.
You can send async messages between each process. This can be done through the erlport
python library, which implements the message passing functionality for the Python process. This is ideal for long-running work, such as running data through a cleaning pipeline, or for performing model training. You can listen for messages from the Python process by implementing the handle_info/2
callback.
For large beam clusters, opt for Pyrlang. As mentioned above, Pyrlang is an erlang node that executes python. This project is currently being actively developed, unlike erlport which is on life-support. However, depending on your application architecture, you may not need or want to manage mutliple nodes. In that case, opt for the more light-weight erlport (Export). The project may very well pick up steam again as more developers add different languages.
That's all for now, hope this helps!