Skip to main content

6 posts tagged with "erlang"

View All Tags

Β· 3 min read
πŸ‘‹ I'm a dev at Supabase

I work on logging and analytics, and manage the underlying service that Supabase Logs and Logflare. The service does over 7 billion requests each day with traffic constantly growing, and these devlog posts talk a bit about my day-to-day open source dev work.

It serves as some insight into what one can expect when working on high availability software, with real code snippets and PRs too. Enjoy!😊

When working with distributed Erlang applications at Logflare/Supabase, we encountered an interesting issue where the :global_name_server would become overwhelmed with messages, leading to a boot loop situation. This issue is particularly relevant when dealing with the prevent_overlapping_partitions feature.

Understanding the Boot Loop​

The boot loop behaviour comes about when the global name server becomes overwhelmed with messages in scenarios involving network partitions, where many nodes are connecting or disconnecting simultaneously. This can create a cascade effect where:

  1. The global name server receives too many messages
  2. Message processing delays lead to timeouts
  3. Node reconnection attempts trigger more messages
  4. GOTO 1

This behaviour is closely related to OTP issue #9117, and within the issue, I highlighted several main potential factors that could be causing the issue depsite the throw fix that Rickard Green had implemented.

We also observed that this behaviour occurs even when not using :global at all. For Logflare, we had migrated our distributed name registration workloads to use the wonderful :syn library. Hence, this bug is more related to the core syncing protocol of :global.

The throw in restart_connect()​

When the :global server attempts to connect to a new node, it will perform a lock to sync the registered names between each node. In the syncing protocol, the :global server will perform a check to verify that the node is not already attempting to perform a sync (indicated by the pending state) within the server. If it is already attempting a sync, it will instead cancel the connection attempt and retry the connection.

Without a throw, it will result in a deadlock situation, where the :global server will wait forever for the node to complete the sync.

prevent_overlapping_partitions to the rescue​

As documented in :global:

As of OTP 25, global will by default prevent overlapping partitions due to network issues by actively disconnecting from nodes that reports that they have lost connections to other nodes. This will cause fully connected partitions to form instead of leaving the network in a state with overlapping partitions.

This means that :global by default will actively disconnect from nodes that report that they have lost connections to other nodes. For small clusters, this is generally a good feature to have so that the cluster can quickly recover from network issues. However, for large clusters, this can cause a lot of unnecessary disconnections and can lead to the above boot loop issue.

As of time of writing, disabling the prevent_overlapping_partitions feature has allowed our cluster to overcome this boot loop issue by preventing a flood of disconnection messages across clusters. However, this flag needs to be used with caution when using the :global server for name registration, as it may result in inconsistencies if there are overlapping partitions and mutliple instances of the same name are registered. Application code needs to be able to handle this case.

Monitoring strategies​

When dealing with large clusters, I would recommend implementing monitoring for:

  • global name server message queue length -- the main indicator of the issue
  • memory usage of the global name server -- a secondary indicator of long message queues

Tracing the :global server callbacks at runtime is also a good way to debug the issue, though it is usually not easy as the time window before the node goes out-of-memory is usually very short.

I explain this in more detail in my post on understanding Erlang's :global prevent_overlapping_partitions Option.

Β· 5 min read
πŸ‘‹ I'm a dev at Supabase

I work on logging and analytics, and manage the underlying service that Supabase Logs and Logflare. The service does over 7 billion requests each day with traffic constantly growing, and these devlog posts talk a bit about my day-to-day open source dev work.

It serves as some insight into what one can expect when working on high availability software, with real code snippets and PRs too. Enjoy!😊

The :syn library provides a distributed process registry for Elixir applications, offering an alternative to :global for name registration across clusters. It allows you to define custom event handler callbacks to handle process conflicts and registration scenarios.

The out-of-the-box features will largely suit majority of use cases, but there are a few important behaviours to consider:

  1. :syn will always default to keeping the most recently registered process. This may result in older state being lost due to the conflict resolution.
  2. :syn by defualt has millisecond precision when comparing process recency. In clustered setups with high number of nodes, this may result in conflicts being resolved incorrectly without a deterministic resolution strategy.
  3. The moment a custom event handler callback is implemented, it will override the default behaviour of :syn and all process conflicts MUST be resolved and handled within the callback. :syn will not perfom any cleanup of processes post-callback, hence it is very important to terminate all unwanted processes within the callback to prevent memory leaks or other unexpected behaviour.

Understanding Syn Event Handlers​

When multiple processes attempt to register with the same name across a distributed cluster, :syn provides custom event handlers to resolve these conflicts. These handlers are useful for process migration between nodes, network partition recovery, supervisor restart scenarios, and cases where high-precision timestamp-based conflict resolution is needed.

Let's explore a few scenarios where custom event handlers can be useful.

Killing Processes and Supervisors​

In scenarios where you want to ensure only one process exists for a given name, you might want to terminate conflicting processes or their supervisors.

defmodule MyApp.SynEventHandler do
@behaviour :syn_event_handler

def on_process_registered(scope, name, pid, meta) do
# Process successfully registered
:ok
end

def on_process_unregistered(scope, name, pid, meta, reason) do
# Process unregistered
:ok
end

def on_registry_conflict(scope, name, {pid1, meta1}, {pid2, meta2}) do
# Kill the newer process and its supervisor
case compare_registration_priority(meta1, meta2) do
:keep_first ->
terminate_process_and_supervisor(pid2)
{pid1, meta1}

:keep_second ->
terminate_process_and_supervisor(pid1)
{pid2, meta2}
end
end

defp terminate_process_and_supervisor(pid) do
# Find and terminate the supervisor
case find_supervisor(pid) do
{:ok, supervisor_pid} ->
Supervisor.terminate_child(supervisor_pid, pid)
:error ->
try_to_stop_process(pid)
end
end

@doc """
Tries to stop a process gracefully. If it fails, it sends a signal to the process.
"""
@spec try_to_stop_process(pid(), atom(), atom()) :: :ok | :noop
defp try_to_stop_process(pid, signal \\ :shutdown, force_signal \\ :kill) do
GenServer.stop(pid, signal, 5_000)
:ok
rescue
_ ->
Process.exit(pid, force_signal)
:ok
catch
:exit, _ ->
:noop
end

defp find_supervisor(pid) do
# Implementation to find the supervisor of a given process
# This could involve walking the supervision tree
end

defp compare_registration_priority(meta1, meta2) do
# Custom logic to determine which process should be kept
# Could be based on node priority, timestamps, etc.
end
end

Keeping the Original Process​

Sometimes you want to preserve the original process and reject new registration attempts:

defmodule MyApp.KeepOriginalHandler do
@behaviour :syn_event_handler

def on_registry_conflict(scope, name, {pid1, _meta1, timestamp1}, {pid2, _meta2, timestamp2}) do
# Always keep the first registered process
# this is in millisecond precision
if timestamp1 < timestamp2 do
Logger.info("Keeping original process #{inspect(pid1)} for #{name}")
pid1
else
Logger.info("Keeping original process #{inspect(pid2)} for #{name}")
pid2
end
end
end

However, what if we somehow have a situation where the timestamps are exactly the same (no matter how unlikely it is)? We can use nanosecond timestamps stored in process metadata to resolve the conflict with higher precision.

Nanosecond Timestamp Resolution​

First, register processes with nanosecond timestamp metadata:

defmodule MyApp.MyProcess do
@doc """
Registers a process with nanosecond timestamp metadata for high-precision conflict resolution.
"""
def start_link(some_args) do
nanosecond_timestamp = System.os_time(:nanosecond)
GenServer.start_link(__MODULE__, some_arg, name: {:via, :syn, {:my_scope, __MODULE__, %{timestamp: nanosecond_timestamp}}})
end
end

Then implement the event handler with fallback to syn's built-in millisecond timestamp when metadata isn't available:

defmodule MyApp.SynEventHandler do
@moduledoc """
Event handler for syn. Always keeps the oldest process.
"""
@behaviour :syn_event_handler

require Logger

@impl true
def resolve_registry_conflict(scope, name, pid_meta1, pid_meta2) do
{original, to_stop} = keep_original(pid_meta1, pid_meta2)

# Only stop process if we're the local node responsible for it
if node() == node(to_stop) do
{pid1, _meta1, _} = pid_meta1
{pid2, _meta2, _} = pid_meta2

try_to_stop_process(to_stop, :shutdown, :kill)
end

original
end

# Use nanosecond-precision timestamp from metadata when available
defp keep_original(
{pid1, %{timestamp: timestamp1}, _syn_timestamp1},
{pid2, %{timestamp: timestamp2}, _syn_timestamp2}
) do
if timestamp1 < timestamp2, do: {pid1, pid2}, else: {pid2, pid1}
end

# Fallback to syn's built-in millisecond timestamp when metadata isn't present
defp keep_original(
{pid1, _meta1, syn_timestamp1},
{pid2, _meta2, syn_timestamp2}
) do
if syn_timestamp1 < syn_timestamp2, do: {pid1, pid2}, else: {pid2, pid1}
end

defp try_to_stop_process(pid, signal, force_signal) do
GenServer.stop(pid, signal, 5_000)
rescue
_ -> Process.exit(pid, force_signal)
catch
:exit, _ -> :noop
end
end

Configuration and Usage of a Custom Event Handler​

Configure your syn event handler in your application:

# In your application.ex or config
def start(_type, _args) do
children = [
# Other children...
{:syn, [
event_handler: MyApp.SynEventHandler,
# other syn options
]}
]

Supervisor.start_link(children, strategy: :one_for_one)
end

Register processes with metadata for conflict resolution:

# Register with timestamp metadata
:syn.register(:my_scope, "unique_name", self(), %{
registered_at: System.monotonic_time(),
nano_timestamp: :erlang.monotonic_time(:nanosecond),
node: Node.self(),
priority: 1
})

Best Practices​

  1. Always include timestamps in metadata for conflict resolution
  2. Handle supervisor relationships carefully when terminating processes
  3. Use monotonic time for reliable ordering across nodes
  4. Log conflict resolutions for debugging and monitoring
  5. Test partition scenarios thoroughly

Monitoring and Observability​

Monitor syn registry conflicts and resolutions:

# Add telemetry events in your event handler
def on_registry_conflict(scope, name, proc1, proc2) do
:telemetry.execute(
[:syn, :conflict, :resolved],
%{count: 1},
%{scope: scope, name: name}
)

# ... conflict resolution logic
end

The :syn library's event handler system enables you to manage distributed process registration conflicts, resulting in robust and predictable behavior in complex distributed systems.

Β· 4 min read
πŸ‘‹ I'm a dev at Supabase

I work on logging and analytics, and manage the underlying service that Supabase Logs and Logflare. The service does over 7 billion requests each day with traffic constantly growing, and these devlog posts talk a bit about my day-to-day open source dev work.

It serves as some insight into what one can expect when working on high availability software, with real code snippets and PRs too. Enjoy!😊

The prevent_overlapping_partitions option in Erlang is a configuration parameter that affects how the :global module handles network partitions in distributed Erlang systems.

Introduced in Erlang/OTP 25, prevent_overlapping_partitions is a kernel parameter that enforces strict network partition prevention in distributed Erlang systems. When enabled (which is the default in OTP 25+), it ensures that the network remains fully connected and prevents scenarios where network partitions could lead to inconsistent states. when enabled, it essentially prevents :global from performing partitioning to avoid inconsistent states.

Since :global replicates its name/lock tables on every node and tries to keep them consistent, it will try to maintain a fully connected network mesh so updates propagate everywhere. However, an overlapping partition results in a partially connected networkβ€”for example, A is connected to B, and B is connected to C, but A and C are unable to communicate directly. In this scenario, B acts as an overlap between the two "sides", since it can reach both, while A and C cannot see each other at all. When different subsets exchange updates inconsistently, this can make :global's internal state inconsistent, and that inconsistency can remain even after the cluster becomes fully connected again.

The Official Warning​

The Erlang documentation provides a strong warning about this feature that's worth examining in detail:

Prevention of overlapping partitions can be disabled using the prevent_overlapping_partitions Kernel parameter, making global behave like it used to do. This is, however, problematic for all applications expecting a fully connected network to be provided, such as for example mnesia, but also for global itself. A network of overlapping partitions might cause the internal state of global to become inconsistent. Such an inconsistency can remain even after such partitions have been brought together to form a fully connected network again. The effect on other applications that expects that a fully connected network is maintained may vary, but they might misbehave in very subtle hard to detect ways during such a partitioning.

Erlang :global documentation

Disabling this feature can lead to subtle and hard-to-detect issues, particularly in applications that expect a fully connected network.

Real-world Examples: CouchDB and Logflare​

CouchDB​

Interestingly, CouchDB has chosen to disable this feature. In a recent commit, they explicitly turned off prevent_overlapping_partitions. Their reasoning is pragmatic:

  1. CouchDB doesn't use the :global module
  2. They have their own auto-connection module
  3. They wanted to avoid potential increased coordination and message overhead during disconnections

Their commit message explains:

# This will toggle to true in Erlang 25+. However since we don't use global
# any longer, and have our own auto-connection module, we can keep the
# existing global behavior to avoid surprises.

Logflare​

For Logflare's situation, we were experiencing instances going out-of-memory with the :global name server going into boot loops, due to flooding of disconnection messages from the syncing protocol. This would lead to certain nodes getting affected, and slowly spreading like an infection as more and more nodes get impacted from the boot loop behaviour. I dive in deeper with this post.

In the end, we were able to fix the issue by disabling prevent_overlapping_partitions and migrating all :global usage over to :syn, an alternative process registry for Erlang. Syn is used across the Supabase stack, in Realtime and now Analytics (Logflare), so it has quite a proven track record.

Conclusion​

From OTP25+, keep prevent_overlapping_partitions enabled. If you have a large cluster with over a hundred nodes with no reliance on :global for name registration, you can (and probably should) disable it to reduce the :global name server's bottleneck.

Β· 2 min read
Ziinc

Quite surprisingly, Supervisors do not have an exposed option for taking a spawn_opt. spawn_opt are process options that are used to control the process behaviour when it comes to memory management, and can be incredibly useful when hunting down garbage build-up in processes.

The Backstory​

This week in life at Supabase, we have some fun garbage collection optimization, and it mostly involves tweaking culprit process behaviours into clearing out their garbage in a timely manner.

Sometimes, garbage might build up as shown for a myriad of reasons, and we gotta take our massive major GC hammer to knock some sense into these processes that are stuck in a minor GC loop!

The Problem​

So, Supervisors don't actually take a spawn_opt, so after digging around, the only real option was to use the :erlang.process_flag/3 function, which is wrapped by Process.flag/2.

We can achieve the the :fullsweep_after tweaking as so:


def init(_arg) do
# trigger major GC after 5,000 minor GCs
Process.flag(:fullsweep_after, 5_000)
...
end

One would think that it would be accepted by Supervisor.start_link/2, but it seems like it isn't at all, and I had to dig into the Elixir source code to find that out.

A Word on Task.Supervisor​

Although the base Supervisor module doesn't accept the :spawn_opt option for its start_link/2 callback, the Task.Supervisor built-in module does accept it.

This can be see here where there is an explicit test case for this option passing.

Quite an interesting tidbit πŸ˜„