Calliope Sounds: 2025

Railway modeling in the mid to late 1970s

The "From the Archive" section of the most recent issue of Railway Modeller magazine (July 2025) referenced a cottage thatching article by Allan Downes in the July 1975 issue. This brought on a flood of memories from that period when I was building my GWR branch line in N guage. His work was wonderfully inspiring even with only black & white photos along side his hand-drawn diagrams and plans. Fortunately, Railway Modeller has all of their back issues online and so I was able to read that article and an earlier one (February 1975) where he introduced the thatching technique.

In hindsight, what I really miss about this time and Downes's approach was the simplicity of it all. I knew no different. I built a whole railway station, goods depot, engine yard, and town using nothing but what a 14 year old could find and afford. The most expensive material I bought were the sheets of printed brick from the Totnes hobby shop!

I have a few, blurry photos of my layout. Saddly, I had to leave it all behind when my family immigrated to the US.

Sidekiq and exclusive workers

In certain scenarios, a Sidekiq worker requires exclusive operation or access to specific data. While it is ideal for a worker to be designed as idempotent, achieving this can be challenging, particularly in a distributed system. For instance, consider a worker responsible for sending out emails. If two instances of this worker are running simultaneously, there is a high likelihood that recipients will receive duplicate emails. Even worse, these emails might contain slightly different content, leading to user confusion. To address this issue, Sidekiq Enterprise offers mechanisms that facilitate exclusivity (and rate limiting).

Note: I wrote this to help a client focus on a few methods of worker exclusivity that avoid race-conditions.

There are three common exclusivity needs:

Exclusivity irrespective of the worker’s parameters.
Exclusivity with respect to all of the worker’s parameters.
Exclusivity with respect to some of the worker’s parameters.

There is a fourth exclusivity need, but handling this need is outside the scope of this post.

Exclusivity with respect to model data.

Using Sidekiq options

The simplest method to achieve exclusivity #2 is to utilize Sidekiq’s sidekiq_options unique_for: duration. For example,

class UniqueExampleWorker
  include Sidekiq::Worker
 
  sidekiq_options unique_for: 1.minute
 
  def perform(params = {})
    # do the work
  end
end

The option’s duration represents an estimate of how long the worker is expected to run. This exclusivity will only be maintained for the specified duration. If the worker exceeds this time, Sidekiq will not prevent a second worker from running concurrently. When utilizing this mechanism, it's advisable to adopt a conservative approach regarding the duration; in this case, opting for a longer duration is preferable to aiming for precision.

If the worker finishes before the lock_timeout duration, the lock is released and another worker can be queued.

An additional benefit of utilizing this option is that if a worker is already queued or currently running, attempting to queue another worker (eg UniqueExampleWorker.perform_async) will return nil instead of the job id.

Using Sidekiq Limiter

A more adaptable approach to achieving exclusivity for #1, #2, and #3 is to utilize Sidekiq::Limiter.concurrent. Unlike the standard Sidekiq options, this approach is implemented within the worker's perform method. The worker operates as usual, while the limiter allows for a swift exit when necessary. For example, to configure and implement exclusivity #1

class UniqueClassExampleWorker
  include Sidekiq::Worker
 
  LIMITER = Sidekiq::Limiter.concurrent(
    "unique_class_example_worker_limiter", # name
    1, # count, ie expect exclusivity
    lock_timeout: 1.minute, # expected duration of worker
    wait_timeout: 0, # don't wait around for running worker to finish
    policy: :ignore, # don't reschedule the worker for later
  )
 
  def perform(params = {})
    LIMITER.within_limit do
      # do the work
    end
  end
end

The key to exclusivity is the limiter name and the count. Here the name is based on the class name. Any concurrent use of this worker, irrespective of parameters, will not enter the within_limit block.

The limiter’s lock_timeout represents an estimate of how long the worker is expected to run. This exclusivity will only be maintained for the specified duration. If the worker exceeds this time, Sidekiq will not prevent a second worker from running concurrently. It is advisable to adopt a conservative approach regarding the duration; in this case, opting for a longer duration is preferable to aiming for precision.

Unlike when using sidekiq_options, there is no easy way to know if there is an existing worker queued or running. It is possible to examine the Sidekiq queues for the worker, but without careful use of mutual exclusion this will result in a race-condition.

Testing

When testing a worker that uses this approach you will need to replace the concurrent limiter to avoid it’s interference.

require "spec_helper"
require "sidekiq/testing"
 
RSpec.describe UniqueClassExampleWorker do
  # ...
  before do
    stub_const("#{described_class.name}::LIMITER", Sidekiq::Limiter.unlimited)
  end
  # ...
end

Exclusivity with regard to parameters

If the worker's exclusivity must be restricted by one or more parameters (or even derived data), then utilize those values to create a unique limiter name. For example, to exclusively run one worker per id parameter

class UniqueParamsExampleWorker
  include Sidekiq::Worker
 
  def perform(params = {})
    limiter = Sidekiq::Limiter.concurrent(
      "unique_params_example_worker_limiter_#{params['id']}", 
      1, 
      lock_timeout: 1.minute,
      wait_timeout: 0,
      policy: :ignore,
    )
    limiter.within_limit do
      # do the work
    end
  end
end

This example creates the limiter within the perform method. The cost of using a limiter is not with its creation (it is just a plain-old-ruby-object), but with the execution of within_limit. It is only then that the Redis implementation of the limiter occurs.

A more general implementation is

class UniqueParamsExampleWorker
  include Sidekiq::Worker
 
  LIMITER_DEFAULT_OPTIONS = {
    lock_timeout: 1.minute,
    wait_timeout: 0,
    policy: :ignore,
  }.freeze
 
  def limiter(*ids, count: 1, **options)
    digest = ids.
      append(self.class.name).
      map(&:to_s).
      reduce(Digest::MD5.new, :update).
      hexdigest
    Sidekiq::Limiter.concurrent(
      "limiter_#{digest}", 
      count, 
      LIMITER_DEFAULT_OPTIONS.merge(options),
    )
  end
 
  def perform(params = {})
    limiter(params['id']).within_limit do
      # do the work
    end
  end
end

Testing

When testing a worker that uses this approach you will need to replace the concurrent limiter to avoid it’s interference.

require "spec_helper"
require "sidekiq/testing"
 
RSpec.describe UniqueParamsExampleWorker do
  # ...
  before do
    allow_any_instance_of(described_class).
      to receive(:limiter).and_return(Sidekiq::Limiter.unlimited)
  end
  # ...
end

Handling non-exclusive uses

The Sidekiq::Limiter.concurrent examples all utilize the policy: :ignore option. This option instructs Sidekiq to disregard non-exclusive uses. It overrides the default policy: :raise which triggers the Sidekiq::Limiter::OverLimit exception. When this exception is raised, Sidekiq will reschedule the worker for a later time. There is a corresponding backoff policy and a maximum rescheduling count associated with this rescheduling process.

However, you can use this to implement a non-exclusive handler. For example,

def perform(params = {})
  limiter(params['id'], policy: :raise).within_limit do
    # do the work
  end
rescue Sidekiq::Limiter::OverLimit
  # handle the over-limit
end

Just don’t re-raise the exception!

Hilary Gridley and building feedback tools

This morning YouTube added the following interview to my feed. Embarrassingly, I realized that I have spent all of my time understanding how to use AI as a coder or as a student/researcher. I had little understanding how other professionals use it every day. Hilary Gridley, the guest, showed a fascinating example of how she uses ChatGPT to build out a feedback tool (a custom "GPT") for herself and for her staff. The feedback is on slide decks, but it is essentially what my employeer is doing for PRs without all the custom coding. I found the whole interview fascinating and now want to see more of its kind.

How custom GPTs can make you a better manager | Hilary Gridley (Head of Core Product at Whoop)
https://www.youtube.com/watch?v=xDMkkOC-EhI

SAGA and AWI

After showing my dark age miniatures I decided that perhaps I would enjoy a solo game of SAGA. After buying a nice mat and setting up the game table I discovered I no longer had the faction dice. I had forgotten I had give them away. So I followed this advice and retrofitted a load of D6 dice and the Viking and Welsh faction boards.

I played a few games. I didn't really enjoy them. I'm not an experienced solo gamer and so that was a factor. What really bothered me was the scale, ie the sweep of the game. The 28mm figures and terrain on a 6'x4' table was visually congested, and the game play had no satisfying built-up or peak. Most faction units were in melee within a few turns and the outcome was obvious. It felt flat. Perhaps if I had been able to extract a story from the encounters and ensuing battles I would have enjoyed them more. (Bloodbaths are not stories.)

I have recently been reading Henry Hyde's Shot, Steel, and Stone rules, and listening to his podcast and that of the Yarkshire gamer. Their advocacy for the large wargame has definitely affected me. Next year is the 250th anniversary of the American War of Independence. The battles are smaller than Napoleon's so it seems practical to have enough figures and landscape to make it feel authentic and not gamey. Gaming the 165ish significant battles, some quite local, on a large table full of 10mm units seems wonderful.

Side-Effects are the Complexity Iceberg

I liked Kris Jenkins's talk Side-Effects are the Complexity Iceberg. Three takeaways for me were

His description of side-effect as "every function as two sets of inputs and two sets of outputs" nicely encapsulates the idea of parameters, results, and before and after states.
The urge to rewrite the system becomes more pressing as ever fewer people remain who understand why the system is as it is. That is, it is an institutional problem and not a technical problem.
If you really hate someone, teach them to recognize bad kerning.

"A flower in the mind is better than a bee in the pocket"

I have amused myself today asking AIs to explain fictitious parables and idioms. For example, 'Explain the idiom "A flower in the mind is better than a bee in the pocket."' The explanations are often eye opening.

Wednesday Gamers

A decade ago I was invited to join the "Wednesday Gamers." This was a small group who had been playing miniature wargames together every Wednesday for decades. It was a rare gift to be in the company of these older men who had gone through life's joys and troubles, remained friends, patient with each's foibles, and continued to have a genuine enthusiasm for the hobby. Within a few years, as each retired, their situations fundamentally changed and the Wednesday gaming sessions stopped. I will forever be grateful to Al, Maurice, Leo, Kim, and Kevin.

Board game UX

I don't know if the game Star Wars: Outer Rim is enjoyable, but the game's physical design is amazing. Which reminds me I have What the Tech World Can Learn from Video Game UX lined up to watch.

Update: The video was about using gamification to enhance learning to use a handheld ultrasound scanner. Nothing new.

Inter-company event sourcing

I was in a client's meeting today and they were discussing how one of their batch processes has a dependency on data generated by a supplier's batch process. The problem is that the client does not know when the supplier's process is complete so they can start their process. (The supplier's data is accessed via an API rather than a bulk file delivery.) The discussion went along the lines of

We know their process starts around 10 PM and takes about an hour to complete. So let's give it a couple of hours to finish, just in case it's slow that day, and we will start our process at, say, midnight.

I scream into the void. This kind inter-company data processing coordination is quite common. Common enough that I am still surprised there is no standard, or industry-specific, solution we use.

South Kingstown & Narragansett restaurants

If you are looking for good restaurants in South Kingstown or Narragansett try Purslane, Duck Press, Tsunama, Agave Social Cocina Mexicana, and Pasquale's Pizzeria Napoletana.

Recent history of education funding in Rhode Island

“For decades, Rhode Island’s policymakers have operated under the myth that cutting taxes for businesses and high earners would spur economic growth. Yet the data is clear..."

SEEDS UNPLANTED - The Recent History of Education Funding in Rhode Island

A pair of formal diagram explanations. One for software and one for buildings.

The C4 set of diagrams are directly relevant to my work as a software engineer. Seeing the architecture profession's sets of diagrams is a useful reminder -- and an immediately obvious one for anyone who has lived in a house! -- that different diagrams are needed for different purposes (ie, contexts).

The C4 Model – Misconceptions, Misuses & Mistakes • Simon Brown • GOTO 2024

What's in my set of architectural documents? Sharing everything: drawings, schedules, + specs.

Why I don't want color coded logs

Most developers laugh at me when I say I don't want color coded logs. They rarely ask why. Logs with color coded structures provide no actionable information. They actually obscure actionable information by forcing a distinction without a difference. What I am looking for are clues in the logs. Those clues are often easily overlooked small tokens. I use color or inversion to distinguish them so that their occurrence immediately stands out from the background noise. The highlight script is a simple tool for this.

Unnecessary frustration and toil

I spent a good part of yesterday tracking down a problem with the staging deployment of a feature I first started back in October of last year. (That it has taken this long to get it to staging has everything to do with how this organization manages work.) When you have such a extended period between implementation and deployment you rarely retain the feature's context and even rarer an environment within which to debug problems. It took me some time to regain that context and environment. (I should have left better notes for my future self.) Once I had that it became obvious that the feature worked and that the problem lay in the deployment.

The deployment is a small Kubernetes cluster. Each service has 2 or 3 container in several pods. I figured out the pod ids and container names (why are they all different!) and opened a terminal window for each and streamed the logs. I used a small script to highlight text that I was expecting to find. I then used the feature and discovered the problem was due to a corrupted private key stored in the deployment's database.

The organization uses Honeybadger.io to record exceptions and Elasticsearch to aggregate logs. These tools are intended to improve access to the details needed to debug issues. Each tool has its own user interface and mechanisms for accessing and searching. To use them you obviously need to understand these mechanisms and, more significantly, you need to know how the organization has configured them. That is, no two organizations use the same data model for how it records exceptions and log details.

The developer needs documentation about the configuration and there was none. Well, that is not quite true. This organization has thousands of incomplete, unmaintained, and contradictory Confluence pages. The "information" available to the developer is actually worse than none at all as they will waste time trying to piece together some semblance of a coherent (partial) picture. What I eventually concluded was that it could not be done and my best path forward was to look at the raw container logs.

I understand that at this organization I am a contractor and so just developer meat. But what I have seen is that this global, financial, highly profitable organization does not do any better for their developer employees. Perhaps all industries are like this. I have only experienced the software development industry and here they are mostly the same. It makes me sad and mad to see and experience such unnecessary frustration and toil.

Transactions and some concurrency problems

A group of us are reading Kleppmann's Designing Data-Intensive Applications. Chapter 7 is on transactions and especially the different approaches used to address concurrency problems, ie the Isolation in ACID. What becomes clear is that transaction isolation levels can only mitigate some problems. It is your application's data model design and use that are mostly responsible for avoiding them. Here are the concurrency problems raised in this chapter:

Lost Updates. These occur when a process reads some records, modifies them, and writes them back. If the updated records had been modified by another process after the read then those updates would be lost.

Read Skew. This is a variation of Lost Updates due to the delays between steps in a multiple step operation. Processes A and B are interacting with the same records. Process A reads records X and Y (two steps). Process B updates records X and Y (two steps). Due to the delay between A's and B's steps, process A has the original X value but the updated Y value.

Write Skew. This occurs when process A reads some records to make a decision and then updates other records appropriate to the decision. While the decision is being made process B changes some records that would alter process A's decision. Process A is unaware of process B's changes and continues to make its updates which invalidates the data model.

Phantoms. This is a variation of Write Skew. Process A queries for records to make a decision. Process B inserts records that would have been included in process A's query results. Unaware of these inserts, process A makes its updates which invalidates the data model. The "phantoms" are the records not included in process A's query result.

Costs of helpful data flexibility.

I'm having a discussion currently with a young developer who has only ever worked in Ruby and JavaScript. I noticed that the developer had chained "symbolize_keys" to the end of a method call. In Ruby this converts a hash's keys from strings to symbols, ie { "a" => 123 } becomes { :a => 123 }. Their reason for this was to offer flexibility to the called as to how it returned the result. They thought this provided flexility and robustness. I countered that it did the exact opposite.

When a function can be given parameters and return results in multiple formats then robustness is only had when the function handles all formats equally. To do that the function needs to be tested with all formats. This can be done, but in practice, and I've seen across many organizations, it is not. Not only does the function need to be tested with the multiple formats, but the callers and the called need to be tested too. It's a combinatorial explosion of testing.

The other detriment to this flexibility is that since no function is sure of the format every function converts the data to its preferred format even if the data is already in the preferred format. This conversion adds to the function's code size and has a runtime cost (CPU and memory) on every invocation. The cost of a single use might be small, but our applications work in a world with thousands of concurrent sessions each with deep call chains, and expect microsecond responses. Those single uses add up.

My recommendation to the developer was to require one format as part of its contract and add validation that runs at least during testing. (I'd like to just tell them to use a typed language where this wouldn't even be an issue!)

I mentioned that the developer's experience is in Ruby and JavaScript. I have found that is common for such developers to not expect data to be in a specific format or type. I assume some of this comes from never being trained to always validate and convert data coming from the outside before using it inside. (Eg, directly passing around an INPUT element's value or a database's column value.) Once inside, you can be assured of its correctness. Instead, data is passed around without any function knowing a priori that it is correct.

I am unsure if I will convince this developer to not use "symbolize_keys". I am rowing against the tide.

Update: Not only did I not convince the developer, but the system's architect rejected it also.