eBPF (Extended Berkeley Packet Filter) for dummies

This is a eBPF simple primer post written with generous help from Claude.

ELI5 version

eBPF is like having magic glasses for your computer. These glasses let you see what’s happening inside your computer without stopping it or slowing it down. You can watch programs talk to each other, see how fast things are moving, and even catch bad behaviors. The best part is you can program these glasses to look for specific things and take action when they happen.

What is eBPF?

eBPF is a technology in the Linux kernel that allows you to run small programs in a safe, sandboxed environment directly in the kernel. It was originally designed for network packet filtering but has evolved into a powerful, general-purpose monitoring and tracing framework.

Key features:

  • Runs safely inside the kernel without modifying kernel code
  • High performance with minimal overhead
  • Versatile application across networking, security, and observability
  • JIT (Just-In-Time) compilation for near-native performance

eBPF Tools Ecosystem

  1. BCC (BPF Compiler Collection): A toolkit for creating eBPF programs using Python and Lua frontends.
  2. bpftrace: A high-level tracing language for eBPF, similar to awk or DTrace. It provides a simple, powerful scripting interface for writing eBPF programs.
  3. Cilium: Uses eBPF for container networking, observability, and security.
  4. Falco: Security monitoring tool that uses eBPF to detect anomalous behavior.
  5. Hubble: Network and security observability platform built on eBPF.
  6. Pixie: Observability platform for Kubernetes applications using eBPF.

What is bpftrace?

bpftrace is a high-level tracing language for eBPF that makes it easy to write small programs to trace and analyze system behavior. Think of bpftrace as the friendly interface to eBPF’s power.

Relationship to eBPF:

  • bpftrace is to eBPF what SQL is to a database engine
  • It compiles your human-readable scripts into eBPF bytecode
  • Handles the complexity of loading and running your eBPF programs
  • Provides built-in functions and easy syntax for common tracing needs

Simple bpftrace example:

# Count system calls by process name
bpftrace -e 'tracepoint:syscalls:sys_enter_* { @[comm] = count(); }'

This one-liner counts all system calls grouped by process name, demonstrating bpftrace’s concise yet powerful syntax.

Kprobes and Uprobes

Kprobes

Kprobes (Kernel Probes) are debugging mechanisms in the Linux kernel that allow you to dynamically break into any kernel routine and collect debugging and performance information non-disruptively. They’re essentially dynamic breakpoints you can insert anywhere in the kernel code.

Key features:

  • Can be attached to virtually any instruction in the kernel
  • Minimal performance impact when not triggered
  • Collect register and memory state at the probe point
  • Available in two flavors: kprobes (at function entry) and kretprobes (at function return)

Uprobes

Uprobes (User Probes) are similar to kprobes but work in userspace. They allow you to trace and instrument user applications by inserting breakpoints at specific functions or instructions.

Key features:

  • Trace applications without modifying their source code
  • Attach to specific functions in userspace programs
  • Monitor application behavior in production
  • Available as both uprobes (function entry) and uretprobes (function return)

Relationship to eBPF

Kprobes and uprobes provide the attachment points for eBPF programs to hook into kernel and user application code. The relationship works like this:

  1. Attachment mechanism: eBPF programs use kprobes/uprobes as the “hooks” to insert themselves into kernel or application execution paths
  2. Data collection: When a probe is triggered, the associated eBPF program executes, collecting data and potentially making decisions
  3. Performance: eBPF added JIT compilation to make probe handlers extremely efficient
  4. Programmability: Before eBPF, probes were limited in functionality; eBPF adds a programmable layer to determine what happens when a probe triggers

An example in bpftrace showing both kprobe and uprobe:

# Trace kernel function
bpftrace -e 'kprobe:do_sys_open { printf("Opening file: %s\n", str(arg1)); }'

# Trace user function in libc
bpftrace -e 'uprobe:/lib/x86_64-linux-gnu/libc.so.6:malloc { printf("malloc called, size: %d\n", arg0); }'

eBPF transformed kprobes and uprobes from simple debugging tools into a powerful, programmable observability framework, turning them from basic breakpoints into sophisticated monitoring tools with minimal performance impact.

How to pin a specific apt package version

I’d like to pin a specific package, say redis-server, to a specific version, in my case 7.0.*, and that seems straight-forward to do with:

Package: redis-server
Pin: version 7.0.*
Pin-Priority: 1001

Now, I would also like to have apt fail when 7.0.* is not available, either because there are only newer versions available, f.ex. 7.2.* or 7.4.* or perhaps because only older versions like 6.* are available.

I can’t seem to find a way to achieve that. I’ve read various resources only, consulted man 5 apt_preferences, but I’m still not sure how to.

I tried combining the previous pinning rule to another one with priority -1, as in the following:

Package: redis-server
Pin: release *
Pin-Priority: -1

But that seems to make all versions unavailable unfortunately. Here’s what I’m seeing:

$ apt-cache policy redis-server
redis-server:
  Installed: (none)
  Candidate: 5:7.0.15-1build2
  Version table:
     5:7.0.15-1build2 500
        500 [http://no.archive.ubuntu.com/ubuntu](http://no.archive.ubuntu.com/ubuntu) noble/universe amd64 Packages

$ cat > /etc/apt/preferences.d/redis-server
Package: redis-server
Pin: version 7.0.15*
Pin-Priority: 1001

Package: redis-server
Pin: release *
Pin-Priority: -1

$ apt-cache policy redis-server
redis-server:
  Installed: (none)
  Candidate: (none)
  Version table:
     5:7.0.15-1build2 -1
        500 [http://no.archive.ubuntu.com/ubuntu](http://no.archive.ubuntu.com/ubuntu) noble/universe amd64 Packages

I expected this configuration to provide an available candidate, since one exists (7.0.15), but that doesn’t work.

Note that a successful outcome for me is:

  • define a target wanted version, f.ex. redis-server=7.0.*
  • provide an apt preferences.d file such that:
    • when any 7.0.* versions are available, apt will install that version
    • when no 7.0.* versions are available, apt will fail installing nothing

A bad outcome is when redis-server is installed, but with a package version that does not match what I had specified as requirement (hence, different from 7.0.*).

This is on Ubuntu 24.04, although there is nothing specific to 24.04 or Ubuntu here I would think.

Any ideas?

Posted on stackoverflow, let’s see! https://unix.stackexchange.com/questions/790837/how-to-pin-an-apt-package-to-a-version-and-fail-if-its-not-available

UPDATE: based on the stackoverflow feedback, it seems that the solution wasn’t far off.

Package: redis-server
Pin: version 5:7.0.15*
Pin-Priority: 1001

Package: redis-server
Pin: release *
Pin-Priority: -1

I needed to prepend the version with the "5:“.

TIL: Styling Obsidian text paragraphs

TIL that it’s possible to style Obsidian text paragraphs in a way that allows me to focus on the actual text content and not on the fact that I need to add an artificial line break every time I type a paragraph :-)

Obsidian can use CSS snippets to style the application itself and the text/markdown content. The CSS snippets need to be saved in <vault_directory>/snippets/whatever.css.
This is how to get that “natural” book-like spacing between paragraphs, and avoid adding spurious line breaks in the markdown code:


.cm-contentContainer {
  line-height: 1.70rem;
}

.markdown-source-view.mod-cm6 .cm-content > .cm-line {
  padding-bottom: 12px !important;
}

Of course, the values for main line-height and padding will depend on your particular screen and font settings. In my case I use a screen tilted in vertical position for writing and coding, and my font of choice is the beautiful Berkeley Graphics’s Berkeley Mono.

SREcon EMEA 2022 Logo

My experience at SREcon EMEA 2022

A couple of weeks ago I attended SREcon EMEA in Amsterdam. Here’s some sparse thoughts about it, with no pretense of being exhaustive or coherent.

Looking Back

There are only a handful of conferences I’ve attended where I felt “at home”. Going back in time, Surge was the first one, then came Velocity. I’m adding SREcon to that list. It definitely felt like I was among people that speak the same language and have similar breadth and depth of expertise, and yet it is somewhat strange at the same time.

As I see it, there’s at least three “tiers” for such a big and niche conference. The FAANG folks, the tiny company with a sysadmin or devops or two, and then the big ocean of mid-sized companies, where people like us are. Our SRE team is four people and we manage a service with millions of monthly users. Needless to say, we have a lot on our plate :-)

I came to SREcon after a hiatus from conferences for some years. After a while, conferences tend to become self-referential and people start talking about the same things over and over again. I wanted to understand how things had changed in our field, what were people talking the most about, get some fresh perspectives and perhaps connect with people from other companies. What prompted me to do this was Niall Murphy tearing the SRE bible book apart.

The Question of SRE Identity

This year’s conference topic was “What could SRE be?”.
No surprise, then, that a good portion of the talks were about what I refer to as the question of identity for SREs. We have seen the same happen — and a lot more during all these years — for the DevOps movement.

What could SRE be, then? According to some presentations, one would conclude that whatever SRE is, it’s no longer what Google intended, it’s not what anyone else thinks it is either, it’s just what you think it is: a subjectivist view.

Among the Usenix slack conversations, there was a lot of chit-chat about SRE identity. My personal contribution was the following meme:

Other funny memes that were shared:

An interesting fact I learned during the conference is that the Google SRE book was written by assembling contributions from the best teams at Google, picking out their respective best practices. Paradoxically, this implies that the SRE book is not representative of how even Google itself does SRE. If you also consider that, at the time the SRE book was published (2016), Google employed about 1,200 people in the various SRE teams, the only possible conclusion is… if you are not Google, there is likely very little that you can apply to your everyday mere-mortal-SRE life.

Before you think I’m exaggerating, such conclusion was claimed by (ex-)Google engineers themselves, for example in Alex Hidalgo’s “Diamonds under Pressure” talk and (in my opinion) in one of the best talks of the conference, Emil Stolarsky’s Unified Theory of SRE. Another entertaining presentation in the same vein was Andrew Clay Shafer’s “SRE as She Is Spoke”. Andrew expressed this thesis that “progress [on the SRE journey] stops when the needs are met”, which seems a reasonable and pragmatic approach.
The videos are not up yet, but they should be in a few weeks.

Alongside to the “subjectivist” view, there were other talks, which could be classified as systems thinking, that focused on the more general and broad aspects of what SREs do, how to handle complex systems, human factors, etc… Among the best IMO were:

What else?

The question of SRE identity accounted for a notable part of the talks, but thankfully not all. It’s good to pause and reflect on our role, but personally that’s not why I was interested in SREcon, not primarily at least. What I like are the deep technical talks, where I get to know more about how other companies actually do the stuff we call SRE. Given my past conference experience, I expected Facebook/Meta’s talk to be somewhat disappointing, and it was. While some details of how Meta is structured were shared, and are always interesting, I expected a bit more on how the incident actually happened.

I loved Effie Mouzeli’s talk on how to make teams resilient, “Is Our Team as Resilient as Our Systems?”. We naturally focus on systems, but teams are a crucial part of the equation. My team and I have had to work on this a lot in the past years, and I’m hoping to share more about this soon. I felt this talk had a lot of good insights, some of which we’ve also applied over time.

Another talk that deserves a mention is Chris Sinjakli’s reflection on broadening the scope of how we work on reliability for our systems. This is sometimes difficult to do when toil is a big part of our jobs. Luckily it’s not for our team, not anymore at least, so this talk felt very relevant to me, and I recommend it.

I couldn’t attend some of the talks due to the two parallel tracks. I hope to catch-up when slides and videos will be published later on.

What about the hallway track?

In general, people say that conferences are most useful because of the casual conversations you can have in the hallways. While I do agree with it, the opportunities to have conversations vary depending on the type of person you are, and the people you meet, of course. My impression is that while some people at SREcon were happy to have conversations, most were likewise happy to be left alone, which is fair enough :-)
Just to say that it was really nice to meet people and chat, and almost all I talked to knew Kahoot! directly and were happy to share details about what they’re doing and equally interested in what we’re doing.

In some of these conversations I’ve been trying to motion for more concrete, down to earth, talks on how smaller companies like ours do SRE. It’s ok to aspire or be interested in how Google runs, but you come away with absolutely zero information that’s useful to your work life. Possibly there’s a downside even: people going home thinking they have to do whatever Google does (see chapters above) so ultimately… let’s give less importance to the Googles of the world, please!

Besides the hallway track, there was a nice “sidewalk” track. We walked around the city, 15 km a day on average — you gotta track those SLOs… — and I also managed to snap some nice pictures of Amsterdam at sunrise and sunset.

The Venue and Organization

Loved all of it, honestly the best conference I’ve ever been to. The venue was spectacular, there was plenty of space, slides were clearly visible on screen, and the food was awesome! We also used one of the available meeting rooms to participate in our own company hackaton after the conference finished, until they kicked us out. Here’s a sneak peek of what our team was working on:

I hope to return to SREcon next year in Dublin. By then, I’d love to see more not-Google, not-Meta, etc… talks on the program. Perhaps we (or you!) should think about presenting too, why not?

On feeling stupid, how mathematics is taught in school, and an Ikea bowl

This story begins with a tweet:

In other words: how to find the quantity (volume) of each layer of jelly to make sure each layer in a bowl is of equal height?

Initially, I thought of a quick solution: mark height levels every n cm on the bowl with a pen, and then just fill up to each mark with the different liquid. There won’t be any need to calculate anything.

This can work assuming the bowl is made of glass. Maybe one could mark the bowl on the outside and then fill it up and still have a sufficiently clear reference for when to stop.

But… this didn’t feel satisfying. How would one approach the problem if they had to solve it without filling the bowl, with mathematics only?

One approach could be to find the area – labeled as B below – between the X axis and the curve of the “bowl”.

We know what A + B is, that is, the height of the layer we desire, multiplied by the width of the bowl at that point. From that area (the green colored above) we subtract B, and we find A, the area of the liquid.

Can we then find the curve or mathematical function that characterizes our particular bowl? How do we find it? I tried to use different methods, unsuccessfully, for instance tracing the bowl profile on a sheet of paper. In the end, I used my phone to take a picture of the bowl, a common Ikea metal bowl we have in the kitchen.

This bowl is 20 cm in diameter. By using some amount of zoom and taking the picture from farther away, it’s possible to get a picture with less lens distortion, more representative of the actual curvature of the bowl.

I opened the picture in Photoshop, changed the image resolution to match the real dimensions, so to have 20 cm in Photoshop correspond to the width of the bowl in the picture, then proceeded to add guides every 1 cm in width and height, and sampling the bowl curve at every cm.

By doing some curve fitting, I imagined I would be able to find a formula for a hypotetical f(x) function that could approximate a half bowl shape. This required a bit of fiddling, but in the end I got something semi-accurate.

Tried several different functions to fit the curve, but the exponential was the smoothest, without artifacts that polynomial functions tend to have. Even though the fit is still not perfect, I thought I’d be good enough. In the end the function that approximates one half of my Ikea bowl is the following:

f(x) = 0.0902209 e 0.458166 x

This function defines the height of the bowl profile given the horizontal coordinate.

Now we can find the area below the curve by calculating the defined integral of this function from 0 to the x coordinate. All we need to do now is to find the points on the x axis that correspond to the equal height layers on the bowl. Let’s call them x1,x2, and x3. The last layer will fill up the bowl completely, so x4 is 10 cm since the bowl’s radius is 10 cm.

Below is an example for the first layer of my bowl, where x1 = 6.76314 cm.

I did refresh my derivative and integral rules for this, but ultimately to avoid stupid mistakes and spend another afternoon on it, I again resorted to Wolfram to do this, and the resulting area was then 4.16831 square centimeters.

This is the B area I have marked in my earlier diagram:

Let’s now find A + B, the product of the layer height, which I have set to 2 cm and the x1 coordinate, so:

A + B = 2 cm × x1
= 2 cm × 6.76314 cm
=  13.52628 cm2

We know A + B, and we know B, so we know A as well now.

A = (A + B) – B = 13.52628 cm2 – 4.16831 cm2
A = 9.35797 cm2

Ok, so finally we have the area A of our layer. An area is not a volume though, and we want to know the volume of the liquid or jelly we need to use to fill our first layer.

To do that, we need to calculate the volume of the solid obtained by the rotation (or revolution) of our curve around the y axis. Now, this is easy to visualize if thinking about a cylinder for example, but to calculate the volume of a revolution solid for an arbitrary curve?

I tried several web searches but I could not make much sense of the explanations, sometimes they have errors in the formulas… Once again I get the feeling that I’m too stupid or slow to understand. Sadly, such a feeling has been a constant companion in my life, as a child but not only… The more I grow personally and professionally, the more I think this is often related to the quality of the teaching, articles, text, or paper. In my opinion, some of these materials are not made to be understood, they’re made to make the authors seem smart and competent.

Anyway… while I understand the idea of a revolution solid, the calculations escaped me in this case, so I resorted once again to WolframAlpha, which understands the query directly, if formulated in a way it can digest. In this case the query is:

volume of solid of revolution about the y axis for y = f(x) for x = 0 to x1

If this is not an example of AI, or in Alan Kay’s words,  an amplifier of human intellect, I don’t know what is.

Wolfram calculated a volume for our revolution solid of 128.333 cm3, or 1.28333 dl.
Consider that the volume in question is the volume external to the bowl, underneath it. To get the volume inside the bowl, we need, once again, like in the earlier case of the A and B areas, to subtract this quantity from the volume of an ideal cylinder of height 2 cm and radius = x1 = 6.76314 cm (2 cm is the layer height I chose). If we do that, we obtain:

Vcyl = π × r2 × h = π × 6.763142 × 2 ≈ 287.39329

The internal volume of liquid to fill our bowl to 2 cm of height is then:

Vlayer1 = Vcyl – 128.333 cm3
Vlayer1 = 287.39329 – 128.333 = 159.06029
cm3

This result can’t be correct, can it? It seems too small a volume, there must be something wrong…

Instead of trying to fill the bowl with 159 ml of water, I slightly changed course and tried to calculate the volume of the whole bowl, to see how much water it would contain if it were to be filled to the brim. Following the same method:

Vbowl = Vcyl (h=9, r=10) – Vrev(x=0..10)
with Vcyl (h=9, r=10) = 2827.43339 cm3

and Vrev(x=0..10) = 947.447 cm3

which results in the full bowl to have a volume of:

Vbowl = 2827.43339 – 947.447
≈ 1,879.98 cm3

At this point, I felt the anticipation of a child waiting for a birthday cake. I was a bit doubtful a 20 cm bowl could hold almost 1.9 liters of water… nevertheless I took a container big enough, filled it with about 1879 grams of water, and then slowly poured the water in the ikea bowl.

I watched as the water almost immediately filled the bowl up to a half. I still had so much left that it would seem impossible but, to my excitement and complete amazement, the bowl ate up all the water, and there was not even a millimeter of height left.

Oh the absolute joy I felt in that moment! I started screaming from the excitement, my family thought I lost my mind for a minute :-) Such a nerdy thing, but so cool!

If you ask me, this is what the joy of mathematics (if you consider this mathematics, maybe it’s more engineering?) should be all about! In my opinion, this is a perfect example of what we should be teaching our children when we teach them mathematics. It was an ultimately pointless, but so intellectually satisfying achievement. I loved it!

Of course, I wasn’t done with this yet. I used the exact same procedure to calculate the volume of the other 2 cm high layers, and it turned out the first layer indeed consists of a very small volume, and the higher in the bowl you go, the more water is “absorbed” by the upper layers.

Why write this long and very boring post?

I wanted to share this positive feeling with others that, like me, may think of themselves that they don’t “get” math, they were never good in school, or maybe – like me – they were taught mathematics in a joyless, boring, mechanical or too abstract way, being completely deprived of the satisfaction that comes from the discovery and application of math to the real world.

I now wonder, perhaps there is already a website for cakes or similar that lets you calculate the volume of these layers, for different bowls, maybe with proportions of recipes, … who knows?
If you enjoyed this, I’d love to hear from you.

Thanks to Excalidraw and WolframAlpha for being awesome tools one can use to calculate layers of a bowl. Like and subscribe for more :-)

Deploying Large Deep Learning Models in Production

Most deep learning or machine learning (ML) articles and tutorials focus on how to build, train and evaluate a model. The model deployment stage is rarely covered in detail, even though it is just as important if not fundamental part of a ML system. In other words, how do we take a working ML model from a jupyter notebook to a production ML-powered API?

I hope more and more practitioners will cover the deployment aspect of ML models. For now, I can offer my own experience about how I approached this problem, hoping this will be useful to some of you out there.

Creating a useful ML model

How to create a useful ML model is the part of the work I won’t cover in this post. :-)

I assume that you already have:

  • a model or pipeline that is either pre-trained or that you have trained yourself
  • a model based on PyTorch, though most of the information here will probably help with any ML framework
  • some idea on how to make your model available as a RESTful API

First step: defining a simple API

The rest of this article will use Python as a programming language, for various reasons, the most important being that the ML model is based on PyTorch. In my specific case, the problem I worked on was text clustering.

Given a set of sentences, the API should output a list of clusters. A cluster is a group of sentences that have a similar meaning, or as similar as possible. This task is usually referred to with the term “semantic similarity”.
Here’s an example. Given the sentences:

  • “Dog Walking: 10 Simple Steps”
  • “The Secrets of Dog Walking”
  • “Why You Need To Dog Walking”
  • “The Art of Dog Walking”
  • “The Joy of Dog Walking”
  • “Public Speaking For The Modern Age”,
  • “Learn The Art of Public Speaking”
  • “Master The Art of Public Speaking”
  • “The Best Way To Public Speaking”

The API should return the following clusters:

  • Cluster 1 = (“Dog Walking: 10 Simple Steps”, “The Secrets of Dog Walking”, “Why You Need To Dog Walking”, “The Art of Dog Walking”, “The Joy of Dog Walking”)
  • Cluster 2 = (“Public Speaking For The Modern Age”, “Learn The Art of Public Speaking”, “Master The Art of Public Speaking”, “The Best Way To Public Speaking”)

The model

I plan to describe the details of the specific model and algorithm I used in a future post. For now, the important aspect is that this model can be loaded in memory with some function we define as follows:

model = get_model()

This model will likely be a very large in-memory object. We only want to load it once in our backend process and use it throughout the request lifecycle, possibly for more than just one request. A typical model will take a long time to load. Ten seconds or more is not unheard of, and we can’t afford to load it for every request. It would make our service terribly slow and unusable.

A simple Python backend module

Last year I discovered FastAPI, and I immediately liked it. It’s easy to use, intuitive and yet flexible. It allowed me to quickly build up every aspect of my service, including its documentation, auto-generated from the code.

FastAPI provides a well-structured base to build upon, whether you are just starting with Python or you are already an expert. It encourages use of type hints and model classes for each request and response. Even if you have no idea what these are, just follow along FastAPI’s good defaults and you will likely find this way of working quite neat.

Let’s build our service from scratch. I usually start from a python virtualenv, an isolated python environment where you can install your dependencies.

virtualenv --python /usr/bin/python3.8 .venv
source .venv/bin/activate

If you are not familiar with virtualenv, there are many tutorials you can read online.
Next step, we write our requirements file, with all the python modules we need to run our project. Here’s an example:

# --- requirements.txt
fastapi~=0.61.1

Save the file as requirements.txt. You can install the modules with pip. There are plenty of guides on how to get pip on your system if you don’t have it:

pip install -r requirements.txt

Doing so will install FastAPI. Let’s create our backend now. Copy the following skeleton API into a main.py file. If you prefer, you can clone the FastAPI template published at https://github.com/cosimo/fastapi-ml-api:

from typing import Optional

from fastapi import FastAPI

app = FastAPI()
model = get_model()

@app.post("/cluster")
def cluster():
return {"Hello": "World"}

You can run this service with:

uvicorn main:app --reload

You’ll notice right away that any changes to the code will trigger a reload of the server: if you are using the production ML model, the model own load time will quickly become a nuisance. I haven’t managed to solve this problem yet. One approach I could see working is to either mock the model results if possible, or use a lighter model for development.

Invoking uvicorn in this way is recommended for development. For production deployments, FastAPI’s docs recommend using gunicorn with the uvicorn workers. I haven’t looked into other options in depth. There might be better ways to deploy a production service. For now this has proven to be reliable for my needs. I did have to tweak gunicorn’s configuration to my specific case.

Running our service with gunicorn

The gunicorn start command looks like the following:

gunicorn -c gunicorn_conf.py -k uvicorn.workers.UvicornWorker --preload main:app

Note the arguments to gunicorn:

  • -k tells gunicorn to use a specific worker class
  • main:app instructs gunicorn to load the main module and use app (in this case the FastAPI instance) as the application code that all workers should be running
  • --preload causes gunicorn to change the worker startup procedure

Preloading our application

Normally gunicorn would create a number of workers, and then have each worker load the application code. The --preload option inverts the sequence of operations by loading the application instance first and then forking all worker processes. Because of how fork() works, each worker process will be a copy of the main gunicorn process and will share (part of) the same memory space.

Making our ML model part of the FastAPI application (or making our model load when the FastAPI application is first created) will cause our model variable to be “shared” across all processes!

The effect of this change is massive. If our model, once loaded into memory, occupies 1 Gb of RAM, and we want to run 4 gunicorn workers, the net gain is 3 Gb of memory that we will have available for other uses. In a container-based deployment, it is especially important to keep the memory usage low. Reclaiming 75% of the total memory that would otherwise be used is an excellent result.

I don’t know enough details about PyTorch models or Python itself to understand how this sharing keeps being valid across the process lifetime. I believe that modifying the model in any way will cause copy-on-write operations and ultimately the model variable to be copied in each process memory space.

Complications

Turns out we don’t get this advantage for free. There are a few complications with having a PyTorch model shared across different processes. The PyTorch documentation covers them in detail, even though I’m not sure I did in fact understand all of it.

In my project I tried several approaches, without success:

  • use pytorch.multiprocessing in the gunicorn configuration module
  • modify gunicorn itself (!) to use pytorch.multiprocessing to load the model. I did it just as a prototype, but even then… bad idea
  • investigate alternative worker models instead of prefork. I don’t remember the results of this investigation, but they must have been unsuccessful
  • use /dev/shm (Linux shared memory tmpfs) as a filesystem where to store the Pytorch model file

A Solution?

The approach I ended up using is the following.

gunicorn must create the FastAPI application to start it, so I loaded the model (as a global) when creating the FastAPI application, and verified the model was loaded before that, and only loaded once.

I added the preload_app = True option to gunicorn’s configuration module.

I limited the amount of workers (my tests showed 3 to work best for my use case), and limited the amount of requests each gunicorn worker will serve. I used max_requests = 50. I limited the amount of requests because I noticed a sudden increase in memory usage in each worker regularly some minutes after startup. I couldn’t trace it back to something specific, so I used this dirty workaround.

Another tweak was to allow the gunicorn workers to start up in a longer than default time, otherwise they would be killed and respawned by gunicorn’s own watchdog as they were taking too long to load the ML model on startup. I used a timeout of 60 seconds instead of the default 30.

The most difficult problem to troubleshoot was workers suddenly stopping and not serving any more requests after a short while. I solved that by not using `async` on my FastAPI application methods. Other people have reported this solution not working for them… This remains to be understood.

Lastly, when loading the Pytorch model, I used the .eval() and .share_memory() methods on it, before returning it to the FastAPI application. This is happening just on first load.

For example, this is how my model loading looks like:

def load_language_model() -> SentenceTransformer:
    language_model = SentenceTransformer(SOME_MODEL_NAME)
    language_model.eval()
    language_model.share_memory()

    return language_model

The value returned by this method is assigned to a global loaded before the FastAPI application instance is created.

I doubt this is the way to do things, but I did not find any clear guide on how to do this. Information about deploying production models seems quite scarce, if you remember the premise to this post.

In summary:

  • preload_app = True
  • Load the ML model before the FastAPI (or wsgi) application is created
  • Use .eval() and .share_memory() if your model is PyTorch-based
  • Limit the amount of workers/requests
  • Increase the worker start timeout period

Read on for other tips about dockerization of all this. But first…

Gunicorn configuration

Here’s more or less all the customizations needed for the gunicorn configuration:

# Preload the FastAPI application, so we can load the PyTorch model
# in the parent gunicorn process and share its memory with all the workers
preload_app = True

# Limit the amount of requests a single worker will handle, so as to
# curtail the increase in memory usage of each worker process
max_requests = 50

Bundling model and application in a Docker container

Your choice of deployment target might be different. What I used for our production environment is a Dockerfile. It’s easily applicable as a development option but also good for production in case you deploy to a platform like Kubernetes like I did.

Initially I tried to build a Dockerfile with everything I needed. I kept the PyTorch model file as binary in the git repository. The binary was larger than 500Mb, and that required the use of git-lfs at least for Github repositories. I found that to be a problem when trying to build Docker containers from Github Actions. I couldn’t easily reconstruct the git-lfs objects at build time. Another shortcoming of this approach is that the large model file makes the docker container context huge, increasing build times.

Two stage Docker build

In cases like this, splitting the Docker build in two stages can help. I decided to bundle the large model binary into a first stage Docker container, and then build up my application layer on top as stage two.

Here’s how it works in practice:

# --- Dockerfile.stage1

# https://github.com/tiangolo/uvicorn-gunicorn-fastapi-docker
FROM tiangolo/uvicorn-gunicorn-fastapi:python3.8

# Install PyTorch CPU version
# https://pytorch.org/get-started/locally/#linux-pip
RUN pip3 install torch==1.7.0+cpu torchvision==0.8.1+cpu torchaudio==0.7.0 -f https://download.pytorch.org/whl/torch_stable.html

# Here I'm using sentence_transformers, but you can use any library you need
# and make it download the model you plan using, or just copy/download it
# as appropriate. The resulting docker image should have the model bundled.
RUN pip3 install sentence_transformers==0.3.8
RUN python -c 'from sentence_transformers import SentenceTransformer; model = SentenceTransformer("")'

Build and push this container image to your docker container registry as stage1 tag.

After that, you can build your stage2 docker image starting from the stage1 image.

# --- Dockerfile
FROM $(REGISTRY)/$(PROJECT):stage1

# Gunicorn config uses these env variables by default
ENV LOG_LEVEL=info
ENV MAX_WORKERS=3
ENV PORT=8000

# Give the workers enough time to load the language model (30s is not enough)
ENV TIMEOUT=60

# Install all the other required python dependencies
COPY ./requirements.txt /app
RUN pip3 install -r /app/requirements.txt

COPY ./config/gunicorn_conf.py /gunicorn_conf.py
COPY ./src /app
# COPY ./tests /tests

You may need to increase the runtime shared memory to be able to load the ML model in a preload scenario.
If that’s the case, or if you get errors on model load when running your project in Docker or Kubernetes, you need to run docker with --shm-size=1.75G for example, or any suitable amount of memory for your own model, as in:

docker run --shm-size=1.75G --rm <command>

The equivalent directive for a helm chart to deploy in Kubernetes is (WARNING: POSSIBLY MANGLED YAML AHEAD):

apiVersion: apps/v1
kind: Deployment
metadata:
  ...
spec:
  ...
  template:
    ...
    spec:
      volumes:
        - name: modelsharedmem
          emptyDir:
            sizeLimit: "1750Mi"
            medium: "Memory"
      containers:
        - name: {{ .Chart.Name }}
          ...
          volumeMounts:
            - name: modelsharedmem
              mountPath: /dev/shm
          ...

A Makefile to bind it all together

I like to add a Makefile to my projects, to create a memory of the commands needed to start a server, run tests or build containers. I don’t need to use brain power to memorize any of that, and it’s easy for colleagues to understand what commands are used for which purpose.

Here’s my sample Makefile:

# --- Makefile
PROJECT=myproject
BRANCH=main
REGISTRY=your.docker.registry/project

.PHONY: docker docker-push start test

start:
    ./scripts/start.sh

# Stage 1 image is used to avoid downloading 2 Gb of PyTorch + nlp models
# every time we build our container
docker-stage1:
    docker build -t $(REGISTRY)/$(PROJECT):stage1 -f Dockerfile.stage1 .
    docker push $(REGISTRY)/$(PROJECT):stage1

docker:
    docker build -t $(REGISTRY)/$(PROJECT):$(BRANCH) .

docker-push:
    docker push $(REGISTRY)/$(PROJECT):$(BRANCH)

test:
    JSON_LOGS=False ./scripts/test.sh

Other observations

I had initially opted for Python 3.7, but I tried upgrading to Python 3.8 because of a comment on a related FastAPI issue on Github, and in my tests I found that Python 3.8 uses slightly less memory than Python 3.7 over time.

See also

I published a sample repository to get started with a project like the one I just described: https://github.com/cosimo/fastapi-ml-api.

And these are the links to issues I either followed or commented on while researching my solutions:

The Perl echo chamber, marketing and … is Perl really dying?

Recently I came across this tweet from Curtis/Ovid, which references longer post about a proposal to integrate a better, more modern object-oriented “system” (Corinna) in Perl 5.

The proposal itself is not what I’d like to address here. I haven’t followed Corinna’s evolution. I believe it goes in a positive direction for the language, FWIW.

From that original tweet, a comment from Rafael followed:

[…] but I’m still wondering what are the real factors that make companies seek an exit strategy from Perl 5. Who makes this kind of expensive decision, and why? I suspect obscure OO syntax is not a major one.

This is what I replied with:

This is indicative of the fundamental problem in the Perl echo chamber. Some people still have no idea why companies are moving away from Perl. If you want to hear the perspective from someone who has seen this happen in multiple companies, let me know :-)

Sorry for this premise, but I was afraid what follows would make no sense otherwise.

Why is Perl dying today?

First of all, I don’t think “<language> is dying” is a useful question to ask, nor it is indicative of anything particularly interesting. I’m sure everyone reading this will have encountered plenty of “C is dying”, “Java is dying” or similar, and yet, C and Java are still being used everywhere. In one sense, no language really dies ever. In Perl’s situation, things are slightly different though, as (I believe) Python slowly conquered Perl’s space over time.

What does it mean for a language to die, or to be dead?

From an end user point of view, let’s say a random programmer employed in a company or freelance, a language could be dying if a task they want to accomplish using that language is hard because there are no supporting libraries for it (think CPAN or PyPi), or the libraries are so old they don’t work anymore. That situation surely conveys the idea that the language is not in use anymore, or very few people must be using that language. One would expect that a common task in 2021 must be easy to accomplish with a language worth using in 2021.

What about a company‘s point of view? The reality is that companies don’t have an opinion on languages, only people do. Teams do have an opinion on languages. The group dynamics inside a team influence what languages are acceptable for current and new projects.

Is Perl dying then?

My experience

Some years ago I was a fairly active member of the Perl community, I attended and presented at various Perl conferences around Europe, talking about my experience using Perl at a few small and large companies.

I remember picking up Perl for the first time based on a suggestion from my manager back then. He gave me a hard copy print-out of the whole of Perl 5.004 man pages, and said: “We are going to use this language. It’s amazing, take some time to study it and we’ll start!”. This was 1998, and I had such a fantastic time :-). I was such a noob, but Perl was amazing. It could do everything you needed and then some, and it was easy and simple. The language was fast already back then, and it got faster over time. At that point in time, I was working in a very small company, we were three people initially, and we ended up writing a complete web framework from scratch that is still in use today, after more than 20 years. If that’s not phenomenal, I don’t know what is. It’d be cool to talk about this framework: it was more advanced than anything that’s ever been done even considering it’s 2021… a story for another time.

And by the way, we were running our Perl code on *anything*, and I mean anything, Windows PCs, Linux, Netware and even AS/400, a limited subset of it at least, at a time when Java’s “write once, run everywhere” was just an empty marketing promise. Remember this was the time of Netscape Navigator and Java applets. Ramblings, I know, but perhaps useful to understand where things have gone wrong.

In 2007, I left my job in Italy and moved to Norway to work for Opera Software. Back then, Opera’s browser was still running the Presto engine, and a little department inside Opera was in charge of web services. That’s where I was headed. Most services there were written in Perl. Glorious times for me, I would learn an awful lot there, meet a lot of skilled developers. Soon after I started working there, 2007, some colleagues were already making fun of Perl. It’s a “write-only language”, “not meant for serious stuff”, “lack of web frameworks”, etc… Those were the times when Python frameworks started to emerge, some of which would eventually disappear. I remember a few colleagues strongly arguing to move to this Python framework called Pylons, and then eventually to Django.

I believe this general attitude towards Perl originated from different factors:

  • personal preference towards other languages and/or dislike towards Perl
  • the desire to be working with the latest “hip” framework or language
  • the discomfort of maintaining an aging codebase with problems

These factors exist and are legitimate reasons to want to move away from any language or framework. I’m not saying they are justified, but I do understand why people wanted that. In our field, I have seen it’s quite common to try and avoid the objective difficulties of maintaining a legacy project, going the greener way of an overly optimistic rewrite, which normally ends in tears.

Throughout the years, I noticed other contributing factors to the progressive abandonment of Perl, even in companies like Opera.
I’ll mention two that I experienced directly:

  1. Outdated or non existent supporting libraries
  2. Teams composition

There was a time a few years ago, when CPAN was awesome, the best language support system in existence and every other language community was envying it. CPAN pretty much was selling Perl by itself. In my case, the libraries on CPAN educated me and made me adopt a testing culture that no other language (in my knowledge) had before Perl. Today, seeing npm modules being installed without running tests makes me uncomfortable :-)

Then over time (years) a shift happened. You would search on CPAN for a library that would help you with a common task and you wouldn’t find anything, or you would only find quick hacks that didn’t really work properly. In my case, I remember the first example of that being OAuth2. If I had to speculate, I would say this is a product of many elements, one of which is the average age of Perl programmers getting higher.

Another related shift I remember from those years is companies publishing their APIs/SDKs started dismissing Perl, at first relying on some CPAN module to eventually appear, then completely omitting Perl support. In the beginning, we politely complained to those companies, trying to make a point, but unfortunately there was no turning back. These days almost no SDK comes with a Perl component.

The second major aspect I have experienced is related to teams. In 2012 I was tasked with writing my first ever greenfield project, entirely from scratch, a project that would turn out to be one of the things I’m most proud of, Opera Discover, an online news recommendation system for the Opera browser, still working today! A team of three veteran engineers (myself included) was assembled, and there and then, we were faced with a decision: what language should we use for this?

While I was most experienced in Perl and knew Python a little, the other two colleagues didn’t know Perl. They had experience in C++ mostly, as this was Opera after all. We were chosen not based on our programming language expertise, rather (I suppose) based on our capability to tackle such a big and complex project. While I could have proposed that the project be written in Perl, in good conscience I knew that choice was not viable. Django was readily available and could provide a wide range of functionality we actually needed. No alternative in the Perl world could come close to such a good value proposition. The fact that Python was (like Perl had been for me!) a very accessible choice, simple to pick up, easily installed on any Linux system, and with plenty of solid up-to-date libraries, made the choice obvious.

With the Discover project, I started learning Python properly as a day-to-day programming language. I remember being horrified (and making fun of) the httplib2/httplib3 situation initially. Then I learned about the requests module and forgot all about it. This is to say, Python also has its quirks of course. The disastrous Python 2 vs Python 3 decision in the Python community caused a lot of grief and uncertainty for people (Perl could have learned something from that…). Nowadays, that’s a non-argument, everything runs on Python 3 and if you still haven’t moved, you will soon.

In general, having learned Python quite well, my mindset with regards to programming and my job changed completely. I’m not a Perl programmer. I’m not a Python programmer either. I can use different tools whenever they are more suited to what I need to do. In fact, in my last four years I have written software in NodeJS and Java of all things… I used to despise and make fun of Java, but I had never worked on any professional project before. While I do maintain that Java has some horrible aspects, contrary to my expectations, I have enjoyed working with it, it has an efficient runtime, awesome threading, solid libraries and debugging/inspection tools.

While I do understand Ovid’s point about wanting to keep the business going, and enjoying Perl as a language, I have personally moved on many years ago. I still use Perl for the occasional script when it’s convenient, but for other use cases, like web APIs, I prefer Python and FastAPI, PyTorch for machine learning, etc.. so my conclusion is that it’s the libraries and the ecosystem that drive language use, and not the language itself.

A better OO system will unfortunately do nothing for Perl (in my opinion at least). Better marketing will without a doubt do nothing for Perl. As if a prettier website could change the situation and the aspects I talked about… it can’t! The situation we have in front of us in 2021 is the result of technological and social changes started at least a decade ago.

I realize this may be an incoherent post. Sorry about that, I tried to write it right away or it would have probably never come out.
If you have questions or comments, let me know and I’ll try to address them if I can.

Most importantly, I do not wish to convince anyone that what I wrote is true. It is simply my experience. If there’s one thing I wish people would take from it, it’s to move away from the thought of yourself being a “X Programmer” and broaden your horizons and set of tools available to you. It was a tremendously positive move for myself, one I wished I had done before.

Peace.

pgtop – a top clone for PostgreSQL

According to meta::cpan records, the first release of pgtop is dated April 26, 2005, which makes this little software more than 15 years old!

Back then I had just found out about the brilliant mytop by Jeremy Zawodny, and my day-to-day experience being on Postgres, IIRC version 6.5.3, I decided to try and “convert” mytop to Postgres.

Being quite naive, I thought the endeavour would be much easier than it really was. I’m glad I started though, which is why pgtop exists in the first place. It’s not the only one either. I seem to remember a few similar pgtop projects by other programmers.

After using MySQL and Percona Server for many years, due to a new job, I have gone back to Postgres, version 9.5 and 10 at this time. In recent months, I have done some work to improve performance of our database queries, and remembered writing and using pgtop years before.

Since I lost(*) the original sources, I tried the pgtop version I last uploaded to CPAN, 0.05, dated 2008. It did work, in the sense that I could run the same perl code unmodified, a great testament to Perl as language and as runtime. It didn’t work because the underlying Postgres meta tables that were used in version 6 changed their schema in the 10-12 years since :-)

I spent some time to adapt the metadata queries to work with recent Postgres versions, and was slightly amused by the quality of my 15 year old code… The best feeling about this little tool was to rediscover how useful a few dozen lines of code can be. The service provider monitoring helps, but doesn’t even come close to the level of detail pgtop can provide.

After getting pgtop to work again, I quickly added a few more useful features. I was pleased by the efficiency with which I could work on this tool, considering its age.

So far I added just what was strictly necessary to me:

  • Updated pgtop to the current decade. Now requires perl >= 5.014
  • Fixed to work with Postgres >= 9.0
  • Added a sample Dockerfile to build and run pgtop as Docker container
  • Added a --config option, to load arbitrary config files. This is useful if you want to monitor several databases at once, for example in a tmux session. The config file supports all the options that are available on the command line.
  • Implemented a query killer command, activated pressing K to kill at once all queries slower than a given threshold, in seconds. This is useful if the database is overwhelmed by a lot of slow queries. I don’t recommend using it, particularly if it involves killing UPDATE or INSERT queries, but it can be quite useful.
  • Added a --slow_threshold option, to consider queries slow if they have been running for longer than the given value (in seconds). Now the tool highlights slow queries in bold yellow, and logs all the slow queries to a pgtop.log file.
  • Added a --slack_webhook option, to automatically notify a slack channel if a query crosses the slow threshold runtime value. All the information about the slow query including the SQL will be included in the slack message.

Please let me know if you give it a try! :-)

Five tips to be a more effective command line user

In the movies, heroes manipulate complex graphics environments using only their keyboard; no mouse in sight. Descending from the movies realm to reality, the command line, not the GUI, is where heroes save the day.

This article intends to be helpful for those who are already command line (CLI) users. Complete beginners are of course encouraged to read on, even though they may not grasp all the advantages immediately and perhaps there is a lot of other more important things to learn when starting. On the other hand, I expect long time CLI users to already work similarly. I do hope they might also find interesting tricks to adopt.

The motivation

First of all, why be more effective? Not everyone wants to, and that is fine. These tips contribute to two primary objectives:

Looking back on everything I tried over the years, I'd like to illustrate those tips that I believe brought me the most "bang for the buck", the most value with the smallest effort and the ones that are more easily applicable to anyone else.

Assumptions

I'm going to assume you are using bash on Linux or MacOSX. A recent MacOSX install shouldn't be too different. Windows has bash too these days, so hopefully these suggestions will be widely applicable.

I'm also going to assume that you, reader, already know how to comfortably move around the CLI (CTRL + A, CTRL + E, CTRL + W, …), recall past commands (!<nnn>, !!, CTRL + R) or arguments (ALT + .). There is enough material to write other posts about this. Let me know and I'll be happy to write it!

1. Shell History

Here we go, first recommendation: use your shell history capabilities.

If you are not already doing that, you can search through your shell history — all the commands you have typed — with CTRL + R. However, the default configuration for bash only keeps up to a certain number of commands.

Computers and hard drives being what they are in 2020, there's no reason why you shouldn't extend your shell history to record all commands you have ever typed from the very beginning of your system history. When I setup a new computer, I normally copy over all my $HOME files, so my command history extends, time-wise, well beyond the system I am writing this on.

My shell command history starts in October 2015, when I first learned this trick. Here's how to do it:

# /etc/profile.d/extended_history.sh

# Show the timestamp for each entry of the history file
export HISTTIMEFORMAT="%Y-%m-%dT%H:%M:%S "

# Ensure the history file size and entry number is large
# enough to record years upon years of history
export HISTFILESIZE=500000000
export HISTSIZE=50000000

At least on Debian and derivative systems, dropping a file into /etc/profile.d/ makes it part of the system-wide profile settings, so that is a handy way of applying those settings to all users.

As a result, the history command will work as before, but the numeric index of each command will not reset every time you open a new shell, or every time the history file gets over a certain size, either in number of entries or in file size.

Here's how the history command output looks like with those settings:

23  2015-10-06T19:51:30 git diff
24  2015-10-06T19:51:33 git add locale/en/LC_MESSAGES/django.pot
25  2015-10-06T19:51:49 git status -uno
26  2015-10-06T19:51:51 git commit -a
27  2015-10-06T19:52:11 git push
28  2015-10-06T20:11:35 make test-recommender_translations
29  2015-10-07T18:53:33 vim ~/notes/recsys/impressions-tracking.txt

At the moment, my shell history file (~/.bash_history) is almost 7 Mb, corresponding to a little less than five years worth of commands. There is really no risk of running out of disk space, so keep those commands around.

Keeping a full history has obvious advantages:

  • If you don't remember how you did something or specific options to a command, you can always use history | grep xyz (or CTRL + R) to find out, and all the commands from months (or years!) back will be there. Obviously this does not apply retroactively :-)
  • If you remember only when you did something but not what it was, it's also easy to grep for specific dates and times.
  • You can easily analyze your shell usage patterns, for example finding what are the top 50 shell commands you have ever used:
$ history \
    | awk '{ print substr($0, length($1 $2) + 3) }' \
    | sort | uniq -c \
    | sort -rn \
    | head -50

# on one line:
$ history | awk '{ print substr($0, length($1 $2) + 3) }' | sort | uniq -c | sort -rn | head -50

In order, those lines do the following:

  1. history: take all history entries
  2. awk ...: remove the entry numeric index and timestamp, to only display the command itself and all the arguments
  3. sort | uniq -c: count the number of occurrences for all the distinct entries
  4. sort -rn: reverse sort numerically all the commands
  5. head -50: take the first 50 commands

If you are confused by all these commands, don't worry too much about them. It's just a way to count the most typed commands in your history.
As a curiousity, here's some of my top commands:

13071  ls -l
 7422  git diff
 6338  git status
 3469  cd ..
 2219  git push
 1816  git pull
 1499  git commit -a
 1367  git log
  940  git commit
  851  gpr
  687  gcs
  400  srdm platf
  348  vimp
  333  l1
  314  srdm merl
  306  dcu
  302  mp;rl-f
  206  gce
  196  realias
  169  gcm
  153  mptr;rl-f
  152  gc-

2. Fast Directory Changes

One of the most frequent operations on the command line is moving among directories, with the cd built-in command.

Especially if you've worked for a long time on many projects, or if you work with Java, you tend to have a lot of directory levels nested quite deeply. The cd commands is then tedious to type. Using tab to invoke your shell autocomplete comes in handy, but not having to type at all can easily beat that.

This is a trick I learned from Damian Conway's Productive Programmer course. He was in Oslo a few years ago, and with the help of my company we organized to have him hold this course internally.

The idea is to use a bespoke shell (or Perl, Python, Node, …) script, to quickly navigate to any directory. Example. Currently I am working on a project called merlin, whose parent directory is ~/src/work. Every time I want to do something in this project, I have to:

cd ~/src/work/merlin

Within that project, there are a bunch of directories, so you could end up writing something like:

cd ~/src/work/merlin/gameserver/prototype/java/src/

The idea is to construct a program that can do the "typing" for you, so you'd use the following command instead:

cd2 src w merl g p j s

I called it cd2 but you can call it however you like of course. This program should:

  • take as input a list of string arguments
  • try to expand them to the closest directory name entry
  • if a directory is found, navigate to it
  • take the next argument and repeat this cycle

When this is done, your shell will be left into the target directory of your choice without any long typing or waiting for autocomplete misfired tabs.

I chose to implement my script in bash for simplicity and call it ~/bin/search-directory.sh. The code is almost trivial and here it is in its entirety:

#!/bin/bash
#
# Search through the home directory space based on the
# list of string arguments given as input.
# From an idea by Damian Conway.
#

# Start from my home directory
SRC_DIR=~

# Take all arguments given as input
ARGS="$*"

# For each argument, try to expand it into the nearest directory name
# at the current level
for dir in $ARGS ; do
    sub=$(find -L "$SRC_DIR/" -mindepth 1 -maxdepth 1 -type d -name "$dir*" \
        | sort \
        | egrep -v '\.egg-info' \
        | head -1)
    if [ ! -z "$sub" ]; then
        # We found a subdir, search will proceed from there
        SRC_DIR=$sub
    else
        # Stop: we didn't find any matching entry.
        exit 1
    fi
done

echo "$SRC_DIR"

exit 0

One could clearly do better than this by employing more sophisticated logic. Initially I thought I'd need better, but this simple script has served me well for the past years, and I don't want to complicate it unnecessarily.

There is one more obstacle to clear though. The script will print the final directory match and exit, without affecting the parent shell's current directory.

How to make the active shell actually change directory? I added a tiny function to my ~/.bashrc file:

# `srd` stands for 'source directory'
srd () {
    match=$(~/bin/search-directory.sh src $*)
    if [ ! -z "$match" ]; then
    echo "→  $match"
        cd "$match"
    fi
}

I made the function always supply the src directory by default, so I don't have to type that either. With these bits set up, you can then move to the example directory above with the command:

srd w merl g p j s

And this is just the beginning :-)
Read on for how to combine this technique with the power of aliases and shorten the command even more.

3. Aliases

Shell aliases are a simple way to define or redefine commands.
The typical example would be to shorten common options for your commands. If you know you always type ls -la, you might want to teach that to your shell. The way to do that is:

$ alias ls='ls -la'

From then on, every time you type ls, your shell will automatically expand the command to ls -la.

Based on what I have seen during my career, shell aliases are something that relatively few people are using. Right now, my shell configuration contains almost 500 lines of aliases, of which around 200 I keep active and probably 30-50 I normally use.

I wasn't always such a heavy alias user. I became one when I had the fantastic experience to work with the Fastmail team in Australia for a period of a few months. I learned how they were doing infrastructural and development work and from the first day I saw they were using a ton of shell commands that were completely obscure to me.

I was quite good at operations/sysadmin work, but after seeing how that team worked, the bar was forever raised and it sank in that I had still a lot to learn. I still do :-)

I use aliases for many things, but mainly to not have to remember a lot of unnecessary details. Here's a few from my list:

Alias Expanded command What/why
less less -RS shortening and options expansion. -RS is to show ANSI color escapes correctly and avoid line wrapping
gd git diff shortening
gc- git checkout - switch to the previous git branch you were on
vmi vim saver for when I type too quickly
cdb cd .. cd "back"
cdb5 cd ../../../../../ to quickly back out of nested directories
kill-with-fire killall -9 for those docker processes…
f. find . -type f -name find file names under the current directory tree
x1 xargs -I{} -L1 simplify using xargs, invoking commands for each line of input f.ex.
awk<n> awk '{ print $<n> }' for when you need to extract field number from a text file or similar. Ex.: awk5 < file extracts the 5th field from the file
vde1 ssh varnish-de-1.domain.com host-based alias. I don't want to have to remember hostnames, so I add aliases instead, with simple mnemonic rules, such as vde1 -> varnish node 1 in the german cluster
jq. jq -C . when you want to inspect JSON payloads, f.ex. curl https://some.api | jq.
dcd docker-compose down is anybody really typing docker-compose ?
dcp docker-compose pull
dcu docker-compose up
dkwf docker-kill-with-fire shorthand for docker stop + docker rm or whatever sequence of commands you need to stop a container. See? I don't have to remember :-)
db docker-bash db postgres instead of docker exec -it container-id bash
dl docker-logs same for docker logs -f ...

Some aliases that I have added thinking they'd be useful I have rarely used. Some have become a staple of my daily CLI life. Sometimes, if a new alias catches on only depends on the first few days. If you make a mindful effort to use it, there's a good chance it will stick (if it's actually good).

To make aliases persistent, instead of typing the alias command in your shell, you can add it to your ~/.bashrc file as you can with any other command. You can also create a ~/.aliases file and keep all your aliases there. If you do that, you then need to include your aliases file in your bash configuration. You do that by adding (only once) this line to your ~/.bashrc:

# ~/.bashrc
...
source ~/.aliases

Every time you feel the need to add a new alias, you can simply edit the ~/.aliases file and reload it into your active shell (source ~/.aliases). When you get tired of that, you can use another trick from Conway's course, and add the last alias you will ever need:

alias realias="${EDITOR:-vim} ~/.aliases; source ~/.aliases"

Typing realias will bring up the alias file in your editor and when you save it and exit, all the new aliases will be immediately available in the parent shell.

Once you start down this path, your creativity won't stop seeing new ways to work smarter and faster.

4. Directory Autorun

This is one of the most recent additions to my arsenal. I found myself typing the same commands over and over whenever I entered specific directories.

The idea is then simply to have a sequence of commands automatically executed for me whenever I enter a directory. This is extremely useful in many occasions. For example, if you want to select a specific Python virtualenv, a Node.js version or AWS profile whenever you enter a specific directory.

I chose to do this by dropping an .autorun file in the target directory. Here's a tiny .autorun I have in a Javascript-based project:

#!/bin/bash
REQUIRED="v11.4.0"
CURRENT=$(nvm current)

if [ "$CURRENT" != "$REQUIRED" ]; then
    nvm use $REQUIRED
fi

In this case I want the shell to automatically activate the correct node.js version I need for this project whenever I enter the directory. If the current version, obtained through nvm current, is already the one I need, nothing is done.

It's quite handy, and I immediately got used to it. I can't do without it now. Another example, to select the correct AWS credentials profile and Python virtualenv:

#!/bin/bash

if [ -z "$AWS_PROFILE" -o "$AWS_PROFILE" != "production" ] ; then
    export AWS_PROFILE=production
    echo "— AWS_PROFILE set to $AWS_PROFILE"
fi

if [ -z "$VIRTUAL_ENV" ] ; then
    source .venv/bin/activate
    echo "— Activated local virtualenv"
fi

The glue to make this work is a couple of lines added to your ~/.bashrc file:

# Support for `.autorun` commands when entering a directory
PROMPT_COMMAND+=$'\n'"[ -s .autorun ] && source ./.autorun"

If you are concerned other users could use your machine, or even in general if you like to keep things tidy, ensure you set appropriate permissions for these .autorun files. A chmod 0600 .autorun could be in order.

Remember to run source ~/.bashrc if you make changes to that file, or they won't immediately reflect on your active shell session.

5. SSH Configuration

SSH is one of the most powerful tools in your arsenal. It can be used to tunnel, encrypt and compress data for connections to arbitrary protocols. I'm not going to cover that functionality here. There are good tutorials out there already, such as this and this.

A smart ssh configuration can help you be more effective on the command line. I'd like to show three specific examples that I use every day:

  1. Persistent ssh connections
  2. Hostname aliases
  3. Automatic ssh key selection

Persistent ssh connections

If you connect to remote hosts often, I'm sure you have noticed the amount of time it takes to establish a new ssh connection. The higher the latency, the longer it takes. It is normal for that initial handshake — where a lot of things happen — to take 2 to 5 seconds.

Performing many small operations via ssh can waste a notable amount of time. One solution to this problem is the transparent use of persistent ssh connections.

The connection is established the first time you ssh (or scp) to a host, and next time you perform a similar operation towards the same host and port, the same TCP/IP connection will be used. This implies that the connection remains active after the ssh command has completed.

The ssh configuration directives that enable this behaviour are the following:

# Normally this is in ~/.ssh/config
ControlMaster auto
ControlPath /var/tmp/ssh_mux_%h_%p_%r
ControlPersist 1h

ControlMaster auto enables this behaviour automatically, without you having to specify whether you want to use shared connections (the ones already opened from before) or not. In particular cases, you may want to specify ControlMaster no on the command line to prevent ssh from using an already open connection. Generally this is not desired though, so ControlMaster auto will normally do what you want.

ControlPath is the filename template that will be used to create the socket files, where:

  • %h is the hostname
  • %p is the port number
  • %r is the username used to connect

ControlPersist is the option that determines how long the connections will stay shared waiting for new clients after being established. In my case, I set it to 1h (one hour) and that works well for me.

In case you want to know more about ssh configuration, I recommend reading the related man page. On linux, that is available with:

man 5 ssh_config

Hostname aliases and key selection

I mentioned I want to get unnecessary details out of my memory as much as possible. The ssh configuration file has lots of useful directives. One of these is the per-host configuration blocks.

If you need to connect to a host quite often and its name is not particularly memorable, like an AWS or GCP hostname, you can add host-specific directives to your ~/.ssh/config file:

# ~/.ssh/config

...

Host aws-test
    Hostname 1.2.3.4
    User my-username

From then on, you can use the command ssh aws-test to connect to this host. You won't have to remember the IP address, or the username you need to use to connect to this host. This is particularly useful if you have dozens of hosts or even projects that use different usernames or host naming schemes.

When you have to work with different projects, it's good practice to employ distinct ssh key-pairs instead of a single one. When you start using ssh, you have a ~/.ssh/id_rsa (or ~/.ssh/id_dsa) file, depending on the type of key and an associated ~/.ssh/id_rsa.pub (or ~/.ssh/id_dsa.pub).

I like to have several key-pairs and use them in different circumstances. For example, the key that is used to connect to a production environment is never the same used to connect to a staging or test environment. Same goes for completely different projects, or customers if you do any freelance work.

Continuing from the example above, you can tell ssh to use a specific private key when connecting to a host:

Host aws-test
   Hostname 1.2.3.4
   User my-username
   IdentityFile ~/.ssh/test_rsa

Host aws-prod
   Hostname 42.42.42.42
   User my-username
   IdentityFile ~/.ssh/prod_rsa

Host patterns work too:

Host *.amazonaws.*
   User my-aws-username
   IdentityFile ~/.ssh/aws_rsa

Host *.secretproject.com
   User root
   IdentityFile ~/.ssh/secret_rsa

Final tip

The more I write, the more it feels there is to write about the command line :-) I'll stop here for now, but please let me know if you'd like me to cover some more basic — or maybe more advanced? — use cases. There are a lot of useful tools that can make you more effective when using the command line.

My suggestion is to periodically gather information about how you use the command line, and spend some time to reassess what are the most frequent commands you use, if there are new ways to perform the same actions, perhaps entirely removing the need to type lots of commands.

When I have to do boring, repetitive tasks, I can't help but look into ways to get myself out of those. Sometimes writing a program is the best way to automate those tasks away. It may take more in the beginning, but at least I managed to transform a potentially boring task into programming, of which luckily I'm never bored :-)

Fast VCL checks for personalized backend responses

I’d like to talk about a problem I encountered a few years ago and one possible solution to it. This particular problem stuck with me for a long time for several reasons.
The first one is that at the time I considered the problem basically unsolvable. It would be like having a cake and eating it too, as the proverb goes. Another reason is that this problem had me spinning my wheels thinking about a solution for a good while.

Without any pretense of this being a particularly clever solution or anything like that, I’d like to illustrate what the general problem is and a possible solution I came up with. Hopefully this will be useful to you.

The general problem

Suppose you have a backend request of some sort, an API or a particular web page. In my case it was a json-based recommendations API, which returned a list of recommended news articles to read. The specific purpose of the request is not terribly important. What’s more important is the fact that this request can be personalized depending on the user that makes the request. I believe this is a quite common scenario.

In a recommendations context, it’s also common for a user not to be signed in to the service, or to be invoking the API for the first time. In this case, the recommendations engine does not have any previous information about the user, also called the cold start case.

In this specific project, we had operated in a “permanent cold start mode”, meaning the recommendations we were offering were never differentiated per user. There were a few knobs and settings to influence which type of recommendations one would get from the system (f.ex. less Sports articles and more Arts or Travel), but the system would not learn over time or change its recommendations based on user signals like articles read.

Among other things, this mode of operation allowed us to serve our entire userbase (around 90M monthly active users, around 10M weekly) with only two servers per data-center, also thanks to a very aggressive caching strategy.

When we started experimenting with personalized recommendations, it was immediately clear that we would not be able to handle the additional backend load caused by all the per-user requests. We estimated that, given the cache hit ratio drop, we would need something ridiculous like 50x the amount of servers. For each API request, we would have to:

  1. fetch the distinct user profile
  2. check if the profile contained any information about previously read articles or otherwise useful information to personalize the offered recommendations
  3. compute and return the personalized recommendations

These steps can only be performed by the recommendations engine backend. This implies that we would not be having any help from our caching in Varnish, which made personalized recommendations much harder to implement for us, at least without employing inordinate amounts of servers and having to significantly rebuild our system infrastructure.

You could very well say that that is a problem in itself, and it probably is :-)

A possible solution

I remember spending quite some time thinking about this, not seeing any possible solution. One day I attended a meetup. One of the engineers there talked about the Varnish API engine. The API Engine is a commercial Varnish add-on that can implement authentication and paywalls directly in the caching layer. The person talking about this mentioned how API engine embedded the SQLite3 database, and how this was crucial to the performance of it, since the caching layer is effectively the first bottleneck of a system.

I connected the dots almost immediately and I realized I had a possible way forward to solve my problem. This is how I imagined I could approach the problem:

  • organize user signals collection (what articles each user is reading, etc…) and user profile building as a completely separate batch activity
  • every x number of hours, build a sqlite database with a single table, user_profiles, consisting of two columns, a user_id string and a has_profile boolean. With such table in place, looking up whether we can build a significantly personalized recommendations set for a user is a only an SQL primary-key lookup away.
  • Using the excellent SQLite3 vmod, implement this SQL lookup in our existing Varnish VCL layer. Make sure that for every possible case this code never fails. For example, if the database file does not exist, or the file is for some reason corrupt, etc… we want to behave as if the particular user for the running request had no personalized profile.
  • Ensure that we would be able to update the SQLite database file at any time, without stopping Varnish, and the new file would be visible to the SQL queries immediately or at least after a short delay.

We tested the whole assembly and it seemed to work correctly. The final step consisted in actually computing the personalized profiles, building the real SQLite database, syncing it to the backend systems, and performing the dispatch logic in the VCL layer.

This is more or less the final logic I used:

  • If the request was for an anonymous user, don’t even perform the user profiles SQL lookup, and return the generic recommendations cached payload.
  • If the request comes from a user that has no personalized profile, that is, no record is present in the SQLite table, also return the generic recommendations payload.
  • If the user profiles lookup is positive, that is, a record exists in the user profiles table in SQLite and its has_profile flag is true, then pass the request on to the backend. We know it is a request that must be personalized and only the backend can do that.

Using such logic allows to serve the majority of your user base, which presumably has not logged in, or does not have any significant user profile yet, caching as much as possible. But it also allows personalized recommendations for all users that do have a profile.

We are shifting the critical decision as early in the chain as possible, that is, in your caching layer, either Varnish or similar, before the backend service is even consulted. Taking the decision to the backend service would not be feasible for the reasons already discussed.

The actual code

We used Puppet as configuration management tool back then, with a custom varnish module. I extended the existing manifest to add a new user_profiles.vcl file and to install by default the sqlite3 vmod for Varnish.

The existing VCL code was also modified to:

  • perform the personalized profile SQL query
  • decide whether to pass the request based on the result of the SQL query

The following code illustrates those two steps:

diff --git a/config.vcl b/config.vcl
index 8e25a8a..50c70ce 100644
--- a/config.vcl
+++ b/config.vcl
@@ -1,22 +1,23 @@
 # Recommender system VCL config

 include "/etc/varnish/accept-encoding.vcl";
 include "/etc/varnish/purge.vcl";
 include "/etc/varnish/x-forwarded-for.vcl";
 include "/etc/varnish/auth.vcl";
 include "/etc/varnish/stats.vcl";
+include "/etc/varnish/user_profiles.vcl";
 include "/etc/varnish/strip-tracking-cookies.vcl";

 backend apache {
     .host  = "127.0.0.1";
     .port  = "8000";
     .probe = {
         .url       = "/ping.html";
         .interval  = 10s;
         .timeout   = 5s;
         .window    = 20;
         .threshold = 3;
         .initial   = 3;
     }
 }
@@ -147,45 +148,49 @@ sub vcl_recv {
     if (req.backend.healthy && req.http.User-Agent ~ "McHammer") {
         return (pass);
     }

     # Client clicks must go through the backend (*with* client-id cookie)
     if (req.url ~ "^/api/1\.0/feedback/") {
         return (pass);
     }

     call check_authorization;
+    call check_user_profile;
     call accept_encoding_normalize;

+    # Users with tracking cookies can be served personalized results
+    if (req.http.X-Profile == "1") {
+        std.log("User has customized profile. Rolling the dice.");
+        # Initially keep the percentage of PASS very low, to test the
waters.
+        if (std.random(0, 100) < 1.0) {
+            std.log("User has customized profile and within 1.0%.
Passing.");
+            return (pass);
+        }
+    }

 }

The new user_profiles.vcl file consisted of the following code:

#-----------------------------------------------------------------------------
# Fast check for personalized user profiles
#-----------------------------------------------------------------------------
#
# The general idea is to use this fast check to send users who we know
# have a personalized user profile to the backend without caching, while
# retaining the ability to send cached objects for everyone else.
#
# Uses a SQLite3 database and libvmod-sqlite3 by Federico Schwindt:
# https://github.com/fgsch/libvmod-sqlite3
#
# Extracts the `clientId' from the HTTP Cookie header.
# Looks up the profile_id key having value equal to the `clientId' cookie.
# The underlying schema is very simple:
#
#   CREATE TABLE user_profiles (
#       profile_id char(100) PRIMARY KEY NOT NULL,
#       data text
#   );
#
# At least initially we will not use the data column.

import sqlite3;

sub vcl_init {
    sqlite3.open("/etc/varnish/user_profiles.db", "|;");
}

sub check_user_profile {

    # Quick yes/no test for the clientId cookie
    if (req.http.Cookie ~ "userId=") {

        # Extract a userId value from the Cookie header,
        # which remains untouched. Make sure we can still extract a clientId
        # value even if there's other cookies before/after ours.
        #
        # XXX Not sure what happens when client sends multiple Cookie lines.
        set req.http.X-Profile-Id = regsub(req.http.Cookie,
            "(?:^|.*;\s*)(?:userId=(.*?))\s*(?:;.*|$)", "\1");

        # No need to do anything if userId hasn't been found
        if (req.http.X-Profile-Id != "") {
            #std.log("Checking profile_id: " + req.http.X-Profile-Id);

            # First case of VCL-injection vulnerability :-)
            set req.http.X-Profile = sqlite3.exec(
                "SELECT 1 FROM user_profiles WHERE profile_id='"
                + req.http.X-Profile-Id
                + "'");

            # req.http.X-Profile !~ "^SQL" to catch errors like missing DB,
            # but seems a bit fragile. Depends on libsqlite3 and/or the vmod.
            if (req.http.X-Profile == "1") {
                std.log("User profile " + req.http.X-Profile-Id
                    + " found (" + req.http.X-Profile + ")");
            }
            else {
                std.log("User profile " + req.http.X-Profile-Id
                    + " not found");
            }
        }
    }
}

The commit message

I believe that good solutions deserve awesome commit messages. Here’s what I wrote:

Date:   Thu Jan 28 19:36:46 2016 +0100

    Fast VCL check for personalized profile existence

    How to have the cake and eat it too. Serve cached objects to the majority of
    users while personalizing recommendations to the ones that actually have a
    significant user profile available.

    Got the idea from the Varnish API engine[1].

    It's possible to perform tens of thousands of sqlite database lookups a second
    while processing requests in Varnish through VCL, thanks to SQLite3 being very
    lightweight and in this case embedded right inside Varnish through the sqlite3
    vmod[2].

    This commit hopefully adds all there is to it. The last bit is obviously the
    database file, which I placed in `/etc/varnish/user_profiles.db'. We will need
    to generate the .db file from the clicker and sync it to all frontends.

    Updates seem to be received immediately.

    When no database file is present, as will be in the initial deployment, the
    `check_user_profile()' function will work normally, signaling that no custom
    user profile has been found.

    [1] https://www.varnish-software.com/products/varnish-api-engine
    [2] https://github.com/fgsch/libvmod-sqlite3

How to rollout gradually?

Another interesting aspect is the way we could “control the flow” to this personalized recommendations API, that is, deciding what percentage of users that had personalized profiles, would actually get personalized recommendations.

A gradual rollout would certainly be the best approach, and it was implemented in two different ways:

  • once the SQL lookup was performed and the result was positive, we would still “roll the dice” and only allow 1% (or 5%, 10%) to actually pass through to the backend as personalized recommendations. This was an additional safety measure.
  • when batch building the SQLite database, we could decide to curtail the amount of users with personalized profiles. For example, excluding all users that had not read at least 5 or 10 articles. This barrier served two purposes. It effectively limited the amount of users that would be included in the SQLite database and at the same time made sure we had accumulated significant user profile information before attempting to serve personalized recommendations. A sort of win-win I didn’t expect at first :-)

As usual, if you have any feedback, email me or write below (but comments are subject to approval due to lots of spam).