The Right Lang for the Job

Exploring the abbyss of non-mainstream programming languages

Explaining Dependency Injection

As told to a coleague at work

Last updated on:

DI libraries were created to fix this obvious, glaring deficiency of the platform.

Dependency injection frameworks have two goals:

  • provide a shorter, more convenient, way of creating instances of classes that need other classes to run
  • provide a way for swapping parts of the dependency graph without modifying the code, other than the DI-framework-specific code

Hm, I thought I have a nice example in tests somewhere, can't find it rn, so I'll need to make the examples up. Imagin you have these classes:

  class A(val b: B, val c: C)
  class B(val d: D)
  class C(val d: D)
  class D

Now, you want to instantiate class A. You have to do it like this:

val a = A(B(D()), C(D()))

Or maybe like this:

val d = D()
val a = A(B(d), C(d))

Imagine you have not 4, but 14 classes like that.

You'd have to manually construct the classes, starting from the ones farthest from the class you want to instantiate, passing these to intermediary classes constructors, then passing those to further intermediate classes, and so on, until you finally have instances that you can pass as arguments to A.

With dependency injection frameworks, you don't have to do this manually.

Now, you have code like this:

class A @Inject constructor(val b: B, val c: C)
class B @Inject constructor(val d: D)
class C @Inject constructor(val d: D)
class D @Inject constructor()

@Component(modules = [])
internal interface SomeComponent {
    fun getA(): A

To get an instance of A, you just need to:

val instanceOfA = DaggerSomeComponent.builder().build().getA()

Again, imagine these are not 4, but 14 classes - instantiating them manually would require many lines of code, while with DI framework, no matter how large the set of classes, you will be always able to get the instance you need with this ...getA() call.

So that's the first goal - you don't need to manually instantiate classes in a correct order, you can leave that to the framework.

Now, a valid question is: why would you design classes in this way? If a class A needs class B, it can just do val b = B() in the body and be done with it.

That's true, but that would fix/freeze the dependency in place, thus increasing coupling (and loose coupling is often what we want) - among other things, refactoring is harder in code written like that.

Another reason, that I honestly don't want to invoke given how bogus it sounds in our codebases - that have literally 0 tests... - is testing. If a class A takes class B as an argument to the constructor, you can swap class B for anything that's B-like. That, in most cases, means a mock or a stub.

The other reason is also purely academic in our code, but well... I'll tell you anyway, might be helpful when you encounter a well-written Kotlin codebase.

Let's make our example a little more complicated:

class A @Inject constructor(val b: B, val c: C)

sealed class B @Inject constructor(val d: D)
class B1 @Inject constructor(d: D) : B(d)
class B2 @Inject constructor(d: D) : B(d)

class C @Inject constructor(val d: D)
class D @Inject constructor()

Here, A depends on some B, but it doesn't care if it gets B1 or B2. In a situation like this, with the direct approach, without dependency inversion, you'd have something like this:

class A {
    val b: B = if (Math.random() > 0.5) B1(D()) else B2(D())

There's a conditional, and you decide which kind of B you want at the exact point you need a B.

However, class A is not the best place for this conditional... Since it really doesn't care which subclass it gets (visible from val b: B), it shouldn't be forced to decide which one to instantiate!

DI frameworks give you a place where you can write this conditional logic.

You do it like this:

class ModuleB {
    fun provideB(): B = if (Math.random() > 0.5) B1(D()) else B2(D())

@Component(modules = [ModuleB::class])
internal interface SomeComponent {
    fun getA(): A

Now, you can still instantiate A in the same way as before - DaggerSomeComponent.builder().build().getA() - but you will get an A with either B1 or B2. Neither A nor B needs to worry about choosing the correct subclass, you don't need to put any special code in them, it'll be handled by the DI framework.

This is the simplest case, but again - imagine there are 14 classes and multiple of them require such conditional logic.

With DI, all that conditional logic is in one place, instead of being spread all over the classes.

There's more - you can also swap/replace whole subsections of the graph, though it's kinda clunky in Dagger. So, if you have one "subsystem" (a collection of classes that collaborate to provide some functionality) that does something in a way X, and another that implements the same functionality but by doing Y, you can put them into Dagger Modules, and decide which module to include based on your condition. Note that, since Components get Modules via annotation at compile time and they're fixed in place, you need to create a separate component that uses your alternative module.

Hope that helped.

Now - go seek more enlightenment on the net!

In terms of pros, this helps code readability, scalability, decoupling, testing

Yup. Precisely.

There are cons, too: mostly more complicated build (for Dagger you need that compiler plugin) and the correct order of instantiating classes is obscured, meaning it's not visible in the code directly, you would need to go to the build/generated/ dir and look at the Factories.

And if DI is used, should it be used throughout the project, no matter scale?

I don't believe DI has to be used from the get go, nor do I believe that you need to DI everything throughout a project. That's my opinion, I'm going to defend it fiercely, but know that it's not the opinion shared by the majority of Java/Kotlin devs.

First of all, if you have a very localized dependency graph with few nodes, it's better to hide those dependencies completely. You do it with private classes in Kotlin and private inner/nested classes in Java.

An example of this is in TokenRepository

If the classes that a class (here: TokenRepository) depends on are all private, you can change them as you wish, without the overhead of setting up the DI for those classes. (edited)

That's basically the "Facade" design pattern in action.

I also don't think you need to use DI for everything, even if you do start using it.

Basically, ask yourself: am I spending too much time trying to figure out how to instantiate this class?

If yes - use a DI framework. If not - leave it alone.

I have a feeling that most Java/Kotlin programmers haven't ever heard of YAGNI

Or KISS for that matter.

Pick the simplest way that works, use it until it stops working well, then upgrade to the more complex thing as needed.

I sense actually implementing DI comes with a learning curve


That's because it's an add-on, not a feature of the language, at least in the context of Java and Kotlin.

There are multiple ways to do it, multiple frameworks, some disagreements in terminology between them, and so on. (edited)

Plus, when you start working on a project that already has a baroque DI, you just don't know where do these things you're reading right now in some class come from, and you can't find it out easily.

So I think starting from the minimal cases and building up from that is the best way to get into it.

(And as you can see, the minimal case is a single @Component and a bunch of @Injects - you don't really need anything else in the beginning)

Implementation details

First, you should know what kapt is. kapt is a Kotlin Annotation Processor - which probably doesn't tell you much, but in short, it's an interface that Kotlin compiler plugins need to implement. These plugins are run just before the actual compilation from Kotlin source to JVM bytecode happens, and they can access the whole codebase the compiler is run in. Now, for one reason or another, these plugins/kapt processors cannot modify the codebase, but they can add new classes (in general, new Kotlin source code). Dagger has a artifact that implements the kapt interface and can be used as an annotation processor. I'm using raw Dagger in SDK Demo, but Hilt, which is used in the other two apps, is built on top of Dagger and works in a very similar way, though it has its own annotation processor (

Ok, so Dagger is a plugin that is run by the Kotlin compiler over the whole codebase. There are basically 2 things that Dagger searches for in the codebase: interfaces annotated with @Component and - this is important - everything else that is annotated with @Inject. Dagger examines all classes, and when it finds one that is annotated with @Inject, it does 2 things:

  • it adds the class to the pool of classes that may be created using Dagger; and
  • it examines the arguments that constructor takes, and sees if it can construct them - that is, if all classes passed as arguments have a constructor annotated with @Inject. Then, Dagger recurses, examining constructors of classes that the current class depends on.

Point 2 is what you usually work with when you use Dagger. However, the first point is also important: all classes annotated with @Inject are already available for construction, no need to do anything else! When you define a component, all @Inject annotated classes are available in it. Classes and values that are required for some of the to-be-constructed classes, like Context, need to be provided to the component Builder, through @BindsInstance. Another way to supply the missing parts of a graph is to declare a module - it has to be a separate abstract class - and use @Provide-annotated methods to construct the missing parts manually. Finally, you can declare a module as an interface, with methods annotated with @Binds - these are used to map interfaces to concrete implementations.

Dagger traverses the dependency graph and creates factories for each class it found. These factories are used for instantiating classes in the correct order, ie. the factory of class X that depends on Y and Z takes care of finding and calling Y and Z factories before it instantiates class X. This doesn't have to be done in this way - there are other DI libraries that don't generate the factories during compile time; instead, they examine the classes in a codebase at runtime, creating the graph of dependencies and instantiating required classes via reflection. Koin is an example of such library.

Now, if Dagger automatically picks up @Inject-annotated classes, why do we need a component (not @Component.Builder)? That's because Dagger has to know which classes it needs to expose - which classes are the starting nodes of the graph. There are two ways of giving Dagger this information:

  • by declaring getter methods on the component interface, eg. fun gimmeClassX(): X
  • or by declaring a special method fun inject(x: ClassThatWillHaveThingsInjected)

The former should be self-explanatory - for each such getter methods declared on the component interface, there will be an implementation of that method on the component implementation class. That method will simply pick a correct factory and run it, returning the result. The latter case is a little more involved - your ClassThatWillHaveThingsInjected is examined in search of @Inject-annotated properties (most often lateinit vars in Kotlin). Then, the inject method is generated - it simply calls relevant factories and sets the corresponding properties/attributes of a class passed to inject method.

TODO: clearer

That's basically it - one more advanced feature that's worth knowing is that you can "scope" classes that are nodes in the dependency graph. Basically, normally, each mention of class X gets fresh instance of it from a factory. So if you have classes Y and Z, both declared with class Y/Z @Inject constructor(val x: X){} then Y and Z will have different instances of X passed to their constructors. You can control it in a few ways, you can cache an instance in the @Provides-annotated method for example, but it's better to declare, or use predefined Scope, which is an annotation you can slap on constructors' arguments and have Dagger inject the same instance of a required class if the scope matches, and will create a new instance if the scope is different in a particular class. You can also inject multiple instances of the same class to a single constructor thanks to this, if you want/need to.

In general, Dagger - while somewhat clunky - is not a bad piece of tech. The reason for its existence is the lack of first-class modules in Java. imports, which in Java express only compile-time dependencies, in other languages do both: you can analyze them to get the compile-time dependencies, but you can also execute them, during runtime. That's when the required/imported module gets created, along with all the modules it depends on, and so on, recursively. This is the case for Python, JavaScript, Ruby, and most other interpreted languages. In compiled languages, imports are strictly compile-time constructs, and they have no side-effects other than pulling a simplified name into current file's namespace. Well, that's not true for all compiled languages, for example OCaml includes both ways of working with modules, and now C++ got support for modules (I hear), but in Java and related compiled languages, imports in a package say nothing about the runtime behavior of classes defined there.


What's wrong with mobile development? From a max-level newbie.

Last updated on:

I'm tempted to say: everything, but it's obviously not that bad. Still...

Kotlin is good, why don't you use it properly?

You're not writing Java, why would you replicate its limitations in a language that doesn't have them.

Dependency Injection frameworks

Convenient and nice when used properly. Absolute disaster when taken too far. Just because you don't have to think about inter-class dependencies doesn't mean that you shouldn't think about them!

RxJava should die in a fire and fast

Trying to hammer everything into a single metaphor doesn't work too well. Otherwise Haskell would be the most popular language out there. RxJava tries to express all computation as operations on streams, ignoring the fact that not all computation is suited for such representation.

Writing correct concurrent code that can run some of its parts in parallel is not easy. Using RxJava for it makes it harder. Kotlin coroutines provide a much better way of dealing with it - scopes handle cancellation without getting in the way, await allows for easy offloading one-off tasks that need to return a result, Flows handle streams of values where it makes sense, channels and actors allow for multiple modes of communication between threads and tasks. The M:N concurrency model - admittedly, the same RxJava uses - saves resources by dispatching concurrent tasks that don't have to be run in parallel on a single thread.

While RxJava provides literally hundreds of methods on a handful of classes, parts of Kotlin coroutines tend to have very small base API, augmented with extension methods in some contexts.

And so on... Just don't use Rx if you can avoid it, that's all.




Last updated on:

Do you see a shift away from clickbait in the long term across the industry? Publishing, search, social media, etc. have been devastated by it (among other factors).

You have to kill it at the funding - if the ad dollars don’t flow to the baiters they won’t bait.

Which for many platforms is a quite difficult thing because that’s they’re revenue and they don’t have the manpower or time to review everything.

Cutting the supply of money will kill the clickbait content alright but it will also kill vast swathes of "content, full stop". Somebody has to pay for hosting no matter what, and if I'm entertaining an idea of writing a fantasy novel in my spare time, I really don't want to be the one to pay. Or rather, I wouldn't want at the time in my life when I entertained such an idea.

I think more dollars is good, as it increases the number of opportunities. This comes with a lot of failed attempts at creating something of value, and that's ok. What we truly need, in a societal sense, is to make next generations recognize the current times as a crazed race after the least demanding content that they are, and to guide them in the direction of articles with enduring value, again. The generation that entered adulthood during these crazy times is probably already lost, the previous generation is busy raising kids, and the previous-previous generation lives in bunkers where only the guys who "go way back" can ever enter. If not the next generation, then we're probably done for, as the IT revolution will eat its children and their children too, like many revolutions before.


Musings after detecting and fixing memory leak in Python

Is reliability of your software important?

Last updated on:

The ProjectX fsck up from last week, from the technical side - that the script should've never been run on prod in the first place is indisputable and there's little need to repeat it yet again - is closely related to the automatic resource management in Python. It works pretty well, most of the time, until one day it doesn't and all hell breaks loose.

It was many years ago, but I fought a few battles against Python garbage collector; I emerged - barely - victorious, though the loses on my side were many. I since forgot most lessons learned, and even if I didn't, they would be close to useless today, as there were at least two major GC overhauls since then. I thought it wouldn't be bad to get back on top of the game.

Furthermore, S. said that he doesn't know the details of garbage collection scheme in Python. I had a feeling that he's not the only one - I mean, I was like this until yesterday, too - and so I thought it would benefit all of our programming efforts to learn the ropes. Hence, I decided to prepare a short (I'll pretend I don't hear you laughing...) write-up on the GC and how it relates to the problem at hand.

CPython Memory Management Scheme

Python is a managed language, like Java, JS, PHP, Ruby, Go, PERL, TCL, Lisp, Smalltalk, and so on (and on, and on - this memory management scheme seems to have "won" in the '90s). and unlike C, C++, D, Pascal/Delphi, Cyclone, Fortran, Ada, Rust, Forth, Asm, and so on.

- this means that, most of the time, there is no need to explicitly allocate or deallocate memory

- this makes Python a *memory-safe language*: certain kinds of bugs (use after free, double free, out of bounds read, buffer overflow) cannot happen in pure Python code

2. Since CPython 1.6 Python uses two (only the first one prior to that) complementary automatic memory management techniques:

- reference counting: each object has a counter, which gets incremented each time the object is referenced from a different context (other stack frame, other name in the same scope, inside of another container) and decremented once that reference stops being reachable (by being deleted, overwritten, going out of scope, being removed from a container, etc.) Once the counter drops to 0, the object is deallocated: *immediately, synchronously, and predictably*. This is a *CPython implementation detail* (ie. it may or may not be true in PyPy, Jython, IronPython, etc), but there's little chance it'll ever change, so it's ok to rely on it (unless you need to use another Python implementation).

- cycle detection: there's a special (doubly-linked) list in the interpreter state which contains all "tracked" objects, that is, the ones allocated with PyObject_GC_New instead of PyObject_New. Most mutable objects, including class objects and dicts, are created this way. Every time the PyObject_GC_New is called there's a *chance* (depending on the ratio of allocations to deallocations since last collection) that the garbage collector will run. When that happens, the list is traversed and cycles - subgraphs of the object graph forming a closed loop, called "cyclic isolates" in the PEP - are found; the GC then traverses each cycle and successively decrements the refcounts ("breaks down the loop"), until they reach 0; then the objects are deallocated. This happens - from the pure Python code - asynchronously.

The two approaches are complementary: every object in CPython is refcounted, which means the GC-aware objects are refcounted, too. Depending on the interpreter compile-time flags and/or runtime configuration the GC can be completely disabled; then the PyObject_GC_New is simply equivalent to PyObject_New.

3. Memory deallocation is a specific case of a more general process called "object finalization" (also "object destruction" in C++ and related languages). In Python, objects are always finalized just before deallocation - in case of cycles, all of them are finalized before the first one is deallocated. It is possible to provide a custom finalization procedure for objects by defining a __del__ magic method on a class. The finalizers are not restricted in what they can do, so it's possible for a finalizer to _resurrect_ an object by creating a fresh reference to itself somewhere.

Prior to Python 3.4 the __del__ methods *were never called in case the object was a part of a cycle*. In other words, the cyclic garbage collector refused to finalize and deallocate objects with __del__ method defined. *This is no longer the case*, thanks to [PEP-442]( In other words, the custom finalizer will be called even if an object is a part of a cycle. It's a very welcome change, BTW, which makes __del__ methods much safer to use than in the past.

4. One important use case for object finalization is releasing resources - graphical contexts (GDK/Cairo), external event loops, loaded libraries, mutexes, child processes, threads, sockets, files and so on. Many built-in objects provide a finalizer, either as a __del__ method or (equivalently) C function stored in tp_finalize field of the PyTypeObject struct (tp_del prior to 3.4).

5. *Important note*: exceptions raised in finalizers are not propagated, and are silently swallowed. This is because there is no guarantee that, at the point in time the finalizer is called, a context where the exception should be raised still exists; an object cycle created inside a function call may get collected after the call already finished, for example.

6. The garbage collection is guaranteed to happen just before the interpreter state is deleted, which always happens before the python process exits. Thus, since 3.4 at least, it's safe to assume the finalizers, including custom ones, *will* be run at some point.

7. I'm not sure when exactly, but in recent Python versions the memory allocator was improved to be able to return the freed memory to the operating system. In Python 2.x, once the memory was requested from the OS, it was never given back - this could, in certain scenarios involving many long-running processes, lead to resource exhaustion even if the memory was not being actively used. This is *no longer the case*, which is another very welcome change.


These are the basics - the details are a bit more complicated due to various optimizations (how often the GC is called), the objects being split into 3 generations (the GC in CPython is a "generational" one), the GC maintaining separate lists for objects of certain built-in types and so on, but in practice they don't matter much, unless you're working with a particularly twisted allocation profile. If you want to know more - just ask, I'll either answer or at least point in the general direction of the answer.

--- ### Memory leak in a script that ran on ProjectX prod

Moving on to the situation at hand: the code as used by @S. would normally be harmless, ie. it would consume at most S3BotoStorage.max_memory_size of memory. There are two things which would guarantee this:

1. During a single iteration of the loop, the id_ reference is overwritten; no reference to the previous id_ is visible in the code. The previous-iteration version of id_ would have its reference counter decremented (just before entering the next iteration), which would trigger its finalization and finally deallocation. As mentioned, this is synchronous and deterministic; we can rely on it, though we should note that it's a CPython implementation detail.

2. The implementation of tempfile.SpooledTemporaryFile contains a _io.BytesIO object as its _file member, which in turn provides a finalizer, which releases both the object itself and all the memory it allocated for content (ie. its internal buffer). It gets more complex in the case of content overflow, where SpooledTemporaryFile changes its internal representation to a named file - it has to introduce an external finalizer then - but this didn't happen in our case (and it wouldn't ever happen with the default settings, see below; also, if it DID happen, there would be no such massive memory leak, actually).

Here's the _io.BytesIO instantiation:

and here's its destructor (finalizer):

Given that the leak happened, we can conclude that the finalizer was not called. Or more precisely, it didn't get the chance to run due to the server running out of RAM; given sufficiently large available memory (>= 16 GB in this case, for 60k images), the finalizers would still run at the end of the loop, the memory would get reclaimed and everything would work just fine. Until one day we'd need to run the loop for 61k images, that is. (I'm assuming the images to be ~250kb in size, based on how 16700 images were able to exhaust 4gb of RAM).

Aside: the fact that the readable() method loads the whole file and caches it (via _get_file) on the S3BotoStorageFile instance is unfortunate, but on its own it wouldn't be dangerous. It would waste some bandwidth - indeed, performing a HEAD request would be much better - but the memory consumption would not exceed the amount needed for a single biggest image. Also, there's a setting, namely AWS_S3_MAX_MEMORY_SIZE, which controls when the temporary file "rolls over", ie. changes its representation from in-memory buffer to a named file. Unfortunately, the default value - 0 - means "no rollover, ever", which is kind of dangerous on its own.

Now the question is where exactly were the references to the id_ objects kept. Looking at the code there is no such place - the CA list only keeps strings, which are considered atomic by the GC, ie. they cannot refer to any other objects; this means the CA list couldn't have kept the processed objects alive. The body of the loop never sees more than one id_ instance, so it's also improbable to be the cause.

There is, however, one object which does see the whole set of IDFrontData instances: the QuerySet. That object, or parts of it at least, and all the objects referenced from there, have to be kept alive until the loop finishes - it only gets finalized and deallocated after the loop.

A quick look through the QuerySet code reveals that, indeed, there is a cache of objects returned:

which, to add insult to injury, gets *fully* populated at the beginning of the loop; the __iter__ method:

calls a _fetch_all method before returning the iterator over the _results_cache; _fetch_all in turn uses a ModelIterable class (by default) over self and calls list on that iterable:

The same thing happens when you call len on the QuerySet, by the way. The _results_cache is only cleared when delete or update methods are called; there's no (or maybe I missed it?) public method for resetting the cache.

To summarize, code like this:

for obj in SomeModel.objects.filter(...):
is semantically equivalent to something like this:
objs = list(SomeModel.objects.filter(...))
for obj in iter(objs):
del objs

So this is where the references were kept, which explains the memory leak: every obj is still reachable via the list, so they cannot be finalized and garbage collected. Normally it's not a big deal, and this is a sane performance optimization - the real problem comes from the model instances being *mutable*: whatever you do to the instance (or objects referred to by the instance), even without calling save() on it, the changes are kept in the list until the whole list is collected.

### Possible Fixes

1. Looking more closely at the QuerySet code we see that the _fetch_all method uses something called _iterable_class(self) which returns a *generator* - hence the cast to list when assigning _results_cache in _fetch_all. There is another public method which exposes that generator - iterator():

That generator returns results one by one, without saving them anywhere - so, in this case, code like this:

for obj in SomeModel.objects.filter(...).iterator():

would yield the expected results: each object would be only referred from the body of the loop and it would get finalized and deallocated immediately before entering the next iteration.

2. It's of course possible to simply close the file, which would also clear the buffer and release the memory:

for id_ in IDFrontData.objects.filter(user_selected_state="CA"):
    if id_.source_image.file.readable():

That would however require us to know in advance that the readable method does what it does (ie. reads the whole file and caches it), which may or may not be explained in the docs (I didn't check, would guess it isn't).

3. We could create a (deep) copy of each object before processing it in the body of the loop. It would waste some memory and time, but it would guarantee that nothing we do to the instance is saved in the original list:

for id_ in IDFrontData.objects.filter(user_selected_state="CA"):
    id_copy = copy.deepcopy(id_)
    if id_copy.source_image.file.readable():

Here the id_copy would get finalized and deallocated just before the next copy is created.

4. For fun: we could make use of the mutability of the list and "clean up" after each step:

qs = SomeModel.objects.filter(...)
for i, obj in enumerate(qs):
    qs._results_cache[i] = None

This assumes the iter() call doesn't copy the original list, in other words that the qs._results_cache is exactly the same list we iterate over.


There are many more ways to handle the situation - once you know what happened it's easy to find a good workaround. The "once you know" part, though, is the problem, which leads to the final point:

### Further Recommendations

I've been thinking about the reliability of our products for a year now, but I just realized that I've never shared my thoughts on the matter, other than in private chats. I'll use this opportunity to summarize the most obvious ones.

1. The biggest problem (aside from running a script, any script, on a production server) in this situation was the fact that the exact failure reason was only discovered after many hours. It should have been discovered immediately: the RAM usage is being monitored in the AWS and the exact problem is clearly visible on the graphs there. The reason for the delay is that it required someone to look at that graph: this is not reliable (humans are unreliable in general). Instead, once the RAM usage exceeds a sane value, that fact should be automatically reported everywhere, immediately. By "everywhere" I mean: sent on all Slack channels related to ProjectX and devops, sent on Slack in private message to person/people responsible for ProjectX, sent by mail to every dev in TS and to ProjectX product owner, sent by SMS to devops and devs responsible for ProjectX, displayed on a status page (we should have one, by the way) for the product, sent by FedEx and carrier pidgeon to just about everyone...

...I mean, come on guys, the *production died*. Twice! This is not supposed to happen, and when it does, it's a RED ALERT (or it should be) across *the whole organization*. The details of a failure cannot be locked inside private conversations and AWS console that no one ever opens. The more people get involved, and the more information they get from the get go, the faster the situation will get resolved.

2. There should never be a reason for anyone to restart the server manually. Once the server gets unresponsive it should immediately be isolated, a fresh one should be created and the traffic should be rerouted there. The faulty server itself *should not be restarted*: this way important clues needed for resolving the situation could get irretrievably lost, which would mean that we wouldn't be able to fix the issue. The best we could do in that case would be to "wait for it to happen again". Which is wrong on many levels - I hope I don't need to explain why.

Doing this on Kubernetes is far easier, but it's perfectly doable with EC2 instances, too. Also, obviously, we should implement some limits in this logic to make sure it doesn't fall into *infinite* loop. Another thing: the logic for handling failures like this *is also code*, and *untested code is broken by default* (if it isn't that's just a lucky coincidence). This means that we have to do a full dry-run after implementing it - we don't want to test the solution for the first time during the *real* outage, because then we'd almost surely need to debug and fix both the server, and the recovery logic at the same time. Hardly appealing perspective.

3. The operating system should never allow a single process to eat all the RAM and get away with it without getting killed. In recent Linux kernels there is a support for per-group and per-process memory limits. It would not protect against a fork bomb (try running :(){ :|:& };: in BASH), but in this case the memory was all inside a single process. That process should have been brutally murdered long before bringing the whole server down. This is a bit contradictory with the previous point - some important clues for resolving the issue might have been lost with the process death - but it's a trade-off: ease of debugging vs. guaranteeing uptime. We're talking about production here, so (esp. in the absence of higher-level disaster recovery) it wouldn't hurt to err on the uptime side.

4. There should never be a need to run any script on a production server. On the other hand, it *cannot be impossible to run one-off tasks* in the context of production deployment. The best way to achieve both goals would be to provide, created on-demand, copy of the server or the whole environment. This would require some tweaks in the code and would be much easier on Kubernetes than on EC2, but it's possible on the latter too. Spinning up such instance should be easy, ideally needing a single click or a single command - we need to disincentivize touching production as much as possible, to the point where *everything* you'd want to do on production is *easier done* outside of it - otherwise situations like the current one *will* happen, it's just a matter of time.

5. We should create a "disaster checklist" - document outlining the exact steps needed to be taken by anyone related to the outage (and in case of production, again, we're *all* related). There should never be a need to think on what to do next; who to call, what data to gather, when to restart, when not to restart, what to tell the users, when to start panicking and so on. People are unreliable even in the best of circumstances; in a stressful situation, possibly late at night, most likely while being dragged away from other tasks, that unreliability can reach astonishing levels. A checklist, while on its own it doesn't guarantee the swift resolution of a problem, helps people to act sane in all kinds of circumstances. The document should be public and should include sections for developers, devops, and managers alike.

6. People are unreliable in general, but especially so in situations they encounter for the first time. It's virtually impossible, even with the instructions in hand, to follow a process (any process) correctly on the first try. This is a truism: talk with any soldier (we've got one former soldier IIRC), or a diver, or a climber, and you'll hear exactly this. This means that, if we want to be good at handling outages, the only thing we can do is to handle outages many times. But we don't want that - ideally we'd like to see no outages at all. There's an obvious way around this: we need to *exercise* using servers which *can* die; if we don't have such servers, then we just need to create them. We can easily create a copy of production environment, bring it down, and perform a dry run of our recovery procedures. Ideally we would do this periodically and update the exercise content after every real problem we encounter.


Again, like with the GC above, these are the basics - a short summary, even - and there are many finer points to discuss; again, Ask Me Anything, I'll do my best to explain everything.


I think one important - THE most important - question regarding our reliability is this: *do we want to take the reliability seriously*? Because as of right now - we don't; if we did, we'd look around and actively search for and implement solutions used successfully at Boeing, in the armed forces, on the satellites and SpaceX rockets, basically everywhere where the "mission critical" means "people die if we fsck up". The solutions are out there, described in detail, ready to implement if we really wanted to, really cared about reliability.

--/md Comments

I finally made the flashcards!

Last updated on:

For a few years now I've been thinking of using "spaced repetition" for learning new and, crucially, retaining exisiting knowledge over the long term. My memory doesn't get any better with age, and it's probably going to get progressively worse as time goes by. I thought about countermeasures, and the "flash cards" idea seemed like a good bet. It works like this:

First, you prepare a bunch of "cards". You take some post-it notes and write a question on one side and an answer on the other. After you assemble a "deck" of such cards, you need to review them periodically. You take a card from the deck and read the question. Then you try to answer it. Then you flip the card and compare your answer with the one written down. Finally, you do some complicated calculations to asses how well you remember this topic and when to look at the card again in the future.

Having said that "I've been thinking" about it for years, it should be obvious that something in the process didn't really work for me. I attempted this many times, but I just couldn't get past that hurdle. That step, my fated enemy, was obviously this:

First, you prepare a bunch of "cards".

However, even if I somehow managed to best it, the next one - doing something consistently over some time... It's simply impossible, a no is a no. Not without software of some kind.

Not very surprising that I decided to build such a software. But, before software, I needed data, to make the cards. Around 6 months ago I asked on Github for's dataset, and the author generated a JSON dump for me (thanks a lot!) I could make cards en masse now, with the description and source code on them. I didn't have a way to display them, however. I considered coding the display part in one of two ways:

  1. Emacs route: new frame, run-at-time
  2. Awesome route: new "wibox" with Pango markup-based widget

Emacs wouldn't be a bad choice here, as it already supports syntax highlighting, a major requirement for displaying cards. On the other hand, at any given moment I can have 0 to 6 Emacsen open, and the scheduling part of the solution would need to account for that. Both edge cases (0 vs multiple) are problematic, as one would make it impossible to run any Elisp code without additional setup, and the other would require some kind of inter-process synchronization. So, a pain either way.

My window manager, on the other hand, has no such problems: there's exactly one instance at all[1] times, and it's always there, unless I'm doing something outside of X, which I don't.

Enough of a backstory, let's start with technical details. I use Awesome as my window manager[2]. Awesome is written in a mixture of C and Lua. It's homepage states that:

awesome has been designed as a framework window manager. It's extremely fast, small, dynamic and heavily extensible using the Lua programming language.

awesome provides a Lua API structured as a set of classes (emulated with Lua prototypal inheritance). The most prominent among these are widgets and wiboxes - boxes for widgets. Wiboxes are X windows, I think, and widgets are basically chunks of drawing logic. Widgets draw themselves on Cairo surfaces and display in wiboxes. An event system based on signals chains these elements together.

The idea is nice: Cairo seems to be the "industry standard" in 2d graphics and has Pango for text rendering, and Lua is a dynamic scripting language with a few interesting capabilities.

However, when I tried actually writing the thing, I ran into two major problems: a small number of built-in widgets, and the dynamism of Lua.

The former is self-explanatory: it's always better to reuse than to write from scratch, if possible. If you're used to the wealth of widgets that GTK & co provide, getting by on a fraction of that is going to be hard.

The problems with Lua are more complex. awesome has, in general, good documentation, but the API area is large, and lack of contributors means that many things are undocumented. At that point, you try to read the source, but to do that, you need to find where something is defined. You search the codebase but get completely nothing as a result. Turns out, the name you're looking for is composed from parts at runtime. Ok, so let's attack it from the other side: let's use introspection at runtime to dump the shapes of various objects. Then you realize that debug info for all methods points to a single line of code... where a decorator used for defining methods is itself defined. Extracting a class hierarchy from the running system is even less possible: some methods are injected directly into objects instead of being inherited from a class, which is a problem, but that there's literally no difference at all between classes and objects - it's all tables anyway - and the mismatch between classes and metatables make it hopeless.

It's not bad to have a dynamic system. However, looking at it only through it's dead body (Gilad Braha reference, dead programs and pathologists) is tragic. Smalltalks are as dynamic as Lua, but they provide all the tools you need to find, read, execute, and debug the code while the system is running. Neither Lua - a minimalist language, leaning to embedding as a main use case language - nor awesome provide any such tools. What you get is the source code on disk and the REPL (fortunately, at least the REPL is there).

I came up with a way to improve the situation: if I could get the API of awesome covered as Haxe externs, I would have a capable macroassembler for Lua, with a gradual type system and good tooling. Context-sensitive completion, go to definition and find references tools that Haxe compiler provides are good enough for my purposes. Creating externs itself was a problem, but I only needed to solve it once, so I tried doing that.

First thing I tried was to write a Raku script that would parse the LuaDoc annotation from Lua sources. I quickly realized that it would be a giant pain: LDoc allows definition of custom annotations, which, combined with mistakes in the docs, made it exceedingly hard to interpret the data correctly. To avoid that, I reused LDoc's own parser: passing a Lua function as a "filter" when running LDoc allowed me to dump the project documentation

Haxe is also minimalistic due to it having so many targets.
  • Well, almost, like with everything...
  • Wiki

AwesomeScript: Lua + Awesome + Haxe == profit!

My third attempt at getting a project off the ground

Last updated on:

I got a nice idea that I want to implement

There are two parts as to why:

  • the resulting project would be quite unique and interesting
  • the end result might actually also be of practical use

The problem is that I probably won't be able to finish the project alone and I'm looking for anyone insterested in and capable of helping.

Let me first tell you what it's about.

Three well-established technologies in one package

An example of what I want to do that is relatively well-known would be OpenRESTY. It's a distribution of Nginx web server bundled with Lua implementation and Lua bindings to various APIs, allowing you to generate web pages dynamically without ever leaving Nginx. This is in contrast to other languages, which live behind Nginx, not inside of it.

Due to coroutines, lightning-fast JITted implementation, and mature Nginx codebase, OpenRESTY is impressively performant, especially against its interpreted/non-JITted competition.

I want to make a package that similarly integrates a mature C codebase and extensive Lua bindings. I want to add one more component to the mix on top of that, but the general idea of creating a distribution of other projects is the same.

Awesome window manager and Lua

Awesome is a small and fast window manager for Linux. It supports both tiling and floatiing modes, is written in C and extended with a Lua class library. The core is relatively small and it can use LuaJIT (instead of plain Lua) without problems. Awesome uses GLib and Cairo for some of its functionality. It also uses XCB for window management, so it's currently coupled to X.

The class library is actually a full-fledged windowing toolkit. Widgets are defined in Lua, taking advantage of Lua tables' flexibility, and drawn on Cairo surfaces. Awesome is fast, consumes very little memory, and has potential for as much eye-candy as you'd care to add.

Multi-target transpiler - Haxe

Haxe is a gradually typed language that compiles to 7 different platforms, including Lua. It's a simple language on the surface, with some similarity to ECMAScript, but includes AST-based macros and an interface for compiler plugins, which allow for endless customization. Moreover, the compiler is written in OCaml, which makes it both very fast and easy to modify.

What are the benefits of bundling them together?

Glad you asked! In my opinion, Awesome could benefit a lot from Haxe. Not only is Haxe capable of performing various optimizations that you wouldn't care to do by hand (loop unrolling, dead code elimination, tail call elimination, etc.), but it also provides IDE-class intelligent auto-complete, type information, and documentation.

Lua is a great language, but using it well requires just about as much self-discipline as using JavaScript well. That is to say: a lot! Haxe adds a layer of safety and compile-time guarantees on top of LuaJIT dynamic runtime.

Enough chit-chat! What is it you want to build?

In short: a desktop environment inspired by Smalltalk. An environment where you can inspect and modify the code of every widget on the fly. Smalltalk is unapologetically dynamic, but so is Lua. On the other hand, Smalltalk-like IDE functionality would be hard to replicate for Lua. With Haxe in the mix, though, we can get a long way towards that functionality without lifting a finger.

There are problems of course. Haxe is not designed to make its output reloadable, and enabling this would

Inspect widget, edit source (in Emacs)

  • coroutines-based async in Awesome

A long time ago in a galaxy far, far away...

Last updated on:

I'm cleaning up the blog, found this gem by accident. It was hard to capture, as you can see, due to the bus shaking wildly, but that's a Linux kernel panic there.

Well, it was just a small system for displaying ads, no big deal if it crashes. If it was an autopilot on a highway... Well, it could be worse 🙂


Write Emacs Lisp like a pro!

A guide to modern Elisp development

Last updated on:

Intro, or - why bother?

This is very wrong:

Emacs is a powerful editor. Its power comes primarily from thousands of man-years spent on writing libraries and apps (or modes) for it. The core Emacs contains half a million lines of Emacs Lisp, and if you go wild with installing packages you can easily end up with a million or more lines of Elisp. There's a lot of functionality there, for sure. You can get pretty far just using other people's code, with minimal customization in hooks and customizeable variables.

However, to actually benefit from all that power to the fullest, you need to delve deeper and understand what Emacs really is. You need to shift your perspective and look beyond the text-based interface Emacs provides.

Emacs is an implementation of Emacs Lisp - and that is its true nature. When thinking about Emacs don't compare it to Sublime Text or VS Code, but rather to Python, Ruby, or JavaScript. Just like them, Emacs provides a language, a standard library, and an ecosystem of packages. You can use Emacs as a tool for text editing, but the full potential will be unleashed only when you begin to treat Emacs as a tool for making tools.

Historically, Emacs Lisp wasn't the greatest language to write programs in. The lack of lexical scoping, messy standard library, no package manager, slow execution were plaguing Emacs for decades. There's a lot of older Elisp code floating around and all the hacks and workarounds in there are a testament to the language's problems.

However, things have changed drastically over the last decade. Almost all problems with Elisp were addressed. Lexical scoping is there, a package manager is in the base distribution, wrapper libraries like dash, s, f expose and extend the functionality in a more organized manner, the AOT compilation of byte-code to native has shown remarkable improvements in performance. You can now write extension modules in any language capable of producing native shared libraries. There's a preliminary support for native threads. Emacs always had the potential, but recently it became a full-fledged IDE for Emacs Lisp, one that could compete (and win) against JetBrains product, if they decided to write one for Elisp.

The only remnant of the dark past, and the biggest outstanding pain point, is lack of namespaces and first-class modules. The improvements on this front would be very welcome, but you can work around this limitation the same way you do when writing C: just namespace your definitions yourself with a prefix of your choosing. I use my- prefix for all my code. It's crude, admittedly, but it works well enough.

In any case, to get the most out of Emacs, you will need to write some Elisp yourself. The better you learn Elisp, the easier it will be for you to bring out the full potential of Emacs.

This series of posts is not a replacement for reading an Elisp tutorial or a book about it. You still need to learn a lot on your own. The tools discussed here will help you along the way - they will let you learn Elisp faster and will help you write your own Elisp faster later on. The time investment of setting all the tools up will be well worth it in the long run.

Basic tools

  • Helm
  • Magit
  • Org
  • Treemacs / Neotree
  • Imenu
  • display-line-numbers-mode / linum-mode
  • fill column indicator
  • highlight current line
  • auto-complete / company
  • yasnippet
  • re-builder (make bindings for search&replace from within)
  • hydra / pretty-hydra

Reading the docs (you don't even need to google!)

  • helpful
  • info viewer
  • info+
  • help+
  • apropos different commands
  • describe

apps and libraries,

libs for elisp consumption

apps with commands for users and settings and stuff

some libs may have a few commands, but not many

buffer api available: batch and background

tools and knowledge

info manual





go to source


refactoring tools



(inspector-inspect 1)



describe- commands



(require 'cl-lib)
(cl-loop for sym being the symbols
         if (s-ends-with? "?" (symbol-name sym))
         collect (symbol-name sym))


TODO startup times of various interpreters relative to Emacs


A Smalltalk environment that Just Works

Fixing some glitches in Visual Works IDE plugin

Last updated on:

NOTE: unusually, there's no code on Github for this post; contact me if you'd like to get the fixed plugin.
NOTE: screenshots' dimensions may look awkward - it's to make them fit on the page

Visual Works - a modern Smalltalk IDE

Cincom® Visual Works™ is an implementation of Smalltalk language and, true to it's ancestry, also an Integrated Development Environment for Rapid Application Development[1]

There are many Smalltalk implementations, some of them fully open-source, others created and maintained by a vendor as a commercial product. The state of the open-source Smalltalks is not the greatest, to put it mildldy[2]. It would take a long-ish rant to explain the reasons for this, but I don't believe it would be constructive. Instead, let's focus solely on the hero of today's post: Visual Works™, a commercial implementation that also offers a free, non-commercial Personal Use License. What you get under PUL is a tad older version - 8.3, from 2017 - but otherwise (almost) fully functional Smalltalk system, with (almost) all its source code available, along with lots of tools, libraries, and documentation.

The free version of VW doesn't contain anything that deals with cryptography. The most prominent victim of this lack is HTTPS support - there's none[3]. Other than that, you get everything VW 8.3 has to offer, including an extensive collection of ready-to-use libraries and plugins. Let's take a quick look at the VW IDE.

Browser - Visual Works' IDE

Almost all Smalltalks are IDEs themselves. For languages like Java, while you can stick to Notepad++, at some point you will probably need JetBrains products or Eclipse or something - in addition to Java compiler and its virtual machine. This is different for Smalltalks: each implementation is its own IDE.

In case you're wondering: yes, there is a reason why I say they "are" IDEs, not that they "have" IDEs. It's easier to show than explain, but in brief: Smalltalk is a language plus a library of classes. In Smalltalk, an IDE is just a set of classes within the stdlib, in the same manner a String class is. And just as you can change the String class in the running system, you can also change the IDE code the same way: while it runs, live, within the Smalltalk system. The classes IDE consists of use the graphic primitives provided by Smalltalk, use the tools for refactoring and code searching provided by Smalltalk itself, and in general only communicate with the OS to draw themselves.

The part of Smalltalk that works as an IDE in Visual Works is called a Browser. It looks like this (I shrunk the window a lot so that the screenshot looks good on the blog, so keep in mind that everything you see is obviously resizeable)

Default view of the VW IDE

A very fast explanation of what's what follows:

  • A - a package name in a global list of all installed packages
  • B - Searchligh, a multipurpose search tool & indexer
  • C - class name in a list of classes local to the selected package
  • D - show normal (Instance) or static (Class) methods
  • E - name of the method in a list of all methods of the selected class
  • F - source code for the selected method that you can edit
  • G - built-in test-runner, here showing that all of 1 tests passed

Of course, this is just the default view you get when you open it. Beyond the many built-in capabilities, there are also community-provided packages and plugins that you can install.

"It's a bit ugly, but it works"

While Visual Works is an IDE, it's also a Smalltalk environment. Think Electron with dev tools enabled. You can inspect and modify any part of the environment and see immediate feedback. Loading plugins is just one of the multitude of things that you can do to the environment: you can create, remove, and change class and method definitions, you can edit graphical assets and fonts, you can create new and edit any of the existing windows. Everything you do will have immediate effect.

Leaving that side, I installed one plugin that promised to display information about classes in a more structured format, in a single window. It's convenient, because normally you need to switch between 2-3 tabs to get the full picture of a given class. Knowing this, I installed it without hesitation.

Unfortunately, it didn't look very good.

ClassDefinitionTool just after installation

There's a lot I didn't like about this UI, starting with how labels of radio-buttons don't have enough space between then and are displayed one over another. The tool is undeniably useful, though, so I decided to fix it.

Editing the GUI with UIPainter

The reason for the problem with labels was that the author of the plugin set positions of various widgets in pixels, making them absolute, and unable to change their sizes to accomodate a bigger font that I chose as a default.

So, how do we fix this and a few other problems with the GUI? First, we need to locate the class responsible for it. Turns out that searching for ClassDefinitionTool was enough to jump right to its source. Next, you need to locate a method where the "spec" is stored. Spec is a declarative description of the GUI, including all widgets, their positions, relations, event handlers, etc. It looks like this when looked at "raw":

Raw view of windowSpec

But you rarely do it, and you never edit the spec by hand. Instead, you use a plugin called UIPainter, which allows you to manipulate the spec graphically:

Window spec open for editing in UIPainter

In this particular case, I was going to add some space between the labels for "indexing" and "visibility" rows, set min width and height in a few places, add some margins, and change some fonts. You can do all that from the UIPainter interface - choose a widget in the tree displayed on the left, go to "Position" tab, and edit the field you find there:

Setting position of a widget

Click "Apply" and you're done. After a few more trivial fixes, the UI for the plugin started looking quite decent:

UIPainter with fixed ClassDefinitionTool

The orange dividers are my preferrence, sorry if I offended your sensibilities.. Anyway, it took less than five minutes to fix the things that irritated me the most. After finishing, you need to "Install" the edited UI back into the windowSpec method - there's a menu item in UIPainter, in "Edit" menu, that serializes the edited spec, updates the method, searches for all instances of the plugin, and makes them redisplay themselves with new spec. Here's the effect, visible immediately after editing the UI:

Displaying fixed ClassDefinitionTool

What's worth noting is that every single element visible on the screen is yours to manipulate in a similar way. You might have noticed that the UIPainter, itself, had some problems with labels sizes. No problem! You can open the UIPainter on itself, and edit those as easily as with the ClassDefinitionTool.

You can see the real window and one of its widgets being edited in the screenshot below. From here it's literally one click to fix the problem.

Now, can you do the same with any other IDE - and as easily as in Visual Works? The UIPainter is not a new idea at all, and it's included in multiple IDEs to generate some kind of interface description. I'm quite sure none of these allows you to open the tool on *itself* and make changes *while the code is running* and without any additional setup? All of the above was done with just the default install and with just two plugins loaded.

But... what about Emacs?

Emacs - or rather, Emacs Lisp development - is in many ways very similar to the Smalltalk environment. Due to its roots and the need to work in a terminal, Emacs is keyboard-centric and the mouse use is not as prevalent. On the other hand, the ability to change pretty much anything while Emacs is running makes for an experience similar to that of Smalltalk.

Among other things Emacs provides, it also has powerful text editing capabilities. There's a lot to like: expand-region, multiple-cursors, iedit, editable occur, transpose-* commands, jumping to closing/opening paren, bulk indenting of multiple lines, recording macros, and so on. Almost none of these are implemented in Visual Works (nor any other Smalltalk, to my knowledge) editor and, yes, their lack hurts a lot.

Fortunately, both Visual Works and Emacs are fully programmable environments. It's not a big deal to make them talk to each other and work together. I made a little - around 80 loc - Emacs mode that listens for TCP connections on a local port and when a connection is established, it reads the content of the message and displays it in a new window (frame in Emacs parlance). You can edit the source there, and when you press C-c C-c it sends the edited contents back through the socket.

On the other side, I added a command accessible in SourceCodeTool, ie. in the bottom half of the browser window. Once you activate the command (currently via right mouse button/context menu) it opens a connection to Emacs, sends the currently selected method source through the socket, then waits for a response. It replaces the code with whatever comes back through the socket.

It took less than 150 loc in total to set this up. Here you can see it in action:

Current implementation is on the level of Proof of Concept and making it more robust overall and asynchronous on the Smalltalk side (currently the window controller process is blocked while it waits for response) would require a bit more effort. On the other hand, making the PoC took me two or three hours at most - and even in that state it is very useful. Reformatting longer methods to my liking is now a lot easier, with all the power of Emacs available. Adding features like auto-complete on the Emacs side with symbols provided by the Smalltalk side may not be trivial, but it's definitely doable in a weekend or two.

In any case: with environments as easily extensible as Emacs and Smalltalk both are you don't really have to choose one over the other. You can connect them and use them together, with just a little bit of code.

  • RAD on Wikipedia
  • One person's experience with Pharo. It's old, but many of the problems they complained about have not been addressed, to this day.
  • Though of course you can hack it trivially, for example by loading libcurl dynamically and calling it, or you can set reverse proxy with Nginx, or use one of the multitude of other possible workarounds, but you're on your own then. Not that I ever wasn't...

Hacking Erlang shell to support Alt+Left/Right cursor movement

Last updated on:

My console setup

I spend a lot of time in the console. It's still a very productive environment, even though the need to preserve the compatibility with a character-oriented TeleTYpe[1] of the '60s prevents any radical improvements. It even works well out of the box until, at one point, it doesn't. This is a story of one such case.

I use URxvt[2] (terminal emulator), TMUX[3] (Terminal MUltipleXer), and ZSH[4] (shell) as my console. I switched to URxvt for the unicode support which wasn't common at the time. A relatively small number of issues with display and input with commonly used applications was a nice bonus.

The only problem I had with it is that pressing Alt+Left/Right inserts C or D characters instead of jumping over a word. Apparently, URxvt sends a sequence of key codes which is not, by default, mapped to anything in most programs. There's a very simple fix for it, at least for programs that use the Readline[5] library. Readline is configurable, and you can define your own bindings in the ~/.inputrc file (see the docs); this is what I use, for example:

  "\e[1;3D": backward-word ### Alt+Left
  "\e[1;3C": forward-word  ### Alt+Right

With just two lines, line editing becomes mostly a solved (for me) problem. Mostly. Though few and far between, some programs don't use Readline or emulate it half-heartedly (thereby ignoring the config file). One of such programs is the Erlang (and, by extention, Elixir) shell.

BTW: Yes, I do know about M-b and M-f. I use Emacs. I just really like my arrow keys, ok?

The journey begins

One Saturday evening, after a decade of putting up with it, I decided to try fixing the problem. I'm honestly not sure what I was thinking, I must've been bored out of my mind. I mean, honestly, who'd want to dig through entirely undocumented, complex internals of a multi-process beast with job control and remote capabilities...? (No, please don't answer, thanks.)

When I resolved myself to go down the rabbit hole, I searched on the net for docs or posts explaining how to customize anything more than the prompt in the shell. It took quite a few tries to get anything at all, but I finally found two (just two) relevant posts. One is a post from 2013 (by the author of "Learn You Some Erlang"[6], about the shell's architecture) and the other a StackOverflow post answering the question of how to add Readline features to Elixir's IO.gets and friends.

The TL;DR from these two is basically "go look at edlin.erl". On a diagram (borrowed from the first post) of processes (and modules) the shell consists of it's the part marked in red:

edlin.erl is part of a stdlib application and can be found in ./lib/stdlib/src/edlin.erl, starting in Erlang build directory root. You can see the whole file here. I made a fresh clone of the Erlang/OTP sources to avoid breaking my system-wide installation in case something went wrong. Compilation took some time, but it finished without problems.

Inside edlin.erl there's a state machine[7] used for parsing the incoming key codes into atoms denoting actions. It's a common Erlang pattern, where Tail Call Elimination is exploited to encode state transitions as normal function calls. It looks roughly like this:

edit([C|Cs], P, {Bef,Aft}, Prefix, Rs0) ->
    case key_map(C, Prefix) of
        meta ->
            edit(Cs, P, {Bef,Aft}, meta, Rs0);
        meta_o ->
            edit(Cs, P, {Bef,Aft}, meta_o, Rs0);
        meta_csi ->
            edit(Cs, P, {Bef,Aft}, meta_csi, Rs0);
        meta_meta ->
            edit(Cs, P, {Bef,Aft}, meta_meta, Rs0);
        {csi, _} = Csi ->
            edit(Cs, P, {Bef,Aft}, Csi, Rs0);
        meta_left_sq_bracket ->
        % ... more cases ...
        {undefined,C} ->
        Op ->
            case do_op(Op, Bef, Aft, Rs0) of
                {blink,N,Line,Rs} ->
                    edit(Cs, P, Line, {blink,N}, Rs);
                {Line, Rs, Mode} -> % allow custom modes from do_op
                    edit(Cs, P, Line, Mode, Rs);
                {Line,Rs} ->
                    edit(Cs, P, Line, none, Rs)

With the key_map function defined like this (out of order excerpts):

key_map($\^A, none) -> beginning_of_line;
key_map($\^B, none) -> backward_char;
key_map($\^D, none) -> forward_delete_char;
% ... more clauses ...
key_map($B, meta) -> backward_word;
key_map($D, meta) -> kill_word;
key_map($F, meta) -> forward_word;
key_map($T, meta) -> transpose_word;
% ... even more clauses ...

To be perfectly honest, the fact that it's a state machine wasn't obvious to me at first. My Erlang is a bit rusty, and one-letter identifiers don't make it the most reable code ever. It's also not trivial to see what sequence of codes will be actually sent to the edit function. I had to dig deeper.

Debugging the shell

First of all, this may be obvious, but in this case tried-and-true debugging with prints doesn't work. From withing edlin you get no direct access to the terminal, which makes sense, given it's itself a part of the terminal handling code. This tripped me up in the beginning a bit.

Fortunately, Erlang has an excellent graphical debugger, which you can attach to any process. To actually make use of it, you need to first reload a module you want to debug with it's instrumented, interpreted version. This is done with the int (or :int in Elixir) module[8]. Unfortunately, when I tried, it didn't work:

  -▶ ./bin/erl
  Erlang/OTP 24 [RELEASE CANDIDATE 1] [erts-11.2] [source-444144870c] [64-bit]

  Eshell V11.2  (abort with ^G)
  1> int:ni(edlin).
  =ERROR REPORT==== 29-Mar-2021::20:37:21.516194 ===
  Can't load module 'edlin' that resides in sticky dir

  ** exception error: no match of right hand side value {error,sticky_directory}
       in function  int:'-load/2-fun-0-'/3 (int.erl, line 531)
       in call from int:load/2 (int.erl, line 527)

Apparently, Erlang code server has a list of "sticky dirs" - modules living in them are not to be reloaded. Makes sense, most of the time. There has to be a way of disabling it though, right? Yes, there is - you can disable it globally with -nostick flag, or per directory or module, like this:

  2> code:unstick_mod(edlin).

Unfortunately, that's still not enough. Apparently, to be interpretable, a module has to be compiled in a special way to include some debug data. If it isn't, you will get the following error:

  1> int:ni(edlin).
  ** Invalid beam file or no abstract code: edlin

You can do this from the shell:

  2> compile:file("./lib/stdlib/src/edlin.erl", [debug_info]).

But then you have to remember to put the compiled file in the correct ebin directory yourself. Alternatively, you can pass a +debug_info to the erlc invocation (as you can see in its help message, +term "passes the term unchanged to the compiler"):

  -▶ ./bin/erlc +debug_info -o ./lib/stdlib/ebin/ ./lib/stdlib/src/edlin.erl

Now you should be able to unstick and instrument the module, and then start the debugger in one go:

  3> code:unstick_mod(edlin), int:ni(edlin), debugger:start().

Working with the debugger

In the newly opened window, click on Module menu and select edlin -> View. Then scroll down to the line you want to break on and double-click it (anywhere on the line). It looks like this on my computer (click to enlarge):

Now, when you switch back to the terminal and press a key you're interested in... nothing will happen! Instead, the process will reach the breakpoint and will stop. This is indicated by the break value showing up in the status column in the main window:

To actually start debugging, you need to use Process -> Attach menu item with the edlin process selected. It will open a window with the code, list of local variables with their values, and buttons for stepping over and into calls. Just remember that for the debugger to work, the module you want to debug has to be instrumented. If you try to step outside of the edlin module you won't see anything.

This is how the debugger looks like in action (click to enlarge):

Getting back to the code

After stepping through the execution of edit/5 function a couple times I was able to guess a few things. Here's the function head again:

edit([C|Cs], P, {Bef,Aft}, Prefix, Rs0) ->
  • The first argument is a list of keycodes (as integers, which also happens to be how Erlang encodes strings, which helps with the preview of values). C is the current code, while Cs contains the rest of the codes to be parsed. This argument is the first part of a state machine and represents state transitions.
  • The second argument is a prompt, as a string. It's not used much and can be ignored in this case.
  • The third argument is a pair of strings. They are the result of splitting the current line at cursor position: Bef keeps the part on the left of the cursor, and Aft the part from the cursor to the end of line. These change when inserting or deleting characters, but in this case they stay constant, so the argument can be ignored.
  • The third argument, Prefix, is an atom (or a tuple of an atom and a string, as we'll see in a moment) which says in what state the parser is currently. This may be none - a starting state; meta - after a modifier key was pressed; meta_meta - if we found two escape characters in a row - and quite a few other values. This is the second part of the state machine.
  • The last argument is, I think, a list of low-level commands (called "requests") for the TTY driver to add or remove characters, move the cursor, blink, and so on. Since I don't need to add any new functionality here, it is also safe to ignore for now.

The key_map function takes the next key code and the current state. It then returns the next state. The edit function interprets the new state and either loops to parse the rest of the codes list, or returns a list of commands for the driver to execute.

Recall my .inputrc: the terminal should send the following key codes sequence when Alt+Left is pressed (BTW: you can use Control+V[9] in the shell to quickly check what keycodes are sent):


Looking at the values of C and Cs variables in the debugger proves that, indeed, this is what edlin receives. For the record: \e numeric value is 27, which you can see in the screenshot. Here is the sequence of key_map function calls (in other words, states of the parser) when given this sequence of codes:

key_map($\e, none) -> meta;
key_map($[, meta) -> meta_left_sq_bracket;
key_map($1, meta_left_sq_bracket) -> {csi, "1"};
key_map($;, {csi, "1"}) -> {csi, "1;"};
key_map($5, {csi, "1;"}) -> {csi, "1;5"};
key_map($C, {csi, "1;5"}) -> forward_word;
key_map($D, {csi, "1;5"}) -> backward_word;

The first character - \e - puts the parser in the meta state, the next - [ - in (aptly named) meta_left_sq_bracket. I can't guess what the csi atom is supposed to be an abbreviation of csi stands for "Control Sequence Indicator", which is a special code which causes the terminal to interpret the following codes instead of passing them to a program; it collects the key codes starting after \e[. Then finally, if all the codes in between match, we get to forward_word and backward_word states, which are passed to do_op function in in the last case of edit.

Once there are no more codes to parse, edit returns a list of rendering commands, tagged with either blink (self explanatory), more_chars (Enter was not pressed), or done (along with the full text of the line).

The problem and the fix

As you can see from the above code, edlin recognizes \e[1;5C and \e[1;5D as valid sequences of key codes, while my terminal sends codes with 3 instead of 5.

To fix this, the only thing needed is to add three new states to the key_map function, like this:

   key_map($5, {csi, "1;"}) -> {csi, "1;5"};
+  key_map($3, {csi, "1;"}) -> {csi, "1;3"};
   key_map($~, {csi, "3"}) -> forward_delete_char;
   key_map($C, {csi, "5"}) -> forward_word;
   key_map($C, {csi, "1;5"}) -> forward_word;
+  key_map($C, {csi, "1;3"}) -> forward_word;
   key_map($D, {csi, "5"})  -> backward_word;
   key_map($D, {csi, "1;5"}) -> backward_word;
+  key_map($D, {csi, "1;3"}) -> backward_word;
   key_map($;, {csi, "1"}) -> {csi, "1;"};

First, we make encountering $3 while in {csi, "1;"} state valid, and make it transition to a csi tuple (with 3 appended) as the next state. After that, we only need to handle the $C and $D characters when in the {csi, "1;3"} state by returning the forward_word and backward_word atoms. That's it!

Turns out that yes, the three lines above is all that it took to scratch my decade-old itch. As usual on such occasions, the solution is much less interesting than the path to it... Well, being able to use arrow keys is still nice, at least.


Despite some early setbacks, the fix turned out quite simple. When I realized that there's a finite state machine in the code, the rest was relatively straightforward. Erlang encoding of a state machine using just function calls and pattern-matching on states (codes so far) and transitions (current key code) is elegant and lightweight. It's quite similar to how one would write a state machine in Prolog, especially given the Prolog-based syntax of Erlang.

I used the word "parse" a few times in the post. This isn't an accident: parsers can be often represented by a state machine. The edit function is a stateful parser as well as a state machine. It parses a grammar consisting of (among others) \e, [, 1, ;, 3 terminal tokens, with state atoms being the production rules.

It's correct to say that the code illustrated here either "interprets" (if you focus on the state machine being run), or "parses" (if you focus on the allowed sequences of states) the input codes list.

That's it, I hope you enjoyed the post as much as I enjoy my arrow keys working properly in the Erlang shell :-)


ffipf - jump to file in a project quickly (PoC)

Using Emacs loadable modules and native Nim code to speed up the file search

Last updated on:

NOTE: As usual, source code is on Github

ffi... what?

The name stands for Fuzzy Find in Project Fast, thanks for asking. Believe me, I tried to come up with a better name for the project... and failed miserably. Sorry. I'd be forever grateful for better name suggestions.

The project is in a proof-of-concept stage and my main reason for this writeup is to see if there's any interest in something like this. I'm happy with the functionality it has, but it's nowhere near polished enough for other people to easily reuse.

So what is it?

ffipf is a backend for fuzzy matching file paths. It's scoring and sorting algorithm is better suited for matching paths compared to more general fuzzy matching implementations, like Helm and Ivy. It's written in Nim and compiled to native code, which makes it faster and more memory-efficient than other solutions. The compiled DLL is a proper Emacs module and can be loaded as if it was Elisp.

Let's go over the features in a bit more detail next.

Better matching and sorting algorithm

The matching algorithm is designed specifically for patterns targeting paths. It matches each section of the pattern, delimited by /, fuzzily, but then matches each segment in sequence. The sorting is done based on how well the path conforms to the pattern, with the path most closely resembling the pattern at the top. The algorithm is said to be close to the one used by TextMate[1] by its author, whose code I ported to Nim from the Ruby original[2].

In practice, this means that you can skip large parts of the path and input very few characters, yet still arrive at the correct file. You can use it to also quickly list files within a set of similar directories, or files matching some pattern no matter where they are in the hierarchy.

Some examples of patterns I'd use to search for some files and results for them. Examples are from my .emacs.d directory, and the output is shortened (you can easily change how many candidates are returned).

  -▶ ./ffip
  > fo/mag/mag.el                       # wanted: magit.el

  > stg/                                # wanted: all files in plugins-staging/

  > stg/.el                             # wanted: all Elisp files in plugins-staging/

  > plu/REA                             # wanted: all plugin READMEs

  > co/mini/h                          # wanted: my config file for Helm

  > co/mini/                           # wanted: all config files related to minibuffer

The details of the scoring algorithm are a bit more complex, but the effects are very satisfactory in my opinion.

Native module and execution speed

The main functionality is implemented in Nim and compiled to native code. Nim is a high-level language with a Pythonesque syntax, but its performance is closer to C or C++. Thanks to Yuuta Yamada[3] work it's possible to write Emacs extension modules easily in Nim.

My .emacs.d has nearly 40000 files under it. This is a lot, and simply traversing the directory hierarchy takes time; when you add the time needed to process the list of files in Emacs Lisp, the invocation of (for example) counsel-file-jump can take close to 2 seconds to initialize. Further, filtering can also feel sluggish during the input of the first few characters (it gets better when the pattern gets longer).

Traversing the whole hierarchy, or initializing the search, takes just 0.3 of a second with the Nim implementation. Moreover, the feedback - displaying candidates as you type - is always instantaneous.

There are downsides to the usage of native code. For example, it's possible for the module to crash Emacs as a whole in case there's a segfault triggered in the code. However, Nim makes it significantly harder to shoot yourself in the foot like that. Almost all of the module is written in a "safe" subset of Nim, and the only place where a crash is possible is in the parts which interact with Emacs module API. Fortunately, after wrapping the raw API calls with helper procedures, the chance of triggering unrecoverable error also goes down drastically.

Another downside is that you need to compile the module first in order to use it. Fortunately, Nim is much easier to compile than C, and it's available for all major platforms. After installing Nim you're just one make or nimble build command away from a working module. It's also possible to distribute binaries in case you don't want to install Nim, but that's something for the future (I have no way of cross-compiling for non-Linux OSes currently).

Displaying candidates

Currently, I use Ivy for displaying candidates and a built-in project.el for finding the root directory of the current project. The interface is very basic, for example it doesn't highlight the parts which were matched, but it does the trick.

That being said, my main focus is on the backend, the Nim-based dynamic module. It should be easy to write a Helm source for it or interface with it through Selectrum, IDo, or any other completion framework.

The main question

Before I start working on making the implementation bulletproof and usable for others I need to know if there's any interest towards a module like this. The code is currently nearly there in terms of features that I want it to have, and if it's for my personal use only, I can slowly improve only the parts I need. On the other hand, if there's any interest, I would need to clean up the code, remove all the assumptions specific to my setup, and add configuration options at the very least. For example, the directory blacklist (list of dirs that should not be traversed) is currently hardcoded on the Nim side, which doesn't bother me, while it could be a problem for others.

So here's the question: would you be interested in a blazing fast fuzzy file finder for your Emacs?


Clojure-like lambda expressions in Emacs Lisp

If it's this easy, why isn't it implemented?

Last updated on:

Anonymous function syntax

In Clojure[1] there are two syntaxes for anonymous functions. The first one is equivalent to lambda expressions in Emacs Lisp; they look like this:

  (fn [arg1] body)
  (lambda (arg1) body)

The other syntax, which is shorter and built into the reader, has no equivalent in Emacs Lisp. It looks like this:

  #(expr %1 %2)

This is equivalent to the following lambda expression:

  (fn [%1 %2] (expr %1 %2))

The shorter syntax is convenient and works really well with map/mapcar and other higher-order functions. It is, however, absent in Emacs Lisp.

Some time ago I found an implementation of short lambda syntax for Emacs Lisp. It's a simple macro which expands to lambda expression. You can test it like this:

  (funcall (short-lambda (cons 1 %1)) 2)  ; => (1 . 2)

The implementation, however, is incomplete. In the words of the author:

This assumes that there is a reader macro in the Emacs C code that translates #(STRUCTURE) to (short-lambda STRUCTURE), in the same way as for the backquote macro.

Indeed, as it is, it's even longer than normal lambda - clearly there's something missing. Namely: a support for the short lambda in the reader, which is implemented in C.

Implementing the missing part

How hard would it be to add the missing support to the reader? When I was thinking about this, I noticed that there is already something similar in Elisp, namely - hash-table syntax. It looks like this:

  ;; ⇒ #s(hash-table size 65 test eql rehash-size 1.5 rehash-threshold 0.8125 data ())

The #s(...) syntax is supported by the reader, ie. hash tables can be read with the same syntax they are printed in.

Wouldn't it be easy to create a short lambda syntax by changing #s to #f? Turns out - yes, it is easy. It took me an afternoon or two to figure it out, with no prior experience with Emacs C codebase. The whole implementation is less than 5 lines of code.

Changing the reader

First, I located the hash-table reader code in lread.c, which is where the reader lives. There's a function called read1, where the hash-table support is implemented. It looks like this (reformatted for brevity):

static Lisp_Object read1(Lisp_Object readcharfun, int *pch, bool first_in_list) {
  int c;
  // ... more variables here ...

  if (c < 0) end_of_file_error ();

  switch (c) {
    case '(': return read_list (0, readcharfun);
    case '[': return read_vector (readcharfun, 0);
    case ')':
    case ']': {
      *pch = c;
      return Qnil;
    case '#':
      c = READCHAR;
      if (c == 's') {
        // ... lots of code specific to hash-tables ...

The only thing needed to add support for short lambda is to change the line 19 to this:

  if (c == 'f') {
    return list2(Qfn, read0(readcharfun));
  else if (c == 's') {
    // ... lots of code specific to hash-tables ...

The list2 function creates a list of two elements, where the first one is a symbol, defined (that's the second, and last, change to C code) like this at the end of lread.c (fn is and alias for short-lambda):

  DEFSYM (Qfn, "fn");

The second element of the list is whatever can be read after the opening paren.

The whole diff looks like this:

diff --git a/src/lread.c b/src/lread.c
index 015bcc2e11..42fc4050ae 100644
--- a/src/lread.c
+++ b/src/lread.c
@@ -2880,7 +2880,10 @@ read1 (Lisp_Object readcharfun, int *pch, bool first_in_list)

     case '#':
       c = READCHAR;
-      if (c == 's')
+      if (c == 'f') {
+        return list2(Qfn, read0(readcharfun));
+      }
+      else if (c == 's')
          c = READCHAR;
          if (c == '(')
@@ -5168,6 +5171,7 @@ syms_of_lread (void)
   DEFSYM (Qload_force_doc_strings, "load-force-doc-strings");

   DEFSYM (Qbackquote, "`");
+  DEFSYM (Qfn, "fn");
   DEFSYM (Qcomma, ",");
   DEFSYM (Qcomma_at, ",@");

When is it useful?

As mentioned, the shortened syntax works well with higher-order functions. It's not essential, but it is convenient. Especially if you use libraries like dash.el, which give you a lot of such functions.

Just yesterday I was writing a bit of code to replace literal characters with HTML entities. I fetched the list of entities from the gist published by Christian Tietze (blog post, the gist), and started coding. I had to parse the list of entities and needed to break each line into components, entity, entity name, and description. The whole code looks like this:

(defconst html-entity-list
  '("char: \" entity: &quot;    text: quotation mark (APL quote)"
    "char: & entity: &amp;      text: ampersand"
    "char: ' entity: &apos;     text: apostrophe (apostrophe-quote); see below"
    ;; ....
(require 's)
(require 'dash)
(require 'short-lambda)

(defconst html-entities-lookup-table
  (-> (-juxt
       #f(second (s-match "char: \\(.\\) " %))
       #f(second (s-match "entity: \\(&.+;\\) " %))
       #f(second (s-match "text: \\(.+\\)" %)))
    (-map html-entity-list)))

(defun my-html-replace-entity (p)
  (interactive "d")
      ((ch (buffer-substring-no-properties p (1+ p)))
       (rep (-find #f(equal ch (car %))
    (delete-char 1)
    (insert (second rep))))

There are 4 lambdas in the code - were it not for the short lambda, I would probably write this code differently. Using them, though, the code ended up being short and readable, without the need for any of the heavier abstractions.

That's it, so... why?

That's really everything you need to add short lambda support Emacs Lisp. I have this implemented in my Emacs since a few years back and I use the #f() syntax regularly. It's convenient. It's easy to implement. I wonder from time to time - why isn't it still implemented in Emacs?

Please let me know if you know the reason!

EDIT: so, um, yeah, one reason may be that nobody suggested this as a feature yet. I'm stupid, it totally slipped my mind. I assumed it was proposed already, for sure, given that the short-lambda.el repo[2] is 6 years old at this point. But I didn't check. My bad!

UPDATE: rewritten, non-recursive reader in current master (2022-09-22)

When I pulled Emacs repo the other day and built it, I realized that my short-lambda stopped working. Apparently, the reader got rewritten to be nonrecursive back in May, and my patch obviously stopped being applicable.

Link to the culprit commit

The good news is that it wasn't too hard to patch the new reader - it's still just a few lines of code:

diff --git a/src/lread.c b/src/lread.c
index d64a4fad3a..3500d78775 100644
--- a/src/lread.c
+++ b/src/lread.c
@@ -3811,6 +3811,13 @@ read0 (Lisp_Object readcharfun, bool locate_syms)
goto read_obj;

+         case 'f':
+           read_stack_push ((struct read_stack_entry) {
+               .type = RE_special,
+               .u.special.symbol = Qfn,
+             });
+           goto read_obj;
case '#':
/* ## -- the empty symbol */
obj = Fintern (empty_unibyte_string, Qnil);
@@ -5682,6 +5689,7 @@ syms_of_lread (void)
DEFSYM (Qinhibit_file_name_operation, "inhibit-file-name-operation");
DEFSYM (Qascii_character, "ascii-character");
DEFSYM (Qfunction, "function");
+  DEFSYM (Qfn, "fn");
DEFSYM (Qload, "load");
DEFSYM (Qload_file_name, "load-file-name");
DEFSYM (Qload_true_file_name, "load-true-file-name");