You'd have to manually construct the classes, starting from the ones farthest
from the class you want to instantiate, passing these to intermediary classes
constructors, then passing those to further intermediate classes, and so on,
until you finally have instances that you can pass as arguments to A.
With dependency injection frameworks, you don't have to do this manually.
Again, imagine these are not 4, but 14 classes - instantiating them manually
would require many lines of code, while with DI framework, no matter how large
the set of classes, you will be always able to get the instance you need with
this ...getA() call.
So that's the first goal - you don't need to manually instantiate classes in a
correct order, you can leave that to the framework.
Now, a valid question is: why would you design classes in this way? If a class A
needs class B, it can just do val b = B() in the body and be done with it.
That's true, but that would fix/freeze the dependency in place, thus increasing
coupling (and loose coupling is often what we want) - among other things,
refactoring is harder in code written like that.
Another reason, that I honestly don't want to invoke given how bogus it sounds
in our codebases - that have literally 0 tests... - is testing. If a class A
takes class B as an argument to the constructor, you can swap class B for
anything that's B-like. That, in most cases, means a mock or a stub.
The other reason is also purely academic in our code, but well... I'll tell you
anyway, might be helpful when you encounter a well-written Kotlin codebase.
Here, A depends on someB, but it doesn't care if it gets B1 or B2. In
a situation like this, with the direct approach, without dependency inversion,
you'd have something like this:
There's a conditional, and you decide which kind of B you want at the exact
point you need a B.
However, class A is not the best place for this conditional... Since it really
doesn't care which subclass it gets (visible from val b: B), it shouldn't be
forced to decide which one to instantiate!
DI frameworks give you a place where you can write this conditional logic.
Now, you can still instantiate A in the same way as before -
DaggerSomeComponent.builder().build().getA() - but you will get an A with
either B1 or B2. Neither A nor B needs to worry about choosing the
correct subclass, you don't need to put any special code in them, it'll be
handled by the DI framework.
This is the simplest case, but again - imagine there are 14 classes and multiple
of them require such conditional logic.
With DI, all that conditional logic is in one place, instead of being spread all
over the classes.
There's more - you can also swap/replace whole subsections of the graph, though
it's kinda clunky in Dagger. So, if you have one "subsystem" (a collection of
classes that collaborate to provide some functionality) that does something in a
way X, and another that implements the same functionality but by doing Y, you
can put them into Dagger Modules, and decide which module to include based on
your condition. Note that, since Components get Modules via annotation at
compile time and they're fixed in place, you need to create a separate component
that uses your alternative module.
Hope that helped.
Now - go seek more enlightenment on the net!
In terms of pros, this helps code readability, scalability, decoupling, testing
Yup. Precisely.
There are cons, too: mostly more complicated build (for Dagger you need that
compiler plugin) and the correct order of instantiating classes is obscured,
meaning it's not visible in the code directly, you would need to go to the
build/generated/ dir and look at the Factories.
And if DI is used, should it be used throughout the project, no matter scale?
I don't believe DI has to be used from the get go, nor do I believe that you
need to DI everything throughout a project. That's my opinion, I'm going to
defend it fiercely, but know that it's not the opinion shared by the majority of
Java/Kotlin devs.
First of all, if you have a very localized dependency graph with few nodes, it's
better to hide those dependencies completely. You do it with private classes in
Kotlin and private inner/nested classes in Java.
An example of this is in TokenRepository
If the classes that a class (here: TokenRepository) depends on are all private,
you can change them as you wish, without the overhead of setting up the DI for
those classes. (edited)
That's basically the "Facade" design pattern in action.
I also don't think you need to use DI for everything, even if you do start using
it.
Basically, ask yourself: am I spending too much time trying to figure out how to
instantiate this class?
If yes - use a DI framework. If not - leave it alone.
I have a feeling that most Java/Kotlin programmers haven't ever heard of YAGNI
Or KISS for that matter.
Pick the simplest way that works, use it until it stops working well, then
upgrade to the more complex thing as needed.
I sense actually implementing DI comes with a learning curve
Yeah.
That's because it's an add-on, not a feature of the language, at least in the
context of Java and Kotlin.
There are multiple ways to do it, multiple frameworks, some disagreements in
terminology between them, and so on. (edited)
Plus, when you start working on a project that already has a baroque DI, you
just don't know where do these things you're reading right now in some class
come from, and you can't find it out easily.
So I think starting from the minimal cases and building up from that is the best
way to get into it.
(And as you can see, the minimal case is a single @Component and a bunch of
@Injects - you don't really need anything else in the beginning)
Implementation details
First, you should know what kapt is. kapt is a Kotlin Annotation Processor -
which probably doesn't tell you much, but in short, it's an interface that
Kotlin compiler plugins need to implement. These plugins are run just before the
actual compilation from Kotlin source to JVM bytecode happens, and they can
access the whole codebase the compiler is run in. Now, for one reason or
another, these plugins/kapt processors cannot modify the codebase, but they can
add new classes (in general, new Kotlin source code). Dagger has a
com.google.dagger:dagger-compiler artifact that implements the kapt interface
and can be used as an annotation processor. I'm using raw Dagger in SDK Demo,
but Hilt, which is used in the other two apps, is built on top of Dagger and
works in a very similar way, though it has its own annotation processor
(com.google.dagger:hilt-android-compiler).
Ok, so Dagger is a plugin that is run by the Kotlin compiler over the whole
codebase. There are basically 2 things that Dagger searches for in the codebase:
interfaces annotated with @Component and - this is important - everything else
that is annotated with @Inject. Dagger examines all classes, and when it finds
one that is annotated with @Inject, it does 2 things:
it adds the class to the pool of classes that may be created using Dagger; and
it examines the arguments that constructor takes, and sees if it can construct
them - that is, if all classes passed as arguments have a constructor
annotated with @Inject. Then, Dagger recurses, examining constructors of
classes that the current class depends on.
Point 2 is what you usually work with when you use Dagger. However, the first
point is also important: all classes annotated with @Inject are already
available for construction, no need to do anything else! When you define a
component, all @Inject annotated classes are available in it. Classes and values
that are required for some of the to-be-constructed classes, like Context, need
to be provided to the component Builder, through @BindsInstance. Another way to
supply the missing parts of a graph is to declare a module - it has to be a
separate abstract class - and use @Provide-annotated methods to construct the
missing parts manually. Finally, you can declare a module as an interface, with
methods annotated with @Binds - these are used to map interfaces to concrete
implementations.
Dagger traverses the dependency graph and creates factories for each class it
found. These factories are used for instantiating classes in the correct order,
ie. the factory of class X that depends on Y and Z takes care of finding and
calling Y and Z factories before it instantiates class X. This doesn't have to
be done in this way - there are other DI libraries that don't generate the
factories during compile time; instead, they examine the classes in a codebase
at runtime, creating the graph of dependencies and instantiating required
classes via reflection. Koin is an example of such library.
Now, if Dagger automatically picks up @Inject-annotated classes, why do we
need a component (not @Component.Builder)? That's because Dagger has to know
which classes it needs to expose - which classes are the starting nodes of the
graph. There are two ways of giving Dagger this information:
by declaring getter methods on the component interface, eg. fun gimmeClassX(): X
or by declaring a special method fun inject(x: ClassThatWillHaveThingsInjected)
The former should be self-explanatory - for each such getter methods declared on
the component interface, there will be an implementation of that method on the
component implementation class. That method will simply pick a correct factory
and run it, returning the result. The latter case is a little more involved -
your ClassThatWillHaveThingsInjected is examined in search of
@Inject-annotated properties (most often lateinit vars in Kotlin). Then, the
inject method is generated - it simply calls relevant factories and sets the
corresponding properties/attributes of a class passed to inject method.
TODO: clearer
That's basically it - one more advanced feature that's worth knowing is that you
can "scope" classes that are nodes in the dependency graph. Basically, normally,
each mention of class X gets fresh instance of it from a factory. So if you have
classes Y and Z, both declared with class Y/Z @Inject constructor(val x: X){}
then Y and Z will have different instances of X passed to their constructors.
You can control it in a few ways, you can cache an instance in the
@Provides-annotated method for example, but it's better to declare, or use predefined
Scope, which is an annotation you can slap on constructors' arguments and have
Dagger inject the same instance of a required class if the scope matches, and
will create a new instance if the scope is different in a particular class. You
can also inject multiple instances of the same class to a single constructor
thanks to this, if you want/need to.
In general, Dagger - while somewhat clunky - is not a bad piece of tech. The
reason for its existence is the lack of first-class modules in Java. imports,
which in Java express only compile-time dependencies, in other languages do
both: you can analyze them to get the compile-time dependencies, but you can
also execute them, during runtime. That's when the required/imported module
gets created, along with all the modules it depends on, and so on,
recursively. This is the case for Python, JavaScript, Ruby, and most other
interpreted languages. In compiled languages, imports are strictly
compile-time constructs, and they have no side-effects other than pulling a
simplified name into current file's namespace. Well, that's not true for all
compiled languages, for example OCaml includes both ways of working with
modules, and now C++ got support for modules (I hear), but in Java and related
compiled languages, imports in a package say nothing about the runtime
behavior of classes defined there.
I'm tempted to say: everything, but it's obviously not that bad. Still...
Kotlin is good, why don't you use it properly?
You're not writing Java, why would you replicate its limitations in a
language that doesn't have them.
Dependency Injection frameworks
Convenient and nice when used properly. Absolute disaster when taken too far.
Just because you don't have to think about inter-class dependencies doesn't mean
that you shouldn't think about them!
RxJava should die in a fire and fast
Trying to hammer everything into a single metaphor doesn't work too well.
Otherwise Haskell would be the most popular language out there. RxJava tries
to express all computation as operations on streams, ignoring the fact that
not all computation is suited for such representation.
Writing correct concurrent code that can run some of its parts in parallel
is not easy. Using RxJava for it makes it harder. Kotlin coroutines provide
a much better way of dealing with it - scopes handle cancellation without
getting in the way, await allows for easy offloading one-off
tasks that need to return a result, Flows handle streams of
values where it makes sense, channels and actors allow for multiple modes of
communication between threads and tasks. The M:N concurrency model -
admittedly, the same RxJava uses - saves resources by dispatching concurrent
tasks that don't have to be run in parallel on a single thread.
While RxJava provides literally hundreds of methods on a handful of classes,
parts of Kotlin coroutines tend to have very small base API, augmented with
extension methods in some contexts.
And so on... Just don't use Rx if you can avoid it, that's all.
Do you see a shift away from clickbait in the long term across the industry?
Publishing, search, social media, etc. have been devastated by it (among other
factors).
You have to kill it at the funding - if the ad dollars don’t flow to the
baiters they won’t bait.
Which for many platforms is a quite difficult thing because that’s they’re
revenue and they don’t have the manpower or time to review everything.
Cutting the supply of money will kill the clickbait content alright but it
will also kill vast swathes of "content, full stop". Somebody has to pay for
hosting no matter what, and if I'm entertaining an idea of writing a fantasy
novel in my spare time, I really don't want to be the one to pay. Or rather, I
wouldn't want at the time in my life when I entertained such an idea.
I think more dollars is good, as it increases the number of opportunities.
This comes with a lot of failed attempts at creating something of value, and
that's ok. What we truly need, in a societal sense, is to make next
generations recognize the current times as a crazed race after the least
demanding content that they are, and to guide them in the direction of
articles with enduring value, again. The generation that entered adulthood
during these crazy times is probably already lost, the previous generation is
busy raising kids, and the previous-previous generation lives in bunkers where
only the guys who "go way back" can ever enter. If not the next generation,
then we're probably done for, as the IT revolution will eat its children and
their children too, like many revolutions before.
The ProjectX fsck up from last week, from the technical side - that the script
should've never been run on prod in the first place is indisputable and there's
little need to repeat it yet again - is closely related to the automatic
resource management in Python. It works pretty well, most of the time, until one
day it doesn't and all hell breaks loose.
It was many years ago, but I fought a few battles against Python garbage
collector; I emerged - barely - victorious, though the loses on my side were
many. I since forgot most lessons learned, and even if I didn't, they would be
close to useless today, as there were at least two major GC overhauls since
then. I thought it wouldn't be bad to get back on top of the game.
Furthermore, S. said that he doesn't know the details of garbage collection
scheme in Python. I had a feeling that he's not the only one - I mean, I was
like this until yesterday, too - and so I thought it would benefit all of our
programming efforts to learn the ropes. Hence, I decided to prepare a short
(I'll pretend I don't hear you laughing...) write-up on the GC and how it
relates to the problem at hand.
CPython Memory Management Scheme
Python is a managed language, like Java, JS,
PHP, Ruby, Go, PERL,
TCL, Lisp, Smalltalk, and so on (and
on, and on - this memory management scheme seems to have "won" in the '90s).
and unlike C, C++, D,
Pascal/Delphi, Cyclone, Fortran,
Ada, Rust, Forth, Asm, and
so on.
- this means that, most of the time, there is no need to explicitly allocate or deallocate memory
- this makes Python a *memory-safe language*: certain kinds of bugs (use after free, double free, out of bounds read, buffer overflow) cannot happen in pure Python code
2. Since CPython 1.6 Python uses two (only the first one prior to that)
complementary automatic memory management techniques:
- reference counting: each object has a counter, which gets incremented each
time the object is referenced from a different context (other stack frame,
other name in the same scope, inside of another container) and decremented
once that reference stops being reachable (by being deleted, overwritten,
going out of scope, being removed from a container, etc.) Once the counter
drops to 0, the object is deallocated: *immediately, synchronously, and
predictably*. This is a *CPython implementation detail* (ie. it may or may not
be true in PyPy, Jython, IronPython, etc), but there's little chance it'll
ever change, so it's ok to rely on it (unless you need to use another Python
implementation).
- cycle detection: there's a special (doubly-linked) list in the interpreter
state which contains all "tracked" objects, that is, the ones allocated with
PyObject_GC_New instead of PyObject_New. Most mutable objects, including
class objects and dicts, are created this way. Every time the
PyObject_GC_New is called there's a *chance* (depending on the ratio of
allocations to deallocations since last collection) that the garbage collector
will run. When that happens, the list is traversed and cycles - subgraphs of
the object graph forming a closed loop, called "cyclic isolates" in the PEP -
are found; the GC then traverses each cycle and successively decrements the
refcounts ("breaks down the loop"), until they reach 0; then the objects are
deallocated. This happens - from the pure Python code - asynchronously.
The two approaches are complementary: every object in CPython is refcounted,
which means the GC-aware objects are refcounted, too. Depending on the
interpreter compile-time flags and/or runtime configuration the GC can be
completely disabled; then the PyObject_GC_New is simply equivalent to
PyObject_New.
3. Memory deallocation is a specific case of a more general process called
"object finalization" (also "object destruction" in C++ and related languages).
In Python, objects are always finalized just before deallocation - in case of
cycles, all of them are finalized before the first one is deallocated. It is
possible to provide a custom finalization procedure for objects by defining a
__del__ magic method on a class. The finalizers are not restricted in what
they can do, so it's possible for a finalizer to _resurrect_ an object by
creating a fresh reference to itself somewhere.
Prior to Python 3.4 the __del__ methods *were never called in case the object
was a part of a cycle*. In other words, the cyclic garbage collector refused to
finalize and deallocate objects with __del__ method defined. *This is no
longer the case*, thanks to
[PEP-442](https://www.python.org/dev/peps/pep-0442/). In other words, the custom
finalizer will be called even if an object is a part of a cycle. It's a very
welcome change, BTW, which makes __del__ methods much safer to use than in the
past.
4. One important use case for object finalization is releasing resources -
graphical contexts (GDK/Cairo), external event loops, loaded libraries, mutexes,
child processes, threads, sockets, files and so on. Many built-in objects
provide a finalizer, either as a __del__ method or (equivalently) C function
stored in tp_finalize field of the PyTypeObject struct (tp_del prior to
3.4).
5. *Important note*: exceptions raised in finalizers are not propagated, and are
silently swallowed. This is because there is no guarantee that, at the point in
time the finalizer is called, a context where the exception should be raised
still exists; an object cycle created inside a function call may get collected
after the call already finished, for example.
6. The garbage collection is guaranteed to happen just before the interpreter
state is deleted, which always happens before the python process exits. Thus,
since 3.4 at least, it's safe to assume the finalizers, including custom ones,
*will* be run at some point.
7. I'm not sure when exactly, but in recent Python versions the memory allocator
was improved to be able to return the freed memory to the operating system. In
Python 2.x, once the memory was requested from the OS, it was never given back -
this could, in certain scenarios involving many long-running processes, lead to
resource exhaustion even if the memory was not being actively used. This is *no
longer the case*, which is another very welcome change.
---
These are the basics - the details are a bit more complicated due to various
optimizations (how often the GC is called), the objects being split into 3
generations (the GC in CPython is a "generational" one), the GC maintaining
separate lists for objects of certain built-in types and so on, but in practice
they don't matter much, unless you're working with a particularly twisted
allocation profile. If you want to know more - just ask, I'll either answer or
at least point in the general direction of the answer.
---
### Memory leak in a script that ran on ProjectX prod
Moving on to the situation at hand: the code as used by @S. would normally
be harmless, ie. it would consume at most S3BotoStorage.max_memory_size of
memory. There are two things which would guarantee this:
1. During a single iteration of the loop, the id_ reference is overwritten; no
reference to the previous id_ is visible in the code. The previous-iteration
version of id_ would have its reference counter decremented (just before
entering the next iteration), which would trigger its finalization and finally
deallocation. As mentioned, this is synchronous and deterministic; we can rely
on it, though we should note that it's a CPython implementation detail.
2. The implementation of tempfile.SpooledTemporaryFile contains a
_io.BytesIO object as its _file member, which in turn provides a finalizer,
which releases both the object itself and all the memory it allocated for
content (ie. its internal buffer). It gets more complex in the case of content
overflow, where SpooledTemporaryFile changes its internal representation to a
named file - it has to introduce an external finalizer then - but this didn't
happen in our case (and it wouldn't ever happen with the default settings, see
below; also, if it DID happen, there would be no such massive memory leak,
actually).
Here's the _io.BytesIO instantiation:
https://github.com/python/cpython/blob/47be7d0108b4021ede111dbd15a095c725be46b7/Lib/tempfile.py#L634
and here's its destructor (finalizer):
https://github.com/python/cpython/blob/47be7d0108b4021ede111dbd15a095c725be46b7/Modules/_io/bytesio.c#L886-L900
Given that the leak happened, we can conclude that the finalizer was not called.
Or more precisely, it didn't get the chance to run due to the server running out
of RAM; given sufficiently large available memory (>= 16 GB in this case, for
60k images), the finalizers would still run at the end of the loop, the memory would
get reclaimed and everything would work just fine. Until one day we'd need to
run the loop for 61k images, that is. (I'm assuming the images to be ~250kb in
size, based on how 16700 images were able to exhaust 4gb of RAM).
Aside: the fact that the readable() method loads the whole file and caches it
(via _get_file) on the S3BotoStorageFile instance is unfortunate, but on its
own it wouldn't be dangerous. It would waste some bandwidth - indeed, performing
a HEAD request would be much better - but the memory consumption would not
exceed the amount needed for a single biggest image. Also, there's a setting,
namely AWS_S3_MAX_MEMORY_SIZE, which controls when the temporary file "rolls
over", ie. changes its representation from in-memory buffer to a named file.
Unfortunately, the default value - 0 - means "no rollover, ever", which is
kind of dangerous on its own.
Now the question is where exactly were the references to the id_ objects kept.
Looking at the code there is no such place - the CA list only keeps strings,
which are considered atomic by the GC, ie. they cannot refer to any other
objects; this means the CA list couldn't have kept the processed objects
alive. The body of the loop never sees more than one id_ instance, so it's
also improbable to be the cause.
There is, however, one object which does see the whole set of IDFrontData
instances: the QuerySet. That object, or parts of it at least, and all the
objects referenced from there, have to be kept alive until the loop finishes -
it only gets finalized and deallocated after the loop.
A quick look through the QuerySet code reveals that, indeed, there is a cache
of objects returned:
which, to add insult to injury, gets *fully* populated at the beginning of the
loop; the __iter__ method:
https://github.com/django/django/blob/227d0c7365cfd0a64d021cb9bdcf77bed2d3f170/django/db/models/query.py#L273-L289
calls a _fetch_all method before returning the iterator over the
_results_cache; _fetch_all in turn uses a ModelIterable class (by default)
over self and calls list on that iterable:
https://github.com/django/django/blob/227d0c7365cfd0a64d021cb9bdcf77bed2d3f170/django/db/models/query.py#L1284-L1288
The same thing happens when you call len on the QuerySet, by the way. The
_results_cache is only cleared when delete or update methods are called;
there's no (or maybe I missed it?) public method for resetting the cache.
To summarize, code like this:
for obj in SomeModel.objects.filter(...):
...
is semantically equivalent to something like this:
objs = list(SomeModel.objects.filter(...))
for obj in iter(objs):
...
del objs
So this is where the references were kept, which explains the memory leak: every
obj is still reachable via the list, so they cannot be finalized and garbage
collected. Normally it's not a big deal, and this is a sane performance
optimization - the real problem comes from the model instances being *mutable*:
whatever you do to the instance (or objects referred to by the instance), even
without calling save() on it, the changes are kept in the list until the whole
list is collected.
### Possible Fixes
1. Looking more closely at the QuerySet code we see that the _fetch_all method
uses something called _iterable_class(self) which returns a *generator* -
hence the cast to list when assigning _results_cache in _fetch_all. There
is another public method which exposes that generator - iterator():
https://github.com/django/django/blob/227d0c7365cfd0a64d021cb9bdcf77bed2d3f170/django/db/models/query.py#L357-L368
That generator returns results one by one, without saving them anywhere - so, in
this case, code like this:
for obj in SomeModel.objects.filter(...).iterator():
...
would yield the expected results: each object would be only referred from the
body of the loop and it would get finalized and deallocated immediately before
entering the next iteration.
2. It's of course possible to simply close the file, which would also clear the
buffer and release the memory:
for id_ in IDFrontData.objects.filter(user_selected_state="CA"):
if id_.source_image.file.readable():
id_.source_image.file.close()
CA.append(id_.source_image.url)
That would however require us to know in advance that the readable method does
what it does (ie. reads the whole file and caches it), which may or may not be
explained in the docs (I didn't check, would guess it isn't).
3. We could create a (deep) copy of each object before processing it in the body of
the loop. It would waste some memory and time, but it would guarantee that
nothing we do to the instance is saved in the original list:
for id_ in IDFrontData.objects.filter(user_selected_state="CA"):
id_copy = copy.deepcopy(id_)
if id_copy.source_image.file.readable():
CA.append(id_.source_image.url)
Here the id_copy would get finalized and deallocated just before the next copy
is created.
4. For fun: we could make use of the mutability of the list and "clean up" after
each step:
qs = SomeModel.objects.filter(...)
for i, obj in enumerate(qs):
...
qs._results_cache[i] = None
This assumes the iter() call doesn't copy the original list, in other words
that the qs._results_cache is exactly the same list we iterate over.
---
There are many more ways to handle the situation - once you know what happened
it's easy to find a good workaround. The "once you know" part, though, is the
problem, which leads to the final point:
### Further Recommendations
I've been thinking about the reliability of our products for a year now, but I
just realized that I've never shared my thoughts on the matter, other than in
private chats. I'll use this opportunity to summarize the most obvious ones.
1. The biggest problem (aside from running a script, any script, on a production
server) in this situation was the fact that the exact failure reason was only
discovered after many hours. It should have been discovered immediately: the RAM
usage is being monitored in the AWS and the exact problem is clearly visible on
the graphs there. The reason for the delay is that it required someone to look
at that graph: this is not reliable (humans are unreliable in general). Instead,
once the RAM usage exceeds a sane value, that fact should be automatically
reported everywhere, immediately. By "everywhere" I mean: sent on all Slack
channels related to ProjectX and devops, sent on Slack in private message to
person/people responsible for ProjectX, sent by mail to every dev in TS and to
ProjectX product owner, sent by SMS to devops and devs responsible for ProjectX,
displayed on a status page (we should have one, by the way) for the product,
sent by FedEx and carrier pidgeon to just about everyone...
...I mean, come on guys, the *production died*. Twice! This is not supposed to
happen, and when it does, it's a RED ALERT (or it should be) across *the whole
organization*. The details of a failure cannot be locked inside private
conversations and AWS console that no one ever opens. The more people get
involved, and the more information they get from the get go, the faster the
situation will get resolved.
2. There should never be a reason for anyone to restart the server manually.
Once the server gets unresponsive it should immediately be isolated, a fresh one
should be created and the traffic should be rerouted there. The faulty server
itself *should not be restarted*: this way important clues needed for resolving
the situation could get irretrievably lost, which would mean that we wouldn't be
able to fix the issue. The best we could do in that case would be to "wait for
it to happen again". Which is wrong on many levels - I hope I don't need to
explain why.
Doing this on Kubernetes is far easier, but it's perfectly doable with EC2
instances, too. Also, obviously, we should implement some limits in this logic
to make sure it doesn't fall into *infinite* loop. Another thing: the logic for
handling failures like this *is also code*, and *untested code is broken by
default* (if it isn't that's just a lucky coincidence). This means that we have
to do a full dry-run after implementing it - we don't want to test the solution
for the first time during the *real* outage, because then we'd almost surely
need to debug and fix both the server, and the recovery logic at the same time.
Hardly appealing perspective.
3. The operating system should never allow a single process to eat all the RAM
and get away with it without getting killed. In recent Linux kernels there is a
support for per-group and per-process memory limits. It would not protect
against a fork bomb (try running :(){ :|:& };: in BASH), but in this case the
memory was all inside a single process. That process should have been brutally
murdered long before bringing the whole server down. This is a bit contradictory
with the previous point - some important clues for resolving the issue might
have been lost with the process death - but it's a trade-off: ease of debugging
vs. guaranteeing uptime. We're talking about production here, so (esp. in the
absence of higher-level disaster recovery) it wouldn't hurt to err on the uptime
side.
4. There should never be a need to run any script on a production server. On the
other hand, it *cannot be impossible to run one-off tasks* in the context of
production deployment. The best way to achieve both goals would be to provide,
created on-demand, copy of the server or the whole environment. This would
require some tweaks in the code and would be much easier on Kubernetes than on
EC2, but it's possible on the latter too. Spinning up such instance should be
easy, ideally needing a single click or a single command - we need to
disincentivize touching production as much as possible, to the point where
*everything* you'd want to do on production is *easier done* outside of it -
otherwise situations like the current one *will* happen, it's just a matter of
time.
5. We should create a "disaster checklist" - document outlining the exact steps
needed to be taken by anyone related to the outage (and in case of production,
again, we're *all* related). There should never be a need to think on what to do
next; who to call, what data to gather, when to restart, when not to restart,
what to tell the users, when to start panicking and so on. People are unreliable
even in the best of circumstances; in a stressful situation, possibly late at
night, most likely while being dragged away from other tasks, that unreliability
can reach astonishing levels. A checklist, while on its own it doesn't guarantee
the swift resolution of a problem, helps people to act sane in all kinds of
circumstances. The document should be public and should include sections for
developers, devops, and managers alike.
6. People are unreliable in general, but especially so in situations they
encounter for the first time. It's virtually impossible, even with the
instructions in hand, to follow a process (any process) correctly on the first
try. This is a truism: talk with any soldier (we've got one former soldier
IIRC), or a diver, or a climber, and you'll hear exactly this. This means that,
if we want to be good at handling outages, the only thing we can do is to handle
outages many times. But we don't want that - ideally we'd like to see no outages
at all. There's an obvious way around this: we need to *exercise* using servers
which *can* die; if we don't have such servers, then we just need to create
them. We can easily create a copy of production environment, bring it down, and
perform a dry run of our recovery procedures. Ideally we would do this
periodically and update the exercise content after every real problem we
encounter.
---
Again, like with the GC above, these are the basics - a short summary, even -
and there are many finer points to discuss; again, Ask Me Anything, I'll do my
best to explain everything.
---
I think one important - THE most important - question regarding our reliability
is this: *do we want to take the reliability seriously*? Because as of right now
- we don't; if we did, we'd look around and actively search for and implement
solutions used successfully at Boeing, in the armed forces, on the satellites
and SpaceX rockets, basically everywhere where the "mission critical" means
"people die if we fsck up". The solutions are out there, described in detail,
ready to implement if we really wanted to, really cared about reliability.
For a few years now I've been thinking of using "spaced repetition" for
learning new and, crucially, retaining exisiting knowledge over the long
term. My memory doesn't get any better with age, and it's probably going to
get progressively worse as time goes by. I thought about countermeasures,
and the "flash cards" idea seemed like a good bet. It works like this:
First, you prepare a bunch of "cards". You take some post-it notes and write
a question on one side and an answer on the other. After you assemble a
"deck" of such cards, you need to review them periodically. You take a card
from the deck and read the question. Then you try to answer it. Then you
flip the card and compare your answer with the one written down. Finally,
you do some complicated calculations to asses how well you remember this
topic and when to look at the card again in the future.
Having said that "I've been thinking" about it for years, it should be
obvious that something in the process didn't really work for me. I attempted
this many times, but I just couldn't get past that hurdle. That step, my
fated enemy, was obviously this:
First, you prepare a bunch of "cards".
However, even if I somehow managed to best it, the next one - doing
something consistently over some time... It's simply impossible, a no is a
no. Not without software of some kind.
Not very surprising that I decided to build such a software. But, before
software, I needed data, to make the cards. Around 6 months ago I asked on
Github for programming-idioms.com's dataset, and the author generated a JSON
dump for me (thanks a lot!) I could make cards en masse now, with the
description and source code on them. I didn't have a way to display them,
however. I considered coding the display part in one of two ways:
Emacs route: new frame, run-at-time
Awesome route: new "wibox" with Pango markup-based widget
Emacs wouldn't be a bad choice here, as it already supports syntax
highlighting, a major requirement for displaying cards. On the other hand,
at any given moment I can have 0 to 6 Emacsen open, and the scheduling part
of the solution would need to account for that. Both edge cases (0 vs
multiple) are problematic, as one would make it impossible to run any Elisp
code without additional setup, and the other would require some kind of
inter-process synchronization. So, a pain either way.
My window manager, on the other hand, has no such problems: there's exactly
one instance at all[1] times,
and it's always there, unless I'm doing something outside of X, which I
don't.
Enough of a backstory, let's start with technical details. I use
Awesome as my window manager[2].
Awesome is written in a mixture of C and Lua. It's homepage states that:
awesome has been designed as a framework window manager. It's extremely
fast, small, dynamic and heavily extensible using the Lua programming
language.
awesome provides a Lua API structured as a set of classes (emulated with Lua
prototypal inheritance). The most prominent among these are widgets and
wiboxes - boxes for widgets. Wiboxes are X windows, I think, and widgets are
basically chunks of drawing logic. Widgets draw themselves on Cairo surfaces
and display in wiboxes. An event system based on signals chains these
elements together.
The idea is nice: Cairo seems to be the "industry standard" in 2d graphics
and has Pango for text rendering, and Lua is a dynamic scripting language
with a few interesting capabilities.
However, when I tried actually writing the thing, I ran into two major
problems: a small number of built-in widgets, and the dynamism of Lua.
The former is self-explanatory: it's always better to reuse than to write
from scratch, if possible. If you're used to the wealth of widgets that GTK
& co provide, getting by on a fraction of that is going to be hard.
The problems with Lua are more complex. awesome has, in general, good
documentation, but the API area is large, and lack of contributors means
that many things are undocumented. At that point, you try to read the
source, but to do that, you need to find where something is defined. You
search the codebase but get completely nothing as a result. Turns out, the
name you're looking for is composed from parts at runtime. Ok, so let's
attack it from the other side: let's use introspection at runtime to dump
the shapes of various objects. Then you realize that debug info for all
methods points to a single line of code... where a decorator used for
defining methods is itself defined. Extracting a class hierarchy from the
running system is even less possible: some methods are injected directly
into objects instead of being inherited from a class, which is a problem,
but that there's literally no difference at all between classes and objects
- it's all tables anyway - and the mismatch between classes and metatables
make it hopeless.
It's not bad to have a dynamic system. However, looking at it only through
it's dead body (Gilad Braha reference, dead programs and pathologists) is
tragic. Smalltalks are as dynamic as Lua, but they provide all the tools you
need to find, read, execute, and debug the code while the system is running.
Neither Lua - a minimalist language, leaning to embedding as a main use case
language - nor awesome provide any such tools. What you get is the source
code on disk and the REPL (fortunately, at least the REPL is there).
I came up with a way to improve the situation: if I could get the API of
awesome covered as Haxe externs, I would have a capable macroassembler for
Lua, with a gradual type system and good tooling. Context-sensitive
completion, go to definition and find references tools that Haxe compiler
provides are good enough for my purposes. Creating externs itself was a
problem, but I only needed to solve it once, so I tried doing that.
First thing I tried was to write a Raku script that would parse the LuaDoc
annotation from Lua sources. I quickly realized that it would be a giant
pain: LDoc allows definition of custom annotations, which, combined with
mistakes in the docs, made it exceedingly hard to interpret the data
correctly. To avoid that, I reused LDoc's own parser: passing a Lua function
as a "filter" when running LDoc allowed me to dump the project documentation
Haxe is also minimalistic due to it having so many targets.
My third attempt at getting a project off the ground
Last updated on:
I got a nice idea that I want to implement
There are two parts as to why:
the resulting project would be quite unique and interesting
the end result might actually also be of practical use
The problem is that I probably won't be able to finish the project alone and I'm
looking for anyone insterested in and capable of helping.
Let me first tell you what it's about.
Three well-established technologies in one package
An example of what I want to do that is relatively well-known would be
OpenRESTY. It's a distribution of Nginx web server bundled with Lua
implementation and Lua bindings to various APIs, allowing you to generate web
pages dynamically without ever leaving Nginx. This is in contrast to other
languages, which live behind Nginx, not inside of it.
Due to coroutines, lightning-fast JITted implementation, and mature Nginx
codebase, OpenRESTY is impressively performant, especially against its
interpreted/non-JITted competition.
I want to make a package that similarly integrates a mature C codebase and
extensive Lua bindings. I want to add one more component to the mix on top of
that, but the general idea of creating a distribution of other projects is the
same.
Awesome window manager and Lua
Awesome is a small and fast window manager for Linux. It supports both tiling
and floatiing modes, is written in C and extended with a Lua class library. The
core is relatively small and it can use LuaJIT (instead of plain Lua) without
problems. Awesome uses GLib and Cairo for some of its functionality. It also
uses XCB for window management, so it's currently coupled to X.
The class library is actually a full-fledged windowing toolkit. Widgets are
defined in Lua, taking advantage of Lua tables' flexibility, and drawn on Cairo
surfaces. Awesome is fast, consumes very little memory, and has potential for as
much eye-candy as you'd care to add.
Multi-target transpiler - Haxe
Haxe is a gradually typed language that compiles to 7 different platforms,
including Lua. It's a simple language on the surface, with some similarity to
ECMAScript, but includes AST-based macros and an interface for compiler plugins,
which allow for endless customization. Moreover, the compiler is written in
OCaml, which makes it both very fast and easy to modify.
What are the benefits of bundling them together?
Glad you asked! In my opinion, Awesome could benefit a lot from Haxe. Not only
is Haxe capable of performing various optimizations that you wouldn't care to do
by hand (loop unrolling, dead code elimination, tail call elimination, etc.),
but it also provides IDE-class intelligent auto-complete, type information, and
documentation.
Lua is a great language, but using it well requires just about as much
self-discipline as using JavaScript well. That is to say: a lot! Haxe adds a
layer of safety and compile-time guarantees on top of LuaJIT dynamic runtime.
Enough chit-chat! What is it you want to build?
In short: a desktop environment inspired by Smalltalk. An environment where you
can inspect and modify the code of every widget on the fly. Smalltalk is
unapologetically dynamic, but so is Lua. On the other hand, Smalltalk-like IDE
functionality would be hard to replicate for Lua. With Haxe in the mix, though,
we can get a long way towards that functionality without lifting a finger.
There are problems of course. Haxe is not designed to make its output
reloadable, and enabling this would
I'm cleaning up the blog, found this gem by accident. It was hard to
capture, as you can see, due to the bus shaking wildly, but that's a Linux
kernel panic there.
Well, it was just a small system for displaying ads, no big deal if it
crashes. If it was an autopilot on a highway... Well, it could be worse 🙂
Emacs is a powerful editor. Its power comes primarily from thousands of
man-years spent on writing libraries and apps (or modes) for it. The core Emacs
contains half a million lines of Emacs Lisp, and if you go wild with installing
packages you can easily end up with a million or more lines of Elisp. There's a
lot of functionality there, for sure. You can get pretty far just using other
people's code, with minimal customization in hooks and customizeable variables.
However, to actually benefit from all that power to the fullest, you need to
delve deeper and understand what Emacs really is. You need to shift your
perspective and look beyond the text-based interface Emacs provides.
Emacs is an implementation of Emacs Lisp - and that is its true nature. When
thinking about Emacs don't compare it to Sublime Text or VS Code, but rather to
Python, Ruby, or JavaScript. Just like them, Emacs provides a language, a
standard library, and an ecosystem of packages. You can use Emacs as a tool for
text editing, but the full potential will be unleashed only when you begin to
treat Emacs as a tool for making tools.
Historically, Emacs Lisp wasn't the greatest language to write programs in. The
lack of lexical scoping, messy standard library, no package manager, slow
execution were plaguing Emacs for decades. There's a lot of older Elisp code
floating around and all the hacks and workarounds in there are a testament to
the language's problems.
However, things have changed drastically over the last decade. Almost all
problems with Elisp were addressed. Lexical scoping is there, a package manager
is in the base distribution, wrapper libraries like dash, s, f expose and extend
the functionality in a more organized manner, the AOT compilation of byte-code
to native has shown remarkable improvements in performance. You can now write
extension modules in any language capable of producing native shared libraries.
There's a preliminary support for native threads. Emacs always had the
potential, but recently it became a full-fledged IDE for Emacs Lisp, one that
could compete (and win) against JetBrains product, if they decided to write one
for Elisp.
The only remnant of the dark past, and the biggest outstanding pain point, is
lack of namespaces and first-class modules. The improvements on this front would
be very welcome, but you can work around this limitation the same way you do
when writing C: just namespace your definitions yourself with a prefix of your
choosing. I use my- prefix for all my code. It's crude, admittedly, but it
works well enough.
In any case, to get the most out of Emacs, you will need to write some Elisp
yourself. The better you learn Elisp, the easier it will be for you to bring out
the full potential of Emacs.
This series of posts is not a replacement for reading an Elisp tutorial or a
book about it. You still need to learn a lot on your own. The tools discussed
here will help you along the way - they will let you learn Elisp faster and will
help you write your own Elisp faster later on. The time investment of setting
all the tools up will be well worth it in the long run.
Basic tools
Helm
Magit
Org
Treemacs / Neotree
Imenu
display-line-numbers-mode / linum-mode
fill column indicator
highlight current line
auto-complete / company
yasnippet
re-builder (make bindings for search&replace from within)
hydra / pretty-hydra
Reading the docs (you don't even need to google!)
helpful
info viewer
info+
help+
apropos different commands
describe
apps and libraries,
libs for elisp consumption
apps with commands for users and settings and stuff
NOTE:
unusually, there's no code on Github for this post; contact me if you'd
like to get the fixed plugin.
NOTE: screenshots' dimensions may look awkward - it's to make them fit on the page
Visual Works - a modern Smalltalk IDE
Cincom® Visual Works™
is an implementation of Smalltalk language
and, true to it's ancestry, also an Integrated Development Environment for
Rapid Application Development[1]
There are many Smalltalk implementations, some of them fully open-source,
others created and maintained by a vendor as a commercial product. The state
of the open-source Smalltalks is not the greatest, to put it mildldy[2]. It would take a long-ish rant to explain the reasons for this,
but I don't believe it would be constructive. Instead, let's focus solely on
the hero of today's post: Visual Works™, a commercial implementation that also
offers a free, non-commercial Personal Use License. What you get under PUL is
a tad older version - 8.3, from 2017 - but otherwise (almost) fully functional
Smalltalk system, with (almost) all its source code available, along with lots
of tools, libraries, and documentation.
The free version of VW doesn't contain anything that deals with cryptography.
The most prominent victim of this lack is HTTPS support - there's none[3]. Other than that, you get everything VW 8.3 has to offer,
including an extensive collection of ready-to-use libraries and plugins. Let's
take a quick look at the VW IDE.
Browser - Visual Works' IDE
Almost all Smalltalks are IDEs themselves. For languages like Java, while
you can stick to Notepad++, at some point you will probably need JetBrains
products or Eclipse or something - in addition to Java compiler and its
virtual machine. This is different for Smalltalks: each implementation is
its own IDE.
In case you're wondering: yes, there is a reason why I say they "are" IDEs,
not that they "have" IDEs. It's easier to show than explain, but in brief:
Smalltalk is a language plus a library of classes. In Smalltalk, an IDE is
just a set of classes within the stdlib, in the same manner a String class
is. And just as you can change the String class in the running system, you
can also change the IDE code the same way: while it runs, live, within the
Smalltalk system. The classes IDE consists of use the graphic primitives
provided by Smalltalk, use the tools for refactoring and code searching
provided by Smalltalk itself, and in general only communicate with the OS to
draw themselves.
The part of Smalltalk that works as an IDE in Visual Works is called a
Browser. It looks like this (I shrunk the window a lot so that the
screenshot looks good on the blog, so keep in mind that everything you see
is obviously resizeable)
A very fast explanation of what's what follows:
A - a package name in a global list of all installed packages
B - Searchligh, a multipurpose search tool & indexer
C - class name in a list of classes local to the selected package
D - show normal (Instance) or static (Class) methods
E - name of the method in a list of all methods of the selected class
F - source code for the selected method that you can edit
G - built-in test-runner, here showing that all of 1 tests passed
Of course, this is just the default view you get when you open it. Beyond
the many built-in capabilities, there are also community-provided packages
and plugins that you can install.
"It's a bit ugly, but it works"
While Visual Works is an IDE, it's also a Smalltalk environment. Think
Electron with dev tools enabled. You can inspect and modify any part of the
environment and see immediate feedback. Loading plugins is just one of the
multitude of things that you can do to the environment: you can create,
remove, and change class and method definitions, you can edit graphical assets
and fonts, you can create new and edit any of the existing windows. Everything
you do will have immediate effect.
Leaving that side, I installed one plugin that promised to display information
about classes in a more structured format, in a single window. It's
convenient, because normally you need to switch between 2-3 tabs to get the
full picture of a given class. Knowing this, I installed it without
hesitation.
Unfortunately, it didn't look very good.
There's a lot I didn't like about this UI, starting with how labels of
radio-buttons don't have enough space between then and are displayed one over
another. The tool is undeniably useful, though, so I decided to fix it.
Editing the GUI with UIPainter
The reason for the problem with labels was that the author of the plugin set
positions of various widgets in pixels, making them absolute, and unable to
change their sizes to accomodate a bigger font that I chose as a default.
So, how do we fix this and a few other problems with the GUI? First, we need
to locate the class responsible for it. Turns out that searching for
ClassDefinitionTool was enough to jump right to its source. Next, you need
to locate a method where the "spec" is stored. Spec is a declarative
description of the GUI, including all widgets, their positions, relations,
event handlers, etc. It looks like this when looked at "raw":
But you rarely do it, and you never edit the spec by hand. Instead, you use
a plugin called UIPainter, which allows you to manipulate the spec
graphically:
In this particular case, I was going to add some space between the labels
for "indexing" and "visibility" rows, set min width and height in a few
places, add some margins, and change some fonts. You can do all that from
the UIPainter interface - choose a widget in the tree displayed on the left,
go to "Position" tab, and edit the field you find there:
Click "Apply" and you're done. After a few more trivial fixes, the UI for
the plugin started looking quite decent:
The orange dividers are my preferrence, sorry if I offended your
sensibilities.. Anyway, it took less than five minutes to fix the things
that irritated me the most. After finishing, you need to "Install" the
edited UI back into the windowSpec method - there's a menu item in
UIPainter, in "Edit" menu, that serializes the edited spec, updates the
method, searches for all instances of the plugin, and makes them redisplay
themselves with new spec. Here's the effect, visible immediately after
editing the UI:
What's worth noting is that every single element visible on the screen is
yours to manipulate in a similar way. You might have noticed that the
UIPainter, itself, had some problems with labels sizes. No problem! You can
open the UIPainter on itself, and edit those as easily as
with the ClassDefinitionTool.
You can see the real window and one of its widgets being edited in the
screenshot below. From here it's literally one click to fix the problem.
Now, can you do the same with any other IDE - and as easily as in Visual
Works? The UIPainter is not a new idea at all, and it's included in multiple
IDEs to generate some kind of interface description. I'm quite sure none of
these allows you to open the tool on *itself* and make changes *while the
code is running* and without any additional setup? All of the above was done
with just the default install and with just two plugins loaded.
But... what about Emacs?
Emacs - or rather, Emacs Lisp development - is in many ways very similar to
the Smalltalk environment. Due to its roots and the need to work in a
terminal, Emacs is keyboard-centric and the mouse use is not as prevalent.
On the other hand, the ability to change pretty much anything while Emacs is
running makes for an experience similar to that of Smalltalk.
Among other things Emacs provides, it also has powerful text editing
capabilities. There's a lot to like: expand-region, multiple-cursors, iedit,
editable occur, transpose-* commands, jumping to closing/opening paren, bulk
indenting of multiple lines, recording macros, and so on. Almost none of
these are implemented in Visual Works (nor any other Smalltalk, to my
knowledge) editor and, yes, their lack hurts a lot.
Fortunately, both Visual Works and Emacs are fully programmable
environments. It's not a big deal to make them talk to each other and work
together. I made a little - around 80 loc - Emacs mode that listens for TCP
connections on a local port and when a connection is established, it reads
the content of the message and displays it in a new window (frame in Emacs
parlance). You can edit the source there, and when you press C-c C-c
it sends the edited contents back through the socket.
On the other side, I added a command accessible in SourceCodeTool,
ie. in the bottom half of the browser window. Once you activate the command
(currently via right mouse button/context menu) it opens a connection to
Emacs, sends the currently selected method source through the socket, then
waits for a response. It replaces the code with whatever comes back through
the socket.
It took less than 150 loc in total to set this up. Here you can see it in
action:
Current implementation is on the level of Proof of Concept and making it
more robust overall and asynchronous on the Smalltalk side (currently the
window controller process is blocked while it waits for response) would
require a bit more effort. On the other hand, making the PoC took me two or
three hours at most - and even in that state it is very useful. Reformatting
longer methods to my liking is now a lot easier, with all the power of Emacs
available. Adding features like auto-complete on the Emacs side with symbols
provided by the Smalltalk side may not be trivial, but it's definitely
doable in a weekend or two.
In any case: with environments as easily extensible as Emacs and Smalltalk
both are you don't really have to choose one over the other. You can connect
them and use them together, with just a little bit of code.
Though of course you can hack it trivially, for example by loading libcurl
dynamically and calling it, or you can set reverse proxy with Nginx, or use
one of the multitude of other possible workarounds, but you're on your own
then. Not that I ever wasn't...
↵
I spend a lot of time in the console. It's still a very productive
environment, even though the need to preserve the compatibility with a
character-oriented TeleTYpe[1]
of the '60s prevents any radical improvements. It even works well out of
the box until, at one point, it doesn't. This is a story of one such case.
I use URxvt[2] (terminal emulator), TMUX[3] (Terminal MUltipleXer), and ZSH[4] (shell) as my console. I switched to URxvt for the unicode
support which wasn't common at the time. A relatively small number of issues
with display and input with commonly used applications was a nice bonus.
The only problem I had with it is that pressing Alt+Left/Right inserts C
or D characters instead of jumping over a word. Apparently, URxvt sends a
sequence of key codes which is not, by default, mapped to anything in most
programs. There's a very simple fix for it, at least for programs that use
the Readline[5] library. Readline is configurable, and you can define your own
bindings in the ~/.inputrc file (see the docs);
this is what I use, for example:
With just two lines, line editing becomes mostly a solved (for me) problem.
Mostly. Though few and far between, some programs don't use Readline or
emulate it half-heartedly (thereby ignoring the config file). One of such
programs is the Erlang (and, by extention, Elixir) shell.
BTW: Yes, I do know about M-b and M-f. I use Emacs. I just really like
my arrow keys, ok?
The journey begins
One Saturday evening, after a decade of putting up with it, I decided to try
fixing the problem. I'm honestly not sure what I was thinking, I must've
been bored out of my mind. I mean, honestly, who'd want to dig through
entirely undocumented, complex internals of a multi-process beast with job
control and remote capabilities...? (No, please don't answer, thanks.)
When I resolved myself to go down the rabbit hole, I searched on the net for
docs or posts explaining how to customize anything more than the prompt in
the shell. It took quite a few tries to get anything at all, but I
finally found two (just two) relevant posts. One is
a post from 2013 (by
the author of "Learn You Some Erlang"[6], about the shell's architecture) and the
other a StackOverflow post answering
the question of how to add Readline features to Elixir's IO.gets and friends.
The TL;DR from these two is basically "go look at
edlin.erl". On a diagram (borrowed from the first post) of processes (and
modules) the shell consists of it's the part marked in red:
edlin.erl is part of a stdlib application and can be found in
./lib/stdlib/src/edlin.erl, starting in Erlang build directory root. You
can see the whole file here.
I made a fresh clone of the Erlang/OTP sources to avoid breaking my
system-wide installation in case something went wrong. Compilation took some
time, but it finished without problems.
Inside edlin.erl there's a state machine[7] used for parsing the incoming key codes into atoms denoting
actions. It's a common Erlang pattern, where Tail Call Elimination is
exploited to encode state transitions as normal function calls. It looks
roughly like this:
edit([C|Cs],P,{Bef,Aft},Prefix,Rs0)->casekey_map(C,Prefix)ofmeta->edit(Cs,P,{Bef,Aft},meta,Rs0);meta_o->edit(Cs,P,{Bef,Aft},meta_o,Rs0);meta_csi->edit(Cs,P,{Bef,Aft},meta_csi,Rs0);meta_meta->edit(Cs,P,{Bef,Aft},meta_meta,Rs0);{csi,_}=Csi->edit(Cs,P,{Bef,Aft},Csi,Rs0);meta_left_sq_bracket->% ... more cases ...{undefined,C}->{undefined,{none,Prefix,C},Cs,{line,P,{Bef,Aft},none},reverse(Rs0)};Op->casedo_op(Op,Bef,Aft,Rs0)of{blink,N,Line,Rs}->edit(Cs,P,Line,{blink,N},Rs);{Line,Rs,Mode}->% allow custom modes from do_opedit(Cs,P,Line,Mode,Rs);{Line,Rs}->edit(Cs,P,Line,none,Rs)endend;
With the key_map function defined like this (out of order excerpts):
key_map($\^A,none)->beginning_of_line;key_map($\^B,none)->backward_char;key_map($\^D,none)->forward_delete_char;% ... more clauses ...key_map($B,meta)->backward_word;key_map($D,meta)->kill_word;key_map($F,meta)->forward_word;key_map($T,meta)->transpose_word;% ... even more clauses ...
To be perfectly honest, the fact that it's a state machine wasn't obvious to
me at first. My Erlang is a bit rusty, and one-letter identifiers don't make
it the most reable code ever. It's also not trivial to see what sequence of
codes will be actually sent to the edit function. I had to dig deeper.
Debugging the shell
First of all, this may be obvious, but in this case tried-and-true debugging
with prints doesn't work. From withing edlin you get no direct access to
the terminal, which makes sense, given it's itself a part of the terminal
handling code. This tripped me up in the beginning a bit.
Fortunately, Erlang has an excellent graphical debugger, which you can
attach to any process. To actually make use of it, you need to first reload
a module you want to debug with it's instrumented, interpreted version. This
is done with the int (or :int in Elixir) module[8]. Unfortunately, when I tried, it didn't work:
-▶ ./bin/erl
Erlang/OTP 24 [RELEASE CANDIDATE 1] [erts-11.2] [source-444144870c] [64-bit]
Eshell V11.2 (abort with ^G)
1> int:ni(edlin).
=ERROR REPORT==== 29-Mar-2021::20:37:21.516194 ===
Can't load module 'edlin' that resides in sticky dir
** exception error: no match of right hand side value {error,sticky_directory}
in function int:'-load/2-fun-0-'/3 (int.erl, line 531)
in call from int:load/2 (int.erl, line 527)
Apparently, Erlang code server has
a list of "sticky dirs" - modules living in them are not to be reloaded.
Makes sense, most of the time. There has to be a way of disabling it though,
right? Yes, there is - you can disable it globally with -nostick flag, or
per directory or module, like this:
Unfortunately, that's still not enough. Apparently, to be interpretable, a
module has to be compiled in a special way to include some debug data. If it
isn't, you will get the following error:
But then you have to remember to put the compiled file in the correct ebin
directory yourself. Alternatively, you can pass a +debug_info to the
erlc invocation (as you can see in its help message, +term "passes
the term unchanged to the compiler"):
In the newly opened window, click on Module menu and select edlin ->
View. Then scroll down to the line you want to break on and double-click
it (anywhere on the line). It looks like this on my computer (click to
enlarge):
Now, when you switch back to the terminal and press a key you're interested
in... nothing will happen! Instead, the process will reach the breakpoint
and will stop. This is indicated by the break value showing up in the
status column in the main window:
To actually start debugging, you need to use Process -> Attach menu item
with the edlin process selected. It will open a window with the code, list
of local variables with their values, and buttons for stepping over and into
calls. Just remember that for the debugger to work, the module you want to
debug has to be instrumented. If you try to step outside of the edlin
module you won't see anything.
This is how the debugger looks like in action (click to enlarge):
Getting back to the code
After stepping through the execution of edit/5 function a couple times I was
able to guess a few things. Here's the function head again:
The first argument is a list of keycodes (as integers, which also happens
to be how Erlang encodes strings, which helps with the preview of values).
C is the current code, while Cs contains the rest of the codes to be
parsed. This argument is the first part of a state machine and represents
state transitions.
The second argument is a prompt, as a string. It's not used much and can
be ignored in this case.
The third argument is a pair of strings. They are the result of splitting
the current line at cursor position: Bef keeps the part on the left of
the cursor, and Aft the part from the cursor to the end of line. These
change when inserting or deleting characters, but in this case they stay
constant, so the argument can be ignored.
The third argument, Prefix, is an atom (or a tuple of an atom and a
string, as we'll see in a moment) which says in what state the parser is
currently. This may be none - a starting state; meta - after a
modifier key was pressed; meta_meta - if we found two escape characters
in a row - and quite a few other values. This is the second part of the
state machine.
The last argument is, I think, a list of low-level commands (called
"requests") for the TTY driver to add or remove characters, move the cursor,
blink, and so on. Since I don't need to add any new functionality here, it
is also safe to ignore for now.
The key_map function takes the next key code and the current state. It
then returns the next state. The edit function interprets the new state
and either loops to parse the rest of the codes list, or returns a list of
commands for the driver to execute.
Recall my .inputrc: the terminal should send the following key codes
sequence when Alt+Left is pressed (BTW: you can use Control+V[9] in the shell to quickly check what keycodes are sent):
Looking at the values of C and Cs variables in the debugger proves that,
indeed, this is what edlin receives. For the record: \e numeric value is
27, which you can see in the screenshot. Here is the sequence of key_map
function calls (in other words, states of the parser) when given this
sequence of codes:
The first character - \e - puts the parser in the meta state, the next -
[ - in (aptly named) meta_left_sq_bracket. I can't guess what the
csi atom is supposed to be an abbreviation ofcsi stands for
"Control Sequence Indicator", which is a special code which causes the
terminal to interpret the following codes instead of passing them to a
program; it collects the key codes starting after \e[. Then finally, if
all the codes in between match, we get to forward_word and backward_word
states, which are passed to do_op function in in the last case of edit.
Once there are no more codes to parse, edit returns a list of rendering
commands, tagged with either blink (self explanatory), more_chars (Enter
was not pressed), or done (along with the full text of the line).
The problem and the fix
As you can see from the above code, edlin recognizes \e[1;5C and \e[1;5D
as valid sequences of key codes, while my terminal sends codes with 3
instead of 5.
To fix this, the only thing needed is to add three new states to the
key_map function, like this:
First, we make encountering $3 while in {csi, "1;"} state valid, and
make it transition to a csi tuple (with 3 appended) as the next state.
After that, we only need to handle the $C and $D characters when in the
{csi, "1;3"} state by returning the forward_word and backward_word
atoms. That's it!
Turns out that yes, the three lines above is all that it took to scratch my
decade-old itch. As usual on such occasions, the solution is much less
interesting than the path to it... Well, being able to use arrow keys is
still nice, at least.
Afterword
Despite some early setbacks, the fix turned out quite simple. When I
realized that there's a finite state machine in the code, the rest was
relatively straightforward. Erlang encoding of a state machine using just
function calls and pattern-matching on states (codes so far) and transitions
(current key code) is elegant and lightweight. It's quite similar to how one
would write a state machine in Prolog, especially given the Prolog-based
syntax of Erlang.
I used the word "parse" a few times in the post. This isn't an accident:
parsers can be often represented by a state machine. The edit function is
a stateful parser as well as a state machine. It parses a grammar consisting
of (among others) \e, [, 1, ;, 3 terminal tokens, with state atoms
being the production rules.
It's correct to say that the code illustrated here either "interprets" (if you
focus on the state machine being run), or "parses" (if you focus on the
allowed sequences of states) the input codes list.
That's it, I hope you enjoyed the post as much as I enjoy my arrow keys
working properly in the Erlang shell :-)
The name stands for Fuzzy Find in Project Fast, thanks for
asking. Believe me, I tried to come up with a better name for the project...
and failed miserably. Sorry. I'd be forever grateful for better name
suggestions.
The project is in a proof-of-concept stage and my main reason for this
writeup is to see if there's any interest in something like this. I'm happy
with the functionality it has, but it's nowhere near polished enough for
other people to easily reuse.
So what is it?
ffipf is a backend for fuzzy matching file paths. It's
scoring and sorting algorithm is better suited for matching paths compared
to more general fuzzy matching implementations, like Helm and Ivy. It's
written in Nim and compiled to native code, which makes it faster and more
memory-efficient than other solutions. The compiled DLL is a proper Emacs
module and can be loaded as if it was Elisp.
Let's go over the features in a bit more detail next.
Better matching and sorting algorithm
The matching algorithm is designed specifically for patterns targeting
paths. It matches each section of the pattern, delimited by /, fuzzily,
but then matches each segment in sequence. The sorting is done based on how
well the path conforms to the pattern, with the path most closely resembling
the pattern at the top. The algorithm is said to be close to the one used by
TextMate[1]
by its author, whose code I ported to Nim from the Ruby original[2].
In practice, this means that you can skip large parts of the path and input
very few characters, yet still arrive at the correct file. You can use it to
also quickly list files within a set of similar directories, or files
matching some pattern no matter where they are in the hierarchy.
Some examples of patterns I'd use to search for some files and results for
them. Examples are from my .emacs.d directory, and the output is shortened
(you can easily change how many candidates are returned).
-▶ ./ffip
> fo/mag/mag.el # wanted: magit.el
.emacs.d/plugins-forked/magit/lisp/magit.el
.emacs.d/plugins-forked/magit/lisp/magit-tag.el
.emacs.d/plugins-forked/magit/lisp/magit-wip.el
.emacs.d/plugins-forked/magit/lisp/magit-git.el
.emacs.d/plugins-forked/magit/lisp/magit-log.el
.emacs.d/plugins-forked/magit/lisp/magit-pkg.el
.emacs.d/plugins-forked/magit/t/magit-tests.el
.emacs.d/plugins-forked/magit/lisp/magit-core.el
.emacs.d/plugins-forked/magit/lisp/magit-stash.el
> stg/ # wanted: all files in plugins-staging/
.emacs.d/plugins-staging/f3/.gitignore
.emacs.d/plugins-staging/f3/LICENSE
.emacs.d/plugins-staging/f3/README.md
.emacs.d/plugins-staging/f3/create-markdown.coffee
.emacs.d/plugins-staging/f3/f3.el
.emacs.d/plugins-staging/f3/package-lock.json
.emacs.d/plugins-staging/f3/package.json
.emacs.d/plugins-staging/f3/update-commentary.el
.emacs.d/plugins-staging/ecb/ecb-advice-test.el
> stg/.el # wanted: all Elisp files in plugins-staging/
.emacs.d/plugins-staging/f3/f3.el
.emacs.d/plugins-staging/ecb/ecb.el
.emacs.d/plugins-staging/elx/elx.el
.emacs.d/plugins-staging/esup/esup.el
.emacs.d/plugins-staging/doom/init.el
.emacs.d/plugins-staging/doom/core/autoload/ui.el
.emacs.d/plugins-staging/unfill/test.el
.emacs.d/plugins-staging/doom/core/cli/env.el
.emacs.d/plugins-staging/ecb/ecb2/test.el
.emacs.d/plugins-staging/doom/core/core.el
> plu/REA # wanted: all plugin READMEs
.emacs.d/plugins-forked/xr/README
.emacs.d/plugins-staging/ecb/README
.emacs.d/plugins-forked/muse/README
.emacs.d/plugins-forked/distel/README
.emacs.d/plugins-forked/lua-mode/README
.emacs.d/plugins-forked/yaml-mode/README
.emacs.d/plugins-forked/s/README.md
.emacs.d/plugins-forked/f/README.md
.emacs.d/plugins-forked/a/README.md
.emacs.d/plugins-forked/ht/README.md
.emacs.d/plugins-forked/gh/README.md
> co/mini/h # wanted: my config file for Helm
.emacs.d/config/interface/minibuffer/my-helm-config.el
> co/mini/ # wanted: all config files related to minibuffer
.emacs.d/config/interface/minibuffer/my-helm-overrides.el
.emacs.d/config/interface/minibuffer/my-ido-config.el
.emacs.d/config/interface/minibuffer/my-selectrum-config.el
.emacs.d/config/interface/minibuffer/my-helm-config.el
.emacs.d/config/interface/minibuffer/my-yes-or-no-prompt.el
.emacs.d/config/interface/minibuffer/my-ivy-config.el
.emacs.d/plugins-forked/selectrum-group/marginalia/.gitignore
.emacs.d/plugins-forked/selectrum-group/marginalia/LICENSE
.emacs.d/plugins-forked/selectrum-group/marginalia/README.md
The details of the scoring algorithm are a bit more complex, but the effects
are very satisfactory in my opinion.
Native module and execution speed
The main functionality is implemented in Nim and compiled to native code.
Nim is a high-level language with a Pythonesque syntax, but its performance
is closer to C or C++. Thanks to Yuuta Yamada[3]
work it's possible to write Emacs extension modules easily in Nim.
My .emacs.d has nearly 40000 files under it. This is a lot, and simply
traversing the directory hierarchy takes time; when you add the time needed
to process the list of files in Emacs Lisp, the invocation of (for example)
counsel-file-jump can take close to 2 seconds to initialize. Further,
filtering can also feel sluggish during the input of the first few
characters (it gets better when the pattern gets longer).
Traversing the whole hierarchy, or initializing the search, takes just 0.3
of a second with the Nim implementation. Moreover, the feedback - displaying
candidates as you type - is always instantaneous.
There are downsides to the usage of native code. For example, it's possible
for the module to crash Emacs as a whole in case there's a segfault
triggered in the code. However, Nim makes it significantly harder to shoot
yourself in the foot like that. Almost all of the module is written in a
"safe" subset of Nim, and the only place where a crash is possible is in the
parts which interact with Emacs module API. Fortunately, after wrapping the
raw API calls with helper procedures, the chance of triggering unrecoverable
error also goes down drastically.
Another downside is that you need to compile the module first in order to
use it. Fortunately, Nim is much easier to compile than C,
and it's available for all major platforms. After installing Nim you're just
one make or nimble build command away from a working module. It's also
possible to distribute binaries in case you don't want to install Nim, but
that's something for the future (I have no way of cross-compiling for
non-Linux OSes currently).
Displaying candidates
Currently, I use Ivy for displaying candidates and a built-in project.el
for finding the root directory of the current project. The interface is very
basic, for example it doesn't highlight the parts which were matched, but it
does the trick.
That being said, my main focus is on the backend, the Nim-based dynamic
module. It should be easy to write a Helm source for it or interface with it
through Selectrum, IDo, or any other completion framework.
The main question
Before I start working on making the implementation bulletproof and usable
for others I need to know if there's any interest towards a module like
this. The code is currently nearly there in terms of features that I want it
to have, and if it's for my personal use only, I can slowly improve only the
parts I need. On the other hand, if there's any interest, I
would need to clean up the code, remove all the assumptions specific to my
setup, and add configuration options at the very least. For example, the
directory blacklist (list of dirs that should not be traversed) is currently
hardcoded on the Nim side, which doesn't bother me, while it could be a
problem for others.
So here's the question: would you be interested in a blazing fast fuzzy file
finder for your Emacs?
The shorter syntax is convenient and works really well with map/mapcar
and other higher-order functions. It is, however, absent in Emacs Lisp.
Some time ago I found an implementation of short
lambda syntax for Emacs Lisp. It's a simple macro which expands to
lambda expression. You can test it like this:
The implementation, however, is incomplete. In the words of the author:
This assumes that there is a reader macro in the Emacs C code that
translates #(STRUCTURE) to (short-lambda STRUCTURE), in the same
way as for the backquote macro.
Indeed, as it is, it's even longer than normal lambda - clearly there's
something missing. Namely: a support for the short lambda in the reader,
which is implemented in C.
Implementing the missing part
How hard would it be to add the missing support to the reader? When I was
thinking about this, I noticed that there is already something similar in
Elisp, namely - hash-table syntax. It looks like this:
(make-hash-table);; ⇒ #s(hash-table size 65 test eql rehash-size 1.5 rehash-threshold 0.8125 data ())
The #s(...) syntax is supported by the reader, ie. hash tables can be read
with the same syntax they are printed in.
Wouldn't it be easy to create a short lambda syntax by changing #s to
#f? Turns out - yes, it is easy. It took me an afternoon or two to figure
it out, with no prior experience with Emacs C codebase. The whole
implementation is less than 5 lines of code.
Changing the reader
First, I located the hash-table reader code in lread.c, which is where the
reader lives. There's a function called read1, where the hash-table
support is implemented. It looks like this (reformatted for brevity):
staticLisp_Objectread1(Lisp_Objectreadcharfun,int*pch,boolfirst_in_list){intc;// ... more variables here ...retry:c=READCHAR_REPORT_MULTIBYTE(&multibyte);if(c<0)end_of_file_error();switch(c){case'(':returnread_list(0,readcharfun);case'[':returnread_vector(readcharfun,0);case')':case']':{*pch=c;returnQnil;}case'#':c=READCHAR;if(c=='s'){// ... lots of code specific to hash-tables ...
The only thing needed to add support for short lambda is to change the line
19 to this:
if(c=='f'){returnlist2(Qfn,read0(readcharfun));}elseif(c=='s'){// ... lots of code specific to hash-tables ...
The list2 function creates a list of two elements, where the first one is
a symbol, defined (that's the second, and last, change to C code) like this at
the end of lread.c (fn is and alias for short-lambda):
diff --git a/src/lread.c b/src/lread.cindex 015bcc2e11..42fc4050ae 100644--- a/src/lread.c+++ b/src/lread.c@@ -2880,7 +2880,10 @@ read1 (Lisp_Object readcharfun, int *pch, bool first_in_list) case '#':
c = READCHAR;
- if (c == 's')+ if (c == 'f') {+ return list2(Qfn, read0(readcharfun));+ }+ else if (c == 's') {
c = READCHAR;
if (c == '(')
@@ -5168,6 +5171,7 @@ syms_of_lread (void) DEFSYM (Qload_force_doc_strings, "load-force-doc-strings");
DEFSYM (Qbackquote, "`");
+ DEFSYM (Qfn, "fn"); DEFSYM (Qcomma, ",");
DEFSYM (Qcomma_at, ",@");
When is it useful?
As mentioned, the shortened syntax works well with higher-order functions.
It's not essential, but it is convenient. Especially if you use libraries
like dash.el, which give you a lot of such functions.
Just yesterday I was writing a bit of code to replace literal characters
with HTML entities. I fetched the list of entities from the gist published
by Christian Tietze (blog post,
the gist),
and started coding. I had to parse the list of entities and needed to break
each line into components, entity, entity name, and description. The whole
code looks like this:
(defconsthtml-entity-list'("char: \" entity: " text: quotation mark (APL quote)""char: & entity: & text: ampersand""char: ' entity: ' text: apostrophe (apostrophe-quote); see below";; ....)(require's)(require'dash)(require'short-lambda)(defconsthtml-entities-lookup-table(->(-juxt#f(second(s-match"char: \\(.\\) "%))#f(second(s-match"entity: \\(&.+;\\) "%))#f(second(s-match"text: \\(.+\\)"%)))(-maphtml-entity-list)))(defunmy-html-replace-entity(p)(interactive"d")(let*((ch(buffer-substring-no-propertiesp(1+p)))(rep(-find#f(equalch(car%))html-entities-lookup-table)))(delete-char1)(insert(secondrep))))
There are 4 lambdas in the code - were it not for the short lambda, I would
probably write this code differently. Using them, though, the code ended up
being short and readable, without the need for any of the heavier
abstractions.
That's it, so... why?
That's really everything you need to add short lambda support Emacs Lisp. I
have this implemented in my Emacs since a few years back and I use the
#f() syntax regularly. It's convenient. It's easy to implement. I wonder
from time to time - why isn't it still implemented in Emacs?
Please let me know if you know the reason!
EDIT: so, um, yeah, one reason may be that nobody suggested
this as a feature yet. I'm stupid, it totally slipped my mind. I assumed it
was proposed already, for sure, given that the short-lambda.el repo[2] is 6 years old at this point. But I didn't check. My bad!
UPDATE: rewritten, non-recursive reader in current master (2022-09-22)
When I pulled Emacs repo the other day and built it, I realized that my
short-lambda stopped working. Apparently, the reader got rewritten to be
nonrecursive back in May, and my patch obviously stopped being applicable.