Better poly than sorry!

ffipf - jump to file in a project quickly (PoC)

Using Emacs loadable modules and native Nim code to speed up the file search

Last updated on:

NOTE: As usual, source code is on Github

ffi... what?

The name stands for Fuzzy Find in Project Fast, thanks for asking. Believe me, I tried to come up with a better name for the project... and failed miserably. Sorry. I'd be forever grateful for better name suggestions.

The project is in a proof-of-concept stage and my main reason for this writeup is to see if there's any interest in something like this. I'm happy with the functionality it has, but it's nowhere near polished enough for other people to easily reuse.

So what is it?

ffipf is a backend for fuzzy matching file paths. It's scoring and sorting algorithm is better suited for matching paths compared to more general fuzzy matching implementations, like Helm and Ivy. It's written in Nim and compiled to native code, which makes it faster and more memory-efficient than other solutions. The compiled DLL is a proper Emacs module and can be loaded as if it was Elisp.

Let's go over the features in a bit more detail next.

Better matching and sorting algorithm

The matching algorithm is designed specifically for patterns targeting paths. It matches each section of the pattern, delimited by /, fuzzily, but then matches each segment in sequence. The sorting is done based on how well the path conforms to the pattern, with the path most closely resembling the pattern at the top. The algorithm is said to be close to the one used by TextMate[1] by its author, whose code I ported to Nim from the Ruby original[2].

In practice, this means that you can skip large parts of the path and input very few characters, yet still arrive at the correct file. You can use it to also quickly list files within a set of similar directories, or files matching some pattern no matter where they are in the hierarchy.

Some examples of patterns I'd use to search for some files and results for them. Examples are from my .emacs.d directory, and the output is shortened (you can easily change how many candidates are returned).

  -▶ ./ffip
  > fo/mag/mag.el                       # wanted: magit.el

  > stg/                                # wanted: all files in plugins-staging/

  > stg/.el                             # wanted: all Elisp files in plugins-staging/

  > plu/REA                             # wanted: all plugin READMEs

  > co/mini/h                          # wanted: my config file for Helm

  > co/mini/                           # wanted: all config files related to minibuffer

The details of the scoring algorithm are a bit more complex, but the effects are very satisfactory in my opinion.

Native module and execution speed

The main functionality is implemented in Nim and compiled to native code. Nim is a high-level language with a Pythonesque syntax, but its performance is closer to C or C++. Thanks to Yuuta Yamada[3] work it's possible to write Emacs extension modules easily in Nim.

My .emacs.d has nearly 40000 files under it. This is a lot, and simply traversing the directory hierarchy takes time; when you add the time needed to process the list of files in Emacs Lisp, the invocation of (for example) counsel-file-jump can take close to 2 seconds to initialize. Further, filtering can also feel sluggish during the input of the first few characters (it gets better when the pattern gets longer).

Traversing the whole hierarchy, or initializing the search, takes just 0.3 of a second with the Nim implementation. Moreover, the feedback - displaying candidates as you type - is always instantaneous.

There are downsides to the usage of native code. For example, it's possible for the module to crash Emacs as a whole in case there's a segfault triggered in the code. However, Nim makes it significantly harder to shoot yourself in the foot like that. Almost all of the module is written in a "safe" subset of Nim, and the only place where a crash is possible is in the parts which interact with Emacs module API. Fortunately, after wrapping the raw API calls with helper procedures, the chance of triggering unrecoverable error also goes down drastically.

Another downside is that you need to compile the module first in order to use it. Fortunately, Nim is much easier to compile than C, and it's available for all major platforms. After installing Nim you're just one make or nimble build command away from a working module. It's also possible to distribute binaries in case you don't want to install Nim, but that's something for the future (I have no way of cross-compiling for non-Linux OSes currently).

Displaying candidates

Currently, I use Ivy for displaying candidates and a built-in project.el for finding the root directory of the current project. The interface is very basic, for example it doesn't highlight the parts which were matched, but it does the trick.

That being said, my main focus is on the backend, the Nim-based dynamic module. It should be easy to write a Helm source for it or interface with it through Selectrum, IDo, or any other completion framework.

The main question

Before I start working on making the implementation bulletproof and usable for others I need to know if there's any interest towards a module like this. The code is currently nearly there in terms of features that I want it to have, and if it's for my personal use only, I can slowly improve only the parts I need. On the other hand, if there's any interest, I would need to clean up the code, remove all the assumptions specific to my setup, and add configuration options at the very least. For example, the directory blacklist (list of dirs that should not be traversed) is currently hardcoded on the Nim side, which doesn't bother me, while it could be a problem for others.

So here's the question: would you be interested in a blazing fast fuzzy file finder for your Emacs?