Better poly than sorry!

Adding string interpolation to Racket

A minor update, added an illustration and another interpolation implementation suggested on HN comments.

NOTE: All the code, as usual, is on GitHub
NOTE: This is just an example of extending the Reader - it's not meant to be practical.

String interpolation is a convenient language feature which makes creating human-readable strings that contain variables easier. The classical approach to this in languages without string interpolation is something like printf - a function which accepts a format string and a variable number of arguments. The format string contains special sequences of characters which get replaced with the values of passed arguments, like this:

printf("An error occured at %s:%d.", "file-name.c", 42);
// Displays: An error occured at file-name.c:42.

This has various problems, for example the possibility to supply more (or fewer) arguments than expected. String interpolation allows you to embed variables directly into the format string:

echo "An error occured at $FILE:$LINE."

As with many features, some languages provide string interpolation and some don't. In the dark and troubled past of JavaScript, for example, some people switched to CoffeeScript for its string interpolation support (among other things.) More recently, string interpolation came to Python in the form of f-strings (PEP-498) and to plain JavaScript in the form of template literals (supported by Babel and other transpilers). While the feature is most often associated with dynamic, dynamically typed languages, some statically typed ones, for example C# and Swift, also support it, so it's not exclusive to the former.

Help! No string interpolation in Racket!

Racket - like JS at one point - lacks the string interpolation support. If Racket was JS, the only options would be to either write a transpiler, or switch to another language. Both choices normally require a lot of work, so developers tend to learn to live without the feature instead and move on.

But Racket is not JS, it's a Lisp. All Lisps are programmable programming languages[1], but Racket takes it up to eleven with the multidute of tools for language extension (and creation)[2]. How much work would it be to make Racket interpolate strings?

How Racket processes code

To process any code, Racket performs these steps:

  • reading - accomplished by two functions, read and read-syntax, reading means taking a string and returning an AST[3] (which happens to be a normal list with normal symbols, strings, numbers, etc. Such a list can be wrapped with a syntax object, which adds some meta-data to the raw list. You can always get a raw list out of syntax object with syntax->datum).
  • expansion - all the special forms and macros in the read code are expanded until only the most basic forms (modules, function applications, value literals) are left.
  • compilation - Racket performs byte-code compilation on the expanded code. This is where most of the optimizations are applied.
  • execution - compiled byte-code is handed to the VM for execution. At this stage, on some architectures, further JIT compilation happens, turning (possibly portions of) bytecode into native code.

The first two steps are completely customizable: code expansion with macros, and code reading via hooks into the Racket parser (you can also write a new parser from scratch (or generate it) if you want, but frequently it's not necessary). Symbolically, what happens during reading and expansion can be presented[4] like this:

Extending the parser

The Racket parser is a simple, recursive-descent one and it works by utilizing a readtable - a mapping from characters to functions responsible for parsing what follows the given character. The read and read-syntax functions utilize current-readtable, which is a parameter you can override. Adding a new syntax to the reader requires just three steps:

  1. create a function for parsing the new syntax into valid Racket forms
  2. get the current-readtable and add your function to it
  3. make that expanded readtable from the prev step the new current-readtable

Interpolated string syntax

We'll get to writing the parsing function in a minute, but first we need to know what is it we're going to parse and what the results of the parsing should be. The exact syntax I have in mind would look like this (with #@ distinguishing interpolated strings from normal ones):

#@"An error occured at @{file}:@{line}."

Interpolated string compilation

String interpolation, for example in CoffeeScript, is often compiled to an expression which concatenates the chunks of the string, like this:

"An error occured at #{file}:#{line}."
# would be compiled to:
"An error occured at " + file + ":" + line + "."

In Racket we have no + operator for concatenating the strings and even if we did have it, the prefix notation and resulting nesting wouldn't be pretty. The easier and more natural way is to use a string-append procedure, which - conveniently - accepts any number of arguments. The syntax introduced in the previous section would compile down to something like this:

(string-append "An error occured at " file ":" line ".")

The parsing function

The parsing function needs to take the string form as an argument and output the code as above. Here's the simplest implementation I could think of:

(define (parse str)
  ;; Assume that @{} cannot be nested and that the braces are always matched.
  ;; Obviously, in a real parsing function, these assumptions would need to be validated.
  (define lst (regexp-split #rx"@{" str))
  ;; After splitting we have a list like this: '("An error occured at " "file}: " "line}.")
  ;; We'll go over it, building a list of expressions to be passed to string-append as we go.
  (define chunks
    (for/fold ([result '()])
              ([chunk (rest lst)])        ; we don't need the first element here
          ;; convert a string into a port
          ([is (open-input-string chunk)]
           ;; call original read to get the expression from inside brackets
           [form (read is)]
           ;; read what remains in the port back into a string
           [after-form (port->string is)]
           ;; drop the closing brace
           [after-brace (substring after-form (add1 (s-index-of after-form "}")))])
        ;; ~a is a generic "toString" function in Racket
        (append result `((~a ,form) ,after-brace)))))
  ;; chunks now looks like this: ((~a file) ":" (~a line) ".")
  `(string-append ,(first lst) ,@chunks))

It's dead simple - just a couple of lines - and it glosses over important details, like the possibility of nesting the @{} expressions and syntax errors handling, but it works for the simple cases. We can easily test it, like this:

(parse "An error occured at @{file}:@{line}.")
;; Returns:
'(string-append "An error occured at " (~a file) ":" (~a line) ".")
;; which is exactly what we wanted!

The boilerplate code

The hard part ends here - the rest is just some boilerplate we need to make Racket aware of our extension to its reader. In Racket, this is often done by defining a new language - working with #lang lang-name syntax, but in our case we don't need (or want) to create a whole language (it's simpler than it sounds, but it's still a bit more work) - we just change the reader and can reuse everything else from some other language.

For situations like these, Racket supports a special syntax to its #lang line, which looks like this:

#lang reader <reader-implementation-module> <language-the-reader-will-be-used-with>
;; for example:
#lang reader "reader.rkt" racket

Generating a reader module

Let's first define reader.rkt. Writing a whole new reader is of course an option but it's a lot of unnecessary work, as the generic read and read-syntax[5] need to handle some more complex things, like module declarations, which we don't care about. Instead, we can use a special language for defining readers, which will generate appropriate read and read-syntax implementations for us provided we supply a single function to it:

#lang s-exp syntax/module-reader
#:language read
#:wrapper1 wrap-read

(require "ext.rkt")

The #:language option takes a function which accepts a string (of what remains after the reader module name on the #lang line) and returns a language name to use. A simple read happens to work here.

The #:wrapper1 option takes a function, which takes another function as input and returns the parsed code. The passed in function, when invoked, performs normal read.

Extending and installing a readtable

With #:wrapper1 option, in conjunction with parameterize, it's very easy to install our readtable[6]:

(define (wrap-read do-read)
  (define rt (make-readtable (current-readtable)
                             ;; register our function to be called when #@ is encountered
                             #\@ 'dispatch-macro
                             ;; name of the function to be called
  (parameterize ([current-readtable rt])

Wrapping parse so that it works inside readtable

The only part missing is a read-str function, which is a bit more complicated. The function passed to make-readtable may be called in two different ways, depending on whether it's used from inside read or read-syntax. The function distinguishes the cases based on a number of arguments it gets, and returns either syntax object or a simple list depending on the case. Here's its implementation:

(define read-str
    [(ch in src line col pos)
     ;; The caller wants a syntax object. The current position of the `in` port
     ;; is just after the #@ characters, before the opening quote.
         ;; Here we read the next expression (which we assume is a string)
         ([literal-string (read in)])
       ;; then we parse the contents of the string and return the result,
       ;; wrapped in a new syntax object. The `in` port is now just after the
       ;; closing quote and the parser continues from there.
       (datum->syntax #f (parse literal-string)))]

    [(ch in)
     ;; The caller wants a simple list. Let's parse the input into syntax and
     ;; transform to a list (this saves as writing the same code twice).
     (syntax->datum (read-syntax #f in))]))))

It works!

The reader extension can be used like this:

#lang reader "reader.rkt" racket

(define line 42)
(define file "file-name.rkt")
(displayln #@"An error occured at @{file}:@{line}.")
;; Displays: An error occured at file-name.rkt:42.

It also works for more complex expressions out of the box:

#lang reader "reader.rkt" racket
(require racket/date)

(displayln #@"Current date&time: @{(date->string (current-date) #t)}")
;; Displays: Current date&time: Friday, April 28th, 2017 7:57:11pm

This is of course a pretty basic as far as string interpolation features go, but it took thirty (34 to be precise) lines of code (excluding comments) to build it. Adding another twenty lines would be enough to implement error checking, and using one of the parser generators could even shorten the code.

That's it!

Aside from how easy it is to extend Racket syntax one thing deserves a mention. That is, such syntax extensions are easily composable (as long as they don't want to install extensions to the same entry in the readtable) - you can load them one by one, each extending the readtable.

The language extension and creation tools of Racket don't end there. Robust module system with support for exporting syntax extensions, powerful syntax-parse, traditional syntax-rules and syntax-case, many tools for generating parsers and lexers - all these features are geared towards meta-lingustic programming. Racket is a programming language laboratory and allows you to bend it to better suit your needs safely (ie. ensuring that only your code will be affected). Very few languages or transpilers offer that much freedom with that kind of safety guarantees.

It's an ideal environment for people who know what they need from their language - with Racket you're never stuck, waiting for the next release to support some feature. On the other hand, despite there being a great many helpers and libraries, you need to know at least some basics of programming language creation to do this yourself. The good news is that the extensions written by others are easily installable with a single command of a Racket package manager.

Addendum: doing the same without messing with the reader

A commenter on Hacker News provided an interesting solution using normal macros only. Turns out there's a special macro, called #%datum, which gets wrapped around every literal in the code (when the code is read). By default it's a noop, but we can override it with arbitrary logic. In this solution there's no need for marking strings for interpolation: all strings are interpolate-able by default. Here's the implementation:

  (rename-in racket (#%datum core-datum))
  (for-syntax syntax/parse))

(define-syntax (#%datum stx)
  (syntax-parse stx
    [(_ . x:str) (interpolate #'x)]
    [(_ . x)     #'(core-datum . x)]))

(define-for-syntax (interpolate stx)
  (define re            #rx"@{[^}]+}")
  (define (trim match)  (substring match 2 (- (string-length match) 1)))
  (define (to-stx val)  (datum->syntax stx val))
  (define (datum val)   (to-stx (cons #'core-datum val)))
  (define (source text) (to-stx (read (open-input-string text))))
  (let* ([text     (syntax->datum stx)]
         [matches  (regexp-match* re text)]
         [template (datum (regexp-replace re text "~a"))]
         [values   (map (compose source trim) matches)])
    (if (null? matches)
        (to-stx `(,#'format ,template ,@values)))))

I don't have the time to go over this line by line and explain it right now, but it's possible I'll write another post featuring this solution in the future.

  • The quote comes from Paul Graham's famous book and is attributed to John Foderaro.
  • A good starting point for learning more would be Racket is ... article by Matthias Felleisen. At least I assume it's his - it's not signed, but clearly in Matthias home directory ;)
  • Abstract Syntax Tree, see Wikipedia
  • Taken from a paper Type Systems as Macros by Stephen Chang, Alex Knauth, Ben Greenman
  • Any module, which exports these two functions, can be used as a reader.
  • In the actual source code the (make-readtable ...) part is extracted into separate procedure, which makes it easier to compose this syntax extension with others.