TopHome
<2024-09-29 Sun>techlisp

Lessons learnt from the Jaratkaru attempt

So, as an exercise over 10 days, I attempted yet another go at creating a lisp iterpreter inspired from lispy and mal. My previous attempt was jane which felt like cheating, since the host language was itself a Lisp.

So, I succeeded in parts, in that with Jaratkaru I got further than I ever have, and yet couldn't meet the initial fanciful target I had. This blog is a review of the lessons learnt in the process, a document for my eventual next attempt.

1. Choice of Host language

In the initial attempt, I wanted to go with nim or Ocaml as the host language - mostly for the "performance" angle. Then it struck me that this was going to be an uphill task anyway, so I might make it easier for myself by going with a dynamic and simple language like Python. This certainly helped in some aspects.

2. Parsing

When following along with lispy, I was stuck with the parsing function. Somehow, I was uncomfortable with the recursive implementation. Parsing was not a problem in jane, but it was one of the barriers here.

This time, I had just spent sometime implementing one4 - an attempt at a Forth interpreter. I had spent a few days wrestling with the stack operations and the difficulty of implementing if-else using an assembly like jumping mechanism that the Stack was in my head as the supreme data structure and sure enough, applying it in the context of parsing made it super easy for me to break through the parsing barrier.

So, there are atleast 4 ways to do parsing here:

  1. Lispy style single recursive function.
  2. Mal style mutually recursive functions of parse and parsesexp.
  3. Writing a parser using a parser generator like yacc or PEG grammers. This is actually the most generic way to write a parser and you have to do this for any other language apart from Lisp. For Lisp, give the already simple nature of syntax, you can do something much simpler as done in 1 or 2.
  4. Explicitly modelling the stack. So, basiclly as you go through tokens, you keep pushing them onto a stack. When you reach a ")", you pop the elements from the stack repeatedly until you reach a "(" and make the whole list into a single S-expression and then push that S-expression back on the stack and continue reading in tokens. This approach makes explicit what is implicitly done in the approaches 1/2 on the call stack - but this makes it much easier to understand and implement.

3. Implementing location tracking

One thing that went very well is implementing location tracking in tokens and then tracking these tokens in the final sexp entities. This meant that for any error - be it parse time or runtime, we know the source location - line number, character of the error.

user> (+ a 10)
Unbound symbol used:  a at character 3
(+ a 10)
--^

This sort of error information was very useful to include, even in the early stage of debugging and running things when arbitrary things were failing. I implemented tracking and reporting of errors ranging from unbalanced paranthesis to malformed expressions - such as in correct format for let etc.

4. A basic Lisp

This time, getting to a basic lisp - with variables, functions, numeric atoms, env, if and let was no problem. This part was smooth sailing.

5. Quote Unquote

Getting quote and unquote working was the next milestone and surprisingly not too complext. The eval part of this is easy, the extra work is needed in supporting the ' ` , syntax in the parser. Surprisingly this too was not too hard.

At this stage, extending the parser to support comments and strings too was easy enough to do. (Assuming very simply case of no escaping etc).

I didn't implement unquote-splice; the algorithm listed by mal seemed complex.

Yet, this is not complete, since I do know that some combinations do not work. I am still confused about some aspects such as quoting of lists within lists.

6. Macros

Once we have the quote forms, macros were the next target. I followed through with Mal's recommended process and got an initial working version.

This one has no macroexpand and the expansion actually happens at eval time (there is no compile time for an interpreter), but a basic version seemed to work.

7. Types

Though I took up Python as the host for the dynamism, at some point I felt confused with the types being passed around and used dataclasses to make explicit the various data types. And yet, since this is Python, this means there was some weird states where the code ran, but I had mistakes with the actual types being passed around.

In hindsight, it may be better to use a statically typed language for this problem, given that your code would not even run with some subtle type bugs.

8. Builtins or exposing functions from the host

This turned out to be the biggest pain point. The problem here is making your new language useful as fast as possible, which means you need to provide fundamental built-ins from your host language.

As dynamic as Python is, it is still a problem to manage types being passed from your guest to your host and vice versa - wrapping and unwrapping as needed. This has to be done carefully for every function you want to expose - basically, you can't expect to automagically inherit a huge set of batteries into the guest without putting in the work for it.

Note: maybe transpiling, instead of interpreting, is the way to go for the last point.

9. Conclusion

My original hopes for Jaratkaru were much grander - including a transpiler to Py from Jk writted in Jk and a compiler in Jk (to a stack based VM backend such as WASM). The path to that is a bit blocked, so I stop this attempt here for now.

In conclusion - Jaratkaru is my first interpreter that may be considered working to some extent. I have crossed the milestones of tokenization, parsing and the eval loop. Experimentally, quoting and macros work too.

The problems with this round are more with type confusion - especially with exposing host level functions into the guest and when using macros. Even these may be solved with more work, but I don't have the time for deep exploration now.

But, given the learnings here, I am hopeful for the future. My next attempt will be most likely be a Lisp transpiler or compiler, with a statically typed host language. Let's see when and how that materializes.