On Making Python Lisp Again
On Making Python Lisp Again
Once upon a time, there was a voice claims that “Python is the new Lisp”, which is obviously, not true. With the rapid developement of the buzz-word industry, Python has gained far more attention than its surface language deserves. While having an OK-ish VM, Python the language is recklessly wasting its natural advantage.
I, a Lisp programmer like you can find in any neighborhood, tried to escape from the stupidity of Python while can still write codes that runs in my working envrionment, have decided to write a simple compiler that compiles a small Lisp into Python’s bytecode, so that I can write whatever I want to and produce a binary file that my colleagues can use.
Naming
I named this small Lisp dialect as LyPPS, after the idol group LiPPS from THE iDOLM@STER, hope there is no legal issue on this lol.
Components
Parser
I have written a simple and straightforward combinatoric parser library in Python, hence the parsing is as easy as:
id_char = lambda c: c.isdigit() or c.isalpha() or c in ".?!$%^&*_+/|~-"
lead = lambda c: c in "#@:'`,"
@parser
def atom(s):
return p(any, id_char)(s) >> (
lambda pair: ([("atom", chars2str(pair[0]))], pair[1])
)
@parser
def string(s):
return (p(one, __ == '"') >> p(pany, __ != '"') << p(one, __ == '"'))(s) >> (
lambda pair: ([("string", chars2str(pair[0]))], pair[1])
)
@parser
def cons(s):
return (p(one, __ == "(") >> many(prefix) << p(one, __ == ")"))(s) >> (
lambda pair: ([("cons", *pair[0])], pair[1])
)
@parser
def sexp(s):
return (spc >> (atom | string | cons) << spc)(skip_comment(s))
@parser
def prefix(s):
pref = p(one, lead)(s)
if not pref:
return sexp(s)
lead_char, rest = pref.from_right(([], []))
val, rest = prefix(rest).from_right(([], []))
if not val:
return left([], s)
return right([("prefix", lead_char[0], val[0])], rest)
With some simple and stupid preprocessing, the input sexp will be turned into your familiar tuples / lists.
Lambda and Scoping
The scoping rule of Python is daft, there are two kinds of scopes: CellVar
and FreeVar
, to explain it with some kindergarten level C++ code:
auto a = bla;
auto f = [a] (auto b) {
auto g = [a, b]() { return a + b; };
return g;
};
For f
, a
is FreeVar
, which is variable captured by a lambda expression, and b is a CellVar
, which is captured by an inner closure g
, and for g
, both a
and b
are FreeVar
s.
In LyPPS, to support currying by default, I designed the lambda expressions to take at most one argument, with an explicit “capture list” to fit my muscles trained by C++. Your familiar Scheme code will be turned into:
; Scheme
(define (foo (a b c)
(+ a b c)))
; LyPPS
(def foo
(lambda () (a)
; ^ capture list
(lambda (a) (b)
; ^ capture list
(lambda (a b) (c)
; ^ capture list
(+ a b c)))))
AST Transformation
Once the code is parsed, we need to identify the special forms and transform the tuples and lists into something that our backend likes, and some desugaring such as turning a lambda with multiple arguments into the curried form, and turning a function apply into a curried apply. E.g.:
; surface lang
(def foo (lambda () (a b c) (+ a b c)))
(foo 1 2)
; compiled
(def foo
(lambda () (a)
(lambda (a) (b)
(lambda (a b) (c)
(+ a b c)))))
((foo 1) 2)
Codegen
There is not much to say about the codegen, just going down the AST and generate the corresponding bytecodes (thanks to the bytecode library), this should be as easy as breathe for any programmer who dares to put Lisp on her LinkedIn profile.
Generating .pyc
Since my workplace forced us to used Python 3.6, some handy features such as source_hash
are not available, and I was just too lazy to backport the stuffs from Python 3.7’s C code to Python 3.6, I did a little research on Python 3.6’s code object generation, the core implementation looks like this:
with open(filename, 'wb') as fout:
data = bytearray(MAGIC_NUMBER)
loader = importlib.machinery.SourceFileLoader("<lps>", filename)
stats = loader.path_stats(filename)
mtime = stats["mtime"]
source_size = stats["size"]
data.extend(_w_long(mtime))
data.extend(_w_long(source_size))
data.extend(marshal.dumps(co))
fout.write(data)
What are Left Behind
Macro
A Lisp dialect without a macro system is just an S-expression translater that an elementary school kid can finish within a weekend.
The approach that LyPPS used is to combine an interpreter into the compiler, so that during the AST transformation phase, macros can be evaluated.
Graph
The current implementation is simple tree traverse, it might help future developement if I make a graph out of the ast, so that more checkings can be done at compile time.
Python FFI
This wouldn’t be a big deal, since import
is native for bytecode.
Inline Bytecode
Just like your favourite __asm
blocks in C codes, nothing special, I might be able to finish it on my way to office tomorrow morning.
Open Source
Yea, since LyPPS was initially developed for my project at my company, I need to get through some process to get it published, hopefully I can do GPL.
Why GPL?
Remember:
FREE SOURCE, FREE MIND