Typon is a work-in-progress Python-to-C++ compiler.
Untyped, idiomatic Python code can be compiled to C++, which can then be compiled to native code using a C++ compiler. Performance of generated code is on par with "real" C++ code.
Asynchronous programming is available in the form of structured concurrency primitives relying on C++20 coroutines, and native to- and from-Python interoperability is provided through CPython.
git clone https://lab.nexedi.com/typon/typon-compiler.git --recursivepython3 -m venv typenvsource typenv/bin/activatepip install ./typon-compiler
pip install typon
apt install g++-13 libfmt-dev liburing-dev python3.12-dev libssl-dev
Typon can be used from the command-line to compile a Python file:
typon [-o/--output output] [-d/--debug] [-v/--verbose] input
Once generated, the C++ code file can be compiled using your compiler of choice:
$(CXX) -O3 $(typon --cpp-flags) input.cpp
The Typon runtime is header-only, so no additional linking is required.
Currently, you'll get the best support with G++ 13.
from typon import fork, syncdef fibo(n): if n < 2: return n a = fork(lambda: fibo(n - 1)) b = fibo(n - 2) sync() return a.get() + bif __name__ == "__main__": print(fibo(20)) # should display 832040
typon forksync.pyg++-13 -O3 forksync.cpp $(typon --cpp-flags)
./forksync832040
You may be thinking “why yet another Python compiler?”. Typon was born out of the observation that none of the existing Python compilers focused on concurrency and asynchronous programming. These two paradigms have indisputably emerged as fundamental pillars of contemporary high-performance and efficiency-conscious software development.
By incorporating these elements into Python compilation, Typon empowers developers to compete on equal footing with languages like Rust and Go, particularly when it comes to creating low-overhead asynchronous code.
What sets Typon apart is its commitment to sustainability as well. By utilizing Typon, we not only attain cutting-edge programming capabilities but also contribute to the reduction of carbon emissions compared to conventional choices such as CPython or Node.js. In a world increasingly conscious of environmental concerns, Typon presents a compelling proposition for a greener, more performance-driven future in software development.
Typon is not the first Python compiler, many other projects have similar goals. Here is a comparison with some of them:
Cython | Cython+ | MicroPython | Nuitka | Codon | PyPy | Shed Skin | Typon | |
---|---|---|---|---|---|---|---|---|
Status | Mature | Experimental | Mature | Mature | Mature | Mature | Experimental | Experimental |
Platform support | Cross | ? | Cross (embedded) | Cross | Cross + WASM | Cross | ? | Linux only (relies on io-uring) |
Native codegen | AOT | AOT | JIT | n/a (AOT + Python) | AOT | JIT | AOT | AOT |
Required type annotations | Function signatures | Function signatures | No | No | ? | No | No | Class fields |
Type inference | Partial | Partial | n/a | One-way | Bidirectional (Hindley-Milner) | |||
Native execution | ✓ | ✓ | Partial – calls CPython API for many things | ✓ | ✓ | ✓ | ||
First-class types | ✓ | ✓ | ✓ | ✓ | ||||
Closures, bound methods | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | |
Can import Python modules | ✓ | ✓ | No (freestanding) | ✓ | ✓ | ✓ | ✓ | |
Can be imported from Python | ✓ | ✓ | No (freestanding) | ✓ | ✓ | |||
Async, I/O coroutines | Not for cdef | Active objects, M:N scheduler | ✓ (+ uasyncio) | ✓ | ✓ | ✓ |
(non-exhaustive table)
See PEP 484: Typon uses the standard typing
module's notational conventions.
Typon supports ADTs in the form of tuples (product types) and unions (sum types):
from typing import Optionala = (1, "a", True) # tuple[int, str, bool]def f(x: Optional[int] = None): # alias for Union[int, NoneType] passdef g(y: str | int): # alias for Union[str, int] pass
Currently, a derivative of Hindley-Milner is used to infer types. Variables, function parameters, return types can all be inferred from the vast majority of programs, making type hints optional in the general case.
For example:
def create_listening_socket(port): # port: U, return_type R sockfd = socket(AF_INET6, SOCK_STREAM) # socket(int, int) -> socket therefore sockfd: socket sockfd.setsockopt(SOL_SOCKET, SO_REUSEADDR, 1) sockfd.bind(("", port)) # socket.bind(self, addr: (str, int)) therefore U = int sockfd.listen(BACKLOG) return sockfd # therefore R = socket
Based on socket.bind
having a signature of def bind(self, address: tuple[str, int] | str) -> None
, the compiler can infer the type of the parameter port to be int
, and the return type to be socket
.
The order of declaration of code items does not matter, mutual recursion is possible.
A unification-based system has some disadvantages though, notably non-locality of errors and the difficulty of modeling inheritance, and as such this will probably be changed to something based on bidirectional typing (see roadmap).
However, even with the current system, the type inference is quite powerful, and Python programs can be compiled often with no changes at all, with an immediate performance bump.
In the end, there is hope that Typon could re-use entire existing Python libraries as-is, by inferring all of their code.
An implementation of the Python standard library is provided, reusing the same interfaces as Python. It is implemented in the C++ side, and use io_uring for I/O accesses through the concurrent runtime. It's still a work in progress; currently only a few modules have been implemented.
The webserver example is a good starting point: it uses the sys
and socket
modules to create a static web server with a performance matching Nginx, written in fully pythonic Python code. The sockets implementation is fully asynchronous, and the server uses multiprocessing to handle concurrent requests.
As mentioned, the compiler could ultimately be able to compile existing pure Python high-level libraries, providing an immediate "free" performance gain.
Typon comes with powerful and performant concurrency primitives built tightly into the core of the language:
fork
/sync
structured-concurrencyfuture
-based "unbounded" concurrencyStructured concurrency is a concurrency paradigm where parallel execution paths are always joined before the function with created them exits.
It makes it possible to reason locally about concurrent code, since concurrent tasks will not outlive the current function. In other words, functions won't "leak" concurrency.
This makes concurrent code more intuitive and readable, and imposes fewer constraints to ensure memory and thread safety.
It also lets any exception be meaningfully propagated from a spawned task up to the parent scope.
For more, see these excellent reads:
The fork
/sync
primitives in Typon mirror the spawn
/sync
paradigm of Cilk.
Like in Cilk, fork
has continuation-stealing semantics, which enable more performant execution and have a well-defined execution order when there is only one core: if another core does not steal the continuation to execute it in parallel, everything happens as if fork
was a normal function call.
In case structured concurrency is too constraining, the future
primitive can be used to launch concurrent tasks that may outlive the parent scope, more like the go
statement of Go. Unlike go
, future
immediately returns a Future
that can optionally be used to wait for the task to complete and retrieve it's result - otherwise the task becomes detached and lives fully independantly.
Just like fork
, future
has continuation-stealing semantics.
Together, fork
/sync
and future
let concurrency be easily expressed in ways that will efficiently take advantage of the available parallelism - from local ad-hoc concurrency up to serving as building blocks for more formal concurrency models like actors.
In the future, Typon aims to provide a type system capable of guaranteeing thread-safety - taking inspiration from Rust and Pony. This type system will be tightly meshed with the concurrency primitives and will be able to leverage the properties of structured concurrency. See our roadmap for more.
The C++ runtime used by generated programs relies on a custom-made continuation-stealing concurrency runtime using cutting-edge C++20 coroutines, featuring both fork
/sync
structured concurrency and future
-based unbounded concurrency. Read more on the dedicated repository.
The origins of Typon can be traced back to its roots in Cython+, a research project from Nexedi whose goal was to extend Cython with a concurrency model inspired from Go, to enable Python to perform well on multi-core environments. Being based on Cython, Cython+ was a superset of Python, adding support for syntax structures for C types and type declarations for local variables, to build Python to efficient native code while getting rid of the GIL in specific conditions. This worked to a degree, exemplified by proofs-of-concept like "A multi-core Python HTTP server (much) faster than Go" and the explanatory article about work-stealing schedulers "Multi-threaded coroutines with work stealing in Cython".
The source code of Cython+ is available on the Nexedi forge for reference purposes. A more complete article describing Cython+ is available here: "Cython+: Extending Cython with Python-like GIL-free Abstractions".
Nowadays, the efficiency and performance of systems increasingly hinge on the utilization of multi-core processors. Notable examples include the ERP5 platform, Galène for video conferencing, and most modern databases, all of which capitalize on multi-core processing to deliver enhanced operational efficiency and speed.
This shift towards multi-core optimization reflects a fundamental paradigm in modern computing, where the seamless synchronization of multiple cores has become a hallmark of system efficiency and responsiveness. We see the effect in how CPUs are designed: for the longest time, the predominant focus in computing was on optimizing single-core efficiency. However, in recent years, a significant shift has occurred as CPU manufacturers have increasingly prioritized cramming a multitude of cores into their processors.
In 2023, one can wonder: how hard is it to build an application that scan scale up to 80 cores?
As Nexedi's objectives began to diverge significantly from Cython's, we made a pivotal decision. Being constrained to Cython's architecture proved to difficult given our objectives, so the decision to start anew was taken, and Typon is the result of that work, started in early 2023. The concurrency engine was written first, implementing all the features we wanted to see on top of io_uring and behind C++20 coroutines. Around that engine, a type-inferring compiler was built.
Although Typon strives to accept as much idiomatic Python as possible, there are some differences and unsupported features:
l = ["abc", 123, False] # unsupportedd = {5: "abc", True: 123} # unsupported
__slots__
, __dict__
) is not supportedint
is fixed-sizetry
blocks are not supported yetWhen importing a module outside from Typon’s scope, if a module matching that name exists in Python’s scope, then that module and optionally the specified functions are hoisted to the import
statement’s scope (as in Python).
from numpy import squareimport mathif __name__ == "__main__": x = [1, 2, 3, 4] y: list[int] = square(x) print(x, y) f: int = math.factorial(5) print("5! =", f)
Typon cannot currently determine the type signature of imported functions, so they must be either be annotated, or be used in a context where their type can be fully inferred.
Typon values (C++ objects) are converted to their Python equivalents when passed to Python functions, and vice-versa. At the moment, collections are passed by copy; passing by reference is not yet supported.
As mentioned earlier, unification is powerful but it is not a panacea. Recent advances in the field of type inference allow more efficient and less error-prone analysis of code. Bidirectional typing (or type-checking) relies, as its name suggests, on inferring a term's type either inside-out (from inner terms to outer terms) or outside-in, depending on the context. The main advantages being clearer type errors, and an easier time dealing with subtyping. A mix of both unification and bidirectional typing will probably end up being used in the long run.
Currently, Python classes behave like dataclasses, in that they're really expected to be used mainly as mutable records. In particular, fields cannot be added dynamically, and they must appear type-annotated in the class body, as with regular dataclasses. Inheritance is also not implemented at the moment, since we haven't yet decided on the best way to build them from our current code model.
Typon should be able to express the actor model of concurrency in terms of its native concurrency primitives.
Functionally, actors can be thought of as a mechanism to serialize concurrent accesses to encapsulated data. This definition also fits the mutex, so it is a generalisation of both. In fact, Hewitt who formalized the actor model also formalized the serializer.
The only functional difference between a mutex and an actor is that in the case of the mutex the data is accessed synchronously - the code which acquires the lock will not do anything else until the lock is acquired - while in the case of the actor the data is accessed asynchronously - the code which sends a message to the actor continues executing and the actor may handle the message "later".
This interpretation of the actor is quite close to Project Verona's concept of concurrent owner.
In terms of Typon's concurrency primitives, this means an actor may be modeled as a mutex where every access is made from a `future` spawned just for that.
To make this nicer and more ergonomic, Typon plans to introduce syntactic sugar to allow a class to encapsulate this behavior. This requires:
Such features would in fact allow any kind of custom serializer to be easily defined, such as support for hooks that should be executed before or after executing the requested computation: these just become things the operator does.
Automatic memory management with reference-counting can create a deceptive illusion of working code because it fails to address the intricacies of ownership and lifetime management in a concurrent environment. Rust's ownership-tracking type system, on the other hand, provides a robust and trustworthy solution.
Self-describing title: use C++ exceptions to reproduce the behavior of Python exceptions.
Type genericity is currently supported by the type checker, but only for the standard library. Support for user-defined generic types will come at a later point, and will require formalizing the meaning and mapping of those to C++ types.
The syntax will most probably follow the PEP 695 design:
def func[T](a: T, b: T) -> T: ...
For more information, please contact Jean-Paul, CEO of Nexedi (+33 629 02 44 25).