Oh man, Python 2 > 3 was such a massive shift. Took almost half a decade if not more and yet it mainly changing superficial syntax stuff. They should have allowed ABIs to break and get these internal things done. Probably came up with a new, tighter API for integrating with other lower level languages so going forward Python internals can be changed more freely without breaking everything.
I'm been occasionally glancing at PR/issue tracker to keep up to date with things happening with the JIT, but I've never seen where the high level discussions were happening; the issues and PRs always jumped right to the gritty details. Is there anywhere a high-level introduction/example of how trace projection vs recording work and differ? Googling for the terms often returns CPython issue tracker as the first result, and repo's jit.md is relatively barebones and rarely updated :(
Similarly, I don't entirely understand refcount elimination; I've seen the codegen difference, but since the codegen happens at build time, does this mean each opcode is possibly split into two (or more?) stencils, with and without removed increfs/decrefs? With so many opcodes and their specialized variants, how many stencils are there now?
I think CPython already had tier2 and some tracing infrastructure when the copy-and-patch JIT backend was added; it's the "JIT frontend" that's more obscure to me.
I love playing with compilers for fun, so maybe I can shed some light. I’ll explain it in a simplified way for everyone’s benefit (going to ignore the stack):
When an object is passed between functions in Python, it doesn’t get copied. Instead, a reference to the object’s memory address is sent. This reference acts as a pointer to the object’s data. Think of it like a sticky note with the object’s memory address written on it. Now, imagine throwing away one sticky note every time a function that used a reference returns.
When an object has zero references, it can be freed from memory and reused. Ensuring the number of references, or the “reference count” is always accurate is therefore a big deal. It is often the source of memory leaks, but I wouldn’t attribute it to a speed up (only if it replaces GC, then yes).
The guy said he hopes the free-threaded build'll be the only one in "3.16 or 3.17", I wonder if that should apply to the JIT too or how the JIT and interpreter interact.
I continue to believe that free-threading hurts performance more than it helps and Python should abandon it.
Having to have thread safe code all over the place just for the 1% of users who need to have multi-threading in Python and can't use subinterpreters for some reason is nuts.
I don't want to go too heavy on the negatives, but what's nuts is Python going for trust-the-programmer style multithreading. The risk is that extension modules could cause a lot of crashes.
What is wrong with the Python code base that makes this so much harder to implement than seemingly all other code bases? Ruby, PHP, JS. They all seemed to add JITs in significantly less time. A Python JIT has been asked for for like 2 decades at this point.
The Python C api leaks its guts. Too much of the internal representation was made available for extensions and now basically any change would be guaranteed to break backwards compatibility with something.
Python’s backward compatibility story still isn’t great compared to things like the Go 1.x compatibility promise, and languages with formal specs like JS and C.
The Python devs still make breaking changes, they’ve just learned not to update the major version number when they do so.
I would argue that the libraries, and specifically NumPy, are the reason Python is still in the picture today.
It will be interesting to see, moving forward, what languages survive. A 15% perf increase seems nice, until you realize that you get a 10x increase porting to Rust (and the AI does it for you).
Maybe library use/popularity is somewhat related to backwards compatibility.
Python does not take backwards compatibility seriously. 2 to 3 is a big compatibility break. But things like `map(None, seq1, seq2)` also broke; such deliberate compatibility break is motivated by no more than aesthetic purity.
Python does not take backwards compatibility very seriously at all. Take a look at all the deprecated APIs.
I would say it's probably worth it to clean up all the junk that Python has accumulated... But it's definitely not very high up the list of languages in terms of backwards compatibility. In fact I'm struggling to think of other languages that are worse. Typescript probably? Certainly Go, C++ and Rust are significantly better.
For what it’s worth Ruby’s JIT took several different implementations, definitely struggled with Rails compatibility and literally used some people’s PhD research. It wasn’t a trivial affair
Some languages are much harder to compile well to machine code. Some big factors (for any languages) are things like: lack of static types and high "type uncertainty", other dynamic language features, established inefficient extension interfaces that have to be maintained, unusual threading models...
That makes sense if you're comparing with Java or C#, but not Ruby, which is way more dynamic than Python.
The more likely reason is that there simply hasn't been that big a push for it. Ruby was dog slow before the JIT and Rails was very popular, so there was a lot of demand and room for improvement. PHP was the primary language used by Facebook for a long time, and they had deep pockets. JS powers the web, so there's a huge incentive for companies like Google to make it faster. Python never really had that same level of investment, at least from a performance standpoint.
To your point, though, the C API has made certain types of optimizations extremely difficult, as the PyPy team has figured out.
Google, Dropbox, and Microsoft from what I can recall all tried to make Python fast so I don’t buy the “hasn’t seen a huge amount of investment”. For a long time Guido was opposed to any changes and that ossified the ecosystem.
But the main problem was actually that pypy was never adopted as “the JIT” mechanism. That would have made a huge difference a long time ago and made sure they evolved in lock step.
Microsoft is the one the TFA refers to cryptically when it says "the Faster CPython team lost its main sponsor in 2025".
AFAIK it was not driven by anything on the tech side. It was simply unlucky timing, the project getting in the middle of Microsoft's heavy handed push to cut everything. So much so that the people who were hired by MS to work on this found out they were laid off in a middle of a conference where they were giving talks on it.
> Python never really had that same level of investment, at least from a performance standpoint.
Or lack of incentive?
Alot of big python projects that does machine learning and data processing offloads the heavy data processing from pure python code to libraries like numpy and pandas that take advantage of C api binding to do native execution.
The simplest JIT just generates the machine code instructions that the interpreter loop would execute anyway. It’s not an extremely difficult thing, but it also doesn’t give you much benefit.
A worthwhile JIT is a fully optimizing compiler, and that is the hard part. Language semantics are much less important - dynamic languages aren’t particularly harder here, but the performance roof is obviously just much lower.
For better or for worse they have been very consistent throughout the years that they don't want want to degrade existing performance. It is why the GIL existed for so long
That's a completely separate codebase that purposefully breaks backwards compatibility in specific areas to achieve their goals. That's not the same as having a first-class JIT in CPython, the actual Python implementation that ~everyone uses.
As far as I know, PyPy doesn't support all CPython extensions, so pure Python code will probably (very likely) run fine but for other things most bets are off. I believe PyPy also only supports up to 3.11?
PyPy is limited to maintenance mode due to a lack of funding/contributors. In the past, I think a few contributors or funding is what helped push "minor" PyPy versions. It's too bad PyPy couldn't take the federal funding the PSF threw away.
A lot of Python code still leans on CPython internals, C extensions, debuggers, or odd platform behavior, so PyPy works until some dependency or tool turns that gap into a support problem.
The JIT helps on hot loops, but for mixed workloads the warmup cost and compatibility tax are enough to keep most teams on the interpreter their deps target first.
Why shouldn't the reference implementation get JIT? Just because some other implementations already have it is no reason not to. That'd be like skipping list comprehensions because they already exist in CPython.
Because the same people who made a big deal about supporting PyPy and PEP 399 when it was fashionable to do so are now told by their corporations that PyPy does not matter. CPython only moves with what is currently fashionable, employer mandated and profitable.
It is exactly what I'm referring to. I didn't say there aren't still people around. But they're far enough behind CPython that folks like NumPy are dropping support. Unless they get a substantial injection of new people and new energy, they're likely to continue falling behind.
Yes, the graphs are incomprehensible because those are not defined in the article. They turn out to be different physical machines with different architectures: https://doesjitgobrrr.com/about
So the biggest gains so far are on Windows 11 Pro of (x86_64) ~20%? Is that because Windows was bad as a baseline (promethius)? It doesn't seem like the x86_64/Linux has improved as dramatically ~5% (ripley). I'm just surprised OS has that much of an effect that can be attributed to JIT vs other OS issues.
It's hard to say whether it's Windows related since the two x86_64 machines don't just run different OSes, they also have different processors, from different manufacturers. I don't know whether an AMD Ryzen 5 3600X versus Intel i5-8400 have dramatically different features, but unlike a generic static binary for x86_64, a JIT could in principle exploit features specific to a given manufacturer.
The immediate question has been answered, but what about the names? The latter three are obvious references to the Alien universe, but what relationship does blueberry have to them?
They are all JIT on different architectures, measured relative to CPython. https://doesjitgobrrr.com/about: blueberry is aarch64 Raspberry Pi, ripley is x86_64 Intel, jones is aarch64 M3 Pro, prometheus is x86_64 AMD.
I am trying to push back. I don't care if other people think the tools make them faster, I did not sign up to be a guinea pig for my employer or their AI-corp partner.
Similarly, I don't entirely understand refcount elimination; I've seen the codegen difference, but since the codegen happens at build time, does this mean each opcode is possibly split into two (or more?) stencils, with and without removed increfs/decrefs? With so many opcodes and their specialized variants, how many stencils are there now?
When an object is passed between functions in Python, it doesn’t get copied. Instead, a reference to the object’s memory address is sent. This reference acts as a pointer to the object’s data. Think of it like a sticky note with the object’s memory address written on it. Now, imagine throwing away one sticky note every time a function that used a reference returns.
When an object has zero references, it can be freed from memory and reused. Ensuring the number of references, or the “reference count” is always accurate is therefore a big deal. It is often the source of memory leaks, but I wouldn’t attribute it to a speed up (only if it replaces GC, then yes).
I recently read an interview about implementing free-threading and getting modifications through the ecosystem to really enable it: https://alexalejandre.com/programming/interview-with-ngoldba...
The guy said he hopes the free-threaded build'll be the only one in "3.16 or 3.17", I wonder if that should apply to the JIT too or how the JIT and interpreter interact.
Having to have thread safe code all over the place just for the 1% of users who need to have multi-threading in Python and can't use subinterpreters for some reason is nuts.
Microsoft used to do this for their C runtime library.
This would be a potential case for a new major version number.
> taking backwards compatibility so seriously
Python’s backward compatibility story still isn’t great compared to things like the Go 1.x compatibility promise, and languages with formal specs like JS and C.
The Python devs still make breaking changes, they’ve just learned not to update the major version number when they do so.
It will be interesting to see, moving forward, what languages survive. A 15% perf increase seems nice, until you realize that you get a 10x increase porting to Rust (and the AI does it for you).
Maybe library use/popularity is somewhat related to backwards compatibility.
Disclaimer: I teach Python for a living.
I would say it's probably worth it to clean up all the junk that Python has accumulated... But it's definitely not very high up the list of languages in terms of backwards compatibility. In fact I'm struggling to think of other languages that are worse. Typescript probably? Certainly Go, C++ and Rust are significantly better.
The more likely reason is that there simply hasn't been that big a push for it. Ruby was dog slow before the JIT and Rails was very popular, so there was a lot of demand and room for improvement. PHP was the primary language used by Facebook for a long time, and they had deep pockets. JS powers the web, so there's a huge incentive for companies like Google to make it faster. Python never really had that same level of investment, at least from a performance standpoint.
To your point, though, the C API has made certain types of optimizations extremely difficult, as the PyPy team has figured out.
But the main problem was actually that pypy was never adopted as “the JIT” mechanism. That would have made a huge difference a long time ago and made sure they evolved in lock step.
AFAIK it was not driven by anything on the tech side. It was simply unlucky timing, the project getting in the middle of Microsoft's heavy handed push to cut everything. So much so that the people who were hired by MS to work on this found out they were laid off in a middle of a conference where they were giving talks on it.
Or lack of incentive?
Alot of big python projects that does machine learning and data processing offloads the heavy data processing from pure python code to libraries like numpy and pandas that take advantage of C api binding to do native execution.
A worthwhile JIT is a fully optimizing compiler, and that is the hard part. Language semantics are much less important - dynamic languages aren’t particularly harder here, but the performance roof is obviously just much lower.
Including simply implementing the slow parts in C, such as the high performance machine learning ecosystem that exists in Python.
A lot of Python code still leans on CPython internals, C extensions, debuggers, or odd platform behavior, so PyPy works until some dependency or tool turns that gap into a support problem.
The JIT helps on hot loops, but for mixed workloads the warmup cost and compatibility tax are enough to keep most teams on the interpreter their deps target first.
See https://github.com/numpy/numpy/issues/30416 for example. It's not being updated for compatibility with new versions of Python.
Can you please not post "facts" you just invented yourself?
So it’s not unmaintained, no. But the project is currently under resourced to keep up with the latest Python spec.
——— posted by clawdbot