The benchmark measures startup time, and Python is notoriously bad with it. Why is the system requires a python app to be started many times per second? Can the system do localhost socket connection? Then we can have persistent Python daemon, and the speed would be many orders of magnitude faster than 22/second.
If for some reason it has to be process (3rd party system?), then a small C wrapper which does a socket connection to Python process would do the trick. This can potentially help with process management too, as it can launch python backend on demand, kinda like bazel does.
(This is all predicated on the fact that task is complex enough that it's worth setting up a dual-language system like this. I have no way to tell if that's true or not from the blog post)
That's a non-trivial statement! A statement which makes this article seem like it was written 10 years ago. Did anyone believe the C++ cited is all the C++ required to make this work? Go see the C++ as tested: https://github.com/lemire/Code-used-on-Daniel-Lemire-s-blog/...
By this I mean -- if Python isn't fast enough -- isn't that why we have Rust? Compare to my quick Rust: https://play.rust-lang.org/?version=stable&mode=debug&editio...
It kinda crushes the CLI tool use case. Especially w/r/t JSON.
It is very hard to ship reliably and consistently standalone tools to users in Python without bundling the damn interpreter in a giant blob/archive with the program itself.
The packaging ecosystem and import system of Python is a mess:
- Any PYTHONPATH entry on the target user machine might break your tool (hello bashrc). - Any globally installed python package on the system ( /lib/python3.X/site-packages ) might break your tool. - Any python package present in the user home directory might break your tool ( e.g ~/.local/lib/python3.X ). - Many python packages have binary dependencies that do not respect the ManyLinux (https://peps.python.org/pep-0513/) standard and have random ABIs issues with systems with different compilers / libc. - Some user mix in their environment Conda and system packages all together with different libc and that blow up with random errors on package import. - Add on top of that, you have the problems with the versioning of python itself.
This is honestly insane. It is a major usability pain compared to a simple "unpack and run" of a Golang, C++ or Rust binary.
I used py2many, a transpiler that can handle many simple language constructs and then modified the calls to json library to something that works.
The hard part of writing such utilities in python and getting a small, faster binary is library call translation. For this, py2many provides a framework. But someone needs to write library to library mappings.
Recently, py2many added a backend for mojo. It's very early in the game. But if mojo provides a python compatible stdlib, it becomes a lot more interesting as a backend.
Code here:
Also don't forget to enable hardened runtime flags, for having those bounds checks. Available in any sensible compiler.
As for raw C, if that is your jam, better use Go instead. Even the language creators moved on.
This could be an SQLiter :memory: opportunity. Keep everything in tables and blow off the parsing.
Then the AI weenies can go ahead and predict the answers in advance.
In regard to the choice of language, Python is not the best tool for most jobs, but it is a tool that always let's you do the job. If you've found a better tool for a certain job, good for you, enjoy it. Python always will be there when you need the next job just done.