It is never too late to write your own C/C++ command-line utilities

gostsamo

The very idea of checking a json every second is a problematic one. But if you need to do it once a second, does it really matter that you could do it in 1ms or 10ms? And if latency is that important, why don't put a bit more effort for better design instead of spawning a new process every time for something that could be in the main code? This looks to me like a bad design where the tools take the blame.

In regard to the choice of language, Python is not the best tool for most jobs, but it is a tool that always let's you do the job. If you've found a better tool for a certain job, good for you, enjoy it. Python always will be there when you need the next job just done.

theamk

First, I assume the script was more complex than shown, and "jq" was not sufficient for some reason - because "jq" on such small document would likely be comparable speed to C++ or even faster, while requiring no development effort at all.

The benchmark measures startup time, and Python is notoriously bad with it. Why is the system requires a python app to be started many times per second? Can the system do localhost socket connection? Then we can have persistent Python daemon, and the speed would be many orders of magnitude faster than 22/second.

If for some reason it has to be process (3rd party system?), then a small C wrapper which does a socket connection to Python process would do the trick. This can potentially help with process management too, as it can launch python backend on demand, kinda like bazel does.

(This is all predicated on the fact that task is complex enough that it's worth setting up a dual-language system like this. I have no way to tell if that's true or not from the blog post)

mustache_kimono

> Migrating a Python script to C++ could well be worthwhile in some instances. However, it’s not all sunshine and rainbows with C++. Diving into C++ requires more mental effort.

That's a non-trivial statement! A statement which makes this article seem like it was written 10 years ago. Did anyone believe the C++ cited is all the C++ required to make this work? Go see the C++ as tested: https://github.com/lemire/Code-used-on-Daniel-Lemire-s-blog/...

By this I mean -- if Python isn't fast enough -- isn't that why we have Rust? Compare to my quick Rust: https://play.rust-lang.org/?version=stable&mode=debug&editio...

It kinda crushes the CLI tool use case. Especially w/r/t JSON.

adev_

Currently the startup time is often the last of the problem when using python for CLI tools. The python ecosystem is terrible for CLIs.

It is very hard to ship reliably and consistently standalone tools to users in Python without bundling the damn interpreter in a giant blob/archive with the program itself.

The packaging ecosystem and import system of Python is a mess:

- Any PYTHONPATH entry on the target user machine might break your tool (hello bashrc). - Any globally installed python package on the system ( /lib/python3.X/site-packages ) might break your tool. - Any python package present in the user home directory might break your tool ( e.g ~/.local/lib/python3.X ). - Many python packages have binary dependencies that do not respect the ManyLinux (https://peps.python.org/pep-0513/) standard and have random ABIs issues with systems with different compilers / libc. - Some user mix in their environment Conda and system packages all together with different libc and that blow up with random errors on package import. - Add on top of that, you have the problems with the versioning of python itself.

This is honestly insane. It is a major usability pain compared to a simple "unpack and run" of a Golang, C++ or Rust binary.

kitd

Someone wanting to migrate Python scripts to compiled form for performance reasons might want to look at Nim. Cpp-like speeds with less cognitive load.

skydhash

It's all about tradeoffs. Do we lug the python interpreter around and be flexible and portable. Or do we go for that sweet performance by directly hooking to syscalls and/or the standard library? I love lean and fast software, and if the refactor surface is small and the solution is known, go for it.

adsharma

The speedup is mostly because of using a faster library, not because of the language.

I used py2many, a transpiler that can handle many simple language constructs and then modified the calls to json library to something that works.

The hard part of writing such utilities in python and getting a small, faster binary is library call translation. For this, py2many provides a framework. But someone needs to write library to library mappings.

Recently, py2many added a backend for mojo. It's very early in the game. But if mojo provides a python compatible stdlib, it becomes a lot more interesting as a backend.

Code here:

https://github.com/adsharma/json_parsing_langbench

pjmlp

Modern C++, alongside vcpkg and conan, make this relatively easy.

Also don't forget to enable hardened runtime flags, for having those bounds checks. Available in any sensible compiler.

As for raw C, if that is your jam, better use Go instead. Even the language creators moved on.

deadfece

Solution building: I don't really see anything in that use case that would prevent it from using inotify/incron, so I would probably evaluate that side first.

bluejekyll

Not to be that guy, but if you’re doing a new CLI and you want something as fast as C++ but closer to the out-of-the-box cross platform and general use of Python, Rust and the Clap library are a really good option. I can whip out a bug free program with that combination in 15 minutes that works on macOS, Linux, and Windows.

ingen0s

This is the way

smitty1e

If the data are really that mutable, why not go with a service doing the query in RAM and then back it up to disk every second?

This could be an SQLiter :memory: opportunity. Keep everything in tables and blow off the parsing.

Then the AI weenies can go ahead and predict the answers in advance.