Scheduling threads like Thomas Jefferson

js8

I am not sure why would you want to do this. If I take an example of the bottling pipeline, it seems to me that the best way to process the bottles on an SMP system is to have multiple threads where each thread is doing all the work for a single bottle, sequentially.

Maybe, if the processes at each stage are I/O-bound, then it might make sense. But if they are CPU-bound, then I am not sure this way of pipelining helps - you're moving data between different CPUs, destroying cache locality.

shermantanktop

I’ve done something like this in large service ecosystems. It has two very unfortunate properties:

- if the actual performance deviates from the predicted (scored) performance, the system easily enters a degenerate bottlenecked state.

- and if that happens, the many internal queues make diagnosis, root causing, and confidence in a fix all exponentially worse.

Now you might assert that this will be applied in situations where scores are accurate and brown failures do not occur. Those aren’t the situations I deal with.

mindcrime

As I came up with this idea of scheduling described above, I bounced it off my friend Daniel Gustafsson who immediately replied “this reminds me a bit of Jefferson’s method” (of allocating seats in parliaments).

Sounds like the author of this would be interested in Queueing Theory[1] (in the sense of being interested in mathematical formalisms to explore this stuff). Apportionment[2] is also studied as a very specific thing unto itself.

There's a huge mass of published research "out there" dealing with queueing and scheduling. Not all of it pertains to "thread scheduling" but there's quite a bit of conceptual overlap between something like thread scheduling and job floor scheduling. And some of the stuff on apportionment likewise probably relates at least by analogy.

[1]: https://en.wikipedia.org/wiki/Queueing_theory

[2]: https://en.wikipedia.org/wiki/Mathematics_of_apportionment

jerf

This also contains a hidden assumption that reallocation from one task to another is very, very expensive, so it's necessary to name the distribution of resources well in advance. This is clearly true for a physical production line. But if you are in the very common circumstance in the computer world where the switching time is significantly less than the amount of progress than can be made on a given task in some time, then the optimum scenario is just to do work as it becomes available, and in many cases, almost any scheduling algorithm that isn't deliberately constructed for maximum pathology will ensure that significant progress is made.

That's why in the vast majority of circumstances you'll be running many things on many CPUs, you just throw all the work at the CPUs and let the chips fall where they may. Deliberate scheduling is a tool, but an unusual one, especially as many times the correct solution to tight scheduling situations is to throw more resources at it anyhow. (Trying to eke out wins by changing your scheduling implies that you're also in a situation where slight increases in workloads will back the entire system up no matter what you do.)

limit499karma

I do not believe 'pipelining' and parallelism are interchangeable models and conflating them is a mistake. For example, consider a parallel processing system that in fact works strictly using a 'pipeline' of length 0, that is there is a hand-of from input to processing stage and processing of that input. And you can have n such parallel processing stages and voila 'parallelism'.

Pipelines are strictly processing stages where the 'production of the input' and processing on the inputs are not synchronized. For example, one sends n requests to via a pipeline protocol to a remote server without waiting for acks for each input from the server. There may only be one such processing pipeline (and thus no parallelism) while there is pipelining.

pwdisswordfishz

"Those who sacrifice freedom for memory safety deserve neither" or something?

bdcravens

Jefferson said we should rewrite the constitution every 19 years, so I assumed that means if threads haven't completed in some amount of time, to just blow them all away and start over lol