Skip to content
Snippets Groups Projects
Commit 81916f5d authored by Patrick Steinhardt's avatar Patrick Steinhardt
Browse files

limithandler: Fix queueing mechanism in the concurrency limiter

The concurrency limiter has two different mechanisms:

    1. Callers first get put into a queue that is global across all
       limiting keys. This queue is depth- and time-limited, which means
       that callers will get rejected if it is full and will be evicted
       when they took too long to acquire the concurrency token.

    2. The concurrency limit itself that will only allow a set number of
       callers at the same time. This functionality is per limiting key.

While the intent is usually to concurrency-limit either by user or by
repository, this is sabotaged by the fact that the queue is key-agnostic
and thus global. So even if almost all calls would apply to a single
repository only, if the queue is enabled and full then it would become
impossible to invoke the function for any other repository now. As a
result the queue is rather useless as a concurrency-limiting primitive.

But there is another big design flaw: as callers need to be in the queue
to obtain the concurrency-limiting token, and the number of callers in
the queue is strictly limited, we essentially have a global concurrency
limit to obtain the concurrency-limiting tokens. Suppose you have two
calls to a function that has a maximum queue depth of 1 and two calls to
this function at the same time. Even if the concurrency limit would now
theoretically allow for both functions to run at the same time, there is
a race window where both callers might try to enter the queue at the
same point in time. If this race is lost, then one of both callers will
be rejected due to the queue being full while the other one is trying to
obtain the concurrency token. This issue in fact surfaces in our tests,
where the `TestStreamLimitHandler` test is frequently failing because
the race is often lost.

This second design flaw cannot easily be fixed while the queue remains
global: we need to remain in the queue when trying to acquire the
concurrency token, or otherwise it wouldn't really be limiting anything
at all.

Convert the queue to be per-key so that each resource identified by a
key essentially has its own queue. This fixes both issues:

    - If the queue is full then we only limit access to the specific
      resource for which it is full. This makes the queueing mechanism
      useful again.

    - We can now change the queueing logic to allow as many callers into
      the queue as the concurrency limit would allow _plus_ the allowed
      depth of the queue itself. This fixes the race described aboved.

Changelog: fixed
parent 54498a8c
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment