Skip to content
  • Will Chandler's avatar
    nodes: Set connection backoff MaxDelay to 1 second · aa235855
    Will Chandler authored
    gRPC clients use an exponential backoff strategy[0] for re-establishing
    connections, meaning that the longer a connection has been in a bad
    state the greater the delay before the client will make its next
    connection attempt. This is useful in scenarios where a very large
    number of clients could trigger a thundering herd effect on a server as
    it returns to service.
    
    In a Gitaly Cluster, this means that in cases where a Gitaly node is
    down for some time and a large connection backoff has been set,
    Praefect may wait to try to connect for up to 120 seconds. This causes
    Gitaly nodes to remain unavailable longer than necessary.
    
    The issues addressed gRPC's default exponential backoff behavior do not
    apply in this scenario as we will always have a small number of clients
    (Praefect nodes), and the volume of traffic from healthchecks is dwarfed
    by normal production load.
    
    To resolve this, set the maximum backoff delay to one second.
    
    [0] https://github.com/grpc/grpc/blob/v1.51.0/doc/connection-backoff.md
    
    Changelog: fixed
    aa235855