Skip to content
Snippets Groups Projects
user avatar
Sami Hiltunen authored
PerRepositoryElector doesn't currently log any primary changes, which
makes it less observable compared to the other electors. This is due to
the number of primary records increasing massively compared to the other
electors due to having one primary for each repository. This makes it no
longer feasible to log all changes individually as the logs would grow with
the number of repositories on cluster.

This commit improves the situation by logging aggregated demotion and promotion
counts for each storage. This allows for an overview of how many repositories a
given storage lost its primary status due to a demotion and how many repositories
a given storage became the primary for.

The aggregation has the downside of not having the exact information of which
repositories' primaries were demoted and which storages got promoted. Ideally we'd
log the individual demotions and promotions. In the future, we could do this with
repository specific primaries as well once we switch to a lazy election approach
from the table wide failover logic. Lazy elections would allow us to perform
failovers only for repositories which need a functioning primary right now, namely
when the repository is receiving a write. That would reduce the number of failovers
to only the repositories which are being written to during the primary's outage, which
would keep the logs more manageable again.

As an intermediary solution, this should suffice to give some observability into the
failovers.
aaf776de
Name Last commit Last update