Commits · pks-gitaly-fix-unset-housekeeping-manager · gitlab-org / Gitaly

This project is mirrored from https://gitlab.com/gitlab-org/gitaly.git. Pull mirroring updated 12 minutes ago.

Feb 21, 2022

gitaly: Fix missing housekeeping manager dependency · 5585b14c

While we set up the housekeeping manager when starting Gitaly, we don't
inject it into the service dependencies. As a result, Gitaly will not
have it set up during normal runtime operations and will thus trigger a
panic whenever trying to access it. This went unnoticed because our test
setup for the Gitaly server is different from the setup we use when
running tests.

Fix this bug by injecting the housekeeping manager as expected

Changelog: fixed

5585b14c

Merge branch 'pks-git-maintenance-decouple-from-server' into 'master' · 40527b7e

Patrick Steinhardt authored 3 years ago

housekeeping: Create central stateful architecture which handles repository optimizations

See merge request gitlab-org/gitaly!4348

40527b7e

Feb 18, 2022

Merge branch 'sh-disable-health-check-logging-default' into 'master' · b72c1fa0
John Cai authored 3 years ago
```
log: Disable gRPC HealthCheck message logging

Closes #3428

See merge request gitlab-org/gitaly!4363
```
b72c1fa0
Merge branch 'support_rqst_issue_tempalte' into 'master' · 731b9bd8
Sami Hiltunen authored 3 years ago
```
Create support request issue template

See merge request gitlab-org/gitaly!3836
```
731b9bd8

log: Disable gRPC HealthCheck message logging · 44abe5c8

Stan Hu authored 3 years ago

gRPC HealthCheck messages are quite noisy, dominate the log volume in
Gitaly, and usually are not that useful. We now disable them by
default and add documentation on how to enable them.

Closes https://gitlab.com/gitlab-org/gitaly/-/issues/3428

Changelog: changed

44abe5c8

Update per review findings · fe5b6ceb
Sami Hiltunen authored 3 years ago

fe5b6ceb

Feb 17, 2022

Merge branch 'toon-replication-cleanup-queue-again-again' into 'master' · 5f39869b
Sami Hiltunen authored 3 years ago
```
datastore: Clean completed & dead replication jobs

Closes #3665

See merge request gitlab-org/gitaly!4353
```
5f39869b

housekeeping: Increase observability of repository optimizations · bea71220

Patrick Steinhardt authored 3 years ago

It is currently hard to observe which optimizations were executed for
repositories and how long they typically take. This commit introduces
two new metrics to better track this information, which also allows us
to easily write tests for OptimizeRepository.

bea71220

Merge branch 'pks-git-fetch-optim-with-commit-graph' into 'master' · 4ac6a590

John Cai authored 3 years ago

git: Backport patches to speed up git-fetch(1) in repos with many refs

Closes #4050

See merge request gitlab-org/gitaly!4355

4ac6a590

Feb 16, 2022

git: Backport patches to speed up git-fetch(1) in repos with many refs · 3ad8de83

Patrick Steinhardt authored 3 years ago

We have upstreamed two patches for git-fetch(1) which speed up mirror
fetches in repositories with hundreds of millions of references. The
first patch makes better use of the commit-graph when computing common
objects, and the second patch will cause us to skip computing the output
width when fetching with the `--quiet` flag.

The patches were tested with our notorious problem repo `www-gitlab-com`
with speedups of about 25% when doing mirror-fetches.

Backport them into our own Git version now that they have been merged
via 2b331293fb (Merge branch 'ps/fetch-optim-with-commit-graph' into
next, 2022-02-14).

Changelog: performance

3ad8de83

housekeeping: Implement repository housekeeping via the manager · 84ecb212

Patrick Steinhardt authored 3 years ago

Now that we have the housekeeping manager globally injected as required,
move the logic to clean up repositories and optimize them to this
manager.

84ecb212

housekeeping: Create new manager to host stateful logic · fd5d730b

Patrick Steinhardt authored 3 years ago

We're eventually going to move the housekeeping package to keep state
such that we can improve the heuristics used for optimizing repos. This
will allow us to e.g. take into account the last time a repository was
optimized to not optimize repositories too often, or avoid packing the
same repository twice concurrently.

Implment a new housekeeping manager structure that will be the home of
this functionality and inject it as required. This manager isn't yet
doing anything.

fd5d730b

repository: Move optimization of repositories into housekeeping package · ceeed993

Patrick Steinhardt authored 3 years ago

We want to allow for OptimizeRepository to be called without invoking an
RPC in order to allow us to call it in more contexts. Now that all its
dependent parts are part of the housekeeping package this commit moves
the logic of OptimizeRepository in there, too.

ceeed993

repository: Move object repacking into housekeeping package · 8f667006

Patrick Steinhardt authored 3 years ago

We want to allow for OptimizeRepository to be called without invoking an
RPC in order to allow us to call it in more contexts. As a preparatory
step, we thus have to move all functionality which is invoked by it into
a package that is independent of gRPC services.

Move the logic handling repacking of objects into the housekeeping
package as part of this move.

8f667006

repository: Move writing of commit-graph into housekeeping package · 28292e0b

Patrick Steinhardt authored 3 years ago

We want to allow for OptimizeRepository to be called without invoking an
RPC in order to allow us to call it in more contexts. As a preparatory
step, we thus have to move all functionality which is invoked by it into
a package that is independent of gRPC services.

Move the writing of the commit-graph into the housekeeping package as
part of this move.

28292e0b

repository: Move cleanup of worktrees into housekeeping package · bfafa50e

Patrick Steinhardt authored 3 years ago

We want to allow for OptimizeRepository to be called without invoking an
RPC in order to allow us to call it in more contexts. As a preparatory
step, we thus have to move all functionality which is invoked by it into
a package that is independent of gRPC services.

Move the cleanup of worktrees into the housekeeping package as part of
this move.

bfafa50e

repository: Decouple OptimizeRepository implementation from server · 3a7c92ca

Patrick Steinhardt authored 3 years ago

We're about to move handling of all repository implementations into the
housekeeping package. Prepare for this by making OptimizeRepository not
depend on the RepositoryService server anymore.

3a7c92ca

Feb 15, 2022

Merge branch 'pks-tx-fix-grpc-error-comparison' into 'master' · 5d557a52

James Fargher authored 3 years ago

coordinator: Fix error comparison causing excessive replication jobs

Closes #4045

See merge request gitlab-org/gitaly!4349

5d557a52

Merge branch 'getrepopath-propagate-notfound' into 'master' · ac32a7f0
Pavlo Strokov authored 3 years ago
```
Propagate NotFound error returned by GetRepoPath

See merge request gitlab-org/gitaly!4338
```
ac32a7f0

datastore: Clean completed & dead replication jobs · 507f5579

Toon Claes authored 3 years ago

Previous attempts [1] & [2] were made to avoid completed and dead jobs
being left over in the replication queue. But only in [3] there was made
sure no new jobs were generated.

So now no more of these stale are created, we can run a migration that
wipes all stale records from the database. This change in fact repeats
the `DELETE` query from the migration added in [2].

1. 8f8ae302 (Replication job acknowledge removes 'completed' and 'dead'
   events., 2020-08-05)
2. 2d3fc806 (Replication job acknowledge removes 'completed' and 'dead'
   events., 2020-09-03)
3. 8c4da135 (replication: Remove 'dead' stale jobs., 2021-10-20)

Changelog: performance
Issue: https://gitlab.com/gitlab-org/gitaly/-/issues/3665

507f5579

Propagate NotFound error returned by GetRepoPath · c7cdd439

Igor Wiedler authored 3 years ago

We are currently wrapping some of the errors returned by GetRepoPath,
which makes them lose the gRPC NotFound code.

This leads to a gRPC error code of Unknown, which counts towards our
error SLO, paging the on-call.

This patch removes the wrapping, ensuring the error code is propagated
to the client, and accounted for appropriately in logs and metrics.

Changelog: fixed

c7cdd439

Update changelog for 14.7.3 · 0ff097c4
GitLab Release Tools Bot authored 3 years ago
```
[ci skip]
```
0ff097c4
Merge branch 'jc-fix-limiter-test' into 'master' · a67a6fdd
Patrick Steinhardt authored 3 years ago
```
limithandler: fix flaky test TestLimitHandlerMetrics

See merge request gitlab-org/gitaly!4350
```
a67a6fdd

Feb 14, 2022

limithandler: fix flaky test TestLimitHandlerMetrics · b581f061

John Cai authored 3 years ago

In this test, we didn't wait for both requests to return. Thus the test
was ending and the grpc server was being shut down before the request
could return.

Fix this by putting both requests onto the request channel and waiting
for both requests to be received before exiting the test.

b581f061

Merge branch 'smh-remove-metadata-hack' into 'master' · 8c9abf20

John Cai authored 3 years ago

Remove metadata creation hack from Praefect

Closes #4019

See merge request gitlab-org/gitaly!4337

8c9abf20

coordinator: Fix error comparison causing excessive replication jobs · a66be554

Patrick Steinhardt authored 3 years ago

When determining whether nodes need replication jobs or not we also take
into account the error status of a node: if the node returned an error
that is different from the error returned by the primary node we create
a replication job. The underlying assumption is that if two nodes behave
the same, they should also run into the same kind of error. And if they
returned different errors, then they likely did different things and may
have diverged.

This comparison is flawed though: we typically handle gRPC-style errors
in this context, and those cannot be directly compared with each other.
As a result, even in the case where two nodes returned the same error
message and code we label them as different and thus create replication
jobs.

Fix this bug by manually comparing error code and message in case we've
got a gRPC error. Note that we do not do this for normal Go errors: it
is unexpected in the first place to get anything but a gRPC error, so we
treat these as "weird" state and err on the side of caution.

Changelog: fixed

a66be554

coordinator: Add tests for gRPC errors to determine outdated nodes · 578bc6ad

Patrick Steinhardt authored 3 years ago

The tests which determine whether `getUpdatedAndOutdatedNodes()` works
correctly is using standard Go errors to verify whether comparison of
errors works as expected. This is not the typical kind of errors we'd
get there though, but instead we'd usually get gRPC errors which cannot
be directly compared with each other.

Add some testcases for gRPC-style errors and record their current
behaviour.

578bc6ad

Merge branch 'wc-praefect-migrat-print' into 'master' · 4360f305
Toon Claes authored 3 years ago
```
migrate: Print execution time of migrations

See merge request gitlab-org/gitaly!4308
```
4360f305
Merge branch 'sh-update-rails-6.1.4.6' into 'master' · 737f19e2
Toon Claes authored 3 years ago
```
Update actionpack and related Ruby gems

See merge request gitlab-org/gitaly!4347
```
737f19e2
Merge branch 'jc-metrics-for-queue-limiting' into 'master' · f5e510ab
Sami Hiltunen authored 3 years ago
```
limithandler: Add metrics for queue limiting

See merge request gitlab-org/gitaly!4335
```
f5e510ab
Merge branch 'pks-prune-unreachable-objects' into 'master' · 9b795d58
Patrick Steinhardt authored 3 years ago
```
repository: Add new RPC to prune unreachable objects

Closes #4041

See merge request gitlab-org/gitaly!4346
```
9b795d58

Update actionpack and related Ruby gems · 87a94651

Stan Hu authored 3 years ago

This fixes
[CVE-2022-23633](https://github.com/advisories/GHSA-wh98-p28r-vrc9),
but this is likely not an issue with Gitaly since Gitaly doesn't
serve HTTP requests with Rails.

* Diff: https://github.com/rails/rails/compare/v6.1.4.4...v6.1.4.6

Relates to https://gitlab.com/gitlab-org/gitlab/-/issues/352659

Changelog: changed

87a94651

praefect: Implement replication for PruneUnreachableObjects · 913345ca

Patrick Steinhardt authored 3 years ago

Mutating maintenance-style RPCs have special handling in the coordinator
and replicator. Implement it for the new PruneUnreachableObjects RPC.

913345ca

repository: Add new RPC to prune unreachable objects · 67c4cdb5

Patrick Steinhardt authored 3 years ago

When rewriting the repository's history with the BFG Repo-Cleaner, then
we potentially accumulate lots and lots of unreachable objects in the
repository's object database. By default, we'd clean up those objects
after two weeks, which is a rather long time to sit on such a huge
number of objects. To fix this usecase we have thus gained a `prune`
parameter in our GarbageCollect RPC call: if set, then we prune
unreachable objects if they haven't been accessed during the last 30
mintues.

The problem with this though is that GarbageCollect does a lot more than
only pruning objects: it may end up packing objects or objects, writing
commit-graphs, write bitmaps or some other things. All of these are
things we want to control ourselves though, but we instead let git-gc(1)
dictate how the repository is packed.

We're thus about to deprecate all RPCs which directly influence how a
repository is packed in favor of OptimizeRepository: this is our "black
box" RPC that, from the viewpoint of the caller, does something with the
repository to make it great again. And this is by design: callers should
not control the way Gitaly handles repository maintenance.

This highlights the need though for a new RPC call which _only_ prunes
objects which have become unreachable to disentangle it from repository
maintenance tasks. This commit thus introduces PruneUnreachableObjects,
a new RPC which does exactly that: any unreachable loose object that
hasn't been touched in the last 30 minutes is going to be pruned.

Note that to make this work correctly, the caller has to do two RPC
calls: the first RPC call to OptimizeRepository is required to unpack
unreachable loose objects, and 30 minutes later they may prune these
objects with a second call to PruneUnreachableObjects.

This is no different from right now, even though it's hidden away and
(naturally) used incorrectly by Rails: GarbageCollect would need to be
called twice, first to explode unreachable objects into loose objects
and then second with `prune=true` to prune them after half an hour. This
is because Git will only ever consider loose objects for pruning, and
the grace period is determined by inspecting its access time. So the way
Rails does this is broken, and the new RPC call doesn't change that
fact. This is a separate story though and nothing we can fix in Gitaly:
we must retain the grace period to avoid repository corruption.

Changelog: added

67c4cdb5

Feb 11, 2022

protoregistry: Add missing testcase for OptimizeRepository · 972288fe
Patrick Steinhardt authored 3 years ago
```
The OptimizeRepository RPC is missing in our protoregistry tests. Add
it.
```
972288fe

protoregistry: Sort map of RPCs alphabetically · e113ad2b

Patrick Steinhardt authored 3 years ago

The protoregistry tests have a map of RPCs and their expected type.
This map isn't sorted though, which makes it hard to spot missing RPCs
or find a proper place for new ones. Let's sort those alphabetically.

e113ad2b

Merge branch 'sh-upgrade-ci-debian-bullseye' into 'master' · d3ab199f
Toon Claes authored 3 years ago
```
ci: Upgrade CI images to Debian bullseye

See merge request gitlab-org/gitaly!4340
```
d3ab199f

Merge branch 'git2go_new_err_field' into 'master' · 5ec0b26b

Sami Hiltunen authored 3 years ago

Set and add error fields for git2go protocol which were missed

See merge request gitlab-org/gitaly!4342

5ec0b26b

ci: Upgrade CI images to Debian bullseye · 7a62bd6a

Stan Hu authored 3 years ago

Debian bullseye replaces buster as the latest stable version. We have
upgraded Cloud Native GitLab to bullseye
(https://gitlab.com/gitlab-org/build/CNG/-/merge_requests/888), so
let's also upgrade CI to match.

Since upgrading system libraries will likely break C extensions, we
need to tie the cache key to the Debian version to ensure gems get
recompiled for this platform.

In addition, previously the PostgreSQL client used was whatever system
default. In Debian buster, this was PostgreSQL v11. In Debian
bullseye, the default is now PostgreSQL 13. For now, we keep version
11 to avoid making changes in the Praefect `structure.sql` file.

Changelog: changed

7a62bd6a

gitaly-git2go: Add generic error field for resolve conflicts · 51123593
James Fargher authored 3 years ago
```
Adding this field was missed in the original change 2af0319c
```
51123593