Commits · pks-housekeeping-rewrite-commit-graph-on-prune · gitlab-org / Gitaly

This project is mirrored from https://gitlab.com/gitlab-org/gitaly.git. Pull mirroring updated 8 minutes ago.

Jul 08, 2022

repository: Fix commit-graphs referencing pruned commits after PruneUnreachableObjects · 0c138e47

When pruning objects we need to rewrite our commit-graphs so that they
don't reference any commits that have just been pruned. While this is
not typically an issue, git-fsck(1) will report any such inconsistencies
in the commit-graph. Furthermore, Git may return an error when being
asked to parse such a missing commit directly.

Fix `PruneUnreachableObjects()` to do so.

Changelog: fixed

0c138e47

repository: Demonstrate commit-graphs referencing pruned commits in PruneUnreachableObjects · 671b3ef5

Patrick Steinhardt authored 3 years ago

Add a test to demonstrate a shortcoming we have with commit-graphs that
reference pruned commits: even though we're pruning commits, we don't
update the commit-graphs accordingly to drop references to any such
pruned commits.

671b3ef5

repository: Fix commit-graphs referencing pruned commits after GC · c83ab586

Patrick Steinhardt authored 3 years ago

When pruning objects we need to rewrite our commit-graphs so that they
don't reference any commits that have just been pruned. While this is
not typically an issue, git-fsck(1) will report any such inconsistencies
in the commit-graph. Furthermore, Git may return an error when being
asked to parse such a missing commit directly.

Fix `GarbageCollect()` to do so.

Changelog: fixed

c83ab586

repository: Demonstrate commit-graphs referencing pruned commits in GC · 136ba436

Patrick Steinhardt authored 3 years ago

Add a test to demonstrate a shortcoming we have with commit-graphs that
reference pruned commits: even though we're pruning commits, we don't
update the commit-graphs accordingly to drop references to any such
pruned commits.

136ba436

repository: Refactor test to test GarbageCollect with locks · 6489404c

Patrick Steinhardt authored 3 years ago

Refactor the test that verifies behaviour of the GarbageCollect RPC when
there are lockfiles around to match modern best practices.

6489404c

housekeeping: Fix commit-graphs referencing pruned commits · 0b54d0a3

Patrick Steinhardt authored 3 years ago

When pruning objects we need to rewrite our commit-graphs so that they
don't reference any commits that have just been pruned. While this is
not typically an issue, git-fsck(1) will report any such inconsistencies
in the commit-graph. Furthermore, Git may return an error when being
asked to parse such a missing commit directly.

Fix `OptimizeRepository()` to do so when it's pruned objects.

Changelog: fixed

0b54d0a3

housekeeping: Rewrite commit-graphs at a later point · 40b48c5b

Patrick Steinhardt authored 3 years ago

Move writing of commit-graphs to happen after pruning of objects. This
is done so that we can start to use the information whether we pruned
any objects or not to decide whether to rewrite the commit-graph chain.

By moving the logic after packing refs we also have the benefit that we
have less loose refs to count in our heuristics.

40b48c5b

housekeeping: Demonstrate commit-graphs referencing pruned commits · b20cbe79

Patrick Steinhardt authored 3 years ago

Add a test to demonstrate a shortcoming we have with commit-graphs that
reference pruned commits: even though we're pruning commits, we don't
update the commit-graphs accordingly to drop references to any such
pruned commits.

b20cbe79

housekeeping: Provide finer-grained metrics for rewriting commit-graphs · 57a6b9f5

Patrick Steinhardt authored 3 years ago

Now that we have information about whether we did a full rewrite of the
commit-graph chain or only an incremental update we can provide better
metrics in `OptimizeRepository()` to report what we have been doing.

57a6b9f5

housekeeping: Introduce configuration for WriteCommitGraph · 5c72d6ed

Patrick Steinhardt authored 3 years ago

Make the behaviour of whether we rewrite the commit-graph or not
controllable by callers and adjust callers accordingly. This new
configuration will be used to rewrite the commit-graph in more cases
than only missing bitmaps, as it is done right now.

5c72d6ed

housekeeping: Improve heuristics for writing commit-graphs · d20e3237

Patrick Steinhardt authored 3 years ago

Now that writing commit-graphs has been disentangled from repacking
objects we can iterate on the heuristics we use to determine whether we
need to write commit-graphs:

    - We stop writing commit-graphs in case there are no references.
      This wouldn't make any sense anyway given that we use only write
      commit-graphs for reachable commits, of which there aren't any in
      case there are no references.

    - We stop repacking objects based on whether the repository has
      bloom filters or not. This logic was previously mixed into the
      heuristics for `RepackObjects()`, but that has only be a historic
      artifact because that function handled both repacks and rewrites
      of commit-graphs. Instead, we only rewrite commit-graphs based on
      that information.

Ultimately, the end result should be that we repack objects less often,
but keep rewriting commit-graphs in the same circumstances as we did
before.

Add tests to verify this new behaviour and adjust existing tests for
repacking of objects.

d20e3237

housekeeping: Split commit-graph writing out of object repacks · 765e09a8

Patrick Steinhardt authored 3 years ago

Writing commit-graphs and repacking objects is currently both done in
`RepackObjects()`. Back when the code was introduced this was done to
keep required code changes at bay by continuing to always rewrite the
commit-graph when we repack objects. We're about to iterate on the
strategy we use to write commit-graphs though, which requires us to
start doing so independently of whether we did or didn't repack objects.

Pull out the logic to write commit-graphs from `RepackObjects()` and
adjust all callsites of the latter to manually write commit-graphs. This
has merit on its own given that we can now properly report metrics for
writing the commit-graph on its own.

765e09a8

housekeeping: Extract function to count references · f97b7404

Patrick Steinhardt authored 3 years ago

We're about to add a second callsite that requires us to count loose and
packed references in a repository as `packRefsIfNeeded()` already knows
to do. Extract the logic into its own function to make it reusable.

f97b7404

repository: Improve error reporting for Repack RPCs · 1a682bd0

Patrick Steinhardt authored 3 years ago

The errors reported in our Repack RPCs may not be wrapped and are
lacking breadcrumbs. Fix both issues by using `helper.ErrInternalf()`.

1a682bd0

helper: Fix error messages for wrapped gRPC errors · 8904860c

Patrick Steinhardt authored 3 years ago

Our `Errorf()`-style error-wrapping functions know to propagate gRPC
error codes from any wrapped `status.Status` error. While this in
general works alright and does what it's intended to do, one thing that
doesn't work is that the resulting error message contains a reference to
the error code. This makes for hard-to-read error messages.

Fix this by wrapping all gRPC errors into a new error message wrapper.
Instead of returning the string returned by `status.Error()`, which
contains the reference to the error code, we only return the error
message itself.

8904860c

helper: Add test to verify formatting with nested gRPC errors · c968a753
Patrick Steinhardt authored 3 years ago
```
Add a testcase to verify that formatting of gRPC errors works as
expected with nested errors.
```
c968a753

helper: Use separate tests for `Errorf()` with "%w" and "%v" · 4be1ce96

Patrick Steinhardt authored 3 years ago

The conditional testing we have for `Errorf()`-style error wrapping with
either `"%w"` or `"%v"` formatters makes tests hard to read. Refactor
them to instead use separate subtests. While this increases duplication,
it does make the tests a lot more readable.

4be1ce96

helper: Inline input and expectations for `Errorf()` tests · f456804b

Patrick Steinhardt authored 3 years ago

The tests for `Errorf()`-style error wrapping are quite hard to
understand because of the different input and output variables. It may
just be me, but I think it's hard to understand what the difference
between `input`, `inputGRPC`, `inputGRPCFmt` and `inputGRPCCode` is at a
detailed level.

Let's inline these variables to make the testcases mostly self-contained
and thus easier to understand.

f456804b

helper: Rename variable that holds expected code in `Errorf()` tests · 0e27645d

Patrick Steinhardt authored 3 years ago

Rename the variable that holds the expected code in `Errorf()`-style
tests. This makes it easier to spot and understand what's going on.

0e27645d

helper: Convert gRPC `Errorf()` tests to use more subtests · c57640ae

Patrick Steinhardt authored 3 years ago

The tests for `Errorf()`-style gRPC error wrapping check various
different cases, but do so in-line without the use of subtests. This
makes it hard to grasp the scope and that we are testing different
functionality.

Use subtests to clearly highlight the scope.

c57640ae

Jul 07, 2022

Merge branch 'pks-ci-fix-unprivileged-building-of-binaries-go-1.18' into 'master' · 88f78ed8
James Fargher authored 3 years ago
```
testcfg: Fix building binaries as unprivileged user with Go 1.18+

See merge request gitlab-org/gitaly!4689
```
88f78ed8
Merge branch 'pks-remove-raw-feature-flags' into 'master' · 9aac5e46
Will Chandler authored 3 years ago
```
featureflag: Remove raw feature flags

See merge request gitlab-org/gitaly!4675
```
9aac5e46
Merge branch 'wc-debug-makefile' into 'master' · 888f9cc2
John Cai authored 3 years ago
```
Makefile: Add target to debug go tests in Delve

See merge request gitlab-org/gitaly!4662
```
888f9cc2
Merge branch 'pks-gitaly-upgrade-build-image' into 'master' · e4a8d5e2
Toon Claes authored 3 years ago
```
ci: Upgrade build images to most recent ones

See merge request gitlab-org/gitaly!4688
```
e4a8d5e2

praefect-schema: Update the Praefect schema · 7bdb3780

Patrick Steinhardt authored 3 years ago

We have changed the Postgres client version due to our update to a more
recent GitLab Build Image, which caused some minor changes in the
Praefect schema. Update the schema to match.

7bdb3780

testcfg: Fix building binaries as unprivileged user with Go 1.18+ · 61331af0

Patrick Steinhardt authored 3 years ago

Our CI pipelines execute tests as unprivileged user so that we can
verify that all tests pass without any additional capabilities and that
tests don't write any data into the source tree. This broke for our
nightly jobs though with the recent update to Go 1.18 because we now
fail to build the auxiliary Go binaries.

The root cause is a combination of two things:

    - Go 1.18 started to query the repository for VCS information so
      that it can embed that information into the resulting Go binaries.

    - Git has addressed CVE-2022-24765 and won't open repositories by
      default anymore that aren't owned by the current user.

So when testing with recent Go and Git versions as unprivileged user
then Go will try to extract the VCS information, but Git will refuse to
operate.

Fix this by declaring the source directory as "safe". While this is not
necessarily the case, it doesn't compromise our security any more than
before: if an adversary was able to write to the `.git/config` file,
then the very same adversary would also able to just touch up the source
code of Gitaly itself. So that adversary could obtain arbitrary code
execution by just changing whatever is executed as part of our tests.

61331af0

ci: Allow manually executing nightly jobs · 88a79036

Patrick Steinhardt authored 3 years ago

Allow manually executing nightly jobs that test against Git's `master`
and `next` branches to easily allow testing changes against these Git
versions.

88a79036

Merge branch 'renovate/gocloud.dev-0.x' into 'master' · 9ff756d2
Patrick Steinhardt authored 3 years ago
```
go: Update module gocloud.dev to v0.25.0

See merge request gitlab-org/gitaly!4683
```
9ff756d2

ci: Upgrade build images to most recent ones · f112b59b

Patrick Steinhardt authored 3 years ago

The GitLab Build Images created for Gitaly have been simplified to
neither include Postgres nor PgBouncer given that both are not used by
our CI pipelines anymore. Furthermore, while not really important to us,
the Git version was bumped to v2.36.0.

Update our default build image to switch to the most recent ones.

f112b59b

go: Update module gocloud.dev to v0.25.0 · 335e0b67
GitLab Renovate Bot authored 3 years ago

335e0b67
Merge branch 'renovate/grpc-dependencies' into 'master' · 9deccce7
James Fargher authored 3 years ago
```
go: Update module google.golang.org/grpc to v1.47.0

See merge request gitlab-org/gitaly!4651
```
9deccce7

Jul 06, 2022

Merge branch 'pks-fetch-remote-test-with-alternates' into 'master' · 2284c7b5
Will Chandler authored 3 years ago
```
repository: Verify behaviour when fetching into pooled repos

Closes #4304

See merge request gitlab-org/gitaly!4671
```
2284c7b5
Merge branch 'pks-gittest-drop-worktrees' into 'master' · 6944c952
John Cai authored 3 years ago
```
gittest: Convert most tests that use worktrees

See merge request gitlab-org/gitaly!4666
```
6944c952
Merge branch 'pks-go-v1.17-infrastructure' into 'master' · 1f20b94a
Sami Hiltunen authored 3 years ago
```
Adjust infrastructure to use Go v1.17

See merge request gitlab-org/gitaly!4684
```
1f20b94a

featureflag: Remove now-unused raw feature flags · 139c4ec5

Patrick Steinhardt authored 3 years ago

The difference between raw and non-raw feature flags has been really
confusing at times: while a non-raw flag has the format "some_a", a raw
flag has the format "gitaly-feature-some-a". "Raw" in this context thus
refers to the metadata as used in the gRPC context.

To avoid this confusion and make the difference more distinguished and
to improve type safety the preceding commits have replaced all usage of
raw feature flags with actual `FeatureFlag` structures. In case one
needs access to the raw flag one would thus now use `MetadataKey()` to
obtain the raw metadata key.

Hopefully, this should clarify things when we use `FeatureFlags` in all
of our codebase in contrast to using raw ones in some places and non-raw
ones in others.

139c4ec5

hooks: Convert payload to use actual feature flags · be54a19d

Patrick Steinhardt authored 3 years ago

We're currently injecting raw feature flags into the hooks payload so
that these flags can be set in the outgoing context that's used to
connect back to Gitaly again. We're reducing the use of raw feature
flags though in favor of using the typed `FeatureFlag` structure.

Convert the hooks payload to instead inject `FeatureFlag`s directly with
their respective value so that we can get rid of raw feature flags.

This is technically-seen a breaking change because any old server that
is running would still inject the old set of feature flags which the new
binary doesn't understand anymore. As a consequence, it wouldn't use any
feature flags at all. This is fine though: none of our hook RPCs is
using feature flags right now, so this doesn't change anything.

be54a19d

coordinator: Refactor injection of default flags to not use raw flags · f00ed2d0

Patrick Steinhardt authored 3 years ago

When injecting default values for feature flags into the outgoing gRPC
context of Gitaly peers we're using `RawFlags()` to extract currently
set feature flags. This function is going away in favor of the type-safe
`FromContext()` function.

Migrate the coordinator to use `FromContext()`.

f00ed2d0

coordinator: Fix feature flags being set twice · a8c7c3e2

Patrick Steinhardt authored 3 years ago

In order to guarantee a persistent view of enabled and disabled feature
flags the coordinator makes sure to inject all feature flags that aren't
explicitly set in the gRPC context with their default flags as seen by
Praefect. This fixes a race where the default of any feature flag may
change during an upgrade that can lead to the flag being default-enabled
on some Gitaly nodes, while it's default-disabled on others.

This mechanism has a bug though: we first convert the incoming context
to an outgoing one and thus retain all metadata. We then compute the
full set of feature flags we wish to inject, including already-set ones.
And then finally amend this metadata to the outgoing context. The end
result is that any feature flag that has been explicitly set in the
incoming context will be set twice in the outgoing context.

Fortunately, this bug is benign because we always use the same value in
both cases, and our feature flagging code only ever checks the first
value. So this bug doesn't cause any adverse effects.

Fix it regardless by only setting feature flags in the outgoing context
that haven't yet been set before.

Changelog: fixed

a8c7c3e2

git2go: Convert injection of feature flags to not use raw flags · d3701bb9

Patrick Steinhardt authored 3 years ago

Convert the logic to inject and extract feature flags in `gitaly-git2go`
to not use raw feature flags anymore. Instead, use `FromMetadataKey()`
and `FromContext()` to work on typed `FeatureFlag`s.

This prepares for the removal of `featureflag.Raw`.

d3701bb9

git2go: Refactor featureflag subcommand to not use raw flags · 8a7bee35

Patrick Steinhardt authored 3 years ago

The featureflag subcommand is only used for testing purposes to assert
that we can indeed pass feature flags from the `git2go.Executor()` to
the process we're about to spawn. We're using raw feature flags for this
purpose, but we're about to remove them in favor of using the type-safe
`FeatureFlag` structure.

Convert the subcommand to return a new `FeatureFlag` structure that we
can use to more thoroughly check whether things behave as expected and
remove usage of `RawFromContext()` to prepare for its removal.

Note that while this is a breaking change to the calling interface of
the `gitaly-git2go` executable, this command is only ever used in our
tests. So in practice, this doesn't matter.

8a7bee35