repository: Add ObjectsSize RPC to calculate fine-grained objects size
In order to calculate a repository's size we provide multiple different functions. All of these have in common that they return the on-disk of various data structures in varying degrees of detail. But none of them provide the caller with the means to calculate the size of objects which are reachable from a starting set of revisions. This results in multiple problems: - It is hard to calculate the size for subsets of the object graph, e.g. for only newly pushed objects or to exclude references that are internal, only. - While `RepositorySize()` discerns normal objects from those which are currently waiting to be pruned via cruft packs, this metric is lagging behind significantly as cruft packs are only updated every few days. - Objects that exist in multiple packfiles or both as a packed and loose object will be accounted for multiple times. - It is impossible to figure out whether a subset of objects is deduplicated via object pools. This information can be quite important in certain contexts though, e.g. when trying to calculate storage size quotas. Implement a new `ObjectsSize()` RPC that calculates the size of objects reachable from a given set of (pseudo-)revisions via git-rev-list(1). This is as accurate as we can get and allows for determining the size of objects for various usecases: - The size of a single branch (`refs/heads/master`). - The size of all references (`--all`) or branches (`--branches`). - The size of new objects in a push (`$new_tips --not --all`). - The size of objects which are not deduplicated in an object deduplication network (`--all --not --alternate-refs`). - The size of objects which are deduplicated in an object deduplication network (`--alternate-refs`). This RPC is thus both as accurate as possible while also being quite flexible. It comes with the downside though that doing the graph walk to figure out reachable objects is quite expensive depending on both the number of references and objects. This cannot really be helped though: the caller needs to choose between either getting fast but coarse or slow but accurate results. Changelog: added
Showing
- internal/gitaly/service/repository/objects_size.go 99 additions, 0 deletionsinternal/gitaly/service/repository/objects_size.go
- internal/gitaly/service/repository/objects_size_test.go 393 additions, 0 deletionsinternal/gitaly/service/repository/objects_size_test.go
- proto/go/gitalypb/repository.pb.go 1409 additions, 1249 deletionsproto/go/gitalypb/repository.pb.go
- proto/go/gitalypb/repository_grpc.pb.go 129 additions, 15 deletionsproto/go/gitalypb/repository_grpc.pb.go
- proto/repository.proto 51 additions, 0 deletionsproto/repository.proto
Please register or sign in to comment