[XLA:GPU] Add one-shot kernel implementation to RaggedAllToAll. (2ac45ac0) · Commits · gitlab-org / build / omnibus-mirror / tensorflow

Commit 2ac45ac0 authored 4 months ago by

Oleg Shyshkov Committed by TensorFlower Gardener 4 months ago

[XLA:GPU] Add one-shot kernel implementation to RaggedAllToAll.

The kernel uses a CUDA kernel for an efficient implementation of ra2a on single host when direct peer access between GPUs is available.

PiperOrigin-RevId: 730881111

parent e8e8b70e

Expand all Hide whitespace changes

Inline Side-by-side

Showing with 3180 additions and 4098 deletions

Please register or to comment