Skip to content
Snippets Groups Projects
Commit 2ac45ac0 authored by Oleg Shyshkov's avatar Oleg Shyshkov Committed by TensorFlower Gardener
Browse files

[XLA:GPU] Add one-shot kernel implementation to RaggedAllToAll.

The kernel uses a CUDA kernel for an efficient implementation of ra2a on single host when direct peer access between GPUs is available.

PiperOrigin-RevId: 730881111
parent e8e8b70e
No related merge requests found
Showing
with 3180 additions and 4098 deletions
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment