[XLA:GPU] Add one-shot kernel implementation to RaggedAllToAll.
The kernel uses a CUDA kernel for an efficient implementation of ra2a on single host when direct peer access between GPUs is available. PiperOrigin-RevId: 730881111
Showing
- tensorflow/compiler/mlir/lite/ir/tfl_ops.cc 6 additions, 4 deletionstensorflow/compiler/mlir/lite/ir/tfl_ops.cc
- tensorflow/compiler/mlir/lite/stablehlo/transforms/compose_uniform_quantized_type_pass.cc 6 additions, 6 deletions...ablehlo/transforms/compose_uniform_quantized_type_pass.cc
- tensorflow/compiler/mlir/lite/stablehlo/transforms/uniform_quantized_stablehlo_to_tfl_pass.cc 49 additions, 38 deletions...hlo/transforms/uniform_quantized_stablehlo_to_tfl_pass.cc
- tensorflow/compiler/mlir/lite/transforms/optimize_pass.cc 2 additions, 2 deletionstensorflow/compiler/mlir/lite/transforms/optimize_pass.cc
- tensorflow/compiler/mlir/quantization/stablehlo/passes/defer_activation_transpose.cc 9 additions, 6 deletions...antization/stablehlo/passes/defer_activation_transpose.cc
- tensorflow/compiler/mlir/quantization/stablehlo/passes/fold_constant_transpose.cc 3 additions, 2 deletions.../quantization/stablehlo/passes/fold_constant_transpose.cc
- tensorflow/compiler/mlir/quantization/stablehlo/passes/insert_weight_param.cc 5 additions, 4 deletions...mlir/quantization/stablehlo/passes/insert_weight_param.cc
- tensorflow/compiler/mlir/quantization/stablehlo/passes/nchw_convolution_to_nhwc.cc 3 additions, 2 deletions...quantization/stablehlo/passes/nchw_convolution_to_nhwc.cc
- tensorflow/compiler/mlir/quantization/stablehlo/passes/quantization_patterns.cc 7 additions, 4 deletions...ir/quantization/stablehlo/passes/quantization_patterns.cc
- tensorflow/compiler/mlir/quantization/tensorflow/passes/add_dump_tensor_op.cc 3 additions, 2 deletions...mlir/quantization/tensorflow/passes/add_dump_tensor_op.cc
- tensorflow/compiler/mlir/quantization/tensorflow/passes/cast_bf16_ops_to_f32.cc 2 additions, 2 deletions...ir/quantization/tensorflow/passes/cast_bf16_ops_to_f32.cc
- tensorflow/compiler/mlir/quantization/tensorflow/passes/remove_var_init_by_const.cc 2 additions, 2 deletions...uantization/tensorflow/passes/remove_var_init_by_const.cc
- tensorflow/compiler/mlir/tensorflow/transforms/prepare_tpu_computation_for_tf_export.cc 2 additions, 2 deletions...rflow/transforms/prepare_tpu_computation_for_tf_export.cc
- tensorflow/dtensor/mlir/dtensor_layout_to_xla_sharding_op.cc 2 additions, 2 deletionstensorflow/dtensor/mlir/dtensor_layout_to_xla_sharding_op.cc
- third_party/llvm/generated.patch 696 additions, 1662 deletionsthird_party/llvm/generated.patch
- third_party/llvm/workspace.bzl 2 additions, 2 deletionsthird_party/llvm/workspace.bzl
- third_party/shardy/temporary.patch 2345 additions, 2354 deletionsthird_party/shardy/temporary.patch
- third_party/shardy/workspace.bzl 2 additions, 2 deletionsthird_party/shardy/workspace.bzl
- third_party/triton/llvm_integration/cl734808760.patch 33 additions, 0 deletionsthird_party/triton/llvm_integration/cl734808760.patch
- third_party/triton/llvm_integration/series.bzl 1 addition, 0 deletionsthird_party/triton/llvm_integration/series.bzl
Please register or sign in to comment