PR #10489: [GPU] Fix command buffer support for cuDNN fusions.
Imported from GitHub PR https://github.com/openxla/xla/pull/10489 CuDnnCmd is constructed before DnnGraph in CuDnnThunk is initialized so CuDnnCmd has to get unique_ptr\<DnnGraph\>& instead of DnnGraph& at initialization. Accordingly cuDNN thunks have to be initialized before command buffer ones to initialize graphs before they get captured. Test CommandBuffersAreSupported used to not demonstrate the use of command buffers because the corresponding command buffer call used to be inlined and no command buffers were created. This is now cleaned up and does work as expected with minimal CUDA graph size set to 1 with a flag. Copybara import of the project: -- 8547c674f3e0858efca9763bed586f1d796184d7 by Ilia Sergachev <isergachev@nvidia.com>: [GPU] Fix command buffer support for cuDNN fusions. Merging this change closes #10489 FUTURE_COPYBARA_INTEGRATE_REVIEW=https://github.com/openxla/xla/pull/10489 from openxla:fix_cudnn_cmd_buffers 8547c674f3e0858efca9763bed586f1d796184d7 PiperOrigin-RevId: 615965989
Showing
- tensorflow/compiler/mlir/lite/BUILD 1 addition, 0 deletionstensorflow/compiler/mlir/lite/BUILD
- tensorflow/compiler/mlir/lite/common/tfl_pass_config.h 3 additions, 0 deletionstensorflow/compiler/mlir/lite/common/tfl_pass_config.h
- tensorflow/compiler/mlir/lite/python/saved_model_to_tfl_flatbuffer.cc 2 additions, 0 deletions...ompiler/mlir/lite/python/saved_model_to_tfl_flatbuffer.cc
- tensorflow/compiler/mlir/lite/tf_tfl_passes.cc 6 additions, 0 deletionstensorflow/compiler/mlir/lite/tf_tfl_passes.cc
- tensorflow/core/tfrt/run_handler_thread_pool/BUILD 1 addition, 0 deletionstensorflow/core/tfrt/run_handler_thread_pool/BUILD
- tensorflow/core/tfrt/run_handler_thread_pool/run_handler_concurrent_work_queue_test.cc 4 additions, 3 deletions...ler_thread_pool/run_handler_concurrent_work_queue_test.cc
- tensorflow/core/tfrt/run_handler_thread_pool/run_handler_test.cc 5 additions, 0 deletions...low/core/tfrt/run_handler_thread_pool/run_handler_test.cc
- tensorflow/lite/delegates/flex/delegate_data.cc 3 additions, 6 deletionstensorflow/lite/delegates/flex/delegate_data.cc
- tensorflow/lite/delegates/flex/delegate_data.h 0 additions, 2 deletionstensorflow/lite/delegates/flex/delegate_data.h
- tensorflow/lite/delegates/flex/subgraph_resource.h 4 additions, 5 deletionstensorflow/lite/delegates/flex/subgraph_resource.h
- tensorflow/lite/delegates/gpu/common/model_builder.cc 3 additions, 16 deletionstensorflow/lite/delegates/gpu/common/model_builder.cc
- tensorflow/lite/delegates/xnnpack/xnnpack_delegate.cc 3 additions, 1 deletiontensorflow/lite/delegates/xnnpack/xnnpack_delegate.cc
- tensorflow/lite/python/convert.py 6 additions, 0 deletionstensorflow/lite/python/convert.py
- tensorflow/lite/python/lite.py 4 additions, 0 deletionstensorflow/lite/python/lite.py
- tensorflow/lite/toco/toco_flags.proto 5 additions, 1 deletiontensorflow/lite/toco/toco_flags.proto
- tensorflow/tools/pip_package/setup.py 1 addition, 1 deletiontensorflow/tools/pip_package/setup.py
- third_party/xla/xla/service/gpu/fusions/BUILD 2 additions, 0 deletionsthird_party/xla/xla/service/gpu/fusions/BUILD
- third_party/xla/xla/service/gpu/fusions/address_computation_fusion_test.cc 816 additions, 0 deletions...la/service/gpu/fusions/address_computation_fusion_test.cc
- third_party/xla/xla/service/gpu/fusions/cudnn_test.cc 45 additions, 29 deletionsthird_party/xla/xla/service/gpu/fusions/cudnn_test.cc
- third_party/xla/xla/service/gpu/fusions/custom.cc 167 additions, 0 deletionsthird_party/xla/xla/service/gpu/fusions/custom.cc
Please register or sign in to comment