Triton Inference Server의 Python Backend 프로세스
지난 글을 통해 Triton Inference Server에서 Python Backend 실습을 해봤는데요,
파이썬 - WSL/docker에 구성한 Triton 예제 개발 환경
; https://www.sysnet.pe.kr/2/0/13938
참고 삼아, 해당 docker 컨테이너가 실행될 때의 로그 전문을 아래에 실어봅니다. ^^
$ docker run --gpus='"device=0"' -it --rm --shm-size=8g -e SSL_CERT_DIR=/etc/ssl/certs/ -p 8005:8000 -v ${MODEL_FOLDER_PATH}:/model_dir tis tritonserver --model-repository=/model_dir --strict-model-config=false --model-control-mode=poll --repository-poll-secs=10 --backend-config=tensorflow,version=2 --log-verbose=1
=============================
== Triton Inference Server ==
=============================
NVIDIA Release 25.04 (build 164182428)
Triton Server Version 2.57.0
Copyright (c) 2018-2025, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
Various files include modifications (c) NVIDIA CORPORATION & AFFILIATES. All rights reserved.
GOVERNING TERMS: The software and materials are governed by the NVIDIA Software License Agreement
(found at https://www.nvidia.com/en-us/agreements/enterprise-software/nvidia-software-license-agreement/)
and the Product-Specific Terms for NVIDIA AI Products
(found at https://www.nvidia.com/en-us/agreements/enterprise-software/product-specific-terms-for-ai-products/).
Warning: '--strict-model-config' has been deprecated! Please use '--disable-auto-complete-config' instead.
I0528 07:03:33.259775 1 cache_manager.cc:480] "Create CacheManager with cache_dir: '/opt/tritonserver/caches'"
I0528 07:03:33.752273 1 pinned_memory_manager.cc:277] "Pinned memory pool is created at '0x204c00000' with size 268435456"
I0528 07:03:33.752790 1 cuda_memory_manager.cc:107] "CUDA memory pool is created on device 0 with size 67108864"
I0528 07:03:33.772394 1 model_config_utils.cc:753] "Server side auto-completed config: "
name: "core"
platform: "pytorch_libtorch"
input {
name: "INPUT__0"
data_type: TYPE_FP32
dims: -1
dims: 3
dims: 224
dims: 224
}
output {
name: "OUTPUT__0"
data_type: TYPE_FP32
dims: -1
dims: -1
}
default_model_filename: "model.pt"
backend: "pytorch"
I0528 07:03:33.773317 1 model_config_utils.cc:753] "Server side auto-completed config: "
name: "ensemble"
platform: "ensemble"
input {
name: "image"
data_type: TYPE_STRING
dims: -1
}
output {
name: "result"
data_type: TYPE_STRING
dims: -1
}
ensemble_scheduling {
step {
model_name: "preprocessing"
model_version: -1
input_map {
key: "image"
value: "image"
}
output_map {
key: "input_image"
value: "input_image"
}
}
step {
model_name: "core"
model_version: -1
input_map {
key: "INPUT__0"
value: "input_image"
}
output_map {
key: "OUTPUT__0"
value: "scores"
}
}
step {
model_name: "postprocessing"
model_version: -1
input_map {
key: "INPUT__0"
value: "scores"
}
output_map {
key: "result"
value: "result"
}
}
}
I0528 07:03:34.100777 1 model_config_utils.cc:753] "Server side auto-completed config: "
name: "postprocessing"
input {
name: "INPUT__0"
data_type: TYPE_FP32
dims: -1
dims: -1
}
output {
name: "result"
data_type: TYPE_STRING
dims: -1
}
instance_group {
kind: KIND_CPU
}
default_model_filename: "model.py"
parameters {
key: "EXECUTION_ENV_PATH"
value {
string_value: "$$TRITON_MODEL_DIRECTORY/python_env"
}
}
backend: "python"
I0528 07:03:34.231888 1 model_config_utils.cc:753] "Server side auto-completed config: "
name: "preprocessing"
input {
name: "image"
data_type: TYPE_STRING
dims: -1
}
output {
name: "input_image"
data_type: TYPE_FP32
dims: -1
dims: 3
dims: -1
dims: -1
}
instance_group {
kind: KIND_CPU
}
default_model_filename: "model.py"
parameters {
key: "EXECUTION_ENV_PATH"
value {
string_value: "$$TRITON_MODEL_DIRECTORY/python_env"
}
}
backend: "python"
W0528 07:03:34.232145 1 model_lifecycle.cc:112] "ignore version directory 'python_env' which fails to convert to integral number"
I0528 07:03:34.232199 1 model_lifecycle.cc:473] "loading: preprocessing:1"
W0528 07:03:34.232274 1 model_lifecycle.cc:112] "ignore version directory 'python_env' which fails to convert to integral number"
I0528 07:03:34.232312 1 model_lifecycle.cc:473] "loading: postprocessing:1"
I0528 07:03:34.232370 1 model_lifecycle.cc:473] "loading: core:1"
I0528 07:03:34.232429 1 backend_model.cc:505] "Adding default backend config setting: default-max-batch-size,4"
I0528 07:03:34.232527 1 shared_library.cc:149] "OpenLibraryHandle: /opt/tritonserver/backends/python/libtriton_python.so"
I0528 07:03:34.232590 1 backend_model.cc:505] "Adding default backend config setting: default-max-batch-size,4"
I0528 07:03:34.232456 1 backend_model.cc:505] "Adding default backend config setting: default-max-batch-size,4"
I0528 07:03:34.248938 1 python_be.cc:1982] "'python' TRITONBACKEND API version: 1.19"
I0528 07:03:34.248995 1 python_be.cc:2004] "backend configuration:\n{\"cmdline\":{\"auto-complete-config\":\"true\",\"backend-directory\":\"/opt/tritonserver/backends\",\"min-compute-capability\":\"6.000000\",\"default-max-batch-size\":\"4\"}}"
I0528 07:03:34.249021 1 python_be.cc:2142] "Shared memory configuration is shm-default-byte-size=1048576,shm-growth-byte-size=1048576,stub-timeout-seconds=30"
I0528 07:03:34.306505 1 python_be.cc:2439] "TRITONBACKEND_GetBackendAttribute: setting attributes"
I0528 07:03:34.306644 1 python_be.cc:2243] "TRITONBACKEND_ModelInitialize: preprocessing (version 1)"
I0528 07:03:34.306655 1 shared_library.cc:149] "OpenLibraryHandle: /opt/tritonserver/backends/pytorch/libtriton_pytorch.so"
I0528 07:03:34.307190 1 model_config_utils.cc:1986] "ModelConfig 64-bit fields:"
I0528 07:03:34.307232 1 model_config_utils.cc:1988] "\tModelConfig::dynamic_batching::default_priority_level"
I0528 07:03:34.307263 1 model_config_utils.cc:1988] "\tModelConfig::dynamic_batching::default_queue_policy::default_timeout_microseconds"
I0528 07:03:34.307288 1 model_config_utils.cc:1988] "\tModelConfig::dynamic_batching::max_queue_delay_microseconds"
I0528 07:03:34.307296 1 model_config_utils.cc:1988] "\tModelConfig::dynamic_batching::priority_levels"
I0528 07:03:34.307301 1 model_config_utils.cc:1988] "\tModelConfig::dynamic_batching::priority_queue_policy::key"
I0528 07:03:34.307307 1 model_config_utils.cc:1988] "\tModelConfig::dynamic_batching::priority_queue_policy::value::default_timeout_microseconds"
I0528 07:03:34.307311 1 model_config_utils.cc:1988] "\tModelConfig::ensemble_scheduling::step::model_version"
I0528 07:03:34.307315 1 model_config_utils.cc:1988] "\tModelConfig::input::dims"
I0528 07:03:34.307338 1 model_config_utils.cc:1988] "\tModelConfig::input::reshape::shape"
I0528 07:03:34.307345 1 model_config_utils.cc:1988] "\tModelConfig::instance_group::secondary_devices::device_id"
I0528 07:03:34.307348 1 model_config_utils.cc:1988] "\tModelConfig::model_warmup::inputs::value::dims"
I0528 07:03:34.307352 1 model_config_utils.cc:1988] "\tModelConfig::optimization::cuda::graph_spec::graph_lower_bound::input::value::dim"
I0528 07:03:34.307356 1 model_config_utils.cc:1988] "\tModelConfig::optimization::cuda::graph_spec::input::value::dim"
I0528 07:03:34.307360 1 model_config_utils.cc:1988] "\tModelConfig::output::dims"
I0528 07:03:34.307364 1 model_config_utils.cc:1988] "\tModelConfig::output::reshape::shape"
I0528 07:03:34.307367 1 model_config_utils.cc:1988] "\tModelConfig::sequence_batching::direct::max_queue_delay_microseconds"
I0528 07:03:34.307371 1 model_config_utils.cc:1988] "\tModelConfig::sequence_batching::max_sequence_idle_microseconds"
I0528 07:03:34.307391 1 model_config_utils.cc:1988] "\tModelConfig::sequence_batching::oldest::max_queue_delay_microseconds"
I0528 07:03:34.307398 1 model_config_utils.cc:1988] "\tModelConfig::sequence_batching::state::dims"
I0528 07:03:34.307402 1 model_config_utils.cc:1988] "\tModelConfig::sequence_batching::state::initial_state::dims"
I0528 07:03:34.307406 1 model_config_utils.cc:1988] "\tModelConfig::version_policy::specific::versions"
I0528 07:03:34.307514 1 python_be.cc:1849] "Using Python execution env /model_dir/preprocessing/python_env"
I0528 07:03:34.307585 1 pb_env.cc:267] "Returning canonical path since EXECUTION_ENV_PATH does not contain compressed path. Path: /model_dir/preprocessing/python_env"
I0528 07:03:35.351471 1 stub_launcher.cc:385] "Starting Python backend stub: source /model_dir/preprocessing/python_env/bin/activate && exec env LD_LIBRARY_PATH=/model_dir/preprocessing/python_env/lib:$LD_LIBRARY_PATH /opt/tritonserver/backends/python/triton_python_backend_stub /model_dir/preprocessing/1/model.py triton_python_backend_shm_region_b0fc2220-a884-4074-b0e9-50111be4a834 1048576 1048576 1 /opt/tritonserver/backends/python 336 preprocessing DEFAULT"
I0528 07:03:35.864486 1 libtorch.cc:2501] "TRITONBACKEND_Initialize: pytorch"
I0528 07:03:35.864552 1 libtorch.cc:2511] "Triton TRITONBACKEND API version: 1.19"
I0528 07:03:35.864558 1 libtorch.cc:2517] "'pytorch' TRITONBACKEND API version: 1.19"
I0528 07:03:35.864664 1 libtorch.cc:2550] "TRITONBACKEND_ModelInitialize: core (version 1)"
I0528 07:03:35.864699 1 python_be.cc:2243] "TRITONBACKEND_ModelInitialize: postprocessing (version 1)"
W0528 07:03:35.865129 1 libtorch.cc:329] "skipping model configuration auto-complete for 'core': not supported for pytorch backend"
I0528 07:03:35.865141 1 python_be.cc:1849] "Using Python execution env /model_dir/postprocessing/python_env"
I0528 07:03:35.865284 1 pb_env.cc:267] "Returning canonical path since EXECUTION_ENV_PATH does not contain compressed path. Path: /model_dir/postprocessing/python_env"
I0528 07:03:35.865608 1 libtorch.cc:358] "Optimized execution is enabled for model instance 'core'"
I0528 07:03:35.865652 1 libtorch.cc:377] "Cache Cleaning is disabled for model instance 'core'"
I0528 07:03:35.865658 1 libtorch.cc:394] "Inference Mode is enabled for model instance 'core'"
I0528 07:03:35.865662 1 libtorch.cc:413] "cuDNN is enabled for model instance 'core'"
I0528 07:03:35.865791 1 libtorch.cc:2594] "TRITONBACKEND_ModelInstanceInitialize: core_0 (GPU device 0)"
I0528 07:03:35.865847 1 backend_model_instance.cc:106] "Creating instance core_0 on GPU 0 (8.9) using artifact 'model.pt'"
I0528 07:03:35.866008 1 stub_launcher.cc:385] "Starting Python backend stub: source /model_dir/postprocessing/python_env/bin/activate && exec env LD_LIBRARY_PATH=/model_dir/postprocessing/python_env/lib:$LD_LIBRARY_PATH /opt/tritonserver/backends/python/triton_python_backend_stub /model_dir/postprocessing/1/model.py triton_python_backend_shm_region_122ca998-6109-4181-9f22-1d3ebb50f917 1048576 1048576 1 /opt/tritonserver/backends/python 336 postprocessing DEFAULT"
I0528 07:03:36.633413 1 backend_model_instance.cc:786] "Starting backend thread for core_0 at nice 0 on device 0..."
I0528 07:03:36.633882 1 model_lifecycle.cc:849] "successfully loaded 'core'"
I0528 07:03:47.014772 1 python_be.cc:1938] "model configuration:\n{\n \"name\": \"preprocessing\",\n \"platform\": \"\",\n \"backend\": \"python\",\n \"runtime\": \"\",\n \"version_policy\": {\n \"latest\": {\n \"num_versions\": 1\n }\n },\n \"max_batch_size\": 0,\n \"input\": [\n {\n \"name\": \"image\",\n \"data_type\": \"TYPE_STRING\",\n \"format\": \"FORMAT_NONE\",\n \"dims\": [\n -1\n ],\n \"is_shape_tensor\": false,\n \"allow_ragged_batch\": false,\n \"optional\": false,\n \"is_non_linear_format_io\": false\n }\n ],\n \"output\": [\n {\n \"name\": \"input_image\",\n \"data_type\": \"TYPE_FP32\",\n \"dims\": [\n -1,\n 3,\n -1,\n -1\n ],\n \"label_filename\": \"\",\n \"is_shape_tensor\": false,\n \"is_non_linear_format_io\": false\n }\n ],\n \"batch_input\": [],\n \"batch_output\": [],\n \"optimization\": {\n \"priority\": \"PRIORITY_DEFAULT\",\n \"input_pinned_memory\": {\n \"enable\": true\n },\n \"output_pinned_memory\": {\n \"enable\": true\n },\n \"gather_kernel_buffer_threshold\": 0,\n \"eager_batching\": false\n },\n \"instance_group\": [\n {\n \"name\": \"preprocessing_0\",\n \"kind\": \"KIND_CPU\",\n \"count\": 1,\n \"gpus\": [],\n \"secondary_devices\": [],\n \"profile\": [],\n \"passive\": false,\n \"host_policy\": \"\"\n }\n ],\n \"default_model_filename\": \"model.py\",\n \"cc_model_filenames\": {},\n \"metric_tags\": {},\n \"parameters\": {\n \"EXECUTION_ENV_PATH\": {\n \"string_value\": \"$$TRITON_MODEL_DIRECTORY/python_env\"\n }\n },\n \"model_warmup\": []\n}"
I0528 07:03:47.015244 1 python_be.cc:2287] "TRITONBACKEND_ModelInstanceInitialize: preprocessing_0_0 (CPU device 0)"
I0528 07:03:47.015299 1 backend_model_instance.cc:69] "Creating instance preprocessing_0_0 on CPU using artifact 'model.py'"
I0528 07:03:47.015417 1 pb_env.cc:267] "Returning canonical path since EXECUTION_ENV_PATH does not contain compressed path. Path: /model_dir/preprocessing/python_env"
I0528 07:03:47.016422 1 stub_launcher.cc:385] "Starting Python backend stub: source /model_dir/preprocessing/python_env/bin/activate && exec env LD_LIBRARY_PATH=/model_dir/preprocessing/python_env/lib:$LD_LIBRARY_PATH /opt/tritonserver/backends/python/triton_python_backend_stub /model_dir/preprocessing/1/model.py triton_python_backend_shm_region_08bebad7-d851-498a-91fe-cba9e3e2e961 1048576 1048576 1 /opt/tritonserver/backends/python 336 preprocessing_0_0 DEFAULT"
I0528 07:03:47.899083 1 python_be.cc:2308] "TRITONBACKEND_ModelInstanceInitialize: instance initialization successful preprocessing_0_0 (device 0)"
I0528 07:03:47.899297 1 backend_model_instance.cc:786] "Starting backend thread for preprocessing_0_0 at nice 0 on device 0..."
I0528 07:03:47.899592 1 model_lifecycle.cc:849] "successfully loaded 'preprocessing'"
I0528 07:03:59.117453 1 python_be.cc:1938] "model configuration:\n{\n \"name\": \"postprocessing\",\n \"platform\": \"\",\n \"backend\": \"python\",\n \"runtime\": \"\",\n \"version_policy\": {\n \"latest\": {\n \"num_versions\": 1\n }\n },\n \"max_batch_size\": 0,\n \"input\": [\n {\n \"name\": \"INPUT__0\",\n \"data_type\": \"TYPE_FP32\",\n \"format\": \"FORMAT_NONE\",\n \"dims\": [\n -1,\n -1\n ],\n \"is_shape_tensor\": false,\n \"allow_ragged_batch\": false,\n \"optional\": false,\n \"is_non_linear_format_io\": false\n }\n ],\n \"output\": [\n {\n \"name\": \"result\",\n \"data_type\": \"TYPE_STRING\",\n \"dims\": [\n -1\n ],\n \"label_filename\": \"\",\n \"is_shape_tensor\": false,\n \"is_non_linear_format_io\": false\n }\n ],\n \"batch_input\": [],\n \"batch_output\": [],\n \"optimization\": {\n \"priority\": \"PRIORITY_DEFAULT\",\n \"input_pinned_memory\": {\n \"enable\": true\n },\n \"output_pinned_memory\": {\n \"enable\": true\n },\n \"gather_kernel_buffer_threshold\": 0,\n \"eager_batching\": false\n },\n \"instance_group\": [\n {\n \"name\": \"postprocessing_0\",\n \"kind\": \"KIND_CPU\",\n \"count\": 1,\n \"gpus\": [],\n \"secondary_devices\": [],\n \"profile\": [],\n \"passive\": false,\n \"host_policy\": \"\"\n }\n ],\n \"default_model_filename\": \"model.py\",\n \"cc_model_filenames\": {},\n \"metric_tags\": {},\n \"parameters\": {\n \"EXECUTION_ENV_PATH\": {\n \"string_value\": \"$$TRITON_MODEL_DIRECTORY/python_env\"\n }\n },\n \"model_warmup\": []\n}"
I0528 07:03:59.117846 1 python_be.cc:2287] "TRITONBACKEND_ModelInstanceInitialize: postprocessing_0_0 (CPU device 0)"
I0528 07:03:59.117900 1 backend_model_instance.cc:69] "Creating instance postprocessing_0_0 on CPU using artifact 'model.py'"
I0528 07:03:59.117985 1 pb_env.cc:267] "Returning canonical path since EXECUTION_ENV_PATH does not contain compressed path. Path: /model_dir/postprocessing/python_env"
I0528 07:03:59.118847 1 stub_launcher.cc:385] "Starting Python backend stub: source /model_dir/postprocessing/python_env/bin/activate && exec env LD_LIBRARY_PATH=/model_dir/postprocessing/python_env/lib:$LD_LIBRARY_PATH /opt/tritonserver/backends/python/triton_python_backend_stub /model_dir/postprocessing/1/model.py triton_python_backend_shm_region_7ba64d7a-c7c5-45cc-9a07-cda9fd5b2f32 1048576 1048576 1 /opt/tritonserver/backends/python 336 postprocessing_0_0 DEFAULT"
I0528 07:04:00.889559 1 python_be.cc:2308] "TRITONBACKEND_ModelInstanceInitialize: instance initialization successful postprocessing_0_0 (device 0)"
I0528 07:04:00.889911 1 backend_model_instance.cc:786] "Starting backend thread for postprocessing_0_0 at nice 0 on device 0..."
I0528 07:04:00.890376 1 model_lifecycle.cc:849] "successfully loaded 'postprocessing'"
I0528 07:04:00.890652 1 model_lifecycle.cc:473] "loading: ensemble:1"
I0528 07:04:00.891034 1 ensemble_model.cc:57] "ensemble model for ensemble\n"
I0528 07:04:00.891085 1 model_lifecycle.cc:849] "successfully loaded 'ensemble'"
I0528 07:04:00.891207 1 server.cc:604]
+------------------+------+
| Repository Agent | Path |
+------------------+------+
+------------------+------+
I0528 07:04:00.891272 1 server.cc:631]
+---------+---------------------------------------------------------+---------------------------------------------------------------------------------------------------------+
| Backend | Path | Config |
+---------+---------------------------------------------------------+---------------------------------------------------------------------------------------------------------+
| python | /opt/tritonserver/backends/python/libtriton_python.so | {"cmdline":{"auto-complete-config":"true","backend-directory":"/opt/tritonserver/backends","min-compute |
| | | -capability":"6.000000","default-max-batch-size":"4"}} |
| pytorch | /opt/tritonserver/backends/pytorch/libtriton_pytorch.so | {"cmdline":{"auto-complete-config":"true","backend-directory":"/opt/tritonserver/backends","min-compute |
| | | -capability":"6.000000","default-max-batch-size":"4"}} |
+---------+---------------------------------------------------------+---------------------------------------------------------------------------------------------------------+
I0528 07:04:00.891349 1 server.cc:674]
+----------------+---------+--------+
| Model | Version | Status |
+----------------+---------+--------+
| core | 1 | READY |
| ensemble | 1 | READY |
| postprocessing | 1 | READY |
| preprocessing | 1 | READY |
+----------------+---------+--------+
I0528 07:04:00.942613 1 metrics.cc:890] "Collecting metrics for GPU 0: NVIDIA GeForce RTX 4060 Ti"
I0528 07:04:00.947748 1 metrics.cc:783] "Collecting CPU metrics"
I0528 07:04:00.948038 1 tritonserver.cc:2598]
+----------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------+
| Option | Value |
+----------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------+
| server_id | triton |
| server_version | 2.57.0 |
| server_extensions | classification sequence model_repository model_repository(unload_dependents) schedule_policy model_configuration system_shared_memory cuda |
| | _shared_memory binary_tensor_data parameters statistics trace logging |
| model_repository_path[0] | /model_dir |
| model_control_mode | MODE_POLL |
| strict_model_config | 0 |
| model_config_name | |
| rate_limit | OFF |
| pinned_memory_pool_byte_size | 268435456 |
| cuda_memory_pool_byte_size{0} | 67108864 |
| min_supported_compute_capability | 6.0 |
| strict_readiness | 1 |
| exit_timeout | 30 |
| cache_enabled | 0 |
+----------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------+
I0528 07:04:00.949955 1 grpc_server.cc:2368]
+----------------------------------------------+---------+
| GRPC KeepAlive Option | Value |
+----------------------------------------------+---------+
| keepalive_time_ms | 7200000 |
| keepalive_timeout_ms | 20000 |
| keepalive_permit_without_calls | 0 |
| http2_max_pings_without_data | 2 |
| http2_min_recv_ping_interval_without_data_ms | 300000 |
| http2_max_ping_strikes | 2 |
+----------------------------------------------+---------+
I0528 07:04:00.955697 1 grpc_server.cc:100] "Ready for RPC 'Check', 0"
I0528 07:04:00.955758 1 grpc_server.cc:100] "Ready for RPC 'ServerLive', 0"
I0528 07:04:00.955794 1 grpc_server.cc:100] "Ready for RPC 'ServerReady', 0"
I0528 07:04:00.955824 1 grpc_server.cc:100] "Ready for RPC 'ModelReady', 0"
I0528 07:04:00.955851 1 grpc_server.cc:100] "Ready for RPC 'ServerMetadata', 0"
I0528 07:04:00.955908 1 grpc_server.cc:100] "Ready for RPC 'ModelMetadata', 0"
I0528 07:04:00.955919 1 grpc_server.cc:100] "Ready for RPC 'ModelConfig', 0"
I0528 07:04:00.955930 1 grpc_server.cc:100] "Ready for RPC 'SystemSharedMemoryStatus', 0"
I0528 07:04:00.955956 1 grpc_server.cc:100] "Ready for RPC 'SystemSharedMemoryRegister', 0"
I0528 07:04:00.955986 1 grpc_server.cc:100] "Ready for RPC 'SystemSharedMemoryUnregister', 0"
I0528 07:04:00.956020 1 grpc_server.cc:100] "Ready for RPC 'CudaSharedMemoryStatus', 0"
I0528 07:04:00.956050 1 grpc_server.cc:100] "Ready for RPC 'CudaSharedMemoryRegister', 0"
I0528 07:04:00.956058 1 grpc_server.cc:100] "Ready for RPC 'CudaSharedMemoryUnregister', 0"
I0528 07:04:00.956067 1 grpc_server.cc:100] "Ready for RPC 'RepositoryIndex', 0"
I0528 07:04:00.956106 1 grpc_server.cc:100] "Ready for RPC 'RepositoryModelLoad', 0"
I0528 07:04:00.956115 1 grpc_server.cc:100] "Ready for RPC 'RepositoryModelUnload', 0"
I0528 07:04:00.956123 1 grpc_server.cc:100] "Ready for RPC 'ModelStatistics', 0"
I0528 07:04:00.956151 1 grpc_server.cc:100] "Ready for RPC 'Trace', 0"
I0528 07:04:00.956181 1 grpc_server.cc:100] "Ready for RPC 'Logging', 0"
I0528 07:04:00.956268 1 grpc_server.cc:364] "Thread started for CommonHandler"
I0528 07:04:00.956579 1 infer_handler.cc:675] "New request handler for ModelInferHandler, 0"
I0528 07:04:00.956667 1 infer_handler.h:1538] "Thread started for ModelInferHandler"
I0528 07:04:00.958032 1 infer_handler.cc:675] "New request handler for ModelInferHandler, 0"
I0528 07:04:00.958119 1 infer_handler.h:1538] "Thread started for ModelInferHandler"
I0528 07:04:00.958353 1 stream_infer_handler.cc:128] "New request handler for ModelStreamInferHandler, 0"
I0528 07:04:00.958431 1 infer_handler.h:1538] "Thread started for ModelStreamInferHandler"
I0528 07:04:00.958473 1 grpc_server.cc:2560] "Started GRPCInferenceService at 0.0.0.0:8001"
I0528 07:04:00.958724 1 http_server.cc:4755] "Started HTTPService at 0.0.0.0:8000"
I0528 07:04:01.003700 1 http_server.cc:358] "Started Metrics Service at 0.0.0.0:8002"
I0528 07:04:01.009674 1 server.cc:376] "Polling model repository"
I0528 07:04:11.479535 1 server.cc:376] "Polling model repository"
I0528 07:04:21.249437 1 server.cc:376] "Polling model repository"
I0528 07:04:31.052945 1 http_server.cc:4641] "HTTP request: 2 /v2/models/ensemble/infer"
I0528 07:04:31.053746 1 infer_request.cc:132] "[request id: <id_unknown>] Setting state from INITIALIZED to INITIALIZED"
I0528 07:04:31.053784 1 infer_request.cc:905] "[request id: <id_unknown>] prepared: [0x0x7f18940034f0] request id: , model: ensemble, requested version: -1, actual version: 1, flags: 0x0, correlation id: 0, batch size: 0, priority: 0, timeout (us): 0\noriginal inputs:\n[0x0x7f1894054488] input: image, type: BYTES, original shape: [1], batch + shape: [1], shape: [1]\noverride inputs:\ninputs:\n[0x0x7f1894054488] input: image, type: BYTES, original shape: [1], batch + shape: [1], shape: [1]\noriginal requested outputs:\nrequested outputs:\nresult\n"
I0528 07:04:31.053799 1 infer_request.cc:132] "[request id: <id_unknown>] Setting state from INITIALIZED to PENDING"
I0528 07:04:31.053807 1 infer_request.cc:132] "[request id: <id_unknown>] Setting state from PENDING to EXECUTING"
I0528 07:04:31.053884 1 infer_request.cc:132] "[request id: <id_unknown>] Setting state from INITIALIZED to INITIALIZED"
I0528 07:04:31.053894 1 infer_request.cc:905] "[request id: <id_unknown>] prepared: [0x0x7f1894046070] request id: , model: preprocessing, requested version: -1, actual version: 1, flags: 0x0, correlation id: 0, batch size: 0, priority: 0, timeout (us): 0\noriginal inputs:\n[0x0x7f1894044958] input: image, type: BYTES, original shape: [1], batch + shape: [1], shape: [1]\noverride inputs:\ninputs:\n[0x0x7f1894044958] input: image, type: BYTES, original shape: [1], batch + shape: [1], shape: [1]\noriginal requested outputs:\ninput_image\nrequested outputs:\ninput_image\n"
I0528 07:04:31.053929 1 infer_request.cc:132] "[request id: <id_unknown>] Setting state from INITIALIZED to PENDING"
I0528 07:04:31.054021 1 infer_request.cc:132] "[request id: <id_unknown>] Setting state from PENDING to EXECUTING"
I0528 07:04:31.054086 1 python_be.cc:1348] "model preprocessing, instance preprocessing_0_0, executing 1 requests"
I0528 07:04:31.063281 1 infer_response.cc:193] "add response output: output: input_image, type: FP32, shape: [1,3,224,224]"
I0528 07:04:31.063357 1 pinned_memory_manager.cc:198] "pinned memory allocation: size 602112, addr 0x204c00090"
I0528 07:04:31.063392 1 ensemble_scheduler.cc:569] "Internal response allocation: input_image, size 602112, addr 0x204c00090, memory type 1, type id 0"
I0528 07:04:31.063516 1 ensemble_scheduler.cc:584] "Internal response release: size 602112, addr 0x204c00090"
I0528 07:04:31.063539 1 infer_request.cc:132] "[request id: <id_unknown>] Setting state from INITIALIZED to INITIALIZED"
I0528 07:04:31.063549 1 infer_request.cc:905] "[request id: <id_unknown>] prepared: [0x0x7f19a4005900] request id: , model: core, requested version: -1, actual version: 1, flags: 0x0, correlation id: 0, batch size: 0, priority: 0, timeout (us): 0\noriginal inputs:\n[0x0x7f19a4005c68] input: INPUT__0, type: FP32, original shape: [1,3,224,224], batch + shape: [1,3,224,224], shape: [1,3,224,224]\noverride inputs:\ninputs:\n[0x0x7f19a4005c68] input: INPUT__0, type: FP32, original shape: [1,3,224,224], batch + shape: [1,3,224,224], shape: [1,3,224,224]\noriginal requested outputs:\nOUTPUT__0\nrequested outputs:\nOUTPUT__0\n"
I0528 07:04:31.063598 1 infer_request.cc:132] "[request id: <id_unknown>] Setting state from INITIALIZED to PENDING"
I0528 07:04:31.063716 1 infer_request.cc:132] "[request id: <id_unknown>] Setting state from PENDING to EXECUTING"
I0528 07:04:31.063728 1 infer_request.cc:132] "[request id: <id_unknown>] Setting state from EXECUTING to RELEASED"
I0528 07:04:31.063792 1 libtorch.cc:2660] "model core, instance core_0, executing 1 requests"
I0528 07:04:31.063906 1 libtorch.cc:1268] "TRITONBACKEND_ModelExecute: Running core_0 with 1 requests"
I0528 07:04:31.063855 1 python_be.cc:2407] "TRITONBACKEND_ModelInstanceExecute: model instance name preprocessing_0_0 released 1 requests"
I0528 07:04:31.683754 1 infer_response.cc:193] "add response output: output: OUTPUT__0, type: FP32, shape: [1,1000]"
I0528 07:04:31.683835 1 ensemble_scheduler.cc:569] "Internal response allocation: OUTPUT__0, size 4000, addr 0x1307093000, memory type 2, type id 0"
I0528 07:04:31.685041 1 ensemble_scheduler.cc:584] "Internal response release: size 4000, addr 0x1307093000"
I0528 07:04:31.685100 1 infer_request.cc:132] "[request id: <id_unknown>] Setting state from INITIALIZED to INITIALIZED"
I0528 07:04:31.685130 1 infer_request.cc:905] "[request id: <id_unknown>] prepared: [0x0x7f1a4a913430] request id: , model: postprocessing, requested version: -1, actual version: 1, flags: 0x0, correlation id: 0, batch size: 0, priority: 0, timeout (us): 0\noriginal inputs:\n[0x0x7f1818e81ee8] input: INPUT__0, type: FP32, original shape: [1,1000], batch + shape: [1,1000], shape: [1,1000]\noverride inputs:\ninputs:\n[0x0x7f1818e81ee8] input: INPUT__0, type: FP32, original shape: [1,1000], batch + shape: [1,1000], shape: [1,1000]\noriginal requested outputs:\nresult\nrequested outputs:\nresult\n"
I0528 07:04:31.685170 1 infer_request.cc:132] "[request id: <id_unknown>] Setting state from INITIALIZED to PENDING"
I0528 07:04:31.685278 1 infer_request.cc:132] "[request id: <id_unknown>] Setting state from EXECUTING to RELEASED"
I0528 07:04:31.685289 1 infer_request.cc:132] "[request id: <id_unknown>] Setting state from PENDING to EXECUTING"
I0528 07:04:31.685414 1 python_be.cc:1348] "model postprocessing, instance postprocessing_0_0, executing 1 requests"
I0528 07:04:31.685317 1 pinned_memory_manager.cc:226] "pinned memory deallocation: addr 0x204c00090"
I0528 07:04:31.688686 1 infer_response.cc:193] "add response output: output: result, type: BYTES, shape: []"
I0528 07:04:31.688752 1 pinned_memory_manager.cc:198] "pinned memory allocation: size 22945, addr 0x204c00090"
I0528 07:04:31.688761 1 ensemble_scheduler.cc:569] "Internal response allocation: result, size 22945, addr 0x204c00090, memory type 1, type id 0"
I0528 07:04:31.688785 1 ensemble_scheduler.cc:584] "Internal response release: size 22945, addr 0x204c00090"
I0528 07:04:31.688817 1 infer_response.cc:167] "add response output: output: result, type: BYTES, shape: []"
I0528 07:04:31.688830 1 http_server.cc:1272] "HTTP: unable to provide 'result' in CPU_PINNED, will use CPU"
I0528 07:04:31.688847 1 http_server.cc:1292] "HTTP using buffer for: 'result', size: 22945, addr: 0x7f19140055f0"
I0528 07:04:31.688884 1 pinned_memory_manager.cc:226] "pinned memory deallocation: addr 0x204c00090"
I0528 07:04:31.689991 1 http_server.cc:1366] "HTTP release: size 22945, addr 0x7f19140055f0"
I0528 07:04:31.690115 1 infer_request.cc:132] "[request id: <id_unknown>] Setting state from EXECUTING to RELEASED"
I0528 07:04:31.690166 1 infer_request.cc:132] "[request id: <id_unknown>] Setting state from EXECUTING to RELEASED"
I0528 07:04:31.690203 1 python_be.cc:2407] "TRITONBACKEND_ModelInstanceExecute: model instance name postprocessing_0_0 released 1 requests"
I0528 07:04:31.862249 1 server.cc:376] "Polling model repository"
I0528 07:04:42.315312 1 server.cc:376] "Polling model repository"
I0528 07:04:52.114117 1 server.cc:376] "Polling model repository"
I0528 07:05:02.489851 1 server.cc:376] "Polling model repository"
...[생략: "Polling model repository" 반복]...
참고로 마지막의 "Polling model repository"는 10초마다 모델 리포지토리의 변경을 확인하는 옵션을,
--model-control-mode=poll --repository-poll-secs=10
triton 명령행에서 전달했기 때문입니다.
파이썬 백엔드를 사용하는 경우 triton 서버의 프로세스 구조를 보면,
# pstree -pT
tritonserver(1)-+-dpkg-preconfigu(567)
|-triton_python_b(266)
`-triton_python_b(372)
하위에 triton_python_backend_stub 프로세스가 2개 떠 있는 것을 확인할 수 있습니다. 이때의 명령행을 보면,
/opt/tritonserver/backends/python/triton_python_backend_stub
/model_dir/preprocessing/1/model.py
triton_python_backend_shm_region_71c355c0-2037-4895-9fbb-38fb4af06a8f
1048576
1048576
1
/opt/tritonserver/backends/python
336
preprocessing_0_0
DEFAULT
/opt/tritonserver/backends/python/triton_python_backend_stub
/model_dir/postprocessing/1/model.py
triton_python_backend_shm_region_6ac9d2f4-3535-4a3d-a334-f93d993228e8
1048576
1048576
1
/opt/tritonserver/backends/python
336
postprocessing_0_0
DEFAULT
2개의 프로세스가 각각 preprocessing과 postprocessing을 담당합니다. 테스트를 위해 간단하게 예제 코드를 변경해 볼까요? ^^ 가령, /model_dir/preprocessing/1/model.py 내에 로그를 남기는 코드를 이런 식으로 추가할 수 있습니다.
import triton_python_backend_utils as pb_utils
# ...[생략]...
class TritonPythonModel:
# ...[생략]...
"""
preprocessing main logic
"""
def execute(self, requests):
logger = pb_utils.Logger
logger.log_info("========================================= INFO MSG ====================")
responses = []
for request in requests:
## get request
raw_images = pb_utils.get_input_tensor_by_name(request, "image").as_numpy()
## make response
input_image = gen_input_image(raw_images)
input_image_tensor = pb_utils.Tensor(
"input_image", input_image.astype(np.float32)
)
response = pb_utils.InferenceResponse(
output_tensors=[input_image_tensor]
)
responses.append(response)
return responses
위의 경우는 preprocessing의 model.py에 넣은 것이라 triton 서버 구동 후 (
client.py를 통해) 요청을 보내면 이런 식의 로그가 남는데요,
I0530 01:26:17.118568 1 model.py:51] "========================================= INFO MSG ===================="
사실, logger의 INFO, WARN, ERROR 등을 따지지 않고 출력하고 싶다면 단순히 print 함수를 사용해도 됩니다.
문서를 보면,
Building Custom Python Backend Stub
; https://github.com/triton-inference-server/python_backend/tree/main?tab=readme-ov-file#building-custom-python-backend-stub
triton_python_backend_stub은 사실상 Triton C++ Core와의 통신을 쉽게 할 수 있도록 기존의 Python Interpreter를 감싼 래퍼(wrapper)인 듯합니다. 즉, 기본적으로 Python Interpreter의 모든 기능을 사용할 수 있다고 봐야 합니다.
그렇다면 혹시
sitecustomize.py를 자동 실행하는 PYTHONPATH 환경 변수도 적용이 가능할까요? 테스트 해보면 알겠죠? ^^ 이를 위해 sitecustomize.py를 다음과 같이 작성하고,
$ cat /home/testusr/test/triton_server_example/temp/sitecustomize.py
print("================= SITE CUSTOMIZE ====================")
컨테이너 실행 시 저 파일을 포함한 디렉터리를 볼륨으로 연결하고 이것을 다시 PYTHONPATH 환경 변수로 지정하면,
$ export MODEL_FOLDER_PATH=/home/testusr/test/triton_server_example/triton
$ export MY_PYTHONPATH=/home/testusr/test/triton_server_example/temp
docker run --gpus='"device=0"' -it --rm --shm-size=8g -p 8005:8000 -e PYTHONPATH=/my_temp -e SSL_CERT_DIR=/etc/ssl/certs/ -v ${MY_PYTHONPATH}:/my_temp -v ${MODEL_FOLDER_PATH}:/model_dir tis tritonserver --model-repository=/model_dir --strict-model-config=false --model-control-mode=poll --repository-poll-secs=10 --backend-config=tensorflow,version=2 --log-verbose=1=1
이후 실행 과정에서 triton 서버 초기화 시 다음과 같이 sitecustomize.py의 흔적을 4군데에서 확인할 수 있습니다.
...[생략]...
I0530 02:28:18.840890 1 python_be.cc:1849] "Using Python execution env /model_dir/postprocessing/python_env"
I0530 02:28:18.840923 1 python_be.cc:1849] "Using Python execution env /model_dir/preprocessing/python_env"
I0530 02:28:18.841027 1 pb_env.cc:267] "Returning canonical path since EXECUTION_ENV_PATH does not contain compressed path. Path: /model_dir/postprocessing/python_env"
I0530 02:28:18.841105 1 pb_env.cc:267] "Returning canonical path since EXECUTION_ENV_PATH does not contain compressed path. Path: /model_dir/preprocessing/python_env"
I0530 02:28:19.103146 1 stub_launcher.cc:385] "Starting Python backend stub: source /model_dir/postprocessing/python_env/bin/activate && exec env LD_LIBRARY_PATH=/model_dir/postprocessing/python_env/lib:$LD_LIBRARY_PATH /opt/tritonserver/backends/python/triton_python_backend_stub /model_dir/postprocessing/1/model.py triton_python_backend_shm_region_af0d939b-b62b-44bb-949f-e156d66e41c1 1048576 1048576 1 /opt/tritonserver/backends/python 336 postprocessing DEFAULT"
I0530 02:28:19.103171 1 stub_launcher.cc:385] "Starting Python backend stub: source /model_dir/preprocessing/python_env/bin/activate && exec env LD_LIBRARY_PATH=/model_dir/preprocessing/python_env/lib:$LD_LIBRARY_PATH /opt/tritonserver/backends/python/triton_python_backend_stub /model_dir/preprocessing/1/model.py triton_python_backend_shm_region_13e3cba7-80d3-4ca5-b268-26cf977cf03e 1048576 1048576 1 /opt/tritonserver/backends/python 336 preprocessing DEFAULT"
================= SITE CUSTOMIZE ====================
================= SITE CUSTOMIZE ====================
I0530 02:28:19.750559 1 libtorch.cc:2501] "TRITONBACKEND_Initialize: pytorch"
I0530 02:28:19.750657 1 libtorch.cc:2511] "Triton TRITONBACKEND API version: 1.19"
I0530 02:28:19.750697 1 libtorch.cc:2517] "'pytorch' TRITONBACKEND API version: 1.19"
I0530 02:28:19.750776 1 libtorch.cc:2550] "TRITONBACKEND_ModelInitialize: core (version 1)"
W0530 02:28:19.751482 1 libtorch.cc:329] "skipping model configuration auto-complete for 'core': not supported for pytorch backend"
I0530 02:28:19.752227 1 libtorch.cc:358] "Optimized execution is enabled for model instance 'core'"
I0530 02:28:19.752305 1 libtorch.cc:377] "Cache Cleaning is disabled for model instance 'core'"
I0530 02:28:19.752318 1 libtorch.cc:394] "Inference Mode is enabled for model instance 'core'"
I0530 02:28:19.752362 1 libtorch.cc:413] "cuDNN is enabled for model instance 'core'"
...[생략]...
I0530 02:28:21.955190 1 python_be.cc:2287] "TRITONBACKEND_ModelInstanceInitialize: preprocessing_0_0 (CPU device 0)"
I0530 02:28:21.955247 1 backend_model_instance.cc:69] "Creating instance preprocessing_0_0 on CPU using artifact 'model.py'"
I0530 02:28:21.955354 1 pb_env.cc:267] "Returning canonical path since EXECUTION_ENV_PATH does not contain compressed path. Path: /model_dir/preprocessing/python_env"
I0530 02:28:21.956001 1 stub_launcher.cc:385] "Starting Python backend stub: source /model_dir/preprocessing/python_env/bin/activate && exec env LD_LIBRARY_PATH=/model_dir/preprocessing/python_env/lib:$LD_LIBRARY_PATH /opt/tritonserver/backends/python/triton_python_backend_stub /model_dir/preprocessing/1/model.py triton_python_backend_shm_region_3b77605f-d9bd-40fc-aab8-cd277ca0734c 1048576 1048576 1 /opt/tritonserver/backends/python 336 preprocessing_0_0 DEFAULT"
================= SITE CUSTOMIZE ====================
I0530 02:28:22.955900 1 python_be.cc:2308] "TRITONBACKEND_ModelInstanceInitialize: instance initialization successful preprocessing_0_0 (device 0)"
I0530 02:28:22.956119 1 backend_model_instance.cc:786] "Starting backend thread for preprocessing_0_0 at nice 0 on device 0..."
I0530 02:28:22.956430 1 model_lifecycle.cc:849] "successfully loaded 'preprocessing'"
...[생략]...
I0530 02:28:23.122625 1 python_be.cc:2287] "TRITONBACKEND_ModelInstanceInitialize: postprocessing_0_0 (CPU device 0)"
I0530 02:28:23.122683 1 backend_model_instance.cc:69] "Creating instance postprocessing_0_0 on CPU using artifact 'model.py'"
I0530 02:28:23.122762 1 pb_env.cc:267] "Returning canonical path since EXECUTION_ENV_PATH does not contain compressed path. Path: /model_dir/postprocessing/python_env"
I0530 02:28:23.123564 1 stub_launcher.cc:385] "Starting Python backend stub: source /model_dir/postprocessing/python_env/bin/activate && exec env LD_LIBRARY_PATH=/model_dir/postprocessing/python_env/lib:$LD_LIBRARY_PATH /opt/tritonserver/backends/python/triton_python_backend_stub /model_dir/postprocessing/1/model.py triton_python_backend_shm_region_6ca36aca-7a8f-483d-b203-5438729ff9ca 1048576 1048576 1 /opt/tritonserver/backends/python 336 postprocessing_0_0 DEFAULT"
================= SITE CUSTOMIZE ====================
I0530 02:28:24.873003 1 python_be.cc:2308] "TRITONBACKEND_ModelInstanceInitialize: instance initialization successful postprocessing_0_0 (device 0)"
I0530 02:28:24.873311 1 backend_model_instance.cc:786] "Starting backend thread for postprocessing_0_0 at nice 0 on device 0..."
I0530 02:28:24.873713 1 model_lifecycle.cc:849] "successfully loaded 'postprocessing'"
I0530 02:28:24.874052 1 model_lifecycle.cc:473] "loading: ensemble:1"
I0530 02:28:24.874461 1 ensemble_model.cc:57] "ensemble model for ensemble\n"
I0530 02:28:24.874519 1 model_lifecycle.cc:849] "successfully loaded 'ensemble'"
예제의 경우 총 4개의 프로세스가 뜨는데,
source /model_dir/postprocessing/python_env/bin/activate && exec env LD_LIBRARY_PATH=/model_dir/postprocessing/python_env/lib:$LD_LIBRARY_PATH /opt/tritonserver/backends/python/triton_python_backend_stub /model_dir/postprocessing/1/model.py triton_python_backend_shm_region_af0d939b-b62b-44bb-949f-e156d66e41c1 1048576 1048576 1 /opt/tritonserver/backends/python 336 postprocessing DEFAULT
source /model_dir/preprocessing/python_env/bin/activate && exec env LD_LIBRARY_PATH=/model_dir/preprocessing/python_env/lib:$LD_LIBRARY_PATH /opt/tritonserver/backends/python/triton_python_backend_stub /model_dir/preprocessing/1/model.py triton_python_backend_shm_region_13e3cba7-80d3-4ca5-b268-26cf977cf03e 1048576 1048576 1 /opt/tritonserver/backends/python 336 preprocessing DEFAULT
source /model_dir/preprocessing/python_env/bin/activate && exec env LD_LIBRARY_PATH=/model_dir/preprocessing/python_env/lib:$LD_LIBRARY_PATH /opt/tritonserver/backends/python/triton_python_backend_stub /model_dir/preprocessing/1/model.py triton_python_backend_shm_region_3b77605f-d9bd-40fc-aab8-cd277ca0734c 1048576 1048576 1 /opt/tritonserver/backends/python 336 preprocessing_0_0 DEFAULT
source /model_dir/postprocessing/python_env/bin/activate && exec env LD_LIBRARY_PATH=/model_dir/postprocessing/python_env/lib:$LD_LIBRARY_PATH /opt/tritonserver/backends/python/triton_python_backend_stub /model_dir/postprocessing/1/model.py triton_python_backend_shm_region_6ca36aca-7a8f-483d-b203-5438729ff9ca 1048576 104초8576 1 /opt/tritonserver/backends/python 336 postprocessing_0_0 DEFAULT
처음 두 번의 pre/post는 종료되고 결국 남아 있는 것은 마지막의 pre/postprocessing_0_0 프로세스 2개입니다.
어쨌든, 이런 결과를 봤을 때 triton_python_backend_stub은 사실상 기존에 우리가 알던 python 실행 파일의 기능을 그대로 내장한다고 봐도 무방할 것입니다.
참고로, model repository로 지정된 디렉터리에는 그 하위에 임의의 목적으로 디렉터리를 만들 수 없습니다. 가령 './model_dir/temp'와 같은 디렉터리를 만들어 임시 파일을 넣어 시작하면, triton 서버 초기화 시 이런 오류와 함께 실패해 버립니다.
E0530 01:55:16.055891 1 model_repository_manager.cc:1460] "Poll failed for model directory 'temp': Invalid model name: Could not determine backend for model 'temp' with no backend in model configuration. Expected model name of the form 'model.<backend_name>'."
그러니까 Model Repository의 하위 디렉터리라면 config.pbtxt 등의 파일로 적절하게 모델 정의가 포함돼 있어야 하는 것입니다.
[이 글에 대해서 여러분들과 의견을 공유하고 싶습니다. 틀리거나 미흡한 부분 또는 의문 사항이 있으시면 언제든 댓글 남겨주십시오.]