(시리즈 글이 5개 있습니다.)

개발 환경 구성: 748. Windows + Foundry Local - 로컬에서 AI 모델 활용
; https://www.sysnet.pe.kr/2/0/13943

닷넷: 2337. C# - Hugging Face에 공개된 LLM 모델을 Foundry Local에서 사용하는 방법
; https://www.sysnet.pe.kr/2/0/13954

닷넷: 2338. C# / Foundry Local - Phi-4-multimodal 모델을 사용하는 방법
; https://www.sysnet.pe.kr/2/0/13957

닷넷: 2339. C# - Phi-4-multimodal 모델의 GPU 가속 방법 (ORT 사용)
; https://www.sysnet.pe.kr/2/0/13958

닷넷: 2348. C# - 카카오 카나나 모델 + Microsoft.ML.OnnxRuntimeGenAI 예제
; https://www.sysnet.pe.kr/2/0/13976

C# - Phi-4-multimodal 모델의 GPU 가속 방법 (ORT 사용)

지난 글에서 실습한 "Phi-4-multimodal" 모델을 실행할 때 좀 이상한 게 있었습니다.

C# / Foundry Local - Phi-4-multimodal 모델을 사용하는 방법
; https://www.sysnet.pe.kr/2/0/13957

아무리 그래도 RTX 4060 Ti에서조차 (109 초 정도가 걸릴 정도로) 답변이 너무 느렸다는 점인데요, 게다가 그런 상황에서 Task Manager의 성능 탭에 보이는 GPU 사용량이 거의 변화가 없고 오히려 CPU 사용량이 증가하는 것을 보여줍니다.

위의 그림에서처럼, Inference.Service.Agent.exe가 CPU 사용량을 (16 Core 시스템에서) 60% 이상 차지하고, GPU/Dedicated GPU memory/Shared GPU memory 모두 수치가 0으로 나옵니다. 실제로 응답 시간이 너무 느려서 OpenAIClientOptions를 통해 NetworkTimeout을 (기본값 1분 40초에서) 늘려야만 했습니다.

OpenAIClientOptions options = new OpenAIClientOptions();
options.NetworkTimeout = TimeSpan.FromMinutes(30);

만약 부하가 심한, 이미지 추론 등의 작업을 하는 경우에는 기본 timeout 값을 넘기는 바람에 이런 오류까지 발생합니다.

Unhandled exception. System.AggregateException: Retry failed after 4 tries. (The operation was cancelled because it exceeded the configured timeout of 0:01:40. ) (The operation was cancelled because it exceeded the configured timeout of 0:01:40. ) (The operation was cancelled because it exceeded the configured timeout of 0:01:40. ) (The operation was cancelled because it exceeded the configured timeout of 0:01:40. )
 ---> System.Threading.Tasks.TaskCanceledException: The operation was cancelled because it exceeded the configured timeout of 0:01:40.
 ---> System.Threading.Tasks.TaskCanceledException: The operation was canceled.
 ---> System.Threading.Tasks.TaskCanceledException: The operation was canceled.
 ---> System.IO.IOException: Unable to read data from the transport connection: An established connection was aborted by the software in your host machine..
 ---> System.Net.Sockets.SocketException (10053): An established connection was aborted by the software in your host machine.
   at System.Net.Sockets.NetworkStream.Read(Span`1 buffer)
   --- End of inner exception stack trace ---
   at System.Net.Sockets.NetworkStream.Read(Span`1 buffer)
   at System.Net.Http.HttpConnection.InitialFillAsync(Boolean async)
   at System.Net.Http.HttpConnection.SendAsync(HttpRequestMessage request, Boolean async, CancellationToken cancellationToken)
   --- End of inner exception stack trace ---
   at System.Net.Http.HttpConnection.SendAsync(HttpRequestMessage request, Boolean async, CancellationToken cancellationToken)
   at System.Net.Http.HttpConnectionPool.SendWithVersionDetectionAndRetryAsync(HttpRequestMessage request, Boolean async, Boolean doRequestAuth, CancellationToken cancellationToken)
   at System.Net.Http.HttpMessageHandlerStage.Send(HttpRequestMessage request, CancellationToken cancellationToken)
   at System.Net.Http.DiagnosticsHandler.SendAsync(HttpRequestMessage request, Boolean async, CancellationToken cancellationToken)
   at System.Net.Http.HttpMessageHandlerStage.Send(HttpRequestMessage request, CancellationToken cancellationToken)
   at System.Net.Http.Metrics.MetricsHandler.SendAsync(HttpRequestMessage request, Boolean async, CancellationToken cancellationToken)
   at System.Net.Http.HttpMessageHandlerStage.Send(HttpRequestMessage request, CancellationToken cancellationToken)
   at System.Net.Http.SocketsHttpHandler.Send(HttpRequestMessage request, CancellationToken cancellationToken)
   at System.Net.Http.HttpClient.Send(HttpRequestMessage request, HttpCompletionOption completionOption, CancellationToken cancellationToken)
   --- End of inner exception stack trace ---
   at System.Net.Http.HttpClient.HandleFailure(Exception e, Boolean telemetryStarted, HttpResponseMessage response, CancellationTokenSource cts, CancellationToken cancellationToken, CancellationTokenSource pendingRequestsCts)
   at System.Net.Http.HttpClient.Send(HttpRequestMessage request, HttpCompletionOption completionOption, CancellationToken cancellationToken)
   at System.ClientModel.Primitives.HttpClientPipelineTransport.ProcessSyncOrAsync(PipelineMessage message, Boolean async)
   at System.ClientModel.Internal.TaskExtensions.EnsureCompleted(ValueTask task)
   at System.ClientModel.Primitives.HttpClientPipelineTransport.ProcessCore(PipelineMessage message)
   at System.ClientModel.Primitives.PipelineTransport.ProcessSyncOrAsync(PipelineMessage message, Boolean async)
   ...[생략]...

하지만 우리가 다운로드한 모델은 분명히 "gpu" 버전인데... 왜일까요? Foundry Local의 문서를 보면,

Foundry Local architecture - Hardware abstraction layer
; https://learn.microsoft.com/en-us/azure/ai-foundry/foundry-local/concepts/foundry-local-architecture#hardware-abstraction-layer

The hardware abstraction layer ensures that Foundry Local can run on various devices by abstracting the underlying hardware. To optimize performance based on the available hardware, Foundry Local supports:

multiple execution providers, such as NVIDIA CUDA, AMD, Qualcomm, Intel.

multiple device types, such as CPU, GPU, NPU.

CUDA를 포함한 다양한 Execution Provider를 지원한다고 합니다. 실제로 지난 글에 테스트한 "phi-3-mini-4k"를 사용했을 때는 Inference.Service.Agent.exe 프로세스가 GPU 자원을 소비하는 것을 확인할 수 있습니다. (아마도 foundry model list로 제공되는 것들은 아무런 조치를 하지 않아도 GPU 가속이 되는 것 같습니다.)

(혹시, Foundry Local 환경에서 "Phi-4-multimodal-instruct" 모델을 GPU 가속으로 사용하는 방법을 아시는 분은 덧글 부탁드립니다. ^^)

재미있는 건, "Phi-4-multimodal-instruct" repo에는 CUDA 가속을 사용하는 phi4-mm.py 스크립트 사용법이 나온다는 점입니다.

// WSL + Ubuntu 24.04 환경에서 테스트

# Install the CUDA package of ONNX Runtime GenAI
pip install --pre onnxruntime-genai-cuda

# Please adjust the model directory (-m) accordingly 
curl https://raw.githubusercontent.com/microsoft/onnxruntime-genai/main/examples/python/phi4-mm.py -o phi4-mm.py
python phi4-mm.py -m gpu/gpu-int4-rtn-block-32 -e cuda

의외로 onnxruntime_genai 패키지를 사용하다 보니 생각보다 간단한 구조를 갖고 있는데요,

# Copyright (c) Microsoft Corporation. All rights reserved.
# Licensed under the MIT License

import argparse
import os
import glob
import time
from pathlib import Path

import onnxruntime_genai as og

# og.set_log_options(enabled=True, model_input_values=True, model_output_values=True)

def _find_dir_contains_sub_dir(current_dir: Path, target_dir_name):
    curr_path = Path(current_dir).absolute()
    target_dir = glob.glob(target_dir_name, root_dir=curr_path)
    if target_dir:
        return Path(curr_path / target_dir[0]).absolute()
    else:
        if curr_path.parent == curr_path:
            # Root dir
            return None
        return _find_dir_contains_sub_dir(curr_path / '..', target_dir_name)


def _complete(text, state):
    return (glob.glob(text + "*") + [None])[state]


def get_paths(modality, user_provided_paths, default_paths, interactive):
    paths = None

    if interactive:
        try:
            import readline
            readline.set_completer_delims(" \t\n;")
            readline.parse_and_bind("tab: complete")
            readline.set_completer(_complete)
        except ImportError:
            # Not available on some platforms. Ignore it.
            pass
        paths = [
            path.strip()
            for path in input(
                f"{modality.capitalize()} Path (comma separated; leave empty if no {modality}): "
            ).split(",")
        ]
    else:
        paths = user_provided_paths if user_provided_paths else default_paths

    paths = [path for path in paths if path]
    return paths


def run(args: argparse.Namespace):
    print("Loading model...")
    config = og.Config(args.model_path)
    if args.execution_provider != "follow_config":
        config.clear_providers()
        if args.execution_provider != "cpu":
            print(f"Setting model to {args.execution_provider}...")
            config.append_provider(args.execution_provider)
    model = og.Model(config)
    print("Model loaded")

    processor = model.create_multimodal_processor()
    tokenizer_stream = processor.create_stream()

    interactive = not args.non_interactive

    while True:
        image_paths = get_paths(
            modality="image",
            user_provided_paths=args.image_paths,
            default_paths=[str(_find_dir_contains_sub_dir(Path(__file__).parent, "test") / "test_models" / "images" / "australia.jpg")],
            interactive=interactive
        )

        audio_paths = get_paths(
            modality="audio",
            user_provided_paths=args.audio_paths,
            default_paths=[str(_find_dir_contains_sub_dir(Path(__file__).parent, "test") / "test_models" / "audios" / "1272-141231-0002.mp3")],
            interactive=interactive
        )

        images = None
        audios = None
        prompt = "<|user|>\n"

        # Get images
        if len(image_paths) == 0:
            print("No image provided")
        else:
            for i, image_path in enumerate(image_paths):
                if not os.path.exists(image_path):
                    raise FileNotFoundError(f"Image file not found: {image_path}")
                print(f"Using image: {image_path}")
                prompt += f"<|image_{i+1}|>\n"
            images = og.Images.open(*image_paths)

        # Get audios
        if len(audio_paths) == 0:
            print("No audio provided")
        else:
            for i, audio_path in enumerate(audio_paths):
                if not os.path.exists(audio_path):
                    raise FileNotFoundError(f"Audio file not found: {audio_path}")
                print(f"Using audio: {audio_path}")
                prompt += f"<|audio_{i+1}|>\n"
            audios = og.Audios.open(*audio_paths)


        if interactive:
            text = input("Prompt: ")
        else:
            if args.prompt:
                text = args.prompt
            else:
                text = "Does the audio summarize what is shown in the image? If not, what is different?"
        prompt += f"{text}<|end|>\n<|assistant|>\n"
        
        print("Processing inputs...")
        inputs = processor(prompt, images=images, audios=audios)
        print("Processor complete.")

        print("Generating response...")
        params = og.GeneratorParams(model)
        params.set_inputs(inputs)
        params.set_search_options(max_length=7680)

        generator = og.Generator(model, params)
        start_time = time.time()

        while not generator.is_done():
            generator.generate_next_token()

            new_token = generator.get_next_tokens()[0]
            print(tokenizer_stream.decode(new_token), end="", flush=True)

        print()
        total_run_time = time.time() - start_time
        print(f"Total Time : {total_run_time:.2f}")

        for _ in range(3):
            print()

        # Delete the generator to free the captured graph before creating another one
        del generator

        if not interactive:
            break


if __name__ == "__main__":
    parser = argparse.ArgumentParser()
    parser.add_argument(
        "-m", "--model_path", type=str, required=True, help="Path to the folder containing the model"
    )
    parser.add_argument(
        "-e", "--execution_provider", type=str, required=False, default='follow_config', choices=["cpu", "cuda", "dml", "follow_config"], help="Execution provider to run the ONNX Runtime session with. Defaults to follow_config that uses the execution provider listed in the genai_config.json instead."
    )
    parser.add_argument(
        "--image_paths", nargs='*', type=str, required=False, help="Path to the images, mainly for CI usage"
    )
    parser.add_argument(
        "--audio_paths", nargs='*', type=str, required=False, help="Path to the audios, mainly for CI usage"
    )
    parser.add_argument(
        '-pr', '--prompt', required=False, help='Input prompts to generate tokens from, mainly for CI usage'
    )
    parser.add_argument(
        '--non-interactive', action=argparse.BooleanOptionalAction, required=False, help='Non-interactive mode, mainly for CI usage'
    )
    args = parser.parse_args()
    run(args)

onnxruntime_genai.Config 인스턴스를 통해 cuda를 사용하라고 설정한 다음 onnxruntime_genai.Model을 생성하고 있습니다. 오호~~~ 명시적으로 Execution Provider를 선택할 수 있는 옵션이 있군요. ^^ 게다가 다음과 같은 함수 호출로 현재 제공되는 Execution Provider도 확인이 가능합니다.

import onnxruntime as ort

print(ort.get_device())
print(ort.get_available_providers())

'''
실행 결과:
GPU
['TensorrtExecutionProvider', 'CUDAExecutionProvider', 'CPUExecutionProvider']
'''

그렇다면, foundry local에서도 저런 식으로 (자동이 아닌) 수작업으로 직접 Execution Provider를 설정하는 기능이 있다면 좋겠는데 아직은 (Preview 버전이니만큼) 그런 배려는 없는 듯합니다.

어쨌든, ORT(ONNX Runtime) GenAI는 파이썬과 함께 C# API도 유사한 수준으로 제공하는데요,

Generate API (Preview) - Python API
; https://onnxruntime.ai/docs/genai/api/python.html

ONNX Runtime generate() C# API
; https://onnxruntime.ai/docs/genai/api/csharp.html

// 그러고 보면, OpenAI API의 OpenAI.Chat.ChatClient도 Execution Provider를 선택하는 기능은 없는 것 같습니다.
// 엄밀히 클라이언트/서버 구조의 OpenAI API 방식에서는 엔진을 실제로 구동하는 서버 측에 Execution Provider를 제공하는 기능이 있는 것이 맞을 것입니다.
// 그런 의미에서 Foundry Local 자체에 Execution Provider 옵션이 제공돼야 하는 것이고.

당연히 (파이썬 예제에서 그랬던 것처럼) Execution Provider를 지정하는 것도 가능하지만, 이와 함께 참조해야 할 패키지가 "Microsoft.ML.OnnxRuntimeGenAI"가 아닌 "Microsoft.ML.OnnxRuntimeGenAI.CUDA"여야 한다는 점이 다릅니다. ^^

using Microsoft.ML.OnnxRuntimeGenAI;

internal class Program
{
    // 1. 사전 환경 구성: Cuda Toolkit과 cuDNN 라이브러리를 설치

    // 2. 패키지를 CUDA 버전으로 교체
    // Install-Package Microsoft.ML.OnnxRuntimeGenAI.CUDA

    private static void Main(string[] args)
    {
        string modelPath = @"C:\temp\models\Phi-4-multimodal-instruct-onnx";

        Console.Write("Loading model from " + modelPath + "...");

        Config onnxConfig = new Config(modelPath);
        onnxConfig.AppendProvider("cuda");

        using Model model = new(onnxConfig);
        Console.Write("Done\n");
        using Tokenizer tokenizer = new(model);
        using TokenizerStream tokenizerStream = tokenizer.CreateStream();

        while (true)
        {
            Console.Write("User:");

            string prompt = "<|im_start|>user\n" +
                            Console.ReadLine() +
                            "<|im_end|>\n<|im_start|>assistant\n";
            var sequences = tokenizer.Encode(prompt);

            using GeneratorParams gParams = new GeneratorParams(model);
            gParams.SetSearchOption("max_length", 200);
            using Generator generator = new(model, gParams);
            generator.AppendTokenSequences(sequences);

            Console.Out.Write("\nAI:");
            while (!generator.IsDone())
            {
                generator.GenerateNextToken();
                var token = generator.GetSequence(0)[^1];
                Console.Out.Write(tokenizerStream.Decode(token));
                Console.Out.Flush();
            }
            Console.WriteLine();
        }
    }
}

실행해 보면 GPU를 사용하므로, (모델 로딩 시간 제외하고) 약 2.6초 정도에 결과가 나옵니다. (109초에 비하면 정말 빠릅니다. ^^)

(첨부 파일은 이 글의 예제 코드를 포함합니다.)

참고로, 위의 예제를 실행할 때 다음과 같은 오류가 발생한다면?

Loading model from C:\temp\models\Phi-4-multimodal-instruct-onnx...Unhandled exception. Microsoft.ML.OnnxRuntimeGenAI.OnnxRuntimeGenAIException: CUDA execution provider is not enabled in this build.
   at Microsoft.ML.OnnxRuntimeGenAI.Model..ctor(Config config)
   at Program.Main(String[] args) in C:\temp\ConsoleApp1\Program.cs:line 15

Microsoft.ML.OnnxRuntimeGenAI.OnnxRuntimeGenAIException: 'CUDA execution provider is not enabled in this build.' #1191
; https://github.com/microsoft/onnxruntime-genai/issues/1191

"Microsoft.ML.OnnxRuntimeGenAI.CUDA" 패키지가 아닌 "Microsoft.ML.OnnxRuntimeGenAI" 패키지를 참조한 상태에서 "cuda" Provider를 활성화하려고 시도했기 때문입니다.

또는, 이런 오류가 발생한다면?

Unhandled exception. Microsoft.ML.OnnxRuntimeGenAI.OnnxRuntimeGenAIException: D:\a\_work\1\s\onnxruntime\core\session\provider_bridge_ort.cc:1778 onnxruntime::ProviderLibrary::Get [ONNXRuntimeError] : 1 : FAIL : Error loading "C:\temp\ConsoleApp1\bin\Debug\net8.0\runtimes\win-x64\native\onnxruntime_providers_cuda.dll" which depends on "cudnn64_9.dll" which is missing. (Error 126: "The specified module could not be found.")

   at Microsoft.ML.OnnxRuntimeGenAI.Model..ctor(Config config)
   at Program.Main(String[] args) in C:\temp\ConsoleApp1\Program.cs:line 15

NVIDIA cuDNN 라이브러리를 설치했는지 확인해 보시고,

NVIDIA cuDNN
; https://developer.nvidia.com/cudnn

설치했다면 로컬에 대충 다음과 같은 경로로 cudnn DLL 파일이 위치할 텐데요,

C:\> dir /a/s cudnn64_9.dll
 ...[생략]...

 Directory of C:\Program Files\NVIDIA\CUDNN\v9.10\bin\11.8

2025-05-14  오전 09:25           266,288 cudnn64_9.dll
               1 File(s)        266,288 bytes

 Directory of C:\Program Files\NVIDIA\CUDNN\v9.10\bin\12.9

2025-05-14  오전 09:43           265,760 cudnn64_9.dll
               1 File(s)        265,760 bytes

...[생략]...

Model 초기화 전에 적절한 버전의 것으로 PATH 환경 변수에 연결해 주시면 됩니다.

private static void Main(string[] args)
{
    // https://developer.nvidia.com/cudnn
    string? path = Environment.GetEnvironmentVariable("PATH");
    path += @";C:\Program Files\NVIDIA\CUDNN\v9.10\bin\12.9";
    Environment.SetEnvironmentVariable("PATH", path);

    // ...[생략]...
}

Windows의 WSL 환경에서 phi4-mm.py 스크립트 실행 시 이런 오류가 발생한다면?

$ python phi4-mm.py -m /mnt/c/foundry_cache/models/Phi-4-multimodal-instruct-onnx -e cuda
Loading model...
Setting model to cuda...
Traceback (most recent call last):
  File "/mnt/c/temp/phi4-mm.py", line 174, in <module>
    run(args)
  File "/mnt/c/temp/phi4-mm.py", line 63, in run
    model = og.Model(config)
RuntimeError: /onnxruntime_src/onnxruntime/core/session/provider_bridge_ort.cc:1778 onnxruntime::Provider& onnxruntime::ProviderLibrary::Get() [ONNXRuntimeError] : 1 : FAIL : Failed to load library libonnxruntime_providers_cuda.so with error: libcublasLt.so.12: cannot open shared object file: No such file or directory

WSL 환경에서 CUDA를 위한 환경 구성을 하지 않았기 때문입니다. 따라서 아래의 글에 따라,

Windows + WSL2 환경에서 (tensorflow 등의) NVIDIA GPU 인식
; https://www.sysnet.pe.kr/2/0/13937#install_cuda

필요한 구성 요소를 설치한 후,

$ wget https://developer.download.nvidia.com/compute/cuda/repos/wsl-ubuntu/x86_64/cuda-wsl-ubuntu.pin
$ sudo mv cuda-wsl-ubuntu.pin /etc/apt/preferences.d/cuda-repository-pin-600
$ wget https://developer.download.nvidia.com/compute/cuda/12.9.0/local_installers/cuda-repo-wsl-ubuntu-12-9-local_12.9.0-1_amd64.deb
$ sudo dpkg -i cuda-repo-wsl-ubuntu-12-9-local_12.9.0-1_amd64.deb
$ sudo cp /var/cuda-repo-wsl-ubuntu-12-9-local/cuda-*-keyring.gpg /usr/share/keyrings/
$ sudo apt-get update
$ sudo apt-get -y install cuda-toolkit-12-9

$ ls -l /home/testusr/miniconda3/envs/huggingface-build/lib/python3.10/site-packages/onnxruntime/capi/libonnxruntime_providers_cuda.so
-rwxr-xr-x 1 testusr testusr 407530104 Jun 18 08:34 /home/testusr/miniconda3/envs/huggingface-build/lib/python3.10/site-packages/onnxruntime/capi/libonnxruntime_providers_cuda.so

다시 실행해 보면, 이번엔 오류 메시지가 바뀌어 여전히 문제가 발생하긴 할 것입니다.

$ python phi4-mm.py -m /mnt/c/foundry_cache/models/Phi-4-multimodal-instruct-onnx -e cuda
Loading model...
Setting model to cuda...
Traceback (most recent call last):
  File "/mnt/c/temp/phi4-mm.py", line 174, in <module>
    run(args)
  File "/mnt/c/temp/phi4-mm.py", line 63, in run
    model = og.Model(config)
RuntimeError: /onnxruntime_src/onnxruntime/core/session/provider_bridge_ort.cc:1778 onnxruntime::Provider& onnxruntime::ProviderLibrary::Get() [ONNXRuntimeError] : 1 : FAIL : Failed to load library libonnxruntime_providers_cuda.so with error: libcudnn.so.9: cannot open shared object file: No such file or directory

그런데, 이상하군요, 분명히 libcudnn.so.9 파일이 있는데도,

$ ls -l /home/testusr/miniconda3/envs/huggingface-build/lib/python3.10/site-packages/nvidia/cudnn/lib/libcudnn.so.9
-rw-r--r-- 1 testusr testusr 125136 Jun 18 08:34 /home/testusr/miniconda3/envs/huggingface-build/lib/python3.10/site-packages/nvidia/cudnn/lib/libcudnn.so.9

$ ldd /home/testusr/miniconda3/envs/huggingface-build/lib/python3.10/site-packages/onnxruntime/capi/libonnxruntime_providers_cuda.so
        linux-vdso.so.1 (0x00007ffe7817e000)
        libcublasLt.so.12 => /usr/local/cuda/targets/x86_64-linux/lib/libcublasLt.so.12 (0x00007f14d7000000)
        libcublas.so.12 => /usr/local/cuda/targets/x86_64-linux/lib/libcublas.so.12 (0x00007f14d0800000)
        libcurand.so.10 => /usr/local/cuda/targets/x86_64-linux/lib/libcurand.so.10 (0x00007f14c6000000)
        libcufft.so.11 => /usr/local/cuda/targets/x86_64-linux/lib/libcufft.so.11 (0x00007f14b4200000)
        libcudart.so.12 => /usr/local/cuda/targets/x86_64-linux/lib/libcudart.so.12 (0x00007f14b3e00000)
        libcudnn.so.9 => not found
        libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2 (0x00007f1509133000)
        librt.so.1 => /lib/x86_64-linux-gnu/librt.so.1 (0x00007f150912e000)
        libnvrtc.so.12 => /usr/local/cuda/targets/x86_64-linux/lib/libnvrtc.so.12 (0x00007f14ad536000)
        libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 (0x00007f1509129000)
        libstdc++.so.6 => /lib/x86_64-linux-gnu/libstdc++.so.6 (0x00007f14ad2b8000)
        libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007f14d6f17000)
        libgcc_s.so.1 => /lib/x86_64-linux-gnu/libgcc_s.so.1 (0x00007f15090f9000)
        libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007f14ad0a6000)
        /lib64/ld-linux-x86-64.so.2 (0x00007f15217d3000)

연결이 안 되고 있는 것입니다. 검색해 보면 이와 관련한 이슈가 있는데요,

sam 2 install failed "libcudnn.so.9: cannot open shared object file: No such file or directory" but libcudnn.so.9 is installed #112
; https://github.com/facebookresearch/sam2/issues/112

덧글의 의견에서 LD_LIBRARY_PATH 또는 LD_PRELOAD 환경 변수를 연결해 보라고 하는데, 제 경우에는 LD_PRELOAD를 적용해 주니 해결됐습니다.

// 해결이 안 됨
// export LD_LIBRARY_PATH=./miniconda3/envs/huggingface-build/lib/python3.10/site-packages/nvidia/cudnn/lib:${LD_LIBRARY_PATH}

// OK
$ export LD_PRELOAD=/home/testusr/miniconda3/envs/huggingface-build/lib/python3.10/site-packages/nvidia/cudnn/lib/libcudnn.so.9
$ ldd /home/testusr/miniconda3/envs/huggingface-build/lib/python3.10/site-packages/onnxruntime/capi/libonnxruntime_providers_cuda.so
        linux-vdso.so.1 (0x00007ffee7b25000)
        /home/testusr/miniconda3/envs/huggingface-build/lib/python3.10/site-packages/nvidia/cudnn/lib/libcudnn.so.9 (0x00007fc40b600000)
        ...[생략]...

만약 이렇게 오류가 발생한다면?

$ python phi4-mm.py -m /mnt/c/foundry_cache/models/Phi-4-multimodal-instruct-onnx -e cuda
Loading model...
Setting model to cuda...
Model loaded
Traceback (most recent call last):
  File "/home/testusr/miniconda3/envs/huggingface-build/lib/python3.10/pathlib.py", line 625, in __str__
    return self._str
AttributeError: 'PosixPath' object has no attribute '_str'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/mnt/c/temp/phi4-mm.py", line 174, in <module>
    run(args)
  File "/mnt/c/temp/phi4-mm.py", line 75, in run
    default_paths=[str(_find_dir_contains_sub_dir(Path(__file__).parent, "test") / "test_models" / "images" / "australia.jpg")],
  File "/mnt/c/temp/phi4-mm.py", line 23, in _find_dir_contains_sub_dir
    return _find_dir_contains_sub_dir(curr_path / '..', target_dir_name)
  File "/mnt/c/temp/phi4-mm.py", line 23, in _find_dir_contains_sub_dir
    return _find_dir_contains_sub_dir(curr_path / '..', target_dir_name)
  File "/mnt/c/temp/phi4-mm.py", line 23, in _find_dir_contains_sub_dir
    return _find_dir_contains_sub_dir(curr_path / '..', target_dir_name)
  [Previous line repeated 986 more times]
  File "/mnt/c/temp/phi4-mm.py", line 16, in _find_dir_contains_sub_dir
    target_dir = glob.glob(target_dir_name, root_dir=curr_path)
  File "/home/testusr/miniconda3/envs/huggingface-build/lib/python3.10/glob.py", line 24, in glob
    return list(iglob(pathname, root_dir=root_dir, dir_fd=dir_fd, recursive=recursive))
  File "/home/testusr/miniconda3/envs/huggingface-build/lib/python3.10/glob.py", line 40, in iglob
    root_dir = os.fspath(root_dir)
  File "/home/testusr/miniconda3/envs/huggingface-build/lib/python3.10/pathlib.py", line 632, in __fspath__
    return str(self)
  File "/home/testusr/miniconda3/envs/huggingface-build/lib/python3.10/pathlib.py", line 627, in __str__
    self._str = self._format_parsed_parts(self._drv, self._root,
  File "/home/testusr/miniconda3/envs/huggingface-build/lib/python3.10/pathlib.py", line 611, in _format_parsed_parts
    return drv + root + cls._flavour.join(parts[1:])
RecursionError: maximum recursion depth exceeded while calling a Python object

"python phi4-mm.py" 명령어를 실행하는 디렉터리를 기준으로 상위에 "test"라는 빈 디렉터리를 하나 만들어 두면 됩니다. 가령 위의 명령어를 "/mnt/c/temp"에서 실행했다면 "/mnt/c/test" 디렉터리를 만들어야 합니다. 굳이 저렇게 만들 이유가 없었을 것 같은데, phi4-mm.py은 test 디렉터리를 발견할 때까지 계속해서 ".." 경로를 추가하면서 무한 반복합니다.

[이 글에 대해서 여러분들과 의견을 공유하고 싶습니다. 틀리거나 미흡한 부분 또는 의문 사항이 있으시면 언제든 댓글 남겨주십시오.]

[다음 글] 오류 유형: 966. Ubuntu - ping: connect: Network is unreachable
[이전 글] 닷넷: 2338. C# / Foundry Local - Phi-4-multimodal 모델을 사용하는 방법

[최초 등록일: 6/21/2025]
[최종 수정일: 7/25/2025]

이 저작물은 크리에이티브 커먼즈 코리아 저작자표시-비영리-변경금지 2.0 대한민국 라이센스에 따라 이용하실 수 있습니다.

by SeongTae Jeong, mailto:techsharer at outlook.com

No	Writer	Date	Cnt.	Title	File(s)
13983	정성태	8/4/2025	489	오류 유형: 972. .NET 8 이상만 지원하는 Microsoft.Data.SqlClient 6.1.0 버전
13982	정성태	8/2/2025	529	개발 환경 구성: 753. CentOS 7 컨테이너 내에서 openssh 서버 호스팅
13981	정성태	8/1/2025	537	오류 유형: 971. CentOS 7에서 yum 사용 시 "Could not resolve host: mirrorlist.centos.org; Unknown error"
13980	정성태	7/31/2025	636	Linux: 119. eBPF - BPF_PROG_TYPE_CGROUP_SOCK 유형에서 정상 동작하지 않는 BPF_CORE_READ (2)
13979	정성태	7/30/2025	695	Linux: 118. eBPF - BPF_PROG_TYPE_CGROUP_SOCK 유형에서 정상 동작하지 않는 BPF_CORE_READ
13978	정성태	7/29/2025	794	오류 유형: 970. 파일 복사 시 "Data error (cyclic redundancy check). (0x80070017)" 에러
13977	정성태	7/28/2025	1019	닷넷: 2349. C# 14 - (5) 문자열 리터럴을 utf-8 인코딩으로 저장	1
13976	정성태	7/25/2025	1184	닷넷: 2348. C# - 카카오 카나나 모델 + Microsoft.ML.OnnxRuntimeGenAI 예제	1
13975	정성태	7/23/2025	1274	닷넷: 2347. C# 14 - (4) 형식 인자가 없는 제네릭 타입의 nameof 지원	1
13974	정성태	7/22/2025	1488	닷넷: 2346. C# 14 - (3) Span 타입과 배열 간의 암시적 형변환	1
13973	정성태	7/21/2025	1386	닷넷: 2345. C# - 배열 및 Span의 공변성	1
13972	정성태	7/21/2025	1439	닷넷: 2344. C#의 Identity conversion 의미	1
13971	정성태	7/17/2025	1472	닷넷: 2343. C# 14 - (2) 속성 구문에서 문맥 키워드로 추가되는 field 예약어	1
13970	정성태	7/17/2025	1415	닷넷: 2342. C# 14 - (1) (예약)
13969	정성태	7/17/2025	1466	닷넷: 2341. snap으로 설치한 .NET 리눅스 실행 환경
13968	정성태	7/16/2025	1352	오류 유형: 969. lddtree - TypeError: 'type' object is not subscriptable
13967	정성태	7/16/2025	1887	오류 유형: 968. snap으로 설치한 "dotnet run" 실행 시 "undefined symbol: _dl_audit_symbind_alt, version GLIBC_PRIVATE" 오류
13966	정성태	7/15/2025	2422	디버깅 기술: 223. WinDbg - .kframes 명령어
13965	정성태	7/11/2025	1711	오류 유형: 967. 디버깅 모드로 실행 시 "Could not find file 'C:\Program Files\IIS Express\Oracle.DataAccess.Common.Configuration.Section.xsd'" 예외
13964	정성태	7/10/2025	2422	닷넷: 2340. C# - Win32 Multimedia Timer 주기	1
13963	정성태	7/8/2025	1961	VS.NET IDE: 202. Visual Studio 2022 + Copilot 기본 사용법
13962	정성태	7/7/2025	1974	스크립트: 79. 파이썬 - onnxruntime_genai에서 지원하지 않는 모델 사용
13961	정성태	7/5/2025	1702	디버깅 기술: 222. WinDbg 분석 사례 - IISreset 시점에 w3wp.exe의 crash 발생
13960	정성태	7/3/2025	2868	개발 환경 구성: 752. ProcDump - C/C++ 예외 코드 필터를 지정한 덤프 생성 [2]
13959	정성태	6/25/2025	1847	오류 유형: 966. Ubuntu - ping: connect: Network is unreachable

AD BLOCK 해제 요청

C# - Phi-4-multimodal 모델의 GPU 가속 방법 (ORT 사용)