(시리즈 글이 4개 있습니다.)

VC++: 125. CUDA로 작성한 RGB2RGBA 성능
; https://www.sysnet.pe.kr/2/0/11471

개발 환경 구성: 356. GTX 1070, GTX 960, GT 640M의 cudaGetDeviceProperties 출력 결과
; https://www.sysnet.pe.kr/2/0/11472

개발 환경 구성: 357. CUDA의 인덱싱 관련 용어 - blockIdx, threadIdx, blockDim, gridDim
; https://www.sysnet.pe.kr/2/0/11481

VC++: 126. CUDA Core 수를 알아내는 방법
; https://www.sysnet.pe.kr/2/0/11482

GTX 1070, GTX 960, GT 640M의 cudaGetDeviceProperties 출력 결과

그냥 기록 차원에서 남깁니다. ^^

우선, CUDA 사양을 cudaGetDeviceProperties API를 통해서 구하는 것이 가능하지만 CUDA SDK 설치 시 /extras/demo_suite 폴더의 deviceQuery.exe 프로그램을 이용하는 것이 편리합니다.

다음은, GTX 1070 시스템에서의 출력 결과이고,

 CUDA Device Query (Runtime API) version (CUDART static linking)

Detected 1 CUDA Capable device(s)

Device 0: "GeForce GTX 1070"
  CUDA Driver Version / Runtime Version          9.1 / 9.1
  CUDA Capability Major/Minor version number:    6.1
  Total amount of global memory:                 8192 MBytes (8589934592 bytes)
  (15) Multiprocessors, (128) CUDA Cores/MP:     1920 CUDA Cores
  GPU Max Clock rate:                            1709 MHz (1.71 GHz)
  Memory Clock rate:                             4004 Mhz
  Memory Bus Width:                              256-bit
  L2 Cache Size:                                 2097152 bytes
  Maximum Texture Dimension Size (x,y,z)         1D=(131072), 2D=(131072, 65536), 3D=(16384, 16384, 16384)
  Maximum Layered 1D Texture Size, (num) layers  1D=(32768), 2048 layers
  Maximum Layered 2D Texture Size, (num) layers  2D=(32768, 32768), 2048 layers
  Total amount of constant memory:               65536 bytes
  Total amount of shared memory per block:       49152 bytes
  Total number of registers available per block: 65536
  Warp size:                                     32
  Maximum number of threads per multiprocessor:  2048
  Maximum number of threads per block:           1024
  Max dimension size of a thread block (x,y,z): (1024, 1024, 64)
  Max dimension size of a grid size    (x,y,z): (2147483647, 65535, 65535)
  Maximum memory pitch:                          2147483647 bytes
  Texture alignment:                             512 bytes
  Concurrent copy and kernel execution:          Yes with 2 copy engine(s)
  Run time limit on kernels:                     Yes
  Integrated GPU sharing Host Memory:            No
  Support host page-locked memory mapping:       Yes
  Alignment requirement for Surfaces:            Yes
  Device has ECC support:                        Disabled
  CUDA Device Driver Mode (TCC or WDDM):         WDDM (Windows Display Driver Model)
  Device supports Unified Addressing (UVA):      Yes
  Device PCI Domain ID / Bus ID / location ID:   0 / 1 / 0
  Compute Mode:
     < Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) >

deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 9.1, CUDA Runtime Version = 9.1, NumDevs = 1, Device0 = GeForce GTX 1070
Result = PASS

다음은 GTX 960,

 CUDA Device Query (Runtime API) version (CUDART static linking)

Detected 1 CUDA Capable device(s)

Device 0: "GeForce GTX 960"
  CUDA Driver Version / Runtime Version          9.1 / 9.1
  CUDA Capability Major/Minor version number:    5.2
  Total amount of global memory:                 4096 MBytes (4294967296 bytes)
  ( 8) Multiprocessors, (128) CUDA Cores/MP:     1024 CUDA Cores
  GPU Max Clock rate:                            1291 MHz (1.29 GHz)
  Memory Clock rate:                             3505 Mhz
  Memory Bus Width:                              128-bit
  L2 Cache Size:                                 1048576 bytes
  Maximum Texture Dimension Size (x,y,z)         1D=(65536), 2D=(65536, 65536), 3D=(4096, 4096, 4096)
  Maximum Layered 1D Texture Size, (num) layers  1D=(16384), 2048 layers
  Maximum Layered 2D Texture Size, (num) layers  2D=(16384, 16384), 2048 layers
  Total amount of constant memory:               65536 bytes
  Total amount of shared memory per block:       49152 bytes
  Total number of registers available per block: 65536
  Warp size:                                     32
  Maximum number of threads per multiprocessor:  2048
  Maximum number of threads per block:           1024
  Max dimension size of a thread block (x,y,z): (1024, 1024, 64)
  Max dimension size of a grid size    (x,y,z): (2147483647, 65535, 65535)
  Maximum memory pitch:                          2147483647 bytes
  Texture alignment:                             512 bytes
  Concurrent copy and kernel execution:          Yes with 2 copy engine(s)
  Run time limit on kernels:                     Yes
  Integrated GPU sharing Host Memory:            No
  Support host page-locked memory mapping:       Yes
  Alignment requirement for Surfaces:            Yes
  Device has ECC support:                        Disabled
  CUDA Device Driver Mode (TCC or WDDM):         WDDM (Windows Display Driver Model)
  Device supports Unified Addressing (UVA):      Yes
  Device PCI Domain ID / Bus ID / location ID:   0 / 1 / 0
  Compute Mode:
     < Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) >

deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 9.1, CUDA Runtime Version = 9.1, NumDevs = 1, Device0 = GeForce GTX 960
Result = PASS

다음은 GT 640M입니다.

 CUDA Device Query (Runtime API) version (CUDART static linking)

Detected 1 CUDA Capable device(s)

Device 0: "GeForce GT 640M"
  CUDA Driver Version / Runtime Version          9.1 / 9.1
  CUDA Capability Major/Minor version number:    3.0
  Total amount of global memory:                 1024 MBytes (1073741824 bytes)
  ( 2) Multiprocessors, (192) CUDA Cores/MP:     384 CUDA Cores
  GPU Max Clock rate:                            709 MHz (0.71 GHz)
  Memory Clock rate:                             2000 Mhz
  Memory Bus Width:                              128-bit
  L2 Cache Size:                                 262144 bytes
  Maximum Texture Dimension Size (x,y,z)         1D=(65536), 2D=(65536, 65536), 3D=(4096, 4096, 4096)
  Maximum Layered 1D Texture Size, (num) layers  1D=(16384), 2048 layers
  Maximum Layered 2D Texture Size, (num) layers  2D=(16384, 16384), 2048 layers
  Total amount of constant memory:               65536 bytes
  Total amount of shared memory per block:       49152 bytes
  Total number of registers available per block: 65536
  Warp size:                                     32
  Maximum number of threads per multiprocessor:  2048
  Maximum number of threads per block:           1024
  Max dimension size of a thread block (x,y,z): (1024, 1024, 64)
  Max dimension size of a grid size    (x,y,z): (2147483647, 65535, 65535)
  Maximum memory pitch:                          2147483647 bytes
  Texture alignment:                             512 bytes
  Concurrent copy and kernel execution:          Yes with 1 copy engine(s)
  Run time limit on kernels:                     Yes
  Integrated GPU sharing Host Memory:            No
  Support host page-locked memory mapping:       Yes
  Alignment requirement for Surfaces:            Yes
  Device has ECC support:                        Disabled
  CUDA Device Driver Mode (TCC or WDDM):         WDDM (Windows Display Driver Model)
  Device supports Unified Addressing (UVA):      Yes
  Device PCI Domain ID / Bus ID / location ID:   0 / 1 / 0
  Compute Mode:
     < Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) >

deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 9.1, CUDA Runtime Version = 9.1, NumDevs = 1, Device0 = GeForce GT 640M
Result = PASS

참고로, Hyper-V의 RemoteFX 환경에서는 (당연하지만) 다음과 같이 CUDA 호환 장비를 찾을 수 없다는 메시지가 나옵니다.

 CUDA Device Query (Runtime API) version (CUDART static linking)

cudaGetDeviceCount returned 35
-> CUDA driver version is insufficient for CUDA runtime version
Result = FAIL

[이 글에 대해서 여러분들과 의견을 공유하고 싶습니다. 틀리거나 미흡한 부분 또는 의문 사항이 있으시면 언제든 댓글 남겨주십시오.]

[다음 글] .NET Framework: 734. C# - Thread.Suspend 호출 시 응용 프로그램 hang 현상
[이전 글] VC++: 125. CUDA로 작성한 RGB2RGBA 성능

[최초 등록일: 3/22/2018]
[최종 수정일: 3/22/2018]

이 저작물은 크리에이티브 커먼즈 코리아 저작자표시-비영리-변경금지 2.0 대한민국 라이센스에 따라 이용하실 수 있습니다.

by SeongTae Jeong, mailto:techsharer at outlook.com

No	Writer	Date	Cnt.	Title	File(s)
11727	정성태	10/8/2018	27858	오류 유형: 491. Visual Studio의 리눅스 SSH 원격 연결 - "Connectivity Failure. Please make sure host name and port number are correct."
11726	정성태	10/7/2018	31893	사물인터넷: 49. 라즈베리 파이를 이용해 원격 컴퓨터의 전원 스위치 제어	1
11724	정성태	10/5/2018	31772	개발 환경 구성: 407. 유니코드와 한글 - "Hangul Compatibility Jamo"	1
11723	정성태	10/4/2018	21975	개발 환경 구성: 406. "Docker for Windows" 컨테이너 내의 .NET Core 응용 프로그램에서 직렬 포트(Serial Port, COM Port) 사용 방법
11722	정성태	10/4/2018	28090	.NET Framework: 798. C# - Hyper-V 가상 머신의 직렬 포트와 연결된 Named Pipe 간의 통신	1
11721	정성태	10/4/2018	28212	.NET Framework: 797. Linux 환경의 .NET Core 응용 프로그램에서 직렬 포트(Serial Port, COM Port) 사용 방법	1
11720	정성태	10/4/2018	29443	개발 환경 구성: 405. Hyper-V 가상 머신에서 직렬 포트(Serial Port, COM Port) 사용
11719	정성태	10/4/2018	30717	.NET Framework: 796. C# - 인증서를 윈도우에 설치하는 방법
11718	정성태	10/4/2018	25208	개발 환경 구성: 404. (opkg가 설치된) Synology NAS(DS216+II)에 cmake 설치
11717	정성태	10/3/2018	26893	사물인터넷: 48. 넷두이노의 C# 네트워크 프로그램 [1]
11716	정성태	10/3/2018	28289	사물인터넷: 47. Raspberry PI Zero (W)에 FTDI 장치 연결 후 C/C++로 DTR 제어	1
11715	정성태	10/3/2018	25772	사물인터넷: 46. Raspberry PI Zero (W)에 docker 설치
11714	정성태	10/2/2018	26130	사물인터넷: 45. Raspberry PI에 ping을 hostname으로 하는 방법
11713	정성태	10/2/2018	27020	개발 환경 구성: 403. Synology NAS(DS216+II)에 docker 설치 후 .NET Core 2.1 응용 프로그램 실행하는 방법
11712	정성태	10/2/2018	33463	.NET Framework: 795. C# - 폰트 목록을 한글이 아닌 영문으로 구하는 방법 [3]
11711	정성태	10/2/2018	27674	오류 유형: 490. 윈도우 라이선스 키 입력 오류 0xc004f050, 0xc004e028
11710	정성태	10/2/2018	27836	.NET Framework: 794. C# - 같은 모양, 다른 값의 한글 자음을 비교하는 호환 분해 [5]
11709	정성태	9/30/2018	26429	개발 환경 구성: 402. .NET Core 콘솔 응용 프로그램을 docker로 실행/디버깅하는 방법 [1]
11708	정성태	9/30/2018	29490	개발 환경 구성: 401. .NET Core 콘솔 응용 프로그램을 배포(publish) 시 docker image 자동 생성 [2]	1
11707	정성태	9/30/2018	31608	오류 유형: 489. ASP.NET Core를 docker에서 실행 시 "Failed with a critical error." 오류 발생 [1]
11706	정성태	9/29/2018	25168	개발 환경 구성: 400. Synology NAS(DS216+II)에서 실행한 gcc의 Segmentation fault [2]
11705	정성태	9/29/2018	25454	개발 환경 구성: 399. Synology NAS(DS216+II)에 gcc 컴파일러 설치
11704	정성태	9/29/2018	32069	기타: 73. Synology NAS 신호음(beep) 끄기 [1]	1
11703	정성태	9/27/2018	24645	개발 환경 구성: 398. Blazor 환경 구성 후 빌드 속도가 너무 느리다면? [2]
11702	정성태	9/26/2018	22034	사물인터넷: 44. 넷두이노(Netduino)의 네트워크 설정 방법
11701	정성태	9/26/2018	28225	개발 환경 구성: 397. 공유기를 일반 허브로 활용하는 방법	1

AD BLOCK 해제 요청

GTX 1070, GTX 960, GT 640M의 cudaGetDeviceProperties 출력 결과