hardware-vgpu

Posted on 2021-08-09 Edited on 2023-08-17 In hardware , vgpu

Introduction

Virtual GPU (vGPU) enables multiple virtual machines (VMs) to have simultaneous, direct access to a single physical GPU, using the same graphics drivers that are deployed on non-virtualized operating systems. By doing this, vGPU provides VMs with unparalleled graphics performance, compute performance, and application compatibility, together with the cost-effectiveness and scalability brought about by sharing a GPU among multiple workloads.

Nvidia VGPU

NVIDIA vGPU software supports GPU instances on GPUs that support the Multi-Instance GPU (MIG) feature in NVIDIA vGPU and GPU pass through deployments. MIG enables a physical GPU to be securely partitioned into multiple separate GPU instances, providing multiple users with separate GPU resources to accelerate their applications. With MIG, A GPU that can be split into several GPU instances of different sizes, with each instance mapped to one vGPU. MIG needs GPU with Ampere GPU architecture, but not all GPU card supports it.

VGPU type

The number of physical GPUs that a board has depends on the board. Each physical GPU can support several different types of virtual GPU (vGPU). vGPU types have a fixed amount of frame buffer, number of supported display heads, and maximum resolutions. They are grouped into different series according to the different classes of workload for which they are optimized. Each series is identified by the last letter of the vGPU type name.

Series	Optimal Workload
Q-series	Virtual workstations for creative and technical professionals who require the performance and features of Quadro technology, 3D rendering
C-series	Compute-intensive server workloads, such as artificial intelligence (AI), deep learning, or high-performance computing (HPC)
B-series	Virtual desktops for business professionals and knowledge workers
A-series	App streaming or session-based solutions for virtual applications users

vGPU types determines

frame buffer
display heads(virtual display outputs)
maximum resolution
number of VGPU

Example
M60-2Q is allocated 2048 Mbytes of frame buffer on a Tesla M60 board.

NVIDIA vGPU is a licensed product on all supported GPU boards.

Q-series vGPU types require a vWS license.
C-series vGPU types require an NVIDIA Virtual Compute Server (vCS) license but can also be used with a vWS license.
B-series vGPU types require a vPC license but can also be used with a vWS license.
A-series vGPU types require a vApps license.

ARCH

High-level architecture of NVIDIA vGPU, Under the control of the NVIDIA Virtual GPU Manager running under the hypervisor, NVIDIA physical GPUs are capable of supporting multiple virtual GPU devices (vGPUs) that can be assigned directly to guest VMs.

NVIDIA vGPU System Architecture

Time-Sliced NVIDIA vGPU Internal Architecture

A time-sliced vGPU is a vGPU that resides on a physical GPU that is not partitioned into multiple GPU instances. All time-sliced vGPUs resident on a GPU share access to the GPU’s engines including the graphics (3D), video decode, and video encode engines

This is VGPU Arch for traditional GPU ARCH that most of GPU card support it.

In a time-sliced vGPU, processes that run on the vGPU are scheduled to run in series. Each vGPU waits while other processes run on other vGPUs. While processes are running on a vGPU, the vGPU has exclusive use of the GPU’s engine.

For time-Sliced VGPU, the vgpu type must be same for a single GPU.

MIG-Backed NVIDIA vGPU Internal Architecture

A MIG-backed vGPU is a vGPU that resides on a GPU instance in a MIG-capable physical GPU. Each MIG-backed vGPU resident on a GPU has exclusive access to the GPU instance’s engines, including the graphics (3D), and video decode engines.

In a MIG-backed vGPU, processes that run on the vGPU run in parallel with processes running on other vGPUs on the GPU. Process run on all vGPUs resident on a physical GPU simultaneously.

For MIG-Backed vgpu, the vgpu type can be different for a single GPU.

GPU Product

Two many GPU series

Quadro - This is the workstation version. Higher priced. This is meant for corporate customers, so it is better tested, more memory, etc. This is the highest quality chips. And since they are higher priced, NVIDIA offers better support, easier exchanges etc.
Tesla - This is the range that is focused on HPC. some may not have video output. This is intended for people using CUDA.

Note

Some products are only for graphics while others are only for compute
Tesla M60 and M6 GPUs support compute mode and graphics mode, can switch between them

Debug

GPU

# Centos
# install vgpu driver
$ rpm -iv NVIDIA-vGPU-rhel-7.5-460.73.02.x86_64.rpm
$ reboot

# verify vgpu driver is loaded correctly
$ lsmod | grep vfio
nvidia_vgpu_vfio       27099  0
nvidia              12316924  1 nvidia_vgpu_vfio
vfio_mdev              12841  0
mdev                   20414  2 vfio_mdev,nvidia_vgpu_vfio
vfio_iommu_type1       22342  0
vfio                   32331  3 vfio_mdev,nvidia_vgpu_vfio,vfio_iommu_type1

# show GPU
$ nvidia-smi
Mon Aug  9 11:24:19 2021       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 430.46       Driver Version: 430.46       CUDA Version: N/A      |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  Tesla P40           On   | 00000000:03:00.0 Off |                    0 |
| N/A   27C    P8    19W / 250W |     41MiB / 23039MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   1  Tesla P40           On   | 00000000:04:00.0 Off |                    0 |
| N/A   27C    P8    19W / 250W |     41MiB / 23039MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   2  Tesla P40           On   | 00000000:84:00.0 Off |                    0 |
| N/A   26C    P8    19W / 250W |     50MiB / 23039MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   3  Tesla P40           On   | 00000000:85:00.0 Off |                    0 |
| N/A   31C    P8    18W / 250W |     41MiB / 23039MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+

# Compute M. means Compute mode enabled
# Disp.A: GPU uses for display
# Processes: show processes running on each GPU
# For GPU supported MIG, there is label MIG M. along with Compute M.

# check BDF(bus, domain, function) of each GPU
$ lspci | grep NVIDIA
03:00.0 3D controller: NVIDIA Corporation GP102GL [Tesla P40] (rev a1)
04:00.0 3D controller: NVIDIA Corporation GP102GL [Tesla P40] (rev a1)
84:00.0 3D controller: NVIDIA Corporation GP102GL [Tesla P40] (rev a1)
85:00.0 3D controller: NVIDIA Corporation GP102GL [Tesla P40] (rev a1)

Create VPUG

# when VGPU is enabled for a GPU, there is a link created at /sys/class/mdev_bus/ pointing to GPU device(PCI bus number)
# ls /sys/class/mdev_bus/
0000:03:00.0  0000:04:00.0  0000:84:00.0  0000:85:00.0

# checkout all supported vgpu type of the given GPU
# nvidia-156 is mdev identifier, the vgpu type is at mdev_supported_types/nvidia-156/name 
$ cd /sys/class/mdev_bus/0000:03:00.0
$ ls mdev_supported_types/
nvidia-156  nvidia-241  nvidia-284  nvidia-286  nvidia-46  nvidia-48  nvidia-50  nvidia-52  nvidia-54  nvidia-56  nvidia-58  nvidia-60  nvidia-62
nvidia-215  nvidia-283  nvidia-285  nvidia-287  nvidia-47  nvidia-49  nvidia-51  nvidia-53  nvidia-55  nvidia-57  nvidia-59  nvidia-61

# P40-2B， P40 is GPU type, while 2B is vgpu-type
$ cat mdev_supported_types/nvidia-156/name 
GRID P40-2B

# check how many VGPU(depends on type) can be created for a given GPU
$ cat mdev_supported_types/nvidia-156/available_instances 
12

# create a VGPU
$ uuidgen
2794ee88-7932-4c37-9927-97ef3a5e76c4
$ echo "2794ee88-7932-4c37-9927-97ef3a5e76c4"> mdev_supported_types/nvidia-156/create

# after this a mdev device(VGPU device) is created
$ ls /sys/bus/mdev/devices/2794ee88-7932-4c37-9927-97ef3a5e76c4
driver  iommu_group  mdev_type  nvidia  power  remove  subsystem  uevent

# assign vgpu to VM 
# VM must
# 1. The VM to which you want to add the vGPUs is shut down. 

$ virsh edit $vm-name

# uuid is the vgpu uuid or use bdf is also ok
<device>
...
  <hostdev mode='subsystem' type='mdev' model='vfio-pci'>
    <source>
      <address uuid='2794ee88-7932-4c37-9927-97ef3a5e76c4'/>
    </source>
  </hostdev>
</device>

# check vm who is using this vgpu
$ cat /sys/bus/mdev/devices/2794ee88-7932-4c37-9927-97ef3a5e76c4/nvidia/vm_name


# remove a vgpu must know the uuid of vgpu
# VM must
# 1. The VM to which the vGPU is assigned is shut down

$ echo "1"> /sys/bus/mdev/devices/2794ee88-7932-4c37-9927-97ef3a5e76c4/remove

Monitoring GPU performance

# get GPU INFO
$ nvidia-smi

# get VGPU INFO
$ nvidia-smi vgpu
Mon Aug  9 12:14:14 2021       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 430.46                 Driver Version: 430.46                    |
|---------------------------------+------------------------------+------------+
| GPU  Name                       | Bus-Id                       | GPU-Util   |
|      vGPU ID     Name           | VM ID     VM Name            | vGPU-Util  |
|=================================+==============================+============|
|   0  Tesla P40                  | 00000000:03:00.0             |   0%       |
+---------------------------------+------------------------------+------------+
|   1  Tesla P40                  | 00000000:04:00.0             |   0%       |
|      3251639763  GRID P40-6Q    | 9750...  i-y7i5cxa4x4        |      0%    |
+---------------------------------+------------------------------+------------+
|   2  Tesla P40                  | 00000000:84:00.0             |   0%       |
+---------------------------------+------------------------------+------------+
|   3  Tesla P40                  | 00000000:85:00.0             |   0%       |
+---------------------------------+------------------------------+------------+

# get VGPU Details
$ nvidia-smi vgpu -q
GPU 00000000:03:00.0
    Active vGPUs                    : 0

GPU 00000000:04:00.0
    Active vGPUs                    : 1
    vGPU ID                         : 3251639763
        VM UUID                     : 97502756-1798-4014-9979-8595f909eeb3
        VM Name                     : i-y7i5cxa4x4
        vGPU Name                   : GRID P40-6Q
        vGPU Type                   : 50
        vGPU UUID                   : 0c4b9906-7a2e-2e4e-133d-ca368087e5ab
        Guest Driver Version        : Not Available
        License Status              : Unlicensed
        Accounting Mode             : Disabled
        ECC Mode                    : Disabled
        Accounting Buffer Size      : 4000
        Frame Rate Limit            : N/A
        FB Memory Usage
            Total                   : 6144 MiB
            Used                    : 0 MiB
            Free                    : 6144 MiB
        Utilization
            Gpu                     : 0 %
            Memory                  : 0 %
            Encoder                 : 0 %
            Decoder                 : 0 %
        Encoder Stats
            Active Sessions         : 0
            Average FPS             : 0
            Average Latency         : 0
        FBC Stats
            Active Sessions         : 0
            Average FPS             : 0
            Average Latency         : 0

GPU 00000000:84:00.0
    Active vGPUs                    : 0

GPU 00000000:85:00.0
    Active vGPUs                    : 0

# To monitor vGPU engine usage across multiple vGPUs

$ nvidia-smi vgpu -u
# GPU       vGPU    sm   mem   enc   dec
# Idx         Id     %     %     %     %
    0          -     -     -     -     -
    1 3251639763     0     0     0     0
    2          -     -     -     -     -
    3          -     -     -     -     -
    0          -     -     -     -     -

# To monitor vGPU engine usage by applications across multiple vGPUs
$ nvidia-smi vgpu -p
# GPU       vGPU    process    sm   mem   enc   dec   process         
# Idx         Id         Id     %     %     %     %   name            
    0          -          -     -     -     -     -   -               
    1          -          -     -     -     -     -   -               
    2          -          -     -     -     -     -   -               
    3          -          -     -     -     -     -   -               

# If MIG mode is not enabled for the GPU, or if the GPU does not support MIG, this property reflects the number and type of vGPUs that are already running on the GPU.
# 1. If no vGPUs are running on the GPU, all vGPU types that the GPU supports are listed.
# 2. If one or more vGPUs are running on the GPU, but the GPU is not fully loaded, only the type of the vGPUs that are already running is listed.
# 3. If the GPU is fully loaded, no vGPU types are listed.

$ nvidia-smi vgpu -c
GPU 00000000:03:00.0
    GRID P40-6Q    

GPU 00000000:04:00.0
    GRID P40-6Q    

GPU 00000000:84:00.0
    GRID P40-6Q    

GPU 00000000:85:00.0
    GRID P40-6Q

sysfs for NVIDIA GPU
The sysfs directory for each physical GPU is at

/sys/bus/pci/devices/
/sys/class/mdev_bus/

/sys/class/mdev_bus/
           |-parent-physical-device
             |-mdev_supported_types
               |-nvidia-vgputype-id
                 |-available_instances
                 |-create
                 |-description
                 |-device_api
                 |-devices
                 |-name

The mdev device file that you create to represent the vGPU does not persist when the host is rebooted and must be re-created after the host is rebooted

REF

nvidia vgpu user guide