k8s_device_plugin

Posted on 2024-07-26 Edited on 2024-07-29 In k8s , deviceplugin

Overview

Kubernetes provides a device plugin framework that you can use to advertise system hardware resources to the Kubelet which then reports resource to API server.

Instead of customizing the code for Kubernetes itself, vendors can implement a device plugin that you deploy either manually or as a DaemonSet. The targeted devices include GPUs, high-performance NICs, FPGAs, InfiniBand adapters, and other similar computing resources that may require vendor specific initialization and setup.

The workflow of the device plugin is divided into two parts:

Resource reporting upon startup and monitoring after starts
Scheduling and running during usage

virtualization-virtio-vhost-user

Posted on 2023-10-20 In virtualization , virtio

vhost-user

In the DPDK architecture the devices are accessed by constant polling. This avoids the context switching and interrupt processing overhead at the cost of dedicating 100% of part of the CPU cores to handle packet processing.

In practice DPDK offers a series of poll mode drivers (PMDs) that enable direct transfer of packets between user space and the physical interfaces which bypass the kernel network stack all together. This approach provides a significant performance boost over the kernel forwarding by eliminating interrupt handling and bypassing the kernel stack.

So we want to implement Virtio Backend(dataplane) in dpdk leverage vhost protocol, that’s vhost-user

virtualization-virtio-vhost-net

Posted on 2023-10-20 In virtualization , virtio

vhost-net

Virtio Backend(dataplane) in host kernel leverage vhost protocol.

The vhost-net is a kernel driver that implements the handler side of the vhost protocol to implement an efficient data plane, i.e., packet forwarding. In this implementation, qemu and the vhost-net kernel driver (handler) use ioctls to exchange vhost messages and a couple of eventfd-like file descriptors called irqfd and ioeventfd(created by Qemu process with ioctl with kvm, then pass it with ioctl with vhost-net) are used to exchange notifications with the guest.

virtualization-virtio-vdpa-hands-on

Posted on 2023-10-20 In virtualization , vdpa

hands on with vDPA device simulator

The vdpa management tool(from iproute2) can be used to communicate with the vDPA framework in the kernel using netlink vDPA API. It allows to create and destroy new devices, and control their parameters.

An example using the in-kernel simulators:

# Load vDPA net and block simulators kernel modules
$ modprobe vdpa-sim-net
$ modprobe vdpa-sim-blk

# List vdpa management device attributes
$ vdpa mgmtdev show
vdpasim_blk: 
  supported_classes block
vdpasim_net: 
  supported_classes net

# Add `vdpa-net1` device through `vdpasim_net` management device
$ vdpa dev add name vdpa-net1 mgmtdev vdpasim_net

# Add `vdpa-blk1` device through `vdpasim_blk` management device
$ vdpa dev add name vdpa-blk1 mgmtdev vdpasim_blk

# List all vdpa devices on the system
$ vdpa dev show
vdpa-net1: type network mgmtdev vdpasim_net vendor_id 0 max_vqs 2 max_vq_size 256
vdpa-blk1: type block mgmtdev vdpasim_blk vendor_id 0 max_vqs 1 max_vq_size 256

# As above, but using pretty[-p] JSON[-j] output
$ vdpa dev show -jp
{
    "dev": {
        "vdpa-net1": {
            "type": "network",
            "mgmtdev": "vdpasim_net",
            "vendor_id": 0,
            "max_vqs": 2,
            "max_vq_size": 256
        },
        "vdpa-blk1": {
            "type": "block",
            "mgmtdev": "vdpasim_blk",
            "vendor_id": 0,
            "max_vqs": 1,
            "max_vq_size": 256
        }
    }
}

# check all vdpa device(each VF is an vdpa device)
$ ls /sys/bus/vdpa/devices
# check bus driver of a vdpa device
$  ls -l /sys/bus/vdpa/devices/vdpa0/driver

# switch bus driver
$ echo vdpa0 > /sys/bus/vdpa/drivers/vhost_vdpa/unbind
$ echo vdpa0 > /sys/bus/vdpa/drivers/virtio_vdpa/bind

# check all bus drivers(virtio, vhost)
$ ls  /sys/bus/vdpa/drivers

virtualization-virtio-vdpa-dpdk

Posted on 2023-10-19 Edited on 2023-10-20 In virtualization , vdpa

Overview

# 06:00.3 is intel vDPA device or simulated vDPA devices
# qemu as vhost-user server, dpdk as client
$ ./dpdk-vdpa -c 0x2 -n 4 --socket-mem 1024,1024 \
        -a 0000:06:00.4,vdpa=1 \
        -- --client --iface=/tmp/qemu-vhost-user-net.sock

$ modprobe vfio-pci
$ ./usertools/dpdk-devbind.py -b vfio-pci 06:00.4

# start vm TODO
$ qemu-system-x86_64 -cpu host -enable-kvm \
-mem-prealloc \
-chardev socket,id=char0,path=/tmp/qemu-vhost-user-net.sock \
-netdev type=vhost-user,id=vdpa,chardev=char0 \
-device virtio-net-pci,netdev=vdpa,mac=00:aa:bb:cc:dd:ee,page-per-vq=on \

# ==================== more options==========================================================
# 0000:ca:0f.5 is VF NET PCI device created by ConnectX-6, bind with vfio-pci kernel driver
# vhost-user-net-server.sock is created by qemu
$/usr/local/bin/vdpa-dpdk --lcore 0@(0-127) -n 4 --huge-dir=/mnt/huge_2MB --log-level=9 -w 0000:ca:0f.5,class=vdpa,event_mode=2 --log-level=pmd.vdpa.mlx5:7 --log-level=pmd:8 --file-prefix=vfnet1 -- --client --iface=vhost-user-net-server.sock

Ref

virtualization-virtio-vdpa

Posted on 2023-10-19 Edited on 2023-10-26 In virtualization , vdpa

Introduction

For IO virtualization, A MM supports two well known models: Emulation of devices or Paravirtualization., like we see Qemu for emulation and virtio for paravirtualization, but both have performance issue, as guest os(guest driver) can’t access physical device directly, it must send IO to intermediate layer(VMM) firstly which reduces performance. there is another voice comes up:
Can we assign HW resources directly to the VM? if we do, what extra work should support by CPU?

virtualization-vfio-sriov

Posted on 2023-10-18 Edited on 2023-10-20 In virtualization , io

VFIO

Virtual Function I/O is a framework for userspace I/O, it’s not limited to SRIOV, but SRIOV VF is the common use case.

The VFIO driver is an IOMMU/device agnostic framework for exposing direct device access to userspace, in a secure, IOMMU protected environment. In other words, this allows safe non-privileged, userspace drivers.

Why do we want that? Virtual machines often make use of direct device access (“device assignment”) when configured for the highest possible I/O performance. From a device and host perspective, this simply turns the VM into a userspace driver, with the benefits of significantly reduced latency, higher bandwidth, and direct use of bare-metal device drivers.

c-language-basic

Posted on 2023-08-24 Edited on 2023-08-29 In c , language

C

变量的声明和定义在c语言中是不同的，声明不开辟空间，而是告诉编译器该变量在其他地方定义了，而定义则是要给变量开辟空间
c语言中式没有引用类型
虽然C语言中有const，但是const不可以修饰函数属性, const 只能修饰变量和参数
C 不支持函数重置，不同的函数必须使用不同的函数名！！！

docker-tools

Posted on 2023-07-10 Edited on 2023-08-16 In docker , command

Tools

docker compose

Docker Compose is a tool for running multi-container applications on Docker defined using the Compose file format. A Compose file is used to define how one or more containers that make up your application are configured. Once you have a Compose file, you can create and start your application with a single command: docker compose up, A commpose file defines your application, a componse file has several services, each service is one container!

linux-performance-perf

Posted on 2023-02-22 Edited on 2023-08-17 In performance , system

Overview

Perf can do lots of thing, like collect, cache miss, context switch, per-thread, per-cpu etc But it needs kernel supporting, perf is always used for system performace debugging, gperftools for application performance debugging(perf also can take application performance debugging)


Usage

    #perf
    usage: perf [--version] [--help] COMMAND [ARGS]

    The most commonly used perf commands are:
     annotate        Read perf.data (created by perf record) and display annotated code
     diff            Read two perf.data files and display the differential profile
     list            List all symbolic event types
     probe           Define new dynamic tracepoints
     record          Run a command and record its profile into perf.data
     report          Read perf.data (created by perf record) and display the profile
     script          Read perf.data (created by perf record) and display trace output
     stat            Run a command and gather performance counter statistics
     top             System profiling tool.

    details about each command
    #perf annotate --help

    Summary: perf has two ACTIONS 'stat' and 'record' for profiling
    'stat' uses counter while 'record' uses samples( xx samples/per second)
    which can show call graph(stacks)
    to collect information about the system, cpu, thread, call graph

    As EVENT is the core part of perf that it can monitor, EVENT includes
        Hardware event like (cach misses, LL2 cache etc)
        Software event like (page fault, context switch etc)
        tracepoint predefined in kernel(requires kernel compiled with debugfs)
        Dynamic event (create by #perf probe --add )