0%

vhost-user

In the DPDK architecture the devices are accessed by constant polling. This avoids the context switching and interrupt processing overhead at the cost of dedicating 100% of part of the CPU cores to handle packet processing.

In practice DPDK offers a series of poll mode drivers (PMDs) that enable direct transfer of packets between user space and the physical interfaces which bypass the kernel network stack all together. This approach provides a significant performance boost over the kernel forwarding by eliminating interrupt handling and bypassing the kernel stack.

So we want to implement Virtio Backend(dataplane) in dpdk leverage vhost protocol, that’s vhost-user

Read more »

vhost-net

Virtio Backend(dataplane) in host kernel leverage vhost protocol.

The vhost-net is a kernel driver that implements the handler side of the vhost protocol to implement an efficient data plane, i.e., packet forwarding. In this implementation, qemu and the vhost-net kernel driver (handler) use ioctls to exchange vhost messages and a couple of eventfd-like file descriptors called irqfd and ioeventfd(created by Qemu process with ioctl with kvm, then pass it with ioctl with vhost-net) are used to exchange notifications with the guest.

Read more »

hands on with vDPA device simulator

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
The vdpa management tool(from iproute2) can be used to communicate with the vDPA framework in the kernel using netlink vDPA API. It allows to create and destroy new devices, and control their parameters.

An example using the in-kernel simulators:

# Load vDPA net and block simulators kernel modules
$ modprobe vdpa-sim-net
$ modprobe vdpa-sim-blk

# List vdpa management device attributes
$ vdpa mgmtdev show
vdpasim_blk:
supported_classes block
vdpasim_net:
supported_classes net

# Add `vdpa-net1` device through `vdpasim_net` management device
$ vdpa dev add name vdpa-net1 mgmtdev vdpasim_net

# Add `vdpa-blk1` device through `vdpasim_blk` management device
$ vdpa dev add name vdpa-blk1 mgmtdev vdpasim_blk

# List all vdpa devices on the system
$ vdpa dev show
vdpa-net1: type network mgmtdev vdpasim_net vendor_id 0 max_vqs 2 max_vq_size 256
vdpa-blk1: type block mgmtdev vdpasim_blk vendor_id 0 max_vqs 1 max_vq_size 256

# As above, but using pretty[-p] JSON[-j] output
$ vdpa dev show -jp
{
"dev": {
"vdpa-net1": {
"type": "network",
"mgmtdev": "vdpasim_net",
"vendor_id": 0,
"max_vqs": 2,
"max_vq_size": 256
},
"vdpa-blk1": {
"type": "block",
"mgmtdev": "vdpasim_blk",
"vendor_id": 0,
"max_vqs": 1,
"max_vq_size": 256
}
}
}

# check all vdpa device(each VF is an vdpa device)
$ ls /sys/bus/vdpa/devices
# check bus driver of a vdpa device
$ ls -l /sys/bus/vdpa/devices/vdpa0/driver

# switch bus driver
$ echo vdpa0 > /sys/bus/vdpa/drivers/vhost_vdpa/unbind
$ echo vdpa0 > /sys/bus/vdpa/drivers/virtio_vdpa/bind

# check all bus drivers(virtio, vhost)
$ ls /sys/bus/vdpa/drivers

Overview

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
# 06:00.3 is intel vDPA device or simulated vDPA devices
# qemu as vhost-user server, dpdk as client
$ ./dpdk-vdpa -c 0x2 -n 4 --socket-mem 1024,1024 \
-a 0000:06:00.4,vdpa=1 \
-- --client --iface=/tmp/qemu-vhost-user-net.sock

$ modprobe vfio-pci
$ ./usertools/dpdk-devbind.py -b vfio-pci 06:00.4

# start vm TODO
$ qemu-system-x86_64 -cpu host -enable-kvm \
-mem-prealloc \
-chardev socket,id=char0,path=/tmp/qemu-vhost-user-net.sock \
-netdev type=vhost-user,id=vdpa,chardev=char0 \
-device virtio-net-pci,netdev=vdpa,mac=00:aa:bb:cc:dd:ee,page-per-vq=on \

# ==================== more options==========================================================
# 0000:ca:0f.5 is VF NET PCI device created by ConnectX-6, bind with vfio-pci kernel driver
# vhost-user-net-server.sock is created by qemu
$/usr/local/bin/vdpa-dpdk --lcore 0@(0-127) -n 4 --huge-dir=/mnt/huge_2MB --log-level=9 -w 0000:ca:0f.5,class=vdpa,event_mode=2 --log-level=pmd.vdpa.mlx5:7 --log-level=pmd:8 --file-prefix=vfnet1 -- --client --iface=vhost-user-net-server.sock

Ref

Introduction

For IO virtualization, A MM supports two well known models: Emulation of devices or Paravirtualization., like we see Qemu for emulation and virtio for paravirtualization, but both have performance issue, as guest os(guest driver) can’t access physical device directly, it must send IO to intermediate layer(VMM) firstly which reduces performance. there is another voice comes up:
Can we assign HW resources directly to the VM? if we do, what extra work should support by CPU?

Read more »

VFIO

Virtual Function I/O is a framework for userspace I/O, it’s not limited to SRIOV, but SRIOV VF is the common use case.

The VFIO driver is an IOMMU/device agnostic framework for exposing direct device access to userspace, in a secure, IOMMU protected environment. In other words, this allows safe non-privileged, userspace drivers.

Why do we want that? Virtual machines often make use of direct device access (“device assignment”) when configured for the highest possible I/O performance. From a device and host perspective, this simply turns the VM into a userspace driver, with the benefits of significantly reduced latency, higher bandwidth, and direct use of bare-metal device drivers.

Read more »

C

  • 变量的声明和定义在c语言中是不同的,声明不开辟空间,而是告诉编译器该变量在其他地方定义了,而定义则是要给变量开辟空间
  • c语言中式没有引用类型
  • 虽然C语言中有const,但是const不可以修饰函数属性, const 只能修饰变量和参数
  • C 不支持函数重置,不同的函数必须使用不同的函数名!!!
Read more »

Tools

docker compose

Docker Compose is a tool for running multi-container applications on Docker defined using the Compose file format. A Compose file is used to define how one or more containers that make up your application are configured. Once you have a Compose file, you can create and start your application with a single command: docker compose up, A commpose file defines your application, a componse file has several services, each service is one container!

Read more »

Overview

Perf can do lots of thing, like collect, cache miss, context switch, per-thread, per-cpu etc But it needs kernel supporting, perf is always used for system performace debugging, gperftools for application performance debugging(perf also can take application performance debugging)

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30

Usage

#perf
usage: perf [--version] [--help] COMMAND [ARGS]

The most commonly used perf commands are:
annotate Read perf.data (created by perf record) and display annotated code
diff Read two perf.data files and display the differential profile
list List all symbolic event types
probe Define new dynamic tracepoints
record Run a command and record its profile into perf.data
report Read perf.data (created by perf record) and display the profile
script Read perf.data (created by perf record) and display trace output
stat Run a command and gather performance counter statistics
top System profiling tool.

details about each command
#perf annotate --help

Summary: perf has two ACTIONS 'stat' and 'record' for profiling
'stat' uses counter while 'record' uses samples( xx samples/per second)
which can show call graph(stacks)
to collect information about the system, cpu, thread, call graph

As EVENT is the core part of perf that it can monitor, EVENT includes
Hardware event like (cach misses, LL2 cache etc)
Software event like (page fault, context switch etc)
tracepoint predefined in kernel(requires kernel compiled with debugfs)
Dynamic event (create by #perf probe --add )
Read more »