qemu-kvm

Posted on 2021-11-02 Edited on 2023-10-20 In kvm , qemu

Overview

In this article, we only give you the knowledge of qemu-kvm without libvirt, say how to start vm by running qemu-kvm itself and others.

Simulated device

qemu-kvm can simulate serial, block, serial, net device inside vm based on virtio driver, the simulated virtio devices is located at /sys/class/virtio-ports/, for simulated device, there are two sides need to be set from command line, backend in host, virtio in vm.

$ qemu-kvm -chardev socket,id=charch0,path=/var/run/xagent/vm-HZVsuboAJh-test/xagent.sock -device virtserialport,bus=virtio-serial0.0,nr=1,chardev=charch0,id=channel0,name=agent.channel.0 -serial unix:/var/run/agent/vm-HZVsuboAJh-test/console.sock,server,nowait -device virtio-serial-pci,id=virtio-serial0,bus=pci.0,addr=0x2

# check device name inside guest
$ cat /sys/class/virtio-ports/vport0p1/name
agent.channel.0

# check device id(major:minor) inside guest
$ cat /sys/class/virtio-ports/vport0p1/dev
252:1

# echo hello> /dev/vport0p1

Device Front End
A device front end is how a device is presented to the guest. The type of device presented should match the hardware that the guest operating system is expecting to see.

# check support front end device and specific options for each type
$ /usr/libexec/qemu-kvm --device virtio-serial-pci,help
virtio-serial-pci.event_idx=bool (on/off)
virtio-serial-pci.ioeventfd=bool (on/off)
virtio-serial-pci.multifunction=bool (on/off)
virtio-serial-pci.rombar=uint32
virtio-serial-pci.x-disable-pcie=bool (on/off)
virtio-serial-pci.indirect_desc=bool (on/off)
virtio-serial-pci.__com.redhat_rhel6_ctrl_guest_workaround=bool
virtio-serial-pci.disable-modern=bool
virtio-serial-pci.disable-legacy=OnOffAuto (on/off/auto)
virtio-serial-pci.emergency-write=bool (on/off)
virtio-serial-pci.command_serr_enable=bool (on/off)
virtio-serial-pci.x-pcie-lnkctl-init=bool (on/off)
virtio-serial-pci.max_ports=uint32
virtio-serial-pci.page-per-vq=bool (on/off)
virtio-serial-pci.x-pcie-deverr-init=bool (on/off)
virtio-serial-pci.x-pcie-pm-init=bool (on/off)
virtio-serial-pci.x-pcie-lnksta-dllla=bool (on/off)
virtio-serial-pci.any_layout=bool (on/off)
virtio-serial-pci.class=uint32
virtio-serial-pci.addr=int32 (Slot and optional function number, example: 06.0 or 06)
virtio-serial-pci.migrate-extra=bool (on/off)
virtio-serial-pci.modern-pio-notify=bool (on/off)
virtio-serial-pci.vectors=uint32
virtio-serial-pci.x-pcie-extcap-init=bool (on/off)
virtio-serial-pci.virtio-backend=child<virtio-serial-device>
virtio-serial-pci.x-ignore-backend-features=bool
virtio-serial-pci.notify_on_empty=bool (on/off)
virtio-serial-pci.iommu_platform=bool (on/off)
virtio-serial-pci.ats=bool (on/off)
virtio-serial-pci.virtio-pci-bus-master-bug-migration=bool (on/off)
virtio-serial-pci.romfile=str

# check common options, refer to https://qemu-project.gitlab.io/qemu/system/invocation.html#hxtool-1
/usr/libexec/qemu-kvm -d int -D /tmp/qemu_vm.log -trace enable=* -kernel /home/data/github/cyun/utils/tips_useful_script/qemu/kernel-5.10.7 -initrd /home/data/github/cyun/utils/tips_useful_script/qemu/initramfs.img -nographic -append console=ttyS0 -qmp unix:/var/run/qmp.sock,server,nowait -serial mon:stdio -vnc 0.0.0.0:106 -chardev file,id=mydev0,path=/tmp/test.s -device isa-serial,chardev=mydev0

A front end is often paired with a back end, which describes how the host’s resources are used in the emulation.

Device Back End
The back end describes how the data from the emulated device will be processed by QEMU. The configuration of the back end is usually specific to the class of device being emulated. For example serial devices will be backed by a --chardev which can redirect the data to a file or socket or some other system. Storage devices are handled by --blockdev which will specify how blocks are handled, for example being stored in a qcow2 file or accessing a raw host disk partition. Back ends can sometimes be stacked to implement features like snapshots.

While the choice of back end is generally transparent to the guest, there are cases where features will not be reported to the guest if the back end is unable to support it.

Device Buses
Most devices will exist on a BUS of some sort. Depending on the machine model you choose (-M foo) a number of buses will have been automatically created. In most cases the BUS a device is attached to can be inferred, for example PCI devices are generally automatically allocated to the next free address of first PCI bus found. However in complicated configurations you can explicitly specify what bus (bus=ID) a device is attached to along with its address (addr=N).

Some devices, for example a PCI SCSI host controller, will add an additional buses to the system that other devices can be attached to. A hypothetical chain of devices might look like:

–device foo,bus=pci.0,addr=0,id=foo –device bar,bus=foo.0,addr=1,id=baz

which would be a bar device (with the ID of baz) which is attached to the first foo bus (foo.0) at address 1. The foo device which provides that bus is itself is attached to the first PCI bus (pci.0).

serial device backend(chardev)

Qemu char device uses below format for backend, more options refer to qemu chardev options

-chardev backend,id=id[,mux=on|off][,options]

Backend is one of: null, socket, udp, file, pipe, console, serial, pty, stdio, tty, parallel and more... . The specific backend will determine the applicable options. different types have different options for that specific type.

All devices must have an id, which can be any string up to 127 characters long. It is used to uniquely identify this device in other command line directives.

A character device may be used in multiplexing mode by multiple front-ends. Specify mux=on to enable this mode. A multiplexer is a “1:N” device, and here the “1” end is your specified chardev backend, and the “N” end is the various parts of QEMU that can talk to a chardev. by default it’s disabled

Example

-chardev file,id=mydev0,path=/tmp/test.s \
-device isa-serial,chardev=mydev0 \

$ /usr/libexec/qemu-kvm -d int -D /tmp/qemu_vm.log -trace enable=* -kernel /home/data/github/cyun/utils/tips_useful_script/qemu/kernel-5.10.7 -initrd /home/data/github/cyun/utils/tips_useful_script/qemu/initramfs.img -nographic -append console=ttyS0 -qmp unix:/var/run/qmp.sock,server,nowait -serial mon:stdio -vnc 0.0.0.0:106 -chardev file,id=mydev0,path=/tmp/test.s -device isa-serial,chardev=mydev0
# in guest /dev/ttyS1 is front end as the ttyS0 is used for console!!!

use virtio serial in the guest, in order to do this, you have to add a virtio pci serial device(hub) into the PCI bus, then under this virtio serial device, you can create serial and console port.

virtio-serial-pci == virtio-serial

-device virtio-serial-pci,id=virtio_serial_pci0 \
-chardev file,id=mydev0,path=/tmp/test.s \
-device virtserialport,chardev=mydev0,name=serial0,id=vc1,bus=virtio_serial_pci0.0 \
-chardev file,id=mydev1,path=/tmp/test.c \
-device virtconsole,chardev=mydev1,name=serial1,id=vc2,bus=virtio_serial_pci0.0

Socket option

-chardev socket,id=id[,TCP options or unix options][,server=on|off][,wait=on|off][,telnet=on|off][,websocket=on|off][,reconnect=seconds][,tls-creds=id][,tls-authz=id]

Create a two-way stream socket, which can be either a TCP or a unix socket. A unix socket will be created if path is specified. Behaviour is undefined if TCP options are specified for a unix socket.
server=on|off specifies that the socket shall be a listening socket.
wait=on|off specifies that QEMU should not block waiting for a client to connect to a listening socket.
telnet=on|off specifies that traffic on the socket should interpret telnet escape sequences.
websocket=on|off specifies that the socket uses WebSocket protocol for communication.
reconnect sets the timeout for reconnecting on non-server sockets when the remote end goes away. qemu will delay this many seconds and then attempt to reconnect. Zero disables reconnecting, and is the default.
tls-creds requests enablement of the TLS protocol for encryption, and specifies the id of the TLS credentials to use for the handshake. The credentials must be previously created with the -object tls-creds argument.
tls-auth provides the ID of the QAuthZ authorization object against which the client’s x509 distinguished name will be validated. This object is only resolved at time of use, so can be deleted and recreated on the fly while the chardev server is active. If missing, it will default to denying access.

TCP and unix socket options are given below:

TCP options: port=port[,host=host][,to=to][,ipv4=on|off][,ipv6=on|off][,nodelay=on|off]

host for a listening socket specifies the local address to be bound. For a connecting socket species the remote host to connect to. host is optional for listening sockets. If not specified it defaults to 0.0.0.0.
port for a listening socket specifies the local port to be bound. For a connecting socket specifies the port on the remote host to connect to. port can be given as either a port number or a service name. port is required.
to is only relevant to listening sockets. If it is specified, and port cannot be bound, QEMU will attempt to bind to subsequent ports up to and including to until it succeeds. to must be specified as a port number.
ipv4=on|off and ipv6=on|off specify that either IPv4 or IPv6 must be used. If neither is specified the socket may use either protocol.
nodelay=on|off disables the Nagle algorithm.

unix options: path=path[,abstract=on|off][,tight=on|off]

path specifies the local path of the unix socket. path is required. abstract=on|off specifies the use of the abstract socket namespace, rather than the filesystem. Optional, defaults to false. tight=on|off sets the socket length of abstract sockets to their minimum, rather than the full sun_path length. Optional, defaults to true.

Pty

-chardev pty,id=id

Create a new pseudo-terminal on the host and connect to it. pty does not take any options.

Block device(backend)

USB

QEMU can emulate a PCI UHCI, OHCI, EHCI or XHCI USB controller. You can plug virtual USB devices, QEMU will automatically create and connect virtual USB hubs as necessary to connect multiple USB devices

XHCI controller
QEMU has XHCI host adapter support. The XHCI hardware design is much more virtualization-friendly when compared to EHCI and UHCI, thus XHCI emulation uses less resources (especially CPU). So if your guest supports XHCI (which should be the case for any operating system released around 2010 or later) we recommend using it:
qemu -device qemu-xhci or qemu -device nec-usb-xhci

XHCI supports USB 1.1, USB 2.0 and USB 3.0 devices, so this is the only controller you need. With only a single USB controller (and therefore only a single USB bus) present in the system there is no need to use the bus= parameter when adding USB devices, as there is only one controller!!!

EHCI controller
The QEMU EHCI Adapter supports USB 2.0 devices. It can be used either standalone or with companion controllers (UHCI, OHCI) for USB 1.1 devices. The companion controller setup is more convenient to use because it provides a single USB bus supporting both USB 2.0 and USB 1.1 devices

EHCI standalone
When running EHCI in standalone mode you can add UHCI or OHCI controllers for USB 1.1 devices too. Each controller creates its own bus though, so there are two completely separate USB buses: One USB 1.1 bus driven by the UHCI controller and one USB 2.0 bus driven by the EHCI controller. Devices must be attached to the correct controller manually

EHCI compansion
The UHCI and OHCI controllers can attach to a USB bus created by EHCI as companion controllers. This is done by specifying the masterbus and firstport properties. masterbus specifies the bus name the controller should attach to. firstport specifies the first port the controller should attach to, which is needed as usually one EHCI controller with six ports has three UHCI companion controllers with two ports each.

Companion controller is defined as multiple USB host controllers (EHCI/OHCI/UHCI) that are wired to the same physical connector such that a device, depending on some characteristic like speed, will be connected to a different controller even when plugged into the same connector, so that for a port on EHCI bus, it can handle usb2.0 and usb1.1 devices, but from system views, you still see several controllers, each takes one PCI addresse!!!

Bus selection for USB devices

XHCI only(support usb1.1, usb2.0, usb3.0), only one usb bus(Recommanded)

1
2
3

#  no need to set bus=xx as there is only one usb controller
-device qemu-xhci \
-device usb-tablet

EHCI bus(usb2.0) + UCHI(OHCI)(usb1.1), two separate buses, usb devices must set bus=x manually.

# USB 1.1(uhci,ohci controller) bus will carry the name usb-bus.0
# usb2.0(ehci controller) will carry the name ehci.0
# usb3.0(xhci controller) will carry the name usb1.0
# must set bus for each usb device as there are two separate buses.
# The '-usb' switch will make qemu create the UHCI controller as part of the PIIX3 chipset.  The USB 1.1 bus will carry the name "usb-bus.0".
-usb                                                        \
-device usb-ehci,id=ehci                                    \
-device usb-tablet,bus=usb-bus.0                            \
-device usb-storage,bus=ehci.0,drive=usbstick

Companion controller, only one usb bus.

# usb2.0 controller ehci six ports
# usb1.1 controller uchi(two ports) attached to ehci
-device ich9-usb-ehci1,id=usb,bus=pci.0,addr=0x3.0x7 \
-device ich9-usb-uhci1,masterbus=usb.0,firstport=0,bus=pci.0,multifunction=on,addr=0x3 \
-device ich9-usb-uhci2,masterbus=usb.0,firstport=2,bus=pci.0,addr=0x3.0x1 \
-device ich9-usb-uhci3,masterbus=usb.0,firstport=4,bus=pci.0,addr=0x3.0x2 \
-device usb-tablet

# inside guest
$ lspci 
00:03.0 USB controller: Intel Corporation 82801I (ICH9 Family) USB UHCI Controller #1 (rev 03)
00:03.1 USB controller: Intel Corporation 82801I (ICH9 Family) USB UHCI Controller #2 (rev 03)
00:03.2 USB controller: Intel Corporation 82801I (ICH9 Family) USB UHCI Controller #3 (rev 03)
00:03.7 USB controller: Intel Corporation 82801I (ICH9 Family) USB2 EHCI Controller #1 (rev 03)

USB devices can be connected with the -device usb-... command line option or the device_add monitor command. Available devices are:

usb-mouse
usb-tablet
usb-kbd (Standard USB keyboard)
usb-audio
…

USB Hub

# Plugging a hub into UHCI port 2 works like this:
-device usb-hub,bus=usb-bus.0,port=2
# Plugging a virtual USB stick into port 4 of the hub just plugged works this way:
-device usb-storage,bus=usb-bus.0,port=2.4,drive=...

network dev

In order to use network device, you need to setup two sides front end and back end, but different types use different ways to setup backend, let’s say vhost-user and tap uses different way to setup back end, Here show tap interface as example, you have to do

A bridge(ovs bridge or linux bridge) created by user.
A tap interface created by user or let qemu to create it automatically.
Up tap interface and add it to the bridge, either by user manually from command line or provide /etc/qemu-ifup which is called by qemu automatically
Down tap interface and remove it from the bridge, either by user mannulay from command linie or provide /etc/qemu-ifdown whih is called by qemu automatically

There are severals way to setup network devices. In short, the -net is the legacy option, while -netdev comes in to solve issue present for -net, the newest way -nic from 2.12 is easiest way to set up an interface.

The -net option can create either a front-end or a back-end (but has disadvanges than -nic)
-netdev can only create a back-end, use -device for front end
Asingle occurrence of -nic will create both a front-end and a back-end.

NOTE
if you use libvirt, all these operations will be done by libvirt itself!!

linux bridge

# create a linux bridge or ovs bridge if ovs installed, or use docker bridge(docker0) which is linux bridge
$ brctl show
bridge name     bridge id               STP enabled     interfaces
docker0         8000.525400d85e6d       yes             docker0-nic
                                                        veth6e0057e
                                                        vethdb3e964
                                                        vnet0
# cat /etc/qemu-ifup
#! /bin/sh
ifconfig "$1" 0.0.0.0 up
br='docker0'
brctl addif $br "$1"

# chmod +x /etc/qemu-ifup

# cat /etc/qemu-ifdown
#! /bin/sh
ifconfig "$1" 0.0.0.0 down
br='docker0'
brctl delif $br "$1"

# chmod +x /etc/qemu-ifdown

# run vm with network
$ /usr/libexec/qemu-kvm -kernel kernel-4.14.121 -initrd initramfs.img -nographic -append "console=ttyS0" -qmp unix:/var/run/qmp.sock,server,nowait -serial mon:stdio -netdev tap,id=n1,ifname=tap0,script=/etc/qemu-ifup,downscript=/etc/qemu-ifdown -device virtio-net,netdev=n1
# with vhost-net on, then check vhost worker thread by  ps -ef | grep vhost-$qemu_pid
$ /usr/libexec/qemu-kvm -kernel kernel-4.14.121 -initrd initramfs.img -nographic -append "console=ttyS0" -qmp unix:/var/run/qmp.sock,server,nowait -serial mon:stdio -netdev tap,id=n1,ifname=tap0,vhost=on,script=/etc/qemu-ifup,downscript=/etc/qemu-ifdown -device virtio-net,netdev=n1

#======== tap0 on the host is created by qemu automatically ========
$ brctl show
bridge name     bridge id               STP enabled     interfaces
docker0         8000.525400d85e6d       yes             docker0-nic
                                                        tap0
                                                        veth6e0057e
                                                        vethdb3e964
                                                        vnet0
# then inside vm, if you have dhcp client, it will try get ip or dns from docker0 whih must have dnsmsq runs on it, or other dhcp server
# if no dhcp client or server, you need to config an ip and setup dns server, route, then VM can access outside network!!!

# =====================================Expose vm port for outside like:ssh ==============================================
# As the it uses docker0 network like 172.17.0.x which is private address
# Inside VM it can access external network as docker already setup iptables for us automatically.
# but outside can NOT access VM as it's private network, in order to let outisde acesss VM like port 22 or 80
# you have to expose VM port to Host Port by iptables like this.

# expose VM port 22 to host port 6622!!! VM IP: 172.17.0.2
$ cat exponse.sh
iptables -t nat -C DOCKER ! -i docker0 -p tcp -m tcp --dport 6622 -j DNAT --to-destination 172.17.0.2:22 >/dev/null 2>&1
if [ $? -ne 0 ]; then
  iptables -t nat -A DOCKER ! -i docker0 -p tcp -m tcp --dport 6622 -j DNAT --to-destination 172.17.0.2:22
fi

iptables -t filter -C DOCKER -d 172.17.0.2/32 ! -i docker0 -o docker0 -p tcp -m tcp --dport 6622 -j ACCEPT >/dev/null 2>&1
if [ $? -ne 0 ]; then
  iptables -A DOCKER -d 172.17.0.2/32 ! -i docker0 -o docker0 -p tcp -m tcp --dport 6622 -j ACCEPT
fi

Tap port with linux bridge example

ovs bridge

# show ovs bridge
$ ovs-vsctl show
23b480b4-4a65-4163-8840-439ef102449e
    Bridge ovs-br0
        Port enp0s3
            Interface enp0s3
        Port ovs-br0
            Interface ovs-br0
                type: internal
    ovs_version: "2.16.0"

# touch /etc/ovs-ifup
#! /bin/sh
ifconfig "$1" 0.0.0.0 up
br='ovs-br0'
ovs-vsctl add-port ${br} $1
# chmod +x /etc/ovs-ifup

# touch /etc/ovs-ifdown
#! /bin/sh
ifconfig "$1" 0.0.0.0 down
br='ovs-br0'
ovs-vsctl del-port ${br} $1
# chmod +x /etc/ovs-ifdown

# run vm with network
$ /usr/libexec/qemu-kvm -kernel kernel-4.14.121 -initrd initramfs.img -nographic -append "console=ttyS0" -qmp unix:/var/run/qmp.sock,server,nowait -serial mon:stdio -netdev tap,id=n1,ifname=vnet0,script=/etc/ovs-ifup,downscript=/etc/ovs-ifdown -device virtio-net,netdev=n1
# Or with vhost-net enabled, then check vhost worker thread by ps -ef | grep vhost-$qemu_pid
$ /usr/libexec/qemu-kvm -kernel kernel-4.14.121 -initrd initramfs.img -nographic -append "console=ttyS0" -qmp unix:/var/run/qmp.sock,server,nowait -serial mon:stdio -netdev tap,id=n1,ifname=vnet0,vhost=on,script=/etc/ovs-ifup,downscript=/etc/ovs-ifdown -device virtio-net,netdev=n1

$ ovs-vsctl show
23b480b4-4a65-4163-8840-439ef102449e
    Bridge ovs-br0
        Port vnet0
            Interface vnet0
        Port enp0s3
            Interface enp0s3
        Port ovs-br0
            Interface ovs-br0
                type: internal
    ovs_version: "2.16.0"

# vnet0 is created by qemu automatically
$ ethtool  -i vnet0
driver: tun
version: 1.6
firmware-version: 
expansion-rom-version: 
bus-info: tap         ----------------->tap device
supports-statistics: no
supports-test: no
supports-eeprom-access: no
supports-register-dump: no
supports-priv-flags: no

# then inside vm, if you have dhcp client, it will try get ip or dns from docker0 which has dnsmasq runs on it
# if no dhcp client or server, you need to config an ip and setup dns server, route, then VM can access outside network!!!

Tap device with OVS

Kernel and rootfs

Change vm setting runtime

The QEMU Monitor Protocol (QMP) is a JSON-based protocol which allows applications to communicate with a QEMU instance. there are several ways to talk with QEMU instance.

virsh/libvirt way using ‘qemu-monitor-command’ which provided by virsh command
if qemu-instance is not started by libvirt or no virsh installed, talk with qmp socket directly.
- use telnet, nc over tcp qmp socket with json format
- use nc, socat over unix qmp socket with json format, while qemu-shell with human way as it’s python tool that does json encode for you.

Same thing happens for HMP(human machine protocol) as well who creates monitor socket for talking.

Qemu parameters

-monitor dev redirect the monitor to char device ‘dev’, will create monitor socket.
-qmp dev like -monitor but opens in ‘control’ mode, will create qmp socket. same with == -monitor chardev=mon0,mode=control

# ==================================QMP socket===================================================================================
# -qmp tcp:localhost:1234,server,nowait
Same as below (qmp uses tcp socket)
# -chardev socket,id=mon0,host=localhost,port=1234,server,nowait -mon chardev=mon0,mode=control

# qemu instance sends output with pretty json
# -chardev socket,id=mon0,host=localhost,port=1234,server,nowait -mon chardev=mon0,mode=control,pretty=on

# qmp uses unix socket
# -qmp unix:/var/run/qmp.sock,server,nowait
$ nc -U /var/run/qmp.sock 
{"QMP": {"version": {"qemu": {"nano": 1, "micro": 0, "minor": 9, "major": 2}, "package": " (-dirty)"}, "capabilities": []}}

# -chardev socket,id=mon0,path=/var/run/qmp.sock,server,nowait -mon chardev=mon0,id=monitor,mode=control
$ nc -U /var/run/qmp.sock 
{"QMP": {"version": {"qemu": {"nano": 1, "micro": 0, "minor": 9, "major": 2}, "package": " (-dirty)"}, "capabilities": []}}$

# ==================================Monitor socket===================================================================================
# -chardev socket,id=mon0,path=/var/run/monitor.sock,server,nowait -mon chardev=mon0,id=monitor
# if monitor with readline mode(default), only support human like command and output is formated
$ nc -U /var/run/monitor.sock 
QEMU 2.9.0.1 monitor - type 'help' for more information
(qemu) 
(qemu) info block
info block
ide1-cd0: [not inserted]
    Removable device: not locked, tray closed

floppy0: [not inserted]
    Removable device: not locked, tray closed

sd0: [not inserted]
    Removable device: not locked, tray closed

NOTE

both sockets(if monitor in control mode) support QMP(json based) and HMP(human machine protocol), they proivde different commands for similiar purpose,
The outputs are different, QMP has more details and with json output, HMP is different.
if monitor socket with readline mode, only support human commands and output is formated.

Suggestion

With virsh/libvirt
- Use virsh qemu-monitor-command
Without virsh/libvirt
- For CLI troubleshooting only, run monitor socket with readline mode, provides man subcommands
- For programable, run monitor socket with control mode, or qmp socket, but you still has tool like qemu-shell which proivdes CLI for user to use with man subcommands. qemu-shell similiar to qemu-monitor-command

# without qmp-shell, when -nographic option is in use, you can switch between the monitor console 
# by pressing Ctrl–a, then press c, back to console ctrl-a, then press c again.
$ ./qmp-shell  /var/run/qmp.sock
Welcome to the QMP low-level shell!
Connected to QEMU 2.12.0

(QEMU) query-block
{"return": [{"locked": false, "tray_open": false, "io-status": "ok", "qdev": "/machine/unattached/device[22]", "removable": true, "device": "ide1-cd0", "type": "unknown"}, {"device": "floppy0", "type": "unknown", "qdev": "/machine/unattached/device[15]", "locked": false, "removable": true}, {"device": "sd0", "type": "unknown", "locked": false, "removable": true}]}

# HMP format
$ ./qmp-shell -H /var/run/qmp.sock
Welcome to the HMP shell!
Connected to QEMU 2.12.0

(QEMU) info block
ide1-cd0: [not inserted]
    Attached to:      /machine/unattached/device[22]
    Removable device: not locked, tray closed

floppy0: [not inserted]
    Attached to:      /machine/unattached/device[15]
    Removable device: not locked, tray closed

sd0: [not inserted]
    Removable device: not locked, tray closed
(QEMU) 

$ ./qmp-shell -p /var/run/qmp.sock
Welcome to the QMP low-level shell!
Connected to QEMU 2.12.0

(QEMU) query-block
{
    "return": [
        {
            "locked": false, 
            "tray_open": false, 
            "io-status": "ok", 
            "qdev": "/machine/unattached/device[22]", 
            "removable": true, 
            "device": "ide1-cd0", 
            "type": "unknown"
        }, 
        {
            "device": "floppy0", 
            "type": "unknown", 
            "qdev": "/machine/unattached/device[15]", 
            "locked": false, 
            "removable": true
        }, 
        {
            "device": "sd0", 
            "type": "unknown", 
            "locked": false, 
            "removable": true
        }
    ]
}
(QEMU)

command output

Most QMP commands

"blockdev-add"
"chardev-remove"
"chardev-add"
"query-cpu-definitions"
"query-machines"
"device-list-properties"
"change-vnc-password"
"nbd-server-stop"
"nbd-server-add"
"nbd-server-start"
"query-block-jobs"
"query-balloon"
"query-migrate-capabilities"
"migrate-set-capabilities"
"query-migrate"
"query-command-line-options"
"query-uuid"
"query-name"
"query-spice"
"query-vnc"
"query-mice"
"query-status"
"query-kvm"
"query-pci"
"query-cpus"
"query-blockstats"
"query-block"
"query-chardev"
"query-events"
"query-commands"
"query-version"
"human-monitor-command"
"qmp_capabilities"
"expire_password"
"set_password"
"block_set_io_throttle"
"block_passwd"
"query-fdsets"
"remove-fd"
"add-fd"
"closefd"
"getfd"
"set_link"
"balloon"
"block_resize"
"netdev_del"
"netdev_add"
"client_migrate_info"
"migrate_set_downtime"
"migrate_set_speed"
"query-migrate-cache-size"
"migrate-set-cache-size"
"migrate_cancel"
"migrate"
"cpu-add"
"cpu"
"device_del"
"device_add"
"system_powerdown"
"system_reset"
"system_wakeup"

QMP by virsh qemu-monitor-command

# 6095 is domain id

# HMP protocol
$ virsh qemu-monitor-command –hmp 6095 info block
drive-virtio-disk0: removable=0 file=/export/jvirt/jcs-agent/instances/i-sm6pxr4068/vda backing_file=/export/jvirt/jcs-agent/instances/_base/img-8sdjnj4qbq backing_file_depth=1 ro=0 drv=qcow2 encrypted=0 bps=0 bps_rd=0 bps_wr=0 iops=0 iops_rd=0 iops_wr=0

# QMP protocol --pretty means format json output
$ virsh qemu-monitor-command  6095 --pretty '{ "execute": "query-block"}'
{
  "return": [
    {
      "device": "drive-virtio-disk0",
      "locked": false,
      "removable": false,
      "inserted": {
        "iops_rd": 0,
        "image": {
          "backing-image": {
            "virtual-size": 42949672960,
            "filename": "/export/jvirt/jcs-agent/instances/_base/img-8sdjnj4qbq",
            "cluster-size": 65536,
            "format": "qcow2",
            "actual-size": 24866193408,
            "format-specific": {
              "type": "qcow2",
              "data": {
                "compat": "1.1",
                "lazy-refcounts": false
              }
            },
            "dirty-flag": false
          },
          "virtual-size": 42949672960,
          "filename": "/export/jvirt/jcs-agent/instances/i-sm6pxr4068/vda",
          "cluster-size": 65536,
          "format": "qcow2",
          "actual-size": 21068431360,
          "format-specific": {
            "type": "qcow2",
            "data": {
              "compat": "1.1",
              "lazy-refcounts": false
            }
          },
          "backing-filename": "/export/jvirt/jcs-agent/instances/_base/img-8sdjnj4qbq",
          "dirty-flag": false
        },
        "iops_wr": 0,
        "ro": false,
        "backing_file_depth": 1,
        "drv": "qcow2",
        "iops": 0,
        "bps_wr": 0,
        "backing_file": "/export/jvirt/jcs-agent/instances/_base/img-8sdjnj4qbq",
        "encrypted": false,
        "bps": 0,
        "bps_rd": 0,
        "file": "/export/jvirt/jcs-agent/instances/i-sm6pxr4068/vda",
        "encryption_key_missing": false
      },
      "type": "unknown"
    }
  ],
  "id": "libvirt-8302918"
}

QMP over tcp socket

# run your qemu instance:   -qmp tcp:127.0.0.1:12345,server,nowait
$ /usr/libexec/qemu-kvm  -kernel ./kernel -initrd ./initramfs.img -nographic -append "console=ttyS0" -qmp tcp:127.0.0.1:12345,server,nowait -serial mon:stdio

[root@dev jason]# nc localhost 12345
{"QMP": {"version": {"qemu": {"nano": 1, "micro": 0, "minor": 9, "major": 2}, "package": " (-dirty)"}, "capabilities": []}}
{ "execute": "qmp_capabilities" } # must run this firstly
{"return": {}}

# HMP
{"execute": "human-monitor-command", "arguments": {"command-line": "info block"}}
{"return": "ide1-cd0: [not inserted]\r\n    Removable device: not locked, tray closed\r\n\r\nfloppy0: [not inserted]\r\n    Removable device: not locked, tray closed\r\n\r\nsd0: [not inserted]\r\n    Removable device: not locked, tray closed\r\n"}


# QMP
{"execute": "query-block"}
{"return": [{"io-status": "ok", "device": "ide1-cd0", "locked": false, "removable": true, "tray_open": false, "type": "unknown"}, {"device": "floppy0", "locked": false, "removable": true, "type": "unknown"}, {"device": "sd0", "locked": false, "removable": true, "type": "unknown"}]}


ctrl +C

[root@dev jason]# telnet 127.0.0.1 12345
Trying 127.0.0.1...
Connected to 127.0.0.1.
Escape character is '^]'.
{"QMP": {"version": {"qemu": {"nano": 1, "micro": 0, "minor": 9, "major": 2}, "package": " (-dirty)"}, "capabilities": []}}
{ "execute": "qmp_capabilities" } # must run this firstly
{"return": {}}
{ "execute": "query-commands" } # check all QMP commands support

# HMP
{"execute": "human-monitor-command", "arguments": {"command-line": "info block"}}
{"return": "ide1-cd0: [not inserted]\r\n    Removable device: not locked, tray closed\r\n\r\nfloppy0: [not inserted]\r\n    Removable device: not locked, tray closed\r\n\r\nsd0: [not inserted]\r\n    Removable device: not locked, tray closed\r\n"}


# QMP
{"execute": "query-block"}
{"return": [{"io-status": "ok", "device": "ide1-cd0", "locked": false, "removable": true, "tray_open": false, "type": "unknown"}, {"device": "floppy0", "locked": false, "removable": true, "type": "unknown"}, {"device": "sd0", "locked": false, "removable": true, "type": "unknown"}]}

ctrl + }
telnet>quit

QMP over unix socket

$ /usr/libexec/qemu-kvm  -kernel ./kernel -initrd ./initramfs.img -nographic -append "console=ttyS0"  -qmp unix:/var/run/qmp.sock,server,nowait -serial mon:stdio
$ nc -U /var/run/qmp.sock
{"QMP": {"version": {"qemu": {"nano": 1, "micro": 0, "minor": 9, "major": 2}, "package": " (-dirty)"}, "capabilities": []}}
{ "execute": "qmp_capabilities" } # must run this firstly
{"return": {}}


# HMP
{"execute": "human-monitor-command", "arguments": {"command-line": "info block"}}
{"return": "ide1-cd0: [not inserted]\r\n    Removable device: not locked, tray closed\r\n\r\nfloppy0: [not inserted]\r\n    Removable device: not locked, tray closed\r\n\r\nsd0: [not inserted]\r\n    Removable device: not locked, tray closed\r\n"}

disk image

Different hypervisor softwares uses different image format, here is a list of them.

VDI is the native format of VirtualBox. Other virtualization software generally don’t support VDI
VMDK is developed by and for VMWare, but VirtualBox also support it. This format might be the the best choice for you because you want wide compatibility with other virtualization software. it has smaller disk size than qcow2
VHD is the native format of Microsoft Virtual PC. Windows Server 2012 introduced VHDX as the successor to VHD, but VirtualBox does not support VHDX.
QCOW is the old original version of the qcow format. It has been superseded by qcow2, which VirtualBox does not support.
QED was an abandoned enhancement of qcow2. QEMU advises against using QED(not use it).

qemu-img is a tool which supports converting from one image format to another. if you want to run VM between hypervisors

Operation

qemu-img allows you to create, convert and modify images offline. It can handle all image formats supported by QEMU

$ qemu-img create -f qcow2 /var/lib/libvirt/images/disk1.img 10G

# Raw type: Raw disk image format is default. This format has the advantage of being simple and easily exportable to all other emulators
$ dd if=/dev/zero of=/var/lib/libvirt/images/disk1.img bs=2M count=5120 status=progress
$ qemu-img create -f raw /var/lib/libvirt/images/disk1.img 10G
$ qemu-img info /var/lib/libvirt/images/disk1.img


# vhd to qcow2
$ qemu-img convert -p -f vpc -O qcow2 centos6.9.vhd centos6.9.qcow2

# vmdk to qcow
$ qemu-img convert -p -f vmdk -O qcow2 centos6.9.vmdk centos6.9.qcow2

# resize a disk, add 3G to guest disk
# Can't resize an image which has snapshots
# Note the new 3G is not seen by guest right now you have to create partition on it using parted command
$ qemu-img resize data.qcow2 +3G

# shrink the free space(not used, unallocated in guest)
$ qemu-img resize --shrink data.qcow2 -3G

# to 100G
$ qemu-img resize data.qcow2 100G

$ qemu-img info data.qcow2

# create snapshots of image(snapshot stores in image itself)
$ qemu-img snapshot -c org CentOS-7-x86_64-GenericCloud.qcow2

# show snapshots of image
$ qemu-img snapshot -l CentOS-7-x86_64-GenericCloud.qcow2 
Snapshot list:
ID        TAG                 VM SIZE                DATE       VM CLOCK
1         1646107061             268M 2022-03-01 11:57:41   00:15:03.857
2         1646107194             268M 2022-03-01 11:59:54   00:17:14.997

# delete snapshots with given ID
$ qemu-img snapshot -d  1  CentOS-7-x86_64-GenericCloud.qcow2

NOTE: make sure make a copy of disk before operation

backing file(qcow2)

In essence, QCOW2(Qemu Copy-On-Write) gives you an ability to create a base-image, and create several ‘disposable’ copy-on-write overlay disk images on top of the base image(also called backing file). Backing files and overlays are extremely useful to rapidly instantiate thin-privisoned virtual machines(more on it below). Especially quite useful in development & test environments, so that one could quickly revert to a known state & discard the overlay. It can also be used to start 100 virtual machines from a common backing image, thus saving space.

use case

.--------------.    .-------------.    .-------------.    .-------------.
|              |    |             |    |             |    |             |
| RootBase     |<---| Overlay-1   |<---| Overlay-1A  <--- | Overlay-1B  |
| (raw/qcow2)  |    | (qcow2)     |    | (qcow2)     |    | (qcow2)     |
'--------------'    '-------------'    '-------------'    '-------------'

The above figure illustrates - RootBase is the backing file for Overlay-1, which in turn is backing file for Overlay-2, which in turn is backing file for Overlay-3.

.-----------.   .-----------.   .------------.  .------------.  .------------.
|           |   |           |   |            |  |            |  |            |
| RootBase  |<--- Overlay-1 |<--- Overlay-1A <--- Overlay-1B <--- Overlay-1C |
|           |   |           |   |            |  |            |  | (Active)   |
'-----------'   '-----------'   '------------'  '------------'  '------------'
   ^    ^
   |    |
   |    |       .-----------.    .------------.
   |    |       |           |    |            |
   |    '-------| Overlay-2 |<---| Overlay-2A |
   |            |           |    | (Active)   |
   |            '-----------'    '------------'
   |
   |
   |            .-----------.    .------------.
   |            |           |    |            |
   '------------| Overlay-3 |<---| Overlay-3A |
                |           |    | (Active)   |
                '-----------'    '------------'

The above figure is just another representation which indicates, we can use a 'single' backing file, and create several overlays -- which can be used further, to create overlays on top of them.

NOTE: Backing files are always opened read-only. In other words, once an overlay is created, its backing file should not be modified(as the overlay depends on a particular state of the backing file)

# base <- sn1 <- sn2 <- sn3

# create sn1
$ qemu-img create -b /home/data/tmp/base.img -f qcow2 /home/data/tmp/sn1.qcow2
$ qemu-img info /home/data/tmp/sn1.qcow2
qemu-img info /home/data/tmp/sn1.qcow2
image: /home/data/tmp/sn1.qcow2
file format: qcow2
virtual size: 25G (26843545600 bytes)
disk size: 196K
cluster_size: 65536
backing file: /home/data/tmp/base.img
Format specific information:
    compat: 1.1
    lazy refcounts: false
    refcount bits: 16
    corrupt: false

# create sn2
$ qemu-img create -b /home/data/tmp/sn1.qcow2 -f qcow2 /home/data/tmp/sn2.qcow2
$ qemu-img info /home/data/tmp/sn2.qcow2
image: /home/data/tmp/sn2.qcow2
file format: qcow2
virtual size: 25G (26843545600 bytes)
disk size: 196K
cluster_size: 65536
backing file: /home/data/tmp/sn1.qcow2
Format specific information:
    compat: 1.1
    lazy refcounts: false
    refcount bits: 16
    corrupt: false

# create sn3
$ qemu-img create -b /home/data/tmp/sn2.qcow2 -f qcow2 /home/data/tmp/sn3.qcow2
$ qemu-img info /home/data/tmp/sn3.qcow2
image: /home/data/tmp/sn3.qcow2
file format: qcow2
virtual size: 25G (26843545600 bytes)
disk size: 196K
cluster_size: 65536
backing file: /home/data/tmp/sn2.qcow2
Format specific information:
    compat: 1.1
    lazy refcounts: false
    refcount bits: 16
    corrupt: false

# an entire backing chain can be recursively 
$ qemu-img info --backing-chain /home/data/tmp/sn3.qcow2
image: /home/data/tmp/sn3.qcow2
file format: qcow2
virtual size: 25G (26843545600 bytes)
disk size: 196K
cluster_size: 65536
backing file: /home/data/tmp/sn2.qcow2
Format specific information:
    compat: 1.1
    lazy refcounts: false
    refcount bits: 16
    corrupt: false

image: /home/data/tmp/sn2.qcow2
file format: qcow2
virtual size: 25G (26843545600 bytes)
disk size: 196K
cluster_size: 65536
backing file: /home/data/tmp/sn1.qcow2
Format specific information:
    compat: 1.1
    lazy refcounts: false
    refcount bits: 16
    corrupt: false

image: /home/data/tmp/sn1.qcow2
file format: qcow2
virtual size: 25G (26843545600 bytes)
disk size: 196K
cluster_size: 65536
backing file: /home/data/tmp/base.img
Format specific information:
    compat: 1.1
    lazy refcounts: false
    refcount bits: 16
    corrupt: false

image: /home/data/tmp/base.img
file format: raw
virtual size: 25G (26843545600 bytes)
disk size: 16M

# size of overlays is smaller!!!
$ ls  -alh sn*.qcow2
-rw-r--r-- 1 root root 193K Aug  2 16:38 sn1.qcow2
-rw-r--r-- 1 root root 193K Aug  2 16:39 sn2.qcow2
-rw-r--r-- 1 root root 193K Aug  2 16:39 sn3.qcow2

# merge sn2 with sn1(commit sn2 changes to sn1)
$ qemu-img commit /home/data/tmp/sn2.qcow2
$ qemu-img rebase -u -b /home/data/tmp/sn1.qcow2 /home/data/tmp/sn3.qcow2
# we need to rebase sn3 as before it points to sn2 as backing file!!!

$ qemu-img info --backing-chain /home/data/tmp/sn3.qcow2
image: /home/data/tmp/sn3.qcow2
file format: qcow2
virtual size: 25G (26843545600 bytes)
disk size: 196K
cluster_size: 65536
backing file: /home/data/tmp/sn1.qcow2
Format specific information:
    compat: 1.1
    lazy refcounts: false
    refcount bits: 16
    corrupt: false

image: /home/data/tmp/sn1.qcow2
file format: qcow2
virtual size: 25G (26843545600 bytes)
disk size: 196K
cluster_size: 65536
backing file: /home/data/tmp/base.img
Format specific information:
    compat: 1.1
    lazy refcounts: false
    refcount bits: 16
    corrupt: false

image: /home/data/tmp/base.img
file format: raw
virtual size: 25G (26843545600 bytes)
disk size: 16M

FAQ

Boot with disk/diskless

# root filesystem is in an ext2 "hard disk"
$ /usr/libexec/qemu-kvm -kernel normal/bzImage -drive file=rootfs.ext2

# root filesystem is in initramfs
$ /usr/libexec/qemu-kvm -kernel normal/bzImage -initrd initramfs.img
# full command to run
$ /usr/libexec/qemu-kvm  -kernel normal/bzImage -initrd initramfs.img -nographic -append "console=ttyS0"

# root filesystem is built in kernel
$ /usr/libexec/qemu-kvm -kernel with_initramfs/bzImage
# Neither -drive nor -initrd are given.
# with_initramfs/bzImage is a kernel compiled with options identical to normal/bzImage, except for one: CONFIG_INITRAMFS_SOURCE=initramfs.img pointing to the exact same CPIO as from the -initrd example.

how to terminate qemu process from console

1
2

# must have  -serial mon:stdio, otherwise Ctrl+A, then `X` not working
$ /usr/libexec/qemu-kvm  -kernel ./kernel -initrd ./initramfs.img -nographic -append "console=ttyS0"  -qmp unix:/var/run/qmp.sock,server,nowait -serial mon:stdio

Ctrl + A, then press X

1	$ virt-install --boot kernel=./kernel,initrd=./initramfs.img,kernel_args="console=ttyS0,115200n8" --name $VM_NAME --memory 500 --vcpus 1 --disk none --os-type linux --graphics none --network default

Ctrl + ] to quit virt-install

how to start qemu with vnc

Add vnc parameter with specified ID, qemu-process will listen on particular port as for vnc id 0==5900, so id 88 means qemu-process will listen on 5988.

$ id=88
$ /usr/libexec/qemu-kvm  -kernel ./kernel -initrd ./initramfs.img -nographic -append "console=ttyS0"  -qmp unix:/var/run/qmp.sock,server,nowait -serial mon:stdio -vnc 0.0.0.0:$id

# OR let qemu select the vnc port automatically
$ /usr/libexec/qemu-kvm  -kernel ./kernel -initrd ./initramfs.img -nographic -append "console=ttyS0"  -qmp unix:/var/run/qmp.sock,server,nowait -serial mon:stdio -vnc 0.0.0.0

$ netstat -nltp | grep kvm
5:tcp        0      0 0.0.0.0:5988            0.0.0.0:*               LISTEN      8566/qemu-kvm

how to mount raw disk

# create raw disk by dd or qemu-img create
$ dd if=/dev/zero of=disk.raw bs=1024k seek=25600 count=0

# check disk info (type) etc
$ qemu-img info disk.raw
image: disk.raw
file format: raw
virtual size: 25G (26843545600 bytes)
disk size: 13M

# format with xfs
$ mkfs.xfs disk.raw

$ blkid disk.raw
disk.raw: UUID="aa26d14d-267f-477b-b1a1-3848c448a0b3" TYPE="xfs" 

########################################### mount raw disk ######################
# make disk image as a block device!!!
# The -f option will search for the next free loop device to attach the image to. 
# The -P option will trigger a scan for partitions on the attached image and create devices for each partition detected
# losetup only supports raw disk, not qcow2
#================ one way ====================================
$ losetup -f -P disk.raw
$ losetup -l
NAME       SIZELIMIT OFFSET AUTOCLEAR RO BACK-FILE
/dev/loop0         0      0         0  0 /root/jason/disk.raw

# mount block device to a dir, then we can access its files
$ mount /dev/loop0  /tmp/raw

#================ another way ====================================
$ mount -t xfs disk.raw /mnt/disk

after enter qmp command, no output

This is probably, qmp.sock is connected with another client, as one qmp.sock can only talk with only on client!!!, if multiple clients want to connect with qemu instance, create multiple qmp socks when start qemu instance like this

$ /usr/libexec/qemu-kvm  -kernel ./kernel -initrd ./initramfs.img -nographic -append "console=ttyS0"  -qmp unix:/var/run/qmp1.sock,server,nowait  -qmp unix:/var/run/qmp2.sock,server,nowait -serial mon:stdio

difference between -net, -netdev and -nic

In short, the -net is the legacy option, while -netdev comes in to solve issue present for -net, the newest way -nic from 2.12 is easiest way to set up an interface.

The -net option can create either a front-end or a back-end (but has disadvanges than -nic)
-netdev can only create a back-end, use -device for front end
Asingle occurrence of -nic will create both a front-end and a back-end.

Suggestion

The new -nic option gives you an easy and quick way to configure the networking of your guest.
For more detailed configuration, e.g. when you need to tweak the details(like queue size, buffer etc) of the emulated NIC hardware, you can use -device together with -netdev.
The -net option should be avoided these days unless you really want to configure a set-up with a hub between the front-ends and back-ends.

-nic like a wrapper of -netdev and -deivce pair, easy to use but less control.

-net(legacy option)

QEMU’s initial way of configuring the network for the guest was the -net option, the emulated NIC and the host back-end are not directly connected. They are rather both connected to an emulated hub by default vlan0 (called “vlan” in older versions of QEMU). Therefore, if you start QEMU with -net nic,model=e1000 -net user -net nic,model=virtio -net tap for example, you get a setup where all the front-ends and back-ends are connected together via a hub(vlan)

front end -net nic,model=xyz or -net nic,model=virtio and backend -net user or -net tap (e.g. -net user for the SLIRP back-end)

-net

That means the e1000 NIC also gets the network traffic from the virtio-net NIC and both host back-ends, this can be solved by giving one hub for each nic, for example -net nic,model=e1000,vlan=0 -net user,vlan=0 -net nic,model=virtio,vlan=1 -net tap,vlan=1 moves the virtio-net NIC and the “tap” back-end to a second hub (with ID #1), Please note that the **“vlan” parameter will be dropped in QEMU v3.0 since the term was rather confusing (it’s not related to IEEE 802.1Q for example) *** and caused a lot of misconfigurations in the past.

-netdev

To configure a network connection where the emulated NIC is directly connected to a host network back-end, without a hub in between, the well-established solution is to use the -netdev option for the back-end, together with -device for the front-end.

-netdev user,id=n1 -device e1000,netdev=n1 -netdev tap,id=n2 -device virtio-net,netdev=n2

directly connected

Now while -netdev together with -device provide a very flexible and extensive way to configure a network connection, there are still two drawbacks with this option pair which prevented us from deprecating the legacy -net option completely:

The -device option can only be used for pluggable NICs. Boards (e.g. embedded boards) which feature an on-board NIC cannot be configured with -device yet, so -net nic,netdev= must be used here instead.
In some cases, the -net option is easier to use (less to type). For example, assuming you want to set up a “tap” network connection and your default scripts /etc/qemu-ifup and -down are already in place, it’s enough to type -net nic -net tap to start your guest. To do the same with -netdev, you always have to specify an ID here, too, for example like this: -netdev tap,id=n1 -device e1000,netdev=n1.

-nic

Looking at the disadvantages listed above, users could benefit from a convenience option that:

is easier to use (and shorter to type) than -netdev ,id= -device ,netdev=
can be used to configure on-board / non-pluggable NICs, too
does not place a hub between the NIC and the host back-end.

This is where the new -nic option kicks in!! this option can be used to configure both the guest’s NIC hardware and the host back-end in one go, instead of -netdev tap,id=n1 -device e1000,netdev=n1 you can simply type -nic tap,model=e1000, you can simply run QEMU with -nic model=help. Beside being easier to use, the -nic option can be used to configure on-board NICs, For machines that have on-board NICs, the first -nic option configures the first on-board NIC, the second -nic option configures the second on-board NIC, and so forth.

qemu-kvm and qemu-kvm-ev

qemu-kvm and qemu-kvm-ev are usually built from the same src.rpm. Some newer or advanced virtualization features have been implemented in qemu-kvm-ev which are not able to be backported to qemu-kvm for compatibility reasons. Also, recently qemu-kvm-ev has a newer qemu-kvm version than the one provided by qemu-kvm being qemu-kvm-ev rebuilt from Red Hat Enterprise Virtualization.

why qemu needs nvdimm??

Really fast writes particularly interesting for:

In-memory databases – get persistence for free*!
Databases – transaction logs
File & storage systems – frequently updated metadata

why PCI address(in guest) is not the same as we set?

Most of that each PCI device would show up in the guest OS with a PCI address that matches the one present in command line set by user, but that’s not guaranteed to happen and will in fact not be the case in all but the simplest scenarios. refer to Qemu pci address

reduce qcow file size on host

The virt-sparsify utility, as we just saw, is what we want to use if we are dealing with a qcow2 image, which by default makes use of thin-provisioning, and we want to make the space previously allocated on the disk image and now not used anymore, available again on the host.

1	$virt-sparsify --in-place disk.qcow2

how to enable IOMMU for VM

Enable IOMMU for VM, so that we can start embeded VM with device passthrough.

$ qemu-kvm -machine q35,accel=kvm -cpu host -device intel-iommu ... this like enable iommu from hardware and enable it in bios, later we need add parameter to boot command line with intel_iommu=on iommu=pt.

Inside this vm, we can start another vm with passthrough device from host(parent vm).