linux-kernel-boot

Overview

why need intial ram disk
Many Linux distributions ship a single, generic Linux kernel image – one that the distribution’s developers create specifically to boot on a wide variety of hardware. The device drivers for this generic kernel image are included as loadable kernel modules because statically compiling many drivers into one kernel causes the kernel image to be much larger, in some cases to cause boot-time crashes or other problems due to probing for inexistent or conflicting hardware. This static-compiled kernel approach also leaves modules in kernel memory which are no longer used or needed, and raises the problem of detecting and loading the modules necessary to mount the root file system at boot time, or for that matter, deducing where or what the root file system is.

To avoid having to hardcode handling for so many special cases into the kernel, an initial boot stage with a temporary root file-system(ram disk with temporary root fs) is used. This temporary root file-system can contain user-space helpers which do the hardware detection, module loading and device discovery necessary to get the real root file-system mounted

initial ramdisk is for loading a temporary root file system into memory, to be used as part of the Linux startup process. initrd and initramfs refer to two different methods of achieving this. Both are commonly used to make preparations before the real root file system can be mounted, if no real root file system is proivdes, the initial ramdisk is used for root file system which is in memory.

The bootloader will load the kernel and initial root file system image into memory and then start the kernel, passing in the memory address of the image. At the end of its boot sequence, the kernel tries to determine the format of the image from its first few blocks of data, which can lead either to the initrd or initramfs scheme.

initial root file system image format schema

  • initrd scheme: the image may be a file system image (optionally compressed), which is made available in a special block device (/dev/ram) that is then mounted as the initial root file system during boots, The driver for that file system must be compiled statically into the kernel. Many distributions originally used compressed ext2 file system images. Once the initial root file system is up, the kernel executes /linuxrc as its first process; when it exits, the kernel assumes that the real root file system has been mounted and executes /sbin/init to begin the normal user-space boot process.

    • A ramdev block device is created(default fixed size 16M). It is a ram-based block device, that is a simulated hard disk that uses memory instead of physical disks.
    • The initrd file is read and unzipped into the device, as if you did zcat initrd | dd of=/dev/ram0 or something similar.
    • The initrd contains an image of a filesystem, so now you can mount the filesystem as usual: mount /dev/ram0 /root. Naturally, filesystems need a driver, so if you use ext2, the ext2 driver has to be compiled in-kernel.
    • exec /linuxrc as first process which mount real root file system, then call /sbin/init to begin user-space boot process.
  • initramfs scheme: (available since the Linux kernel 2.6.13), the image may be a cpio archive (optionally compressed). The archive is unpacked by the kernel into a special instance of a tmpfs that becomes the initial root file system. This scheme has the advantage of not requiring an intermediate file system or block drivers to be compiled into the kernel. In the initramfs scheme, the kernel executes /init as its first process that is not expected to exit.

    • A tmpfs is mounted: mount -t tmpfs nodev /root. The tmpfs doesn’t need a driver, it is always on-kernel. No device needed, no additional drivers.
    • The initramfs is uncompressed directly into this new filesystem: zcat initramfs | cpio -i, or similar.
    • exec /init never exit

initrd and initramfs

initrd schema

  • initrd is for Linux kernels 2.4 and lower.
  • Initrd requires at least one file system driver be compiled into the kernel
  • A disk created by Initrd has got to have a fixed size
  • All of the reads/writes on Initrd are buffered redundantly (unnecessarily) into main memory

So, initrd is deprecated and is replaced by initramfs.

1
2
3
4
5
6
7
8
# inspect initrd image(it's disk)
$ gunzip initrd.gz
$ file -L initrd
initrd: Linux rev 1.0 ext2 filesystem data (mounted or unclean), UUID=6d512aa6-269e-4932-ba2b-83d953559340

$ mount ‑t ext2 ‑o loop initrd /mnt/initrd
$ ls /mnt/initrd
bin cleanup dev drivers.lzm etc lib liblinuxlive linuxrc mnt proc sbin sys tmp usr usr.lzm var

initramfs schema

  • initramfs is a Linux 2.6 and above.
  • This feature is made up from a cpio archive of files that enables an initial root filesystem and init program to reside in kernel memory cache, rather than on a ramdisk, as with initrd filesystems.
  • with initramfs, you create an archive with the files which the kernel extracts to a tmpfs.
  • intramfs can increase boot-time flexibility, memory efficiency, and simplicity
  • dracut is the tool used to create the initramfs image.
  • initramfs location of init : /init
1
2
3
4
5
# inspect initramfs(it's just root file system)
$ file -L initramfs-3.10.0-1160.el7.x86_64.img
initramfs-3.10.0-1160.el7.x86_64.img: gzip compressed data, from Unix, last modified: Wed Sep 29 18:29:57 2021, max compression

$ zcat initramfs-3.10.0-1160.el7.x86_64.img | cpio -idmv

create initramfs

Actually, initramfs is created when you build kernel, you can also create initramfs with other tools without build your kernel.
When we first boot, we need at least some tools to start working. This includes the init process and some tools like ls, mount, mv, etc. To get those user space tools you can use BusyBox. BusyBox has many useful commands available for just 1.1MB, it’s a binary that support many command

1
2
3
# use busybox
$ busybox ls
$ busybox df

create_initramfs.sh

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
#!/bin/bash

ARCH="x86_64"
BB_VER="1.31.0"

# Dirs
mkdir -p root
cd root
mkdir -p bin dev etc lib mnt proc sbin sys tmp var
cd -

# Utils
if [ ! -f "root/bin/busybox" ]; then
curl -L "https://www.busybox.net/downloads/binaries/${BB_VER}-defconfig-multiarch-musl/busybox-${ARCH}" >root/bin/busybox
fi
cd root/bin
chmod +x busybox
# as this two may be called before init process, hence link it
ln -s busybox mount
ln -s busybox sh
cd -

# Init process
# init process create soft link at /bin for all commands that busybox support
cat >>root/init << EOF
#!/bin/busybox sh
/bin/busybox --install -s /bin
mount -t devtmpfs devtmpfs /dev
mount -t proc proc /proc
mount -t sysfs sysfs /sys
mount -t tmpfs tmpfs /tmp
setsid cttyhack sh
exec /bin/sh
EOF
chmod +x root/init

# initramfs creation

cd root
find . | cpio -ov --format=newc | gzip -9 >../initramfs
cd -

boot with disk/diskless

1
2
3
4
5
6
7
8
9
10
11
12
# root filesystem is in an ext2 "hard disk"
$ qemu-system-x86_64 -kernel normal/bzImage -drive file=rootfs.ext2

# root filesystem is in initramfs
$ qemu-system-x86_64 -kernel normal/bzImage -initrd initramfs.img
# full command to run
$ qemu-system-x86_64 -kernel normal/bzImage -initrd initramfs.img -nographic -append "console=ttyS0"

# root filesystem is built in kernel
$ qemu-system-x86_64 -kernel with_initramfs/bzImage
# Neither -drive nor -initrd are given.
# with_initramfs/bzImage is a kernel compiled with options identical to normal/bzImage, except for one: CONFIG_INITRAMFS_SOURCE=initramfs.img pointing to the exact same CPIO as from the -initrd example.

ramfs vs tmpfs

A ramdisk is a volatile storage space defined in the RAM memory. all information stored in it will be lost if the device is umounted or system reboots.

In Linux, ramdisks can be created using the command mount and the filesystems tmpfs and ramfs

  • Tmpfs: Tmpfs is a temporary file system stored in the RAM memory (and/or swap memory). By specifying this file system with the argument -t of the command mount, you can assign limited memory resources to a temporary file system.

    • stored in ram and swap memory
    • ensure a limit
    • adjusted on the fly via ‘mount -o remount …’
    • Normal users can be allowed write access to tmpfs mounts!
  • Ramfs: Ramfs is similar to Tmpfs, it uses ram memory only and the user can’t ensure a limit, and the allocated resource grows dynamically. If the user doesn’t control the ramfs consumption, ramfs will keep using all the memory until hanging or crashing the system

    • stored in ram only
    • can not ensure a limit, but traditonal ram disk (/dev/ramX) has fixed size, default is 16M
    • cant adjust size on the fly for /dev/ramX, you need to reboot system or reload kernel module brd.
    • only root use can access ramfs mounts!

ram disk(ramfs)

As mentioned above, they are two ways to create ram disk, one is using ramfs, the other is using tmpfs.

  • enable traditional ram disk
    old system ram disk is built into kernel with these kernel configs, that means after system boots, you will seee /dev/ram0, /dev/ram1, /dev/ramX which has default fixed size configured during kernel compiling. even you have 16 /dev/ramX, the memory for each block device is not prealloated, memory allocation happens when make fs on that device.
    1
    2
    3
    CONFIG_BLK_DEV_RAM=y
    CONFIG_BLK_DEV_RAM_COUNT=16
    CONFIG_BLK_DEV_RAM_SIZE=16384
    new system newer system by default compile it as kernel module brd, you need to load this module when using /dev/ramX
    1
    2
    3
    CONFIG_BLK_DEV_RAM=m
    CONFIG_BLK_DEV_RAM_COUNT=16
    CONFIG_BLK_DEV_RAM_SIZE=16384
  • change traditonal ram disk size
    old system ram disk is built into kernel, there is only one way to change ram disk ize, that’s by appending parmaters to kernel boot line like this.
    1
    kernel /vmlinuz-2.6.32.24 ro root=LABEL=/ rhgb quiet ramdisk_size=10485760
    new system ram disk szie can be changed only when brd is loaded
    1
    2
    # 1G, one ram disk /dev/ram0
    $ modprobe brd rd_nr=1 rd_size=1048576
  • use traditional ram disk
    1
    2
    3
    4
    5
    6
    7
    8
    $ mkfs /dev/ram0
    $ mkdir /mnt/randisk
    $ mount /dev/ram0 /mnt/ramdisk
    $ df -h
    df -h
    Filesystem Size Used Avail Use% Mounted on
    ...
    /dev/ram0 16M 140K 15M 1% /mnt/ramdisk
  • use ramfs not /dev/ramX(brd is not needed for this)
    1
    $ mount -t ramfs ramfs /tmp/ramdisk

    ram disk(tmpfs)

    Use tmpfs to create ram disk is easy.
    1
    2
    3
    4
    5
    6
    7
    8
    # memory is allocated only when it's used, it's not prereserved.
    $ mount -t tmpfs -o size=10G none /tmp/ramdisk
    (base) [root@dev github]# df -h
    ...
    none 10G 0 10G 0% /tmp/ramdisk

    (base) [root@dev github]# mount
    none on /tmp/ramdisk type tmpfs (rw,relatime,seclabel,size=10485760k)

As when umount data on ram disk is gone, hence in order to save data in ram disk when reboot, we need a service that will copy data from ram disk to hard disk also copy it back to ram disk.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
$ vi /lib/systemd/system/ramdisk-sync.service

[Unit]
# runs before umount service, to make sure, we copy data before umounting
# but if you run umount mannually, data is lost.
Before=umount.target

[Service]
Type=oneshot
User=root

# root below can be change to any user.
ExecStartPre=/bin/chown -Rf root /mnt/ramdisk
# when service starts, copy back when system boots
ExecStart=/usr/bin/rsync -ar /mnt/ramdisk_backup/ /mnt/ramdisk/
# when serivce stops, copy data from ram disk to hard disk
ExecStop=/usr/bin/rsync -ar /mnt/ramdisk/ /mnt/ramdisk_backup/
ExecStopPost=/bin/chown -Rf root /mnt/ramdisk_backup
RemainAfterExit=yes

[Install]
WantedBy=multi-user.target

$ vi /etc/fstab

tmpfs /mnt/ramdisk tmpfs rw,size=110M 0 0

Acutally, there are some ram disk(tmpfs) created by systemd if you check with df, it’s /run/ and /dev/shm, sys/fs/cgroup with default size(half of total physical memory)

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
(base) [root@dev etc]# df
Filesystem 1K-blocks Used Available Use% Mounted on
devtmpfs 4074576 0 4074576 0% /dev
tmpfs 4086484 0 4086484 0% /dev/shm
tmpfs 4086484 9168 4077316 1% /run
tmpfs 4086484 0 4086484 0% /sys/fs/cgroup

(base) [root@dev etc]# free
total used free shared buff/cache available
Mem: 8172968 979092 4939616 9308 2254260 6880632
Swap: 8257532 0 8257532

(base) [root@dev etc]# mount | grep tmpfs
devtmpfs on /dev type devtmpfs (rw,nosuid,seclabel,size=4074576k,nr_inodes=1018644,mode=755) # with size
tmpfs on /dev/shm type tmpfs (rw,nosuid,nodev,seclabel) # without size half of total memory
tmpfs on /run type tmpfs (rw,nosuid,nodev,seclabel,mode=755)
tmpfs on /sys/fs/cgroup type tmpfs (ro,nosuid,nodev,noexec,seclabel,mode=755)

resize /dev/shm

1
2
3
4
5
6
7
8
9
10
11
12
Edit file /etc/fstab (with sudo if needed).
In this file, try to locate a line like this one : none /dev/shm tmpfs defaults,size=4G 0 0.

Case 1 - This line exists in your /etc/fstab file:

Modify the text after size=. For example if you want an 8G size, replace size=4G by size=8G.
Exit your text editor, then run (with sudo if needed) $ mount -o remount /dev/shm.

Case 2 - This line does NOT exists in your /etc/fstab file:

Append at the end of the file the line none /dev/shm tmpfs defaults,size=4G 0 0, and modify the text after size=. For example if you want an 8G size, replace size=4G by size=8G.
Exit your text editor, then run (with sudo if needed) $ mount /dev/shm.

REF