Aku Kotkavuo

@eagleflo

Portrait

I am a software generalist from Helsinki, Finland. I’ve been working with software for most of my life. I practise writing about related topics here.

My open source projects include mpyq and jisho.


Regenerating vmlinuz-linux

28th April 2025

I've been daily driving Linux for decades at this point. Every now and then, Problems™ arise despite the best efforts of the Linux engineering community. This morning was one of those days.

Error loading \vmlinuz-linux: Volume Corrupt

Sigh. Mondays, right?

One lesson I learned early on is to have a bootable Linux USB stick at hand at all times. With that I could boot into a familiar environment and start debugging.

Since this is some sort of hard drive corruption, let's look at the devices at hand with fdisk -l.

Disk /dev/nvme1n1: 931,51 GiB, 1000204886016 bytes, 1953525168 sectors
Disk model: Samsung SSD 980 PRO 1TB
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disklabel type: gpt
Disk identifier: 1FFA4540-45EE-11EB-9D57-50EB7110FAF3

Device           Start        End    Sectors   Size Type
/dev/nvme1n1p1    2048    1085447    1083400   529M Windows recovery environment
/dev/nvme1n1p2 1085448    1290255     204808   100M EFI System
/dev/nvme1n1p3 1290256    1323031      32776    16M Microsoft reserved
/dev/nvme1n1p4 1323032 1953525063 1952202032 930,9G Microsoft basic data


Disk /dev/nvme0n1: 465,76 GiB, 500107862016 bytes, 976773168 sectors
Disk model: Samsung SSD 960 EVO 500GB
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disklabel type: gpt
Disk identifier: 395FBB02-993A-3D45-8F79-B3BB10DD0F40

Device         Start       End   Sectors   Size Type
/dev/nvme0n1p1  2048 976773134 976771087 465,8G Linux filesystem

When it comes to actually booting Linux, it's the EFI partition on my Windows drive that matters -- /dev/nvme1n1p2, the EFI System. Back when I originally set this system up, I decided to reuse the EFI partition created by Windows 10 installation media for dual booting -- it's a bit of a hack, but apart from the ridiculously small size it has worked well... so far. Cue ominous music.

Without much further thought I ran fsck /dev/nvme1n1p2 to see if there were problems. And sure enough, the vmlinuz-linux image is somehow messed up.

File size is x bytes, cluster chain length is y bytes.
Truncating file to y bytes.

Letting fsck.fat fix the issue is where we start, but why did the file become corrupt in the first place? I did udpate my packages yesterday, but the pacman run (logged into /var/log/pacman.log) was successful:

==> Building image from preset: /etc/mkinitcpio.d/linux.preset: 'default'
==> Using configuration file: '/etc/mkinitcpio.conf'
  -> -k /boot/vmlinuz-linux -c /etc/mkinitcpio.conf -g /boot/initramfs-linux.img
==> Starting build: '6.14.4-arch1-1'
  -> Running build hook: [base]
  -> Running build hook: [udev]
  -> Running build hook: [autodetect]
  -> Running build hook: [modconf]
  -> Running build hook: [kms]
  -> Running build hook: [keyboard]
  -> Running build hook: [keymap]
  -> Running build hook: [consolefont]
  -> Running build hook: [block]
  -> Running build hook: [filesystems]
  -> Running build hook: [fsck]
==> Generating module dependencies
==> Creating xz-compressed initcpio image: '/boot/initramfs-linux.img'
  -> Early uncompressed CPIO image generation successful
==> Initcpio image generation successful

The partition is 75MB in size with 100MB capacity. vmlinuz-linux weighs only 15MB, but its neighbor initramfs-linux.img is 46MB despite heavy compression. Maybe writing both of these at once when the partition is at near capacity is somehow flaky? I've updated these files literally hundreds of times without any issue previously, so I'm not at all sure I'm even on the right track.

Oh well, I have work to do, so let's mount the main Linux drive and just regenerate the image:

mount /dev/nvme0n1p1 /mnt
arch-chroot /mnt
mount /dev/nvme1n1p2 /boot
mkinitcpio -P

After rebooting, all is well again.

Despite the happy ending, I have a sneaking suspicion that I'm myself to blame for this problem due to reusing the Windows EFI partition for /boot. I'm now one step closer to buying a bigger NVMe drive and recreating the boot setup from scratch with a separate /boot partition using ext4 and mounting the FAT32 EFI partition at /boot/efi.

It also looks like Microsoft realized that 100MB is not large enough for the EFI partition: recent installation media seem to have bumped that up to 260MB or even 400MB with Windows 11. I'm avoiding installing Windows 11 until I absolutely have to.

Imagine if I hadn't been banging my head against these kinds of problems in the past and didn't know my way around the tooling I used here. I'd be stuck with an unbootable Linux installation. I wrote this post as a way to give back to the community: I hope someone in a similar predicament in the future will find this useful.

Since I didn't yet discover the root cause, that someone is most likely future me.