Menu

Fixed Intel VMD RAID Support in Clonezilla Live

Entri3
6 days ago
6 days ago
  • Entri3

    Entri3 - 6 days ago

    Clonezilla Base: 20260301-questing-amd64 | Kernel: 6.17.0-14-generic

    Modern Dell, HP, and Lenovo business desktops and workstations ship with Intel RST (Rapid Storage Technology) RAID enabled via Intel VMD (Volume Management Device) controllers. Clonezilla Live currently cannot detect NVMe drives behind VMD controllers due to six issues in the boot chain. We have developed and tested a complete workaround — this document provides the full technical details, the working scripts, and our recommended path for upstream integration.


    2. Problem Description

    2.1 Hardware Context

    Intel VMD is a PCI controller (device class 0104) that acts as a bridge, placing NVMe SSDs into a separate PCI domain. Dell systems commonly use device ID [8086:ad0b]. When Intel RST RAID is enabled in BIOS, the NVMe drives are only accessible through the VMD controller — they are invisible to standard NVMe enumeration.

    2.2 Complete Failure Chain

    # Issue Impact Our Workaround
    1 VMD kernel module rejects bus offset value 3 VMD probe fails with "Unknown Bus Offset Setting" on newer Dell hardware. The vmd.ko module only supports bus offset values 1 and 2, but Dell systems with device [8086:ad0b] report bus offset 3 in PCI config register 0x45 bits [1:0]. Binary patch vmd.ko to force value 2
    2 Module signature enforcement blocks patched modules Even with lockdown=none and module.sig_enforce=0, a module with an invalid signature is rejected. The signature must be stripped entirely for the module to load as "unsigned" (kernel taint only). Strip PKCS#7 signature from patched vmd.ko
    3 AHCI driver competes for VMD PCI device The VMD PCI device [8086:ad0b] matches both vmd and ahci kernel module aliases. If AHCI loads and binds to the device, VMD loses control, the PCI domain is torn down, and all NVMe devices underneath disappear. Blacklist AHCI via kernel boot parameter
    4 NVMe I/O timeouts under VMD MSI-X interrupt routing fails when IOMMU is in default mode, causing NVMe command timeouts. iommu=pt (passthrough mode)
    5 VMD unbinds during live boot transition VMD binds during initramfs, but something in the live boot process causes it to unbind. The NVMe devices disappear after the transition to the live filesystem. rmmod vmd; modprobe vmd cycle via ocs_prerun to re-establish binding
    6 IMSM RAID assembly + partition detection required After VMD loads and NVMe appears, the IMSM container (md127) and RAID volume (md126) must be assembled via mdadm --incremental, then partprobe must run on md126 to create the partition device nodes (md126p1-p5). fix-vmd script handles assembly, settling, and partprobe

    Additionally, Clonezilla's is_partition() function in ocs-functions misclassifies md126 as a partition rather than a disk, hiding it from the disk selection UI. This requires a code change (see Section 5).

    2.3 Affected Hardware

    Confirmed affected: Dell OptiPlex/Precision with Intel VMD [8086:ad0b] (Arrow Lake / Meteor Lake era). Likely affects all systems with VMD bus offset value 3, which is a newer configuration not yet supported in upstream kernels as of 6.17.


    3. Our Working Solution (Complete Details)

    We have a fully working end-to-end solution deployed in the field. Below is everything needed to reproduce it.

    3.1 Kernel Boot Parameters

    These must be added to GRUB entries when VMD RAID support is needed:

    lockdown=none module.sig_enforce=0 modprobe.blacklist=ahci iommu=pt
    
    Parameter Purpose
    lockdown=none Disable kernel lockdown to allow unsigned modules
    module.sig_enforce=0 Don't enforce module signatures (allow unsigned with taint)
    modprobe.blacklist=ahci Prevent AHCI from stealing the VMD PCI device
    iommu=pt IOMMU passthrough — fixes MSI-X interrupt routing for VMD

    3.2 VMD Kernel Module Patch (Binary)

    The vmd.ko module needs a 3-byte binary patch in both /live/filesystem.squashfs and /live/initrd.img. The patch is in the vmd_probe() function where the bus offset is read from PCI config register 0x44-0x45:

    Location: vmd_probe()  where bus offset value is read from PCI config register 0x45 bits [1:0]
    
    Original code (3 bytes):
      83 e0 03          and  $0x3,%eax        ; mask bus offset bits [1:0]
      (followed by)
      66 83 f8 01       cmp  $0x1,%ax         ; value 1 -> bus_start=0x80
      0f 84 xx xx       je   handler_1
      66 83 f8 02       cmp  $0x2,%ax         ; value 2 -> bus_start=0xE0
      0f 85 xx xx       jne  error_handler    ; value 0,3 -> "Unknown Bus Offset Setting"
    
    Patched code (replace first 3 bytes only):
      6a 02 58          push $2; pop %rax     ; force eax=2 regardless of hardware value
      (rest unchanged)
      66 83 f8 01       cmp  $0x1,%ax         ; 2!=1, not taken
      ...
      66 83 f8 02       cmp  $0x2,%ax         ; 2==2, falls through to bus_start=0xE0
    
    Search pattern (hex): 83e0036683f801
    Replace first 3 bytes: 6a0258
    Result pattern (hex):  6a02586683f801
    

    This forces bus offset 2 (bus_start=0xE0), which is correct for Dell hardware where the NVMe lives at PCI bus 0xE2.

    Why this is safe: Bus offset value 3 is not handled by the kernel at all (it falls through to an error). We map it to value 2 (bus_start=0xE0), which matches the observed PCI topology on all Dell VMD systems we've tested. The proper upstream fix would be to add a case 3: in the switch statement in drivers/pci/controller/vmd.c.

    3.3 Module Signature Stripping

    After binary patching, the module's PKCS#7 signature must be completely removed (not just invalidated). The kernel rejects modules with invalid signatures even when module.sig_enforce=0, but it accepts modules with no signature (with a taint warning).

    Module signatures use the magic string ~Module signature appended~\n at the end of the .ko file, preceded by a 12-byte structure containing the signature length. Stripping logic:

    #!/usr/bin/env python3
    import sys, struct
    MAGIC = b"~Module signature appended~\n"
    with open(sys.argv[1], "r+b") as f:
        data = f.read()
        idx = data.rfind(MAGIC)
        if idx == -1:
            print("No signature found (already unsigned)")
            sys.exit(0)
        struct_off = idx - 12
        sig_len = struct.unpack(">I", data[struct_off+8:struct_off+12])[0]
        strip_total = sig_len + 12 + len(MAGIC)
        new_size = len(data) - strip_total
        f.seek(0)
        f.write(data[:new_size])
        f.truncate(new_size)
    

    3.4 VMD Rebind + RAID Assembly Script (fix-vmd)

    This is the script that runs via ocs_prerun at boot. It handles VMD rebinding (issue #5) and IMSM RAID assembly (issue #6). This is the exact script we deploy to /usr/local/bin/fix-vmd inside the squashfs:

    #!/bin/sh
    echo "=== Intel VMD + IMSM RAID Fix ==="
    
    # Step 1: Reload VMD to force fresh PCI domain binding
    # During live boot transition, VMD unbinds. This rebind cycle
    # forces the patched vmd.ko to re-probe and create PCI domain 10000.
    echo "Reloading VMD module..."
    rmmod vmd 2>/dev/null
    rmmod nvme 2>/dev/null
    sleep 1
    modprobe vmd 2>&1
    sleep 3
    
    # Step 2: Wait for NVMe devices to appear under the VMD domain
    # After VMD creates domain 10000, the NVMe controller at 10000:e2:00.0
    # needs time to enumerate.
    echo "Waiting for NVMe..."
    for i in 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15; do
      if ls /dev/nvme[0-9]*n[0-9]* >/dev/null 2>&1; then
        echo "  NVMe found"
        break
      fi
      sleep 2
    done
    
    # Let udev process all the new device events
    udevadm trigger 2>/dev/null
    udevadm settle --timeout=30 2>/dev/null
    
    # Step 3: Assemble IMSM RAID
    # The NVMe drives have Intel IMSM metadata at the end of the disk.
    # mdadm --incremental reads this metadata and assembles:
    #   md127 = IMSM container (metadata holder, 0 bytes)
    #   md126 = RAID1 volume (the actual usable disk, 476.9 GB)
    modprobe md_mod 2>/dev/null
    modprobe raid1 2>/dev/null
    modprobe raid0 2>/dev/null
    
    echo "Assembling IMSM RAID..."
    for dev in /dev/nvme[0-9]*n[0-9]*; do
      [ -b "$dev" ] && mdadm --incremental "$dev" 2>/dev/null
    done
    sleep 2
    
    # Kick any containers that didn't auto-start their member arrays
    for md in /dev/md[0-9]* /dev/md/*; do
      if [ -b "$md" ]; then
        SIZE=$(blockdev --getsize64 "$md" 2>/dev/null || echo 0)
        if [ "$SIZE" = "0" ]; then
          mdadm --incremental --run "$md" 2>/dev/null
        fi
      fi
    done
    sleep 2
    
    # Final udev settle to create all device nodes
    udevadm trigger 2>/dev/null
    udevadm settle --timeout=15 2>/dev/null
    
    echo "=== Result ==="
    lsblk -o NAME,SIZE,TYPE,FSTYPE,LABEL 2>/dev/null
    echo ""
    cat /proc/mdstat 2>/dev/null
    echo "=== Done ==="
    

    3.5 GRUB Menu Entry

    The complete GRUB entry that ties it all together. The key addition is ocs_prerun="/usr/local/bin/fix-vmd" which ensures VMD rebinding and RAID assembly happen before Clonezilla starts scanning disks:

    menuentry "Clonezilla + Intel VMD RAID" --id fix-vmd-clone {
      search --set -f /live/vmlinuz
      $linux_cmd /live/vmlinuz boot=live union=overlay username=user config \
        quiet loglevel=3 noswap edd=on enforcing=0 noeject locales= \
        keyboard-layouts= \
        ocs_prerun="/usr/local/bin/fix-vmd" \
        ocs_live_run="ocs-live-general" \
        ocs_live_extra_param="" ocs_live_batch="no" vga=788 net.ifnames=0 \
        splash i915.blacklist=yes radeonhd.blacklist=yes \
        nouveau.blacklist=yes vmwgfx.enable_fbdev=1 \
        lockdown=none module.sig_enforce=0 modprobe.blacklist=ahci iommu=pt
      $initrd_cmd /live/initrd.img
    }
    

    4. The ocs-functions Patch (is_partition Fix)

    This is the most critical code change needed in Clonezilla itself. Without it, md126 is hidden from Clonezilla's disk selection UI even after successful assembly.

    4.1 The Problem

    In /usr/share/drbl/sbin/ocs-functions, the is_partition() function classifies devices as either "disk" or "partition". For md devices, it uses heuristics based on /proc/partitions — if the md device has child partitions (md126p1, md126p2, etc.), it's treated as a disk. Otherwise it's treated as a partition and hidden from the disk selection UI.

    This breaks in three scenarios with VMD RAID:

    Scenario Current Behavior Expected Behavior
    md126 with partitions (md126p1-p5) during save Shown as disk (correct) Shown as disk
    md126 empty — fresh RAID rebuild, no partitions yet (target for restore) Hidden (no child partitions found, classified as "partition") Shown as disk
    md126 during restore — image was originally saved from nvme0n1 Hidden (image metadata has no md126 entries to reference) Shown as disk
    md127 (IMSM container, 0 bytes) Hidden (correct) Hidden

    4.2 Proposed Fix

    In the is_partition() function, for md devices, add a size-based heuristic: any md device larger than 1 GB should be treated as a "disk" even if it has no child partitions. The IMSM container (md127) is always 0 bytes, so this cleanly separates real RAID volumes from metadata containers.

    Additionally, is_partition() should check the live system's /proc/partitions first (not just the image metadata), since during restore the target md126 exists on the running system but not in the image.

    # Pseudocode for the md device heuristic in is_partition():
    if device matches md[0-9]*:
        # Check live system's /proc/partitions for child partitions
        if grep finds ${device}p[0-9] in /proc/partitions:
            return "disk"  # has partitions = definitely a disk
        # No child partitions found — check size
        size = blockdev --getsize64 /dev/${device}
        if size > 1073741824:  # > 1 GB
            return "disk"  # large md device = RAID volume, not a partition
        else:
            return "partition"  # small/zero = IMSM container or actual partition
    

    5.1 Short-term: GRUB Menu Option + ocs_prerun Hook

    Add a "Clonezilla live (Intel VMD RAID)" GRUB menu entry with the required kernel parameters and an ocs_prerun script. This is non-invasive, opt-in, and immediately solves the problem for field deployments. The fix-vmd script (Section 3.4) and kernel parameters (Section 3.1) are all that's needed.

    5.2 Medium-term: Auto-detection + ocs-functions Fix

    • Detect VMD hardware at boot (lspci -d 8086: -d ::0104 for Intel VMD class devices)
    • If VMD detected, automatically blacklist AHCI for VMD devices via driver_override
    • Run VMD rebind + RAID assembly if NVMe devices are missing
    • Fix is_partition() (Section 4) so md126 always appears in disk selection

    5.3 Long-term: Upstream Kernel Fix

    Submit a patch to the Linux kernel VMD driver adding bus offset value 3 support. The fix is a one-line addition in drivers/pci/controller/vmd.c — add case 3: mapping to bus_start = 0xE0 in the switch statement in vmd_probe(). This would eliminate the need for binary patching entirely.


    6. Testing Results

    Test Result
    VMD module loads (unsigned, signature stripped) Pass — kernel taint warning only
    NVMe detection under VMD domain 10000 Pass — nvme0n1 with 5 partitions detected
    IMSM container assembly (md127) Pass — container with 1/2 members (degraded mirror)
    RAID1 volume assembly (md126) Pass — 476.9 GiB, 5 partitions (ESP, MSR, OS, Recovery, Dell)
    Partition detection (partprobe md126) Pass — md126p1-p5 created
    Clonezilla disk visibility (with is_partition fix) Pass — md126 appears in disk selection
    savedisk md126 Pass — full image captured
    restoredisk to md126 Pass — image restored, Windows boots normally
    Intel RST RAID rebuild after restore Pass — mirror auto-resyncs in background

    6.1 Hardware Tested

    • System: Dell OptiPlex/Precision (2025/2026 model)
    • VMD Controller: Intel [8086:ad0b] — Volume Management Device NVMe RAID Controller
    • NVMe: [1c5c:1f69] — 476.9 GiB SSD (x2 in RAID1)
    • RAID Config: Intel RST IMSM RAID1 (mirror), 2 disks, 1 volume
    • Clonezilla: 20260301-questing-amd64 (Kernel 6.17.0-14-generic)

    RAID appears degraded (1/2 mirrors): This is expected and normal. VMD only exposes one NVMe drive to Linux at a time. Since this is RAID1 (mirror), one copy contains all the data. Save and restore operations work correctly on the degraded array. After restoring and booting Windows, Intel RST automatically rebuilds the mirror to the second drive.


    7. Architecture Diagram

    +------------------------------------------------------+
    |              Dell System with Intel RST               |
    |                                                       |
    |  +----------+    +--------------------------------+   |
    |  |   CPU    |--->|  Intel VMD [8086:ad0b]         |   |
    |  +----------+    |  PCI 00:0e.0                   |   |
    |                  |  Creates PCI Domain 10000       |   |
    |                  |                                 |   |
    |                  |  +---------------------------+  |   |
    |                  |  | NVMe SSD #1 (visible)     |  |   |
    |                  |  | 10000:e2:00.0             |  |   |
    |                  |  +---------------------------+  |   |
    |                  |                                 |   |
    |                  |  (NVMe SSD #2  accessed by     |   |
    |                  |   BIOS/RST only, not by Linux)  |   |
    |                  +---------------------------------+   |
    |                                                       |
    |  What Clonezilla sees after fix-vmd runs:             |
    |                                                       |
    |  md127        0B     raid1  IMSM container (ignore)   |
    |  md126      476.9G   raid1  RAID volume (USE THIS)    |
    |  +-md126p1   484M    vfat   ESP                       |
    |  +-md126p2   128M           Microsoft Reserved        |
    |  +-md126p3   474G    ntfs   OS (Windows)              |
    |  +-md126p4   990M    ntfs   Recovery                  |
    |  +-md126p5   1.4G    ntfs   Dell Support              |
    +-------------------------------------------------------+
    

    CRITICAL: Always save/restore md126, NEVER nvme0n1 directly.
    Restoring to the raw NVMe drive (nvme0n1) destroys the IMSM RAID metadata at the end of the disk, breaking the RAID array. Intel RST will then see the drives as separated, requiring a RAID rebuild in BIOS before Clonezilla can see md126 again.


    8. Files Modified on USB (Our Current Workaround)

    File Change
    /boot/grub/grub.cfg Added GRUB menu entry with VMD kernel parameters + ocs_prerun="/usr/local/bin/fix-vmd"
    /live/filesystem.squashfsvmd.ko.zst Binary patch (3 bytes) + signature strip
    /live/filesystem.squashfs/usr/local/bin/fix-vmd VMD rebind + IMSM assembly script (Section 3.4)
    /live/filesystem.squashfs/usr/local/bin/diag-vmd Diagnostic script (saves logs to USB)
    /live/filesystem.squashfsocs-functions is_partition() fix for md126 visibility (Section 4)
    /live/initrd.imgvmd.ko.zst Binary patch (3 bytes) + signature strip

    9. Automated Patch Script

    We have developed a complete automated patch script (fix-clonezilla-vmd.sh) that applies all of the above modifications to a standard Clonezilla USB. It handles squashfs extraction/repacking, initrd patching, vmd.ko binary patching, signature stripping, script injection, and GRUB configuration. The script runs on Linux or WSL and requires approximately 20 minutes.

    We are happy to provide the full script, additional test logs, or collaborate on implementing this feature natively in Clonezilla. The script and all diagnostic output referenced in this document are available upon request.


     

    Last edit: Entri3 5 days ago
  • Entri3

    Entri3 - 6 days ago

    While this allowed to image to the raid it does show as md126 in clonezilla but windows and was able to rebuild the raid successfully

     

    Last edit: Entri3 5 days ago

Log in to post a comment.

MongoDB Logo MongoDB