Building on the foundation laid in part 1, we will now complete the remaining requirements by adding Direct Memory Access (DMA) operations and interrupt support.

QEMU PCIe Device

Register Layout

To add DMA support, we need to introduce a DMA descriptor that specifies the details of the DMA operation for the device. Below is the structure of the DMA descriptor used in the device:

Offset (Hex)Field DescriptionBitsTypeDefault Value
00Source Address[63:32]RW0x0
04Source Address[31:0]RW0x0
08Destination Address[63:32]RW0x0
0CDestination Address[31:0]RW0x0
10Transfer Size (Bytes)[31:0]RW0x0

QEMU File Changes

With the DMA descriptor layout defined, initiating a DMA transaction involves two steps:

  1. User updates the corresponding descriptor registers with the source, destination, and transfer size.
  2. The start bit in the Control register is set to trigger the operation

The mmio_write callback function needs to be updated to monitor writes to the Control register, specifically for changes to the start bit. If the start bit is 1 and the reset bit is 0, the device will retrieve the DMA descriptor information and perform the DMA operation using QEMU’s pci_dma_write / pci_dma_read methods. Once the operation is done, the device will trigger an interrupt by calling the msix_notify function.

In the next section, we will develop a kernel module and a user-space application to test the DMA implementation.

Testing the Device

To simplify the interaction with the PCIe device, we will develop a kernel module to abstract the details required to interact with the device.

NOTE

You could use the combination of UIO and u-dma-buf kernel modules to avoid writing your own kernel module.

Kernel Module

See the source code for implementation details.

The kernel module will handle the following:

  1. Device initialization and clean up through probe and remove callbacks.
  2. Create a character device interface for user-space application interaction.
  3. Allocate a shared DMA buffer, used by user-space application.
  4. Handle interrupts.

Refer to the Linux Kernel Module Programming Guide for more details on how to create a kernel module.

Device Initialization

The MODULE_DEVICE_TABLE macro specifies the vendor ID and device ID for the devices that the kernel module supports.

The selection is defined as:

#define PCIE_TEST_DEVICE_VID 0x1234
#define PCIE_TEST_DEVICE_DID 0xABBA
 
static struct pci_device_id pcie_id_table[] = {
    { PCI_DEVICE(PCIE_TEST_DEVICE_VID, PCIE_TEST_DEVICE_DID) },
    {},
};
MODULE_DEVICE_TABLE(pci, pcie_id_table);
 
static struct pci_driver pcie_module_driver = {
    .name = PCIE_TEST_KERNEL_DRIVER_NAME,
    .id_table = pcie_id_table,
    .probe = pcie_module_probe,
    .remove = pcie_module_remove,
};

When the kernel module is loaded, it registers the pci_driver structure. As part of this registration, the kernel invokes the driver’s probe method if a matching PCIe device is present. Following the recommendations in How To Write Linux PCI Drivers, the probe implementation will perform the following:

  1. Enable the PCIe Device
  2. Request MMIO/IOP resources
  3. Allocate shared DMA buffer
  4. Register interrupt handler
  5. Enable Bus Mastering

In addition to this, a character device is created so that the user-space application can interface with the device.

Interrupt Handling

During device initialization, an interrupt handler is registered to respond to interrupts generated by the PCIe device. When a DMA transfer completes, the device generates an interrupt, and the following handler is invoked:

static irqreturn_t intHandlerHard(int irq, void *pdev)
{
    pcie_device_t *pcie_device = (pcie_device_t *)pdev;
    const uint32_t value = readl(pcie_device->bar0_mmio + PCIE_TEST_DEVICE_MMIO_INT_STATUS_OFFSET);
    if (value) {
        pcie_device->irq_count++;
        writel(value, pcie_device->bar0_mmio + PCIE_TEST_DEVICE_MMIO_INT_STATUS_OFFSET);
 
        atomic_set(&irq_event, 1);
        wake_up_interruptible(&wait_queue);
    }
    return IRQ_HANDLED;
}
 

This handler does the following:

  1. Read the interrupt status MMIO_INT_STATUS_OFFSET.
  2. Increment a IRQ counter if the interrupt is valid.
  3. Acknowledge the interrupt by writing to MMIO_INT_STATUS_OFFSET.
  4. Notify that the interrupt has been serviced.

Character Device

A character device will be created under /dev with the name pcietest0. The numbering 0 is dynamically assigned, since there is only one device, it is typically assigned 0.

This character device implements the following operations:

static const struct file_operations g_device_file_ops = {
    .owner = THIS_MODULE,
    .open = pcie_open,
    .release = pcie_release,
    .mmap = pcie_mmap,
    .unlocked_ioctl = pcie_ioctl,
    .read = pcie_read,
    .poll = pcie_poll,
};

The key methods used by the user-space application perform the following:

MethodDescription
mmapProvides an interface to the kernel allocated DMA buffer.
unlocked_ioctlInterpret commands and performs operations such as querying the device version, status, starting the DMA transfer.
readBlocks until interrupt is received and returns the interrupt count.

Supported IOCTL Commands

To better understand the unlocked_ioctl implementation mentioned above, the kernel module implements the following IOCTL commands that the user-space application can use to interact with the device:

#define PCIE_TEST_IOCTL_DEVICE_VERSION _IOR(..., 20, uint32_t)
#define PCIE_TEST_IOCTL_GET_STATUS     _IOR(..., 21, uint32_t)
#define PCIE_TEST_IOCTL_GET_INT_STATUS _IOR(..., 22, uint32_t)
#define PCIE_TEST_IOCTL_SET_INT_STATUS _IOW(..., 23, uint32_t)
#define PCIE_TEST_IOCTL_SET_INT_MASK   _IOW(..., 24, uint32_t)
#define PCIE_TEST_IOCTL_TEST_INT       _IOW(..., 25, uint32_t)
#define PCIE_TEST_IOCTL_START_TRANSFER _IOW(..., 29, dma_ctrl_t)

Each of these system calls will perform a specific operation on the PCIe device by manipulating the MMIO registers. This hides the need for the user-space application to know the exact details of the device.

Since the main focus is on DMA transfers, let’s focus on the PCIE_TEST_IOCTL_START_TRANSFER command.

To tell the kernel module about the DMA transfer, the user-space application will pass a dma_ctrl_t structure:

typedef struct dma_ctrl {
    uint32_t op_code;    // 0: Host -> Device, 1: Device -> Host
    uint32_t bytes;      // Number of bytes to transfer
    uint64_t src;        // Starting source address
    uint64_t dst;        // Starting destination address
} dma_ctrl_t;

This structure specifies the operation type (Host → Device or Device → Host), number of bytes to transfer, and the source/destination addresses.

The user-space application uses the ioctl interface to pass the PCIE_TEST_IOCTL_START_TRANSFER command and the dma_ctrl_t structure.

ioctl(fd, PCIE_TEST_IOCTL_START_TRANSFER, dma_ctrl)

When the kernel module receives this command, it parses the user-provided structure and writes to DMA descriptor device registers:

writel(((uint64_t)final_src_addr >> 32) & U32_MAX, dev->bar0_mmio + PCIE_TEST_DEVICE_DESC_SRC_ADDR_HI);
writel(((uint64_t)final_src_addr) & U32_MAX, dev->bar0_mmio + PCIE_TEST_DEVICE_DESC_SRC_ADDR_LOW);
writel(((uint64_t)final_dst_addr >> 32) & U32_MAX, dev->bar0_mmio + PCIE_TEST_DEVICE_DESC_DST_ADDR_HI);
writel(((uint64_t)final_dst_addr) & U32_MAX, dev->bar0_mmio + PCIE_TEST_DEVICE_DESC_DST_ADDR_LOW);
writel(value.bytes, dev->bar0_mmio + PCIE_TEST_DEVICE_DESC_TX_SIZE);

Then it will write to the Control register to start the DMA transfer:

DeviceCtrl_t ctrl = { 0 };
ctrl.bits.start = 1;
ctrl.bits.type = value.op_code;
writel(ctrl.all, dev->bar0_mmio + PCIE_TEST_DEVICE_MMIO_CTRL_OFFSET);

Loading the Kernel Module

With the kernel module complete, it is time to compile it and load it into Linux.

To load the kernel module using insmod:

root@debian:/home/andre# insmod pcie-test-module.ko
[  100.863746] pcie_test_module: loading out-of-tree module taints kernel.
[  100.867753] pcie_test_module: module verification failed: signature and/or required key missing - tainting kernel
[  100.881355] ACPI: \_SB_.GSIE: Enabled at IRQ 20

NOTE

You may encounter the following warning: module verification failed: signature and/or required key missing - tainting kernel. This is fine for our testing purposes. To resolve this warning, you can follow the guide on how to sign the kernel module.

The next sections will show how to check that the kernel module is enabling the device correctly and how to do some basic DMA transfers.

Linux lspci

Similar to part 1, running the lspci utility should show details about the PCIe device.

root@debian:/home/andre# lspci -s 4 -vvv
00:04.0 RAM memory: Device 1234:abba (rev 01)
	Subsystem: Red Hat, Inc. Device 1100
	Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR+ FastB2B- DisINTx+
	Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
	Latency: 0
	Interrupt: pin A routed to IRQ 20
	Region 0: Memory at febe6000 (32-bit, non-prefetchable) [size=4K]
	Region 1: Memory at febd0000 (32-bit, non-prefetchable) [size=64K]
	Region 3: Memory at febe7000 (32-bit, non-prefetchable) [size=4K]
	Capabilities: [4c] Express (v2) Root Complex Integrated Endpoint, MSI 00
		DevCap:	MaxPayload 128 bytes, PhantFunc 0
			ExtTag+ RBE+ FLReset-
		DevCtl:	CorrErr- NonFatalErr- FatalErr- UnsupReq-
			RlxdOrd- ExtTag- PhantFunc- AuxPwr- NoSnoop-
			MaxPayload 128 bytes, MaxReadReq 128 bytes
		DevSta:	CorrErr- NonFatalErr- FatalErr- UnsupReq- AuxPwr- TransPend-
		DevCap2: Completion Timeout: Not Supported, TimeoutDis- NROPrPrP- LTR-
			 10BitTagComp- 10BitTagReq- OBFF Not Supported, ExtFmt+ EETLPPrefix+, MaxEETLPPrefixes 4
			 EmergencyPowerReduction Not Supported, EmergencyPowerReductionInit-
			 FRS-
			 AtomicOpsCap: 32bit- 64bit- 128bitCAS-
		DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis- LTR- 10BitTagReq- OBFF Disabled,
			 AtomicOpsCtl: ReqEn-
	Capabilities: [40] MSI-X: Enable+ Count=1 Masked-
		Vector table: BAR=3 offset=00000000
		PBA: BAR=3 offset=00000800
	Kernel driver in use: pcie-test-device

Comparing this to the output from part 1, there are the following changes:

  1. BusMaster+ means bus master is now enabled, so the device can transfer data to and from memory without CPU intervention.
  2. Capabilities: [40] MSI-X: Enable+ means MSI-X is enabled.
  3. Kernel driver in use: pcie-test-device shows the device is managed by the kernel module.

User-space Application Test

To validate the kernel module and PCIe device functionality, we will write a user-space application that interacts with the device and gradually build it up to performing a DMA transfer. Refer to the source code for additional details.

Since we will be testing interrupt handling, we should check the number of interrupts that have been handled already. We can see this by running the following command:

root@debian:/home/andre# cat /proc/interrupts
           CPU0
....
 32:          0   PCI-MSI 65536-edge      PCIe test device interrupt

We can compare this to when the test completes.

To start the test, the application opens the character device /dev/pcietest0. From here, ioctl is used to query the device’s version using the PCIE_TEST_IOCTL_DEVICE_VERSION command. If this returns the correct version number, we assume that the communication between the application and the kernel module is working.

Next, the interrupt handling test is performed. By using the device’s Interrupt Trigger register, to make the device generate an interrupt. Once it is successful, the DMA transfer test can begin.

The test will focus on a few regions of the device’s memory. mmap is used to access the buffer allocated by the kernel module, which we populate with known data. This data is then transferred to the device using the PCIE_TEST_IOCTL_START_TRANSFER command with op_code set to 0.

After transferring data to the device, we transfer it back to the buffer to different locations. This will use the PCIE_TEST_IOCTL_START_TRANSFER command with op_code set to 1, varying the src and dst fields in the dma_ctrl_t structure. Since the initial data is known, the application will verify that the correct data was transferred back.

These tests are relatively simple and focus on the happy path, but they are sufficient to demonstrate that an application can communicate with the PCIe device.

Running the dma-check test on the device should show the following.

root@debian:/home/andre# ./dma-check pcietest0
Running Kernel module tests
--- Checking Version Register ---
--- Testing Force Trigger Interrupt ---
Checking Interrupt Status First- Force Interrupt
--- Testing Device Memory ---
Writing 32 bytes (decrementing pattern) to DMA buffer @ 0x0 from userspace...
Writing 32 bytes (incrementing pattern) to DMA buffer @ 0x120 from userspace...
Transfer contents from DMA buffer to device (32 bytes @ 0x0 to 0x0)
Transfer contents from DMA buffer to device (32 bytes @ 0x120 to 0x120)
Transfer contents from device to DMA buffer (32 bytes @ 0x0 to 0x120)
Transfer contents from device to DMA buffer (32 bytes @ 0x120 to 0x0)
Transfer contents device to DMA buffer (16 bytes @ 0x0 to 0xfff0)
Checking buffer content
Kernel module tests passed ✓!

After the test successfully runs, we can double check the results to see how many interrupts were handled.

root@debian:/home/andre# cat /proc/interrupts
           CPU0
....
 32:          6   PCI-MSI 65536-edge      PCIe test device interrupt

This aligns with our expectation:

  • Generating an interrupt using the Interrupt Trigger register once.
  • Copying data from the buffer to the device twice.
  • Copying data from the device to the buffer three times.

Summary

This concludes the guide on creating a device using QEMU and interacting with the device in Linux. All the code for the kernel module, QEMU device, and test application can be found here .

Resources