How to allocate large contiguous, memory regions in Linux
Asked Answered
L

1

9

Yes, I will ultimately be using this for DMA but lets leave coherency aside for the moment. I have 64 bit BAR registers, therefore, AFAIK, all of RAM (e.g. higher than 4G) is available for DMA.

I am looking for about 64MB of contiguous RAM. Yes, that's a lot.

Ubuntu 16 and 18 have CONFIG_CMA=y but CONFIG_DMA_CMA is not set at kernel compile time.

I note that if both were set (at Kernel build time) I could simply call dma_alloc_coherent, however, for logistical reasons, it is undesirable to recompile the kernel.

The machines will always have at least 32GB of RAM, do not run anything RAM intensive, and the kernel module will load shortly after boot before RAM becomes significantly fragmented and, AFAIK, nothing else is using the CMA.

I have set the kernel parameter CMA=1G. (and have tried 256M and 512M)

# dmesg | grep cma
[    0.000000] Command line: BOOT_IMAGE=/boot/vmlinuz-4.4.170 root=UUID=2b25933c-e10c-4833-b5b2-92e9d3a33fec ro cma=1G
[    0.000000] Kernel command line: BOOT_IMAGE=/boot/vmlinuz-4.4.170 root=UUID=2b25933c-e10c-4833-b5b2-92e9d3a33fec ro cma=1G
[    0.000000] Memory: 65612056K/67073924K available (8604K kernel code, 1332K rwdata, 3972K rodata, 1484K init, 1316K bss, 1461868K reserved, 0K cma-reserved)

I have tried alloc_pages(GFP_KERNEL | __GFP_HIGHMEM, order), no joy.

And finally the actual question: How does one get large contiguous blocks from the CMA? Everything I have found online suggests the use of dma_alloc_coherent but I know this only works with CONFIG_CMA=y and CONFIG_DMA_CMA=yes.

The module source, tim.c

#include <linux/module.h>       /* Needed by all modules */
#include <linux/kernel.h>       /* Needed for KERN_INFO */
#include <linux/init.h>
#include <linux/mm.h>
#include <linux/gfp.h>
unsigned long big;
const int order = 15;
static int __init tim_init(void)
{
    printk(KERN_INFO "Hello Tim!\n");
    big = __get_free_pages(GFP_KERNEL | __GFP_HIGHMEM, order);
    printk(KERN_NOTICE "big = %lx\n", big);
    if (!big)
        return -EIO; // AT&T

    return 0; // success
}

static void __exit tim_exit(void)
{
    free_pages(big, order);
    printk(KERN_INFO "Tim says, Goodbye world\n");
}

module_init(tim_init);
module_exit(tim_exit);
MODULE_LICENSE("GPL");

Inserting the module yields...

# insmod tim.ko
insmod: ERROR: could not insert module tim.ko: Input/output error
# dmesg | tail -n 33

[  176.137053] Hello Tim!
[  176.137056] ------------[ cut here ]------------
[  176.137062] WARNING: CPU: 4 PID: 2829 at mm/page_alloc.c:3198 __alloc_pages_nodemask+0xd14/0xe00()
[  176.137063] Modules linked in: tim(OE+) xt_CHECKSUM iptable_mangle ipt_MASQUERADE nf_nat_masquerade_ipv4 iptable_nat nf_nat_ipv4 nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack ipt_REJECT nf_reject_ipv4 xt_tcpudp bridge stp llc ebtable_filter ebtables ip6table_filter ip6_tables iptable_filter ip_tables x_tables configfs vxlan ip6_udp_tunnel udp_tunnel uio pf_ring(OE) x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm mei_me mei irqbypass sb_edac ioatdma edac_core shpchp serio_raw input_leds lpc_ich dca acpi_pad 8250_fintek mac_hid ib_iser rdma_cm iw_cm ib_cm ib_sa ib_mad ib_core ib_addr iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi autofs4 btrfs raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c raid0 multipath linear
[  176.137094]  hid_generic usbhid crct10dif_pclmul crc32_pclmul ghash_clmulni_intel e1000e aesni_intel raid1 aes_x86_64 isci lrw libsas ahci gf128mul ptp glue_helper ablk_helper cryptd psmouse hid libahci scsi_transport_sas pps_core wmi fjes
[  176.137105] CPU: 4 PID: 2829 Comm: insmod Tainted: G           OE   4.4.170 #1
[  176.137106] Hardware name: Supermicro X9SRL-F/X9SRL-F, BIOS 3.3 11/13/2018
[  176.137108]  0000000000000286 8ba89d23429d5749 ffff88100f5cba90 ffffffff8140a061
[  176.137110]  0000000000000000 ffffffff81cd89dd ffff88100f5cbac8 ffffffff810852d2
[  176.137112]  ffffffff821da620 0000000000000000 000000000000000f 000000000000000f
[  176.137113] Call Trace:
[  176.137118]  [<ffffffff8140a061>] dump_stack+0x63/0x82
[  176.137121]  [<ffffffff810852d2>] warn_slowpath_common+0x82/0xc0
[  176.137123]  [<ffffffff8108541a>] warn_slowpath_null+0x1a/0x20
[  176.137125]  [<ffffffff811a2504>] __alloc_pages_nodemask+0xd14/0xe00
[  176.137128]  [<ffffffff810ddaef>] ? msg_print_text+0xdf/0x1a0
[  176.137132]  [<ffffffff8117bc3e>] ? irq_work_queue+0x8e/0xa0
[  176.137133]  [<ffffffff810de04f>] ? console_unlock+0x20f/0x550
[  176.137137]  [<ffffffff811edbdc>] alloc_pages_current+0x8c/0x110
[  176.137139]  [<ffffffffc0024000>] ? 0xffffffffc0024000
[  176.137141]  [<ffffffff8119ca2e>] __get_free_pages+0xe/0x40
[  176.137143]  [<ffffffffc0024020>] tim_init+0x20/0x1000 [tim]
[  176.137146]  [<ffffffff81002125>] do_one_initcall+0xb5/0x200
[  176.137149]  [<ffffffff811f90c5>] ? kmem_cache_alloc_trace+0x185/0x1f0
[  176.137151]  [<ffffffff81196eb5>] do_init_module+0x5f/0x1cf
[  176.137154]  [<ffffffff81111b05>] load_module+0x22e5/0x2960
[  176.137156]  [<ffffffff8110e080>] ? __symbol_put+0x60/0x60
[  176.137159]  [<ffffffff81221710>] ? kernel_read+0x50/0x80
[  176.137161]  [<ffffffff811123c4>] SYSC_finit_module+0xb4/0xe0
[  176.137163]  [<ffffffff8111240e>] SyS_finit_module+0xe/0x10
[  176.137167]  [<ffffffff8186179b>] entry_SYSCALL_64_fastpath+0x22/0xcb
[  176.137169] ---[ end trace 6aa0b905b8418c7b ]---
[  176.137170] big = 0

curiously, trying it again yields...

# insmod tim.ko
insmod: ERROR: could not insert module tim.ko: Input/output error
...and dmesg just shows:

[  302.068396] Hello Tim!
[  302.068398] big = 0

why no stack dump the second (and subsequent) try(s)?

Loleta answered 8/6, 2019 at 16:21 Comment(3)
I'm trying to avoid running a kernel up in qemu/KVM and following the alloc_pages call and looking for a clue using gdb.Loleta
Have you tried __get_dma_pages or __get_free_pages ?(oreilly.com/library/view/linux-device-drivers/0596000081/…) I built a kernel module using __get_free_pages to allocate big chunks of memory for a project some years ago.Desantis
Just tried __get_free_pages(GFP_KERNEL | __GFP_HIGHMEM, 15) an order of anything over 10 fails. I have substantially edited my original post and added source and dmesg for clarity. I have the creeping suspicion I'm missing something silly.Loleta
L
11

The short version is that __GFP_DIRECT_RECLAIM (also provided by __GFP_RECLAIM) is necessary as dma_alloc_contiguous is eventually called and it checks, via a call to gfpflags_allow_blocking, that blocking is okay. I used the usual GFP_KERNEL which provides __GFP_RECLAIM | __GFP_IO | __GFP_FS. But before all that one must call dma_set_mask_and_coherent(&pdev->dev, DMA_BIT_MASK(64)) with DMA_BIT_MASK(64) not DMA_BIT_MASK(32).

err = dma_set_mask_and_coherent(&pdev->dev, DMA_BIT_MASK(64));
if (err) {
    printk(KERN_INFO "[%s:probe] dma_set_mask returned: %d\n", DRIVER_NAME, err);
    return -EIO;
}
vaddr = dma_alloc_coherent(&pdev->dev, dbsize, paddr, GFP_KERNEL);
if (!vaddr) {
    printk(KERN_ALERT "[%s:probe] failed to allocate coherent buffer\n", DRIVER_NAME);
    return -EIO;
}

iowrite32(paddr, ctx->bar0_base_addr + 0x140); // tell card where to DMA from

Allocating Unreasonably Large DMA Regions Using the CMA with Ubuntu 16.04 & 18.04:

  1. Rebuild Kernel

    1. Use uname -r to ascertain your current kernel version
    2. Issue apt install linux-source-$(uname -r) to fetch the kernel source
    3. copy /boot/config-$(uname -r) to /usr/src/linux-source-$(uname -r)/.config
    4. edit .config
      1. Locate CONFIG_DMA_CMA is not set
      2. change to CONFIG_DMA_CMA=y
    5. build kernel
      1. make -j[2 × # of cores]
      2. make -j[2 × # of cores] modules
      3. make install
    6. You have rebuilt the kernel
  2. Configure CMA to reserve RAM

    1. Edit /etc/defualt/grub
      1. Locate GRUB_CMDLINE_LINUX=""
      2. Change to GRUB_CMDLINE_LINUX="cma=33G"
      3. use your desired CMA reserved RAM in place of 33G
    2. Issue update-grub
    3. Reboot
    4. Issue dmesg | grep cma
      1. Look for Memory: 30788784K/67073924K available (14339K kernel code, 2370K rwdata, 4592K rodata, 2696K init, 5044K bss, 1682132K reserved, 34603008K cma-reserved
      2. note: This example reserves 33G
    5. You have configured CMA to hold back RAM from the normal allocation subsystems
  3. Alter your kernel module (driver) source

    1. Inform the kernel that the card can address 64b
    2. In your probe function locate a line like dma_alloc_coherent(…
    3. A few lines before that you may find dma_set_mask_and_coherent(&pdev->dev, DMA_BIT_MASK(32))
    4. change this to dma_set_mask_and_coherent(&pdev->dev, DMA_BIT_MASK(64))
    5. You have informed the kernel that the card in question is not restricted to low memory
    6. dma_alloc_coherent(&pdev->dev, dbsize, paddr, GFP_KERNEL)
    7. dbsize may specify up to 32G
    8. Recompile your kernel module (driver) and test
Loleta answered 14/2, 2020 at 18:17 Comment(3)
I can confirm that this works as advertised. Thank you so much for taking the time to answer your own question!Indeclinable
It seems that CMA is ARM only, and there is not x86-64 support in RHEL7 / CentOS7. Is that true? Ubuntu has x86 CMA support though?Shorts
I can confirm that this works for Ubuntu 22.04, Kernel 5.15.64, X86. My case is slightly different that the dma has to be 32-bit address. I used the following cmdline in grub: "cma=500M@0-4G". dmesg reports: cma: Reserved 500 MiB at 0x0000000034400000Ithyphallic

© 2022 - 2024 — McMap. All rights reserved.