UPDATE For anyone interested, here is a step-by-step instruction and explanation on how to build a bare metal USB-Stack, how to tackle such a project and what you need to know for each step: STM32USB@GitHub

TLDR: I have a STM32G441 and want to implement a USB driver without the use of any HAL Libraries, just using CMSIS - for learning experience, for space and because what I want to do would require to change the hal anyway.

But I can't get this thing to receive anything. I'm stuck trying to get the Device Address, which is never handed to the code. The hal middleware works just fine, so it's not a HW issue.

What I'm doing

I'm enabling the USB clock (correctly as I assume, because it can send ACK signals using my Logic Analyzer), power up the USB peripheral as defined in the datasheet, enable all the necessary Interrupts and handle the reset event by initializing the BTable and Endpoint 0. Now I expect to receive a CTR-Interrupt which never appears.

Reference Manual

Clock

The μC runs on a 25MHz HSE clock. The USB periphery runs on the PLL Q clock at ~48MHz, RCC settings were verified with the CubeMX clock configurator. AHB runs at half speed, because I get a bus error hard fault if I try to run it at full speed, but that's another question. The System Clock is set to 143.75MHz.

RCC->CR |= RCC_CR_HSEON | RCC_CR_HSION;

// Configure PLL (R=143.75, Q=47.92)
RCC->CR &= ~RCC_CR_PLLON;
while (RCC->CR & RCC_CR_PLLRDY) {
}
RCC->PLLCFGR |= RCC_PLLCFGR_PLLSRC_HSE | RCC_PLLCFGR_PLLM_0 | (23 << RCC_PLLCFGR_PLLN_Pos) | RCC_PLLCFGR_PLLQ_1;
RCC->PLLCFGR |= RCC_PLLCFGR_PLLREN | RCC_PLLCFGR_PLLQEN;
RCC->CR |= RCC_CR_PLLON;

// Select PLL as main clock, AHB/2 > otherwise Bus Error Hard Fault
RCC->CFGR |= RCC_CFGR_HPRE_3 | RCC_CFGR_SW_PLL;

// Select & Enable IO Clocks (PLL > USB, ADC; HSI16 > UART)
RCC->CCIPR = RCC_CCIPR_CLK48SEL_0 | RCC_CCIPR_ADC12SEL_1 | RCC_CCIPR_USART1SEL_1 | RCC_CCIPR_USART2SEL_1 | RCC_CCIPR_USART3SEL_1 | RCC_CCIPR_UART4SEL_1;
RCC->AHB2ENR |= RCC_AHB2ENR_ADC12EN | RCC_AHB2ENR_GPIOAEN | RCC_AHB2ENR_GPIOBEN | RCC_AHB2ENR_GPIOCEN;
RCC->APB1ENR1 |= RCC_APB1ENR1_USBEN | RCC_APB1ENR1_UART4EN | RCC_APB1ENR1_USART3EN | RCC_APB1ENR1_USART2EN;
RCC->APB2ENR |= RCC_APB2ENR_USART1EN;

// Enable DMAMUX & DMA1 Clock
RCC->AHB1ENR |= RCC_AHB1ENR_DMAMUX1EN | RCC_AHB1ENR_DMA1EN;

USB Memory

As far as I know, the USB BTable and endpoint buffers need to be placed in the USB-SRAM, not in regular SRAM. I've added some linker directives to create a section for that, which seems to work just fine according to the memory analyzer. Mem2Usb just recalculates the offset from absolute to relative to the USB-SRAM offset.

#define __USB_MEM __attribute__((section(".usbbuf")))
#define __USBBUF_BEGIN 0x40006000
#define __MEM2USB(X) (((int)X - __USBBUF_BEGIN))

First question: The access is only allowed to be 16 Bytes wide. But, contrary to e.g. STM32F103 there is no need for padding as it seems. The memory tool has some problems displaying this region, because it is only handling WORD access while the tool uses DWORD access, but copying the memory allocated by the HAL word by word also shows no padding. Is that correct? So I should be able to use all 1024 bytes, not just seeing them but only having 512. This is also the reason why mem2usb does not divide the address by 2.

Then I create some structures for the BTable and the zero-endpoint. The BTable ends up at 0x40006000 by default. Endpoint 0 has a rx and a tx buffer with max 64 bytes as per USB spec. The alignments are taken from the Reference manual. The memory is not automatically zeroed out.

typedef struct {
    unsigned short ADDR_TX;
    unsigned short COUNT_TX;
    unsigned short ADDR_RX;
    unsigned short COUNT_RX;
} USB_BTABLE_ENTRY;

__ALIGNED(8)
__USB_MEM
static USB_BTABLE_ENTRY BTable[8] = {0};

__ALIGNED(2)
__USB_MEM
static char EP0_Buf[2][64] = {0};

Initialization

Enabling the NVIC, then power up, wait 1μs until clock is stable as per datasheet, then clear reset state, clear pending interrupts, enable interrupts and last enable the internal pull up to start enumeration.

NVIC_SetPriority(USB_HP_IRQn, 0);
NVIC_SetPriority(USB_LP_IRQn, 0);
NVIC_SetPriority(USBWakeUp_IRQn, 0);
NVIC_EnableIRQ(USB_HP_IRQn);
NVIC_EnableIRQ(USB_LP_IRQn);
NVIC_EnableIRQ(USBWakeUp_IRQn);

USB->CNTR &= ~USB_CNTR_PDWN;

// Wait 1μs until clock is stable
SysTick->LOAD = 100;
SysTick->VAL = 0;
SysTick->CTRL = 1;
while ((SysTick->CTRL & SysTick_CTRL_COUNTFLAG_Msk) == 0) {
}
SysTick->CTRL = 0;

USB->CNTR &= ~USB_CNTR_FRES;
USB->ISTR = 0;

USB->CNTR |= USB_CNTR_RESETM | USB_CNTR_CTRM | USB_CNTR_WKUPM | USB_CNTR_SUSPM | USB_CNTR_ESOFM;
USB->BCDR |= USB_BCDR_DPPU;

USB Reset

Now the host sends a reset signal, which is triggered correctly. During the reset signal, I initialize the BTable and EP0. I set EP0 to ACK on RX and NACK on TX requests, as do other bare metal USB examples and the HAL (they are toggle, not write, but the register is in a known state of 0x00 as the hardware resets them on a reset). Lastly I put the USB peripheral in enable mode and reset the device address to 0.

if ((USB->ISTR & USB_ISTR_RESET) != 0) {
    USB->ISTR = ~USB_ISTR_RESET;

    // Enable EP0
    USB->BTABLE = __MEM2USB(BTable);

    BTable[0].ADDR_TX = __MEM2USB(EP0_Buf[0]);
    BTable[0].COUNT_TX = 0;
    BTable[0].ADDR_RX = __MEM2USB(EP0_Buf[1]);
    BTable[0].COUNT_RX = (1 << 15) | (1 << 10);

    USB->EP0R = USB_EP_CONTROL | (2 << 4) | (3 << 12);
    USB->CNTR = USB_CNTR_CTRM | USB_CNTR_RESETM;

    USB->DADDR = USB_DADDR_EF;
}

Debugging shows that the BTable is indeed at 0x40006000 and the Buffer address is written (I assume) correctly. The EP0 register was compared to a working HAL implementation and they are the same at that point.

Here I'm stuck

I expect the host to send the device address next (it doesn't, it sends a sleep and a wakeup and then another reset first), which will trigger the CRT interrupt (which is masked). Point is, it never does. And I don't know why. The host sends the request just fine and the device sends an ACK on that request just fine (logic analyzer), but the CRT is never triggered. Any ideas what else I can try or where to look?

Update

I've now compared the messages from my implementation with the HAL ones. The interrupt now handles the exact same messages in the exact same order and the USB-Registers also contain exactly the same values for every request. I've changed the BTable and USB-SRAM layout to contain the exact same values as the HAL after the Reset-Interrupt.

I had to implement the SUSP and WKUP for this to work, which was probably one of the things thats missing. Now they both behave exactly the same. It turns out, the problem is that I never receive a proper SOF-Package. The HAL gets its first SOF directly after the second reset (HW-Reset > 2x ESOF > SUSP > WKUP > RESET > (Optional 1 ESOF) > SOF), while mine gets an ERR instead of the SOF.

Looks like the error is not related to the USB registers or USB-SRAM. Next step will be to compare all registers I can think of as relevant between the two implementations. Maybe I forgot a clock?