I recently encountered an issue in a custom Linux kernel (2.6.31.5, x86) driver where copy_to_user would periodically not copy any bytes to user space. It would return the count of bytes passed to it, indicating that it had not copied anything. After code inspection we found that the code was disabling interrupts while calling copy_to_user which violates it's contract. After correcting this, the issue stopped occurring. Because the issue happened so infrequently, I need to prove that disabling the interrupts caused the issue.
If you look at the code snippet below from arch/x86/lib/usercopy_32.c rep; movsl copies the words to userspace by the count in CX. Size is updated with CX on exit. CX will be 0 if the movsl execute correctly. Because CX is not zero, the movs? instructions must not have executed, in order to fit the definition of copy_to_user and the observed behavior.
/* Generic arbitrary sized copy. */
#define __copy_user(to, from, size) \
do { \
int __d0, __d1, __d2; \
__asm__ __volatile__( \
" cmp $7,%0\n" \
" jbe 1f\n" \
" movl %1,%0\n" \
" negl %0\n" \
" andl $7,%0\n" \
" subl %0,%3\n" \
"4: rep; movsb\n" \
" movl %3,%0\n" \
" shrl $2,%0\n" \
" andl $3,%3\n" \
" .align 2,0x90\n" \
"0: rep; movsl\n" \
" movl %3,%0\n" \
"1: rep; movsb\n" \
"2:\n" \
".section .fixup,\"ax\"\n" \
"5: addl %3,%0\n" \
" jmp 2b\n" \
"3: lea 0(%3,%0,4),%0\n" \
" jmp 2b\n" \
".previous\n" \
".section __ex_table,\"a\"\n" \
" .align 4\n" \
" .long 4b,5b\n" \
" .long 0b,3b\n" \
" .long 1b,2b\n" \
".previous" \
: "=&c"(size), "=&D" (__d0), "=&S" (__d1), "=r"(__d2) \
: "3"(size), "0"(size), "1"(to), "2"(from) \
: "memory"); \
} while (0)
The 2 ideas that I have are:
- when the interrupts are disabled, the page fault does not occur and then rep; movs? is skipped without doing anything. The return value would then be CX, or the amount not copied to userspace, as the definition specifies and the behavior observed.
- The page fault does occur, but linux can not process it because interrupts are disabled, so the page fault handler skips the instruction, although I don't know how the page fault handler would do this. Again, in this case CX would remain unmodified and the return value would be correct.
Can anyone point me to the sections in the Intel manuals that specify this behavior, or point me to any additional Linux source that could be helpful?