What is a bus error? Is it different from a segmentation fault?
Asked Answered
D

17

355

What does the "bus error" message mean, and how does it differ from a segmentation fault?

Deweydewhirst answered 17/10, 2008 at 14:48 Comment(3)
I'd like to add a simple explanation for both: Segmentation fault means that you are trying to access memory that you are not allowed to (e. g. it's not part of your program). However, on a bus error it usually means that you are trying to access memory that does not exist (e. g. you try to access an address at 12G but you only have 8G memory) or if you exceed the limit of usable memory.Polychasium
On what platform did you see this? PC? Mac? x86? 32/64?Chancechancel
I got an bus error when I was passing a pointer to a string literal (read-only memory) to strtok. An example of what may happen when an undefined behaviour occursArdelia
V
329

Bus errors are rare nowadays on x86 and occur when your processor cannot even attempt the memory access requested, typically:

  • using a processor instruction with an address that does not satisfy its alignment requirements.

Segmentation faults occur when accessing memory which does not belong to your process. They are very common and are typically the result of:

  • using a pointer to something that was deallocated.
  • using an uninitialized hence bogus pointer.
  • using a null pointer.
  • overflowing a buffer.

PS: To be more precise, it is not manipulating the pointer itself that will cause issues. It's accessing the memory it points to (dereferencing).

Virescence answered 17/10, 2008 at 15:12 Comment(9)
They aren't rare; I'm just at Exercise 9 from How to Learn C the Hard Way and already encountered one...Cf
Another cause of bus errors (on Linux anyway) is when the operating system can't back a virtual page with physical memory (e.g. low-memory conditions or out of huge pages when using huge page memory.) Typically mmap (and malloc) just reserve the virtual address space, and the kernel assigns the physical memory on demand (so called soft page faults.) Make a large enough malloc, and then write to enough of it and you'll get a bus error.Noodle
for me the partition containing /var/cache was simply full askubuntu.com/a/915520/493379Brisesoleil
In my case, a method static_casted a void * parameter to an object that stores a callback (one attribute points to the object and the other to the method). Then the callback is called. However, what was passed as void * was something completely different and thus the method call caused the bus error.Scanlon
@Virescence Do you know the nature of bus errors. i.e. does the message on the ring bus have some mechanism where a stop on the ring also accepts a message that was sent by it but to whichever destination as it suggests that it has gone all the way round the ring and hasn't been accepted. I'm guessing the line fill buffer returns an error status and when it retires it flushes the pipeline and calls the correct exception microroutine. This basically requires that the memory controller accept all address in its range which would suggest that when the BARs etc are changed, it would have to internallyGummosis
change registers in the memory controller to exclude that address rangeGummosis
It seems like an awkward operation to me as memory is intialised on boot before the BARs so any update to a BAR would have to reflect itself in the memory controller which would also mean it would then have to update its internal mapping to the RAM channel, module, rank, IC, chip, bank, row, column for all the other rangesGummosis
I've also seen this comming from my dotnet core app when the disk was full (using memory-mapped io). Maybe that's realated to @Eloff's comment.Percussive
can structure packing, and then accessing members at unaligned addresses cause BUS errors ?Fluke
L
113

A segfault is accessing memory that you're not allowed to access. It's read-only, you don't have permission, etc...

A bus error is trying to access memory that can't possibly be there. You've used an address that's meaningless to the system, or the wrong kind of address for that operation.

Lifeless answered 17/10, 2008 at 14:55 Comment(0)
S
28

mmap minimal POSIX 7 example

"Bus error" happens when the kernel sends SIGBUS to a process.

A minimal example that produces it because ftruncate was forgotten:

#include <fcntl.h> /* O_ constants */
#include <unistd.h> /* ftruncate */
#include <sys/mman.h> /* mmap */

int main() {
    int fd;
    int *map;
    int size = sizeof(int);
    char *name = "/a";

    shm_unlink(name);
    fd = shm_open(name, O_RDWR | O_CREAT, (mode_t)0600);
    /* THIS is the cause of the problem. */
    /*ftruncate(fd, size);*/
    map = mmap(NULL, size, PROT_READ | PROT_WRITE, MAP_SHARED, fd, 0);
    /* This is what generates the SIGBUS. */
    *map = 0;
}

Run with:

gcc -std=c99 main.c -lrt
./a.out

Tested in Ubuntu 14.04.

POSIX describes SIGBUS as:

Access to an undefined portion of a memory object.

The mmap spec says that:

References within the address range starting at pa and continuing for len bytes to whole pages following the end of an object shall result in delivery of a SIGBUS signal.

And shm_open says that it generates objects of size 0:

The shared memory object has a size of zero.

So at *map = 0 we are touching past the end of the allocated object.

Unaligned stack memory accesses in ARMv8 aarch64

This was mentioned at: What is a bus error? for SPARC, but here I will provide a more reproducible example.

All you need is a freestanding aarch64 program:

.global _start
_start:
asm_main_after_prologue:
    /* misalign the stack out of 16-bit boundary */
    add sp, sp, #-4
    /* access the stack */
    ldr w0, [sp]

    /* exit syscall in case SIGBUS does not happen */
    mov x0, 0
    mov x8, 93
    svc 0

That program then raises SIGBUS on Ubuntu 18.04 aarch64, Linux kernel 4.15.0 in a ThunderX2 server machine.

Unfortunately, I can't reproduce it on QEMU v4.0.0 user mode, I'm not sure why.

The fault appears to be optional and controlled by the SCTLR_ELx.SA and SCTLR_EL1.SA0 fields, I have summarized the related docs a bit further here.

Stuffed answered 7/8, 2015 at 12:0 Comment(0)
J
11

I believe the kernel raises SIGBUS when an application exhibits data misalignment on the data bus. I think that since most[?] modern compilers for most processors pad / align the data for the programmers, the alignment troubles of yore (at least) mitigated, and hence one does not see SIGBUS too often these days (AFAIK).

From: Here

Jarid answered 17/10, 2008 at 14:54 Comment(1)
Depends on the nasty tricks you're doing with your code. You can trigger a BUS error/Alignment Trap if you do something silly like do pointer math and then typecast for access to a problem mode (i.e. You set up an uint8_t array, add one, two, or three to the array's pointer and then typecast to a short, int, or long and try to access the offending result.) X86 systems will pretty much let you do this, albeit at a real performance penalty. SOME ARMv7 systems will let you do this- but most ARM, MIPS, Power, etc. will grouse at you over it.Coy
S
11

I agree with all the answers above. Here are my 2 cents regarding the BUS error:

A BUS error need not arise from the instructions within the program's code. This can happen when you are running a binary and during the execution, the binary is modified (overwritten by a build or deleted, etc.).

Verifying if this is the case

A simple way to check if this is the cause is by launching a couple of instances of the same binary form a build output directory, and running a build after they start. Both the running instances would crash with a SIGBUS error shortly after the build has finished and replaced the binary (the one that both the instances are currently running).

Underlying Reason

This is because OS swaps memory pages and in some cases, the binary might not be entirely loaded in memory. These crashes would occur when the OS tries to fetch the next page from the same binary, but the binary has changed since the last time it was read.

Shier answered 30/10, 2018 at 16:26 Comment(1)
Agreed, this is the most common cause of bus errors in my experience.Longo
D
10

On POSIX systems, you can also get the SIGBUS signal when a code page cannot be paged in for some reason.

Dichroism answered 18/10, 2008 at 17:52 Comment(2)
This often happens when I update the .so file while running the processColecolectomy
Another reason to happen is if you try to mmap a file larger than the size of /dev/shmReliquary
I
5

One classic instance of a bus error is on certain architectures, such as the SPARC (at least some SPARCs, maybe this has been changed), is when you do a misaligned access. For instance:

unsigned char data[6];
(unsigned int *) (data + 2) = 0xdeadf00d;

This snippet tries to write the 32-bit integer value 0xdeadf00d to an address that is (most likely) not properly aligned, and will generate a bus error on architectures that are "picky" in this regard. The Intel x86 is, by the way, not such an architecture. It would allow the access (albeit execute it more slowly).

Insolate answered 17/10, 2008 at 14:58 Comment(7)
In case, I had data[8]; This is now a multiple of 4 in a 32-bit architecture. So, it is aligned. Will I still get the error now? Also, please explain, is it a bad idea to a data type conversion for pointers. Will it cause mis-alignment errors on a fragile architecture. Please elaborate, It will help me.Carreno
Heh. It's not so much type conversion as you're doing type conversion on a pointer that you've done pointer math on. Look carefully at the code above. The compiler has carefully dword aligned your pointer for data- and then you screw everything up on the compiler by offsetting the reference by TWO and typecasting to a very much needing to be dword aligned access on what's going to be a non-dword boundary.Coy
"Fragile" isn't the word I'd use for all of this. X86 machines and code have got people doing rather silly things for a while now, this being one of them. Rethink your code if you're having this sort of problem- it's not very performant on X86 to begin with.Coy
@Svartalf: On x86, word accesses on unaligned pointers are certainly slower than word accesses to aligned pointers, but at least historically they have been faster than simple code which unconditionally assembles things out of bytes, and they're certainly simpler than code which tries to use an optimal combination of varied-size operations. I wish the C standard would include means of packing/unpacking larger integer types to/from a sequence of smaller integers/characters so as to let the compiler use whatever approach is best on a given platform.Tightlipped
@Supercat: The thing is this- you get away with it on X86. You try this on ARM, MIPS, Power, etc. and you're going to get nasty things happening to you. On ARM less than Arch V7, you will have your code have an alignment failure- and on V7, you can, IF your runtime is set for it, handle it with a SEVERE performance hit. You just simply don't want to DO this. It's bad practices, to be blunt. :DCoy
@Svartalf: I'm well aware that platforms vary as to their treatment of unaligned accesses. I wish the C standard would define a means via which source code could specify "I want this type to behave as a pointer with __ alignment, to an unsigned integer stored using the __ lower bits of each of __ locations of type __, in __-first format", and let the compiler generate whatever machine code would be needed to accomplish that. A compiler may end up having to generate nasty inefficient code, but it wouldn't be any worse than what a programmer would have to write for portability. The difference...Tightlipped
...would be that on platforms where the stated requirements coincide with a natural processor behavior, the compiler could exploit that easily. If on e.g. x86 the programmer had requested MSB-first alignment, the processor may have to add byte-swap instructions after loads and before stores, but that would still be cheaper than four separate 8-bit stores.Tightlipped
F
5

A specific example of a bus error I just encountered while programming C on OS X:

#include <string.h>
#include <stdio.h>

int main(void)
{
    char buffer[120];
    fgets(buffer, sizeof buffer, stdin);
    strcat("foo", buffer);
    return 0;
}

In case you don't remember the docs strcat appends the second argument to the first by changing the first argument(flip the arguments and it works fine). On linux this gives a segmentation fault(as expected), but on OS X it gives a bus error. Why? I really don't know.

Frosting answered 8/10, 2014 at 16:7 Comment(2)
Probably stack overflow protection raises bus error.Dichroism
"foo" is stored in a read-only segment of memory, so it is impossible to write to it. It wouldn't be stack overflow protection, just memory write protection (this is a security hole if your program can rewrite itself).Endearment
G
3

I was getting a bus error when the root directory was at 100%.

Gentle answered 16/6, 2016 at 3:39 Comment(0)
C
3

Firstly SIGBUS and SIGSEGV are not a specific type of error but are groups or families of errors. This is why you typically see a signal number(si_no) and a signal code(si_code).

They also depend on the os and architecture as to what can cause them exactly.

Generally we can say that. A SIGSEGV is related to memory mappings(permissions,no mapping) i.e. an mmu error.

A SIGBUS is when the memory mapping succeeds and you hit an issue with the underlying memory system(out of memory, No memory at that location, alignment, smmu prevents access, etc..), i.e. a bus error..

A SIGBUS can also be with mmapped files, if the file vanishes from the system e.g. you mmap a file on a removable media and it gets unplugged.

A good place to look on a platform is the siginfo.h header, to get an idea of the signal sub types. e.g. for linux This page provides an overview. https://elixir.bootlin.com/linux/latest/source/include/uapi/asm-generic/siginfo.h#L245

/*
 * SIGSEGV si_codes
 */
#define SEGV_MAPERR 1   /* address not mapped to object */
#define SEGV_ACCERR 2   /* invalid permissions for mapped object */
#define SEGV_BNDERR 3   /* failed address bound checks */
#ifdef __ia64__
# define __SEGV_PSTKOVF 4   /* paragraph stack overflow */
#else
# define SEGV_PKUERR    4   /* failed protection key checks */
#endif
#define SEGV_ACCADI 5   /* ADI not enabled for mapped object */
#define SEGV_ADIDERR    6   /* Disrupting MCD error */
#define SEGV_ADIPERR    7   /* Precise MCD exception */
#define SEGV_MTEAERR    8   /* Asynchronous ARM MTE error */
#define SEGV_MTESERR    9   /* Synchronous ARM MTE exception */
#define NSIGSEGV    9

/*
 * SIGBUS si_codes
 */
#define BUS_ADRALN  1   /* invalid address alignment */
#define BUS_ADRERR  2   /* non-existent physical address */
#define BUS_OBJERR  3   /* object specific hardware error */
/* hardware memory error consumed on a machine check: action required */
#define BUS_MCEERR_AR   4
/* hardware memory error detected in process but not consumed: action optional*/
#define BUS_MCEERR_AO   5
#define NSIGBUS     5

a Final note is that, all signals can also be user generated e.g. kill. If it is user generated then the si_code is SI_USER. So special sources get negative si_codes.

/*
 * si_code values
 * Digital reserves positive values for kernel-generated signals.
 */
#define SI_USER     0       /* sent by kill, sigsend, raise */
#define SI_KERNEL   0x80        /* sent by the kernel from somewhere */
#define SI_QUEUE    -1      /* sent by sigqueue */
#define SI_TIMER    -2      /* sent by timer expiration */
#define SI_MESGQ    -3      /* sent by real time mesq state change */
#define SI_ASYNCIO  -4      /* sent by AIO completion */
#define SI_SIGIO    -5      /* sent by queued SIGIO */
#define SI_TKILL    -6      /* sent by tkill system call */
#define SI_DETHREAD -7      /* sent by execve() killing subsidiary threads */
#define SI_ASYNCNL  -60     /* sent by glibc async name lookup completion */

#define SI_FROMUSER(siptr)  ((siptr)->si_code <= 0)
#define SI_FROMKERNEL(siptr)    ((siptr)->si_code > 0)
Cask answered 4/7, 2021 at 14:26 Comment(0)
A
2

It depends on your OS, CPU, compiler, and possibly other factors.

In general, it means the CPU bus could not complete a command, or suffered a conflict, but that could mean a whole range of things, depending on the environment and code being run.

Asti answered 17/10, 2008 at 14:52 Comment(0)
A
2

It normally means an un-aligned access.

An attempt to access memory that isn't physically present would also give a bus error, but you won't see this if you're using a processor with an MMU and an OS that's not buggy, because you won't have any non-existent memory mapped to your process's address space.

Adactylous answered 17/10, 2008 at 14:57 Comment(1)
My i7 certainly has an MMU, but I still came across this error while learning C on OS X (passing uninitialized pointer to scanf). Does that mean that OS X Mavericks is buggy? What would have been the behavior on a non-buggy OS?Dibri
I
1

My reason for bus error on Mac OS X was that I tried to allocate about 1Mb on the stack. This worked well in one thread, but when using openMP this drives to bus error, because Mac OS X has very limited stack size for non-main threads.

Indrawn answered 19/11, 2015 at 13:56 Comment(0)
L
0

For me, I accidentally triggered a "Bus Error" by not declaring that my assembly was heading back into the .text section. It might seem obvious but it had me stumped for a while.

Eg.

.globl _myGlobal # Allocate a 64-bit global with the value 2
.data
.align 3
_myGlobal:
.quad 2
.globl _main # Main function code
_main:
push %rbp

Was missing a text directive when returning to code from data:

_myGlobal:
.quad 2
.text # <- This
.globl _main
_main:

Hope this ends up helpful to someone

Lamoree answered 21/6, 2021 at 20:57 Comment(0)
G
0

One notable cause is that SIGBUS is returned if you attempt to mmap a region of /dev/mem which userspace isn't allowed to access.

Gibeonite answered 25/10, 2021 at 21:52 Comment(0)
R
0

I was trying to free a string that was accidentally on the stack:

#include <stdlib.h>

int main(void)
{
    char *str = "foo";
    free(str);
    return (EXIT_SUCCESS);
}

My fix was to strdup() the string on the stack:

#include <stdlib.h>
#include <string.h>

int main(void)
{
    char *str = strdup("foo");
    free(str);
    return (EXIT_SUCCESS);
}
Rainout answered 23/12, 2022 at 16:36 Comment(2)
Why would you duplicate it and then free it in the same function? Why wouldnt you just leave it on the stack and not free it?Pulsate
@user16217248 Because there were multiple spots assigning a string to the same field of a struct, and some of those were allocated and had to be freed in order to prevent having memory leaksRainout
G
-1

A typical buffer overflow which results in Bus error is,

{
    char buf[255];
    sprintf(buf,"%s:%s\n", ifname, message);
}

Here if size of the string in double quotes ("") is more than buf size it gives bus error.

Gravel answered 26/6, 2012 at 8:51 Comment(1)
Heh...if this were the case, you'd have BUS error concerns instead of the stack smashing exploits you read about all the time for Windows and other machines. BUS errors are caused by an attempt to access "memory" that the machine simply cannot access because the address is invalid. (Hence the term "BUS" error.) This can be due to a host of failings, including invalid alignments, and the like- so long as the processor can't place the address ON the bus lines.Coy

© 2022 - 2024 — McMap. All rights reserved.