I know this is an old question with a selected answer but didnt see anyone explain the answer to what is the difference between aligned and unaligned memory access...
Be it dram or sram or flash or other. Take an sram as a simple example it is built out of bits a specific sram will be built out of a fixed number of bits wide and a fixed number of rows deep. lets say 32 bits wide and several/many rows deep.
if I do a 32 bit write to address 0x0000 in this sram, the memory controller around this sram can simply do a single write cycle to row 0.
if I do a 32 bit write to address 0x0001 in this sram, assuming that is allowed, the controller will need to do a read of row 0, modify three of the bytes, preserving one, and write that to row 0, then read row 1 modify one byte leaving the other three as found and write that back. which bytes get modified or not have to do with endianness for the system.
The former is aligned and the latter unaligned, clearly a performance difference plus need the extra logic to be able to do the four memory cycles and merge the byte lanes.
If I were to read 32 bits from address 0x0000 then a single read of row 0, done. But read from 0x0001 and I have to do two reads row0 and row1 and depending on the system design just send those 64 bits back to the processor possibly two bus clocks instead of one. or the memory controller has the extra logic so that the 32 bits are aligned on the data bus in one bus cycle.
16 bit reads are a little better, a read from 0x0000, 0x0001 and 0x0002 would only be a read from row0 and could based on the system/processor design send those 32 bits back and the processor extracts them or shift them in the memory controller so that they land on specific byte lanes so the processor doesnt have to rotate around. One or the other has to if not both. A read from 0x0003 though is like above you have to read row 0 and row1 as one of your bytes is in each and then either send 64 bits back for the processor to extract or the memory controller combines the bits into one 32 bit bus response (assuming the bus between the processor and memory controller is 32 bits wide for these examples).
A 16 bit write though always ends up with at least one read-modify-write in this example sram, address 0x0000, 0x0001 and 0x0002 read row0 modify two bytes and write back. address 0x0003 read two rows modify one byte each and write back.
8 bit you only need to read one row containing that byte, writes though are a read-modify-write of one row.
The armv4 didnt like unaligned although you could disable the trap and the result is not like you would expect above, not important, current arms allow unaligned and give you the above behavior you can change a bit in a control register and then it will abort unaligned transfers. mips used to not allow, not sure what they do now. x86, 68K etc, was allowed and the memory controller may have had to do the most work.
The designs that dont permit it clearly are for performance and less logic at what some would say is a burden on the programmers others might say it is no extra work on the programmer or easier on the programmer. aligned or not you can also see why it can be better to not try to save any memory by making 8 bit variables but go ahead and burn a 32 bit word or whatever the natural size of a register or the bus is. It may help your performance at a small cost of some bytes. Not to mention the extra code the compiler would need to add to make the lets say 32 bit register mimic an 8 bit variable, masking and sometimes sign extension. Where using register native sizes those additional instructions are not required. You can also pack multiple things into a bus/memory wide location and do one memory cycle to collect or write them then use some extra instructions to manipulate between registers not costing ram and a possible wash on the number of instructions.
I dont agree that the compiler will always align the data right for the target, there are ways to break that. And if the target doesnt support unaligned you will hit the fault. Programmers would never need to talk about this if the compiler always did it right based on any legal code you could come up with, there would be no reason for this question unless it was for performance. if you dont control the void ptr address to be aligned or not then you have to use the mem2() unaligned access all the time or you have to do an if-then-else in your code based on the value of the ptr as nik pointed out. by declaring as void the C compiler now has no way to correctly deal with your alignment and it wont be guaranteed. if you take a char *prt and feed it to these functions all bets are off on the compiler getting it right without you adding extra code either buried in the mem2() function or outside these two functions. so as written in your question mem2() is the only correct answer.
DRAM say used in your desktop/laptop tends to be 64 or 72 (with ecc) bits wide, and every access to them is aligned. Even though the memory sticks are actually made up of 8 bit wide or 16 or 32 bit wide chips. (this may be changing with phones/tablets for various reasons) the memory controller and ideally at least one cache sits in front of this dram so that the unaligned or even aligned accesses that are smaller than the bus width read-modify-writes are dealt with in the cache sram which is way faster, and the dram accesses are all aligned full bus width accesses. If you have no cache in front of the dram and the controller is designed for full width accesses then that is the worst performance, if designed for lighting up the byte lanes separately (assuming 8 bit wide chips) then you dont have the read-modify-writes but a more complicated controller. if the typical use case is with a cache (if there is one in the design) then it may not make sense to have that additional work in the controller for each byte lane, but have it just know how to do full bus width sized transfers or multiples of.