goto http://infocenter.arm.com along the left under contents look for ARM architecture. And under that Reference Manuals. used to be there was a single ARM ARM (ARM Architecture Reference Manual) but the family has grown to the point they had to break it into, well, families.
The ARM ARM's are going to show you the instruction sets. What I think they call the ARMv5 manual is the old ARM ARM. You will find the ARM instructions (32bit) and thumb instructions (16 bit). For each instruction they list what architecture supports it, so you might see an ARMv5 instruction that is not supported by the ARMv4 (ARMv4 a.k.a ARM7, like the popular ARM7TDMI core). Thumb instructions are supported by ARMv4T and newer, etc.
So there is the core 32 bit arm instruction set which you may have been used to with new instructions added from time to time and bugs/restrictions fixed (ldr r0,[r0] for example), etc.
The floating point unit has had one or two overhauls, most cores do not have a fpu and the ones that have an fpu that doesnt mean the chip vendor included it in the chip. the fpa being the older, vfp being newer and now neon stuff. If you pay attention these all fall into the generic coprocessor instructions category. But you dont have to know/use the coprocessor version they have aliases for everything.
There is/was this java/jazelle thing, same story some cores might have it as an option doesnt mean the vendor included it.
At least two sets of thumb2 extensions to the thumb instruction set. Before thumb2 extensions the thumb instructions were all 16 bit and had a one to one mapping to an ARM instruction, makes sense you only need an ARM core, the decoder translates from the smaller instruction to ARM instruction and feeds that to the core. All instructions are 16 bit except the branch, and if you look at that pattern you can quite easily decode that as two separate 16 bit instructions. So then they decide to make their microcontroller offering smaller, instead of everyone just using the ARM7TDMI and consuming the chip size and power, thumb2 capable processors are thumb only, they do not support 32 bit ARM instructions, there is no ARM core that thumb instructions are translated to, etc. new core. The ARMv6-M a.k.a Cortex-m0 and Cortex-m1 take the thumb instruction set and add a few 32 bit instructions to close the performance gap to ARM (thumb was smaller yes, but a little slower than ARM if you compiled the same code to both, it took like 10-20% more instructions from my experiments to use thumb). In theory thumb-2 (ARMv7-M) outperforms ARM when and where you can compare them. For whatever reason the Cortex-m3 came out first which is ARMv7-M and has a bunch of 32 bit thumb2 instructions added to the thumb instruction set. I recently counted and ARMv6-M added like 20, ARMv7-M has like 140-150 instructions added to the base thumb instruction set. thumb2 is basically variable word length. And again only runs on the cortex-m series. Looking at it it is almost like they re-built the ARM instruction set again under the name thumb. not completely but you get back a lot of arm like instructions, three register instead of two, being able to reach higher registers and use immediates, etc. What this caused is a desire to write asm that compiled for both ARM and thumb/thumb2. So they came up with a unified syntax. you can write an instruction like
add r0,r1
If assembling for thumb, that is the instruction, if assembling for arm they will convert it to
add r0,r0,r1
for you, instead of any syntax errors. You have to specify that you are using the unified syntax, at least with the gnu binutils assembler (gas).
An equally important set of documents is the Technical Reference Manuals, also at infocenter.arm.com. Each core has a trm, actually each rev of each core has a TRM. Also the extra cost items like L2 caches have their own TRM, for each rev. it is important to find out the core the chip vendor bought/used and if possible the revision (rev 2.0 r2p0, rev 1.0 r1p0, etc) as there are programming differences as well as errata differences between them (dont trust Linux as a reference!, it is a huge mess, every time I look yet another company has completely misunderstood and misapplied core/errata differences, it si a bit of a disaster at the moment). Sometimes the TRM includes instruction information, or paints a more clear picture on what that core supports and doesnt support. The ARM ARM's are generic they cover the whole family or a number of families of cores, where the TRM is very specific to one core. An example of confusing between the ARM ARM and the TRM is that looking at the ARM ARM you might get the impression that you can use BE-32 or BE-8 big endian modes, the reality is you have either one or the other ARMv6 and newer is BE-8, period, get used to it. ARMv5 and ARMv4 is BE-32 or before ARMv6 just called big endian. I highly recommend NOT using big endian on an arm despite what you think you might gain from it. go with the native mode and you will save yourself a ton of work and failure. I mention it from personal experience trying to figure out why the bits described in an ARM ARM just didnt work in the core I was using.
A 64 bit core is somewhere in the development phase, I wouldnt be surprised if it is done and just looking for someone to pull the trigger and use it. Actually the ARMv8 doc is available, downloading now.
Short answer infocenter.arm.com under ARM Architecture you find all the docs describing the different instruction sets as well as improvements/additions over time to those instruction sets.