For armv7 ISA (and variants)
The NEON is a SIMD and parallel data processing unit for integer and floating point data and the VFP is a fully IEEE-754 compatible floating point unit. In particular on the A8, the NEON unit is much faster for just about everything, even if you don't have highly parallel data, since the VFP is non-pipelined.
So why would you ever use the VFP?!
The most major difference is that the VFP provides double precision floating point.
Secondly, there are some specialized instructions that that VFP offers that there are no equivalent implementations for in the NEON unit. SQRT comes to mind, perhaps some type conversions.
But the most important difference not mentioned in Cosmin's answer is that the NEON floating point pipeline is not entirely IEEE-754 compliant. The best description of the differences are in the FPSCR Register Description.
Because it is not IEEE-754 compliant, a compiler cannot generate these instructions unless you tell the compiler that you are not interested in full compliance. This can be done in several ways.
- Using an intrinsic function to force NEON usage, for example see the GCC Neon Intrinsic Function List.
- Ask the compiler, very nicely. Even newer GCC versions with
-mfpu=neon
will not generate floating point NEON instructions unless you also specify -funsafe-math-optimizations
.
For armv8+ ISA (and variants) [Update]
NEON is now fully IEE-754 compliant, and from a programmer (and compiler's) point of view, there is actually not too much difference. Double precision has been vectorized. From a micro-architecture point of view I kind of doubt they are even different hardware units. ARM does document scalar and vector instructions separately but both are part of "Advanced SIMD."