These are two different intrinsic names for the same machine instruction, thanks to Intel and AMD. The instruction is the same on all CPUs that support it, and the different intrinsics also have no difference in C or C++.
The __popcnt*() builtins are for AMD's Advanced Bit Manipulation (ABM) instructions. See http://blogs.amd.com/developer/2007/09/26/barcelona-processor-feature-advanced-bit-manipulation-abm/
The _mm_popcnt_u*() intrinsics are for Intel's implementation, which aren't part of SSE4.2 per se, but were implemented around the same time. See http://en.wikipedia.org/wiki/SSE4#POPCNT_and_LZCNT
According to https://www.chessprogramming.org/Population_Count , both implementations are binary compatible, in spite of their different intrinsic names.
Intel's architecture manual states that:
Before an application attempts to use the POPCNT instruction, it must check that the
processor supports SSE4.2 (if CPUID.01H:ECX.SSE4_2[bit 20] = 1) and POPCNT (if
CPUID.01H:ECX.POPCNT[bit 23] = 1).
AMD's AMD64 Architecture Programmer's Manual Volume 3: General Purpose and System Instructions says
Support for the POPCNT instruction is indicated by ECX bit 23 (POPCNT) as returned by CPUID
function 0000_0001h. Software MUST check the CPUID bit once per program or library initialization
before using the POPCNT instruction, or inconsistent behavior may result.
I can't see any reason why popcnt would require the presence of SSE4.2, so I think that checking bit 23 of ECX is sufficient to determine popcnt's presence.
AMD's Barcelona, the first AMD CPU to have popcnt, didn't fully implement SSE4, so it's possible that Intel's architecture manual suggests a method for determine presence which will work on Intel CPUs and fail on even qualified AMD CPUs.
Intel's current documentation for popcnt
in their vol.2 instruction-set reference manual only says #UD If CPUID.01H:ECX.POPCNT [Bit 23] = 0
so the anti-competitive suggestion that would lead to software not taking advantage of popcnt
on some AMD CPUs without SSE4.2 is gone.