The link Chris gave gives some good methods for bit counting. I would suggest this method, as it is both very fast and doesn't require looping but only bit-wise operation, which would be easier to do in assembly.
Another way you might get the assembly code is to make the code in C, compile it and then look at the output assembly (most compiles can produce an assembly file output (-S for gcc, but make sure to disable optimization via -O0 to get easier to understand code), or allow you to view the binary file disassembled). It should point you in the right direction.
As an anecdote, I've done some testing a while back on PowerPC (not MIPS, I know...) for the quickest way to count bits in a 32 bit int. The method I linked was the best by far from all other methods, until I did a byte sized lookup table and addressed it 4 times. It would seem that the ALU is slower than referencing the cache (running around a million numbers through the algorithm).