This does not sound to be stack specific, but alignment in general. Perhaps think of the term integer multiple.
If you have items in memory that are a byte in size, units of 1, then lets just say they are all aligned. Things that are two bytes in size, then integers times 2 will be aligned, 0, 2, 4, 6, 8, etc. And non-integer multiples, 1, 3, 5, 7 will not be aligned. Items that are 4 bytes in size, integer multiples 0, 4, 8, 12, etc are aligned, 1,2,3,5,6,7, etc are not. Same goes for 8, 0,8,16,24 and 16 16,32,48,64, and so on.
What this means is you can look at the base address for the item and determine if it is aligned.
size in bytes, address in the form of
1, xxxxxxx
2, xxxxxx0
4, xxxxx00
8, xxxx000
16,xxx0000
32,xx00000
64,x000000
and so on
In the case of a compiler mixing in data with instructions in the .text segment it is fairly straightforward to align data as needed (well, depends on the architecture). But the stack is a runtime thing, the compiler cannot normally determine where the stack will be at run time. So at runtime if you have local variables that need to be aligned you would need to have the code adjust the stack programmatically.
Say for example you have two 8 byte items on the stack, 16 total bytes, and you really want them aligned (on 8 byte boundaries). On entry the function would subtract 16 from the stack pointer as usual to make room for these two items. But to align them there would need to be more code. If we wanted these two 8 byte items aligned on 8 byte boundaries and the stack pointer after subtracting 16 was 0xFF82, well the lower 3 bits are not 0 so it is not aligned. The lower three bits are 0b010. In a generic sense we want to subtract 2 from the 0xFF82 to get 0xFF80. How we determine it is a 2 would be by anding with 0b111 (0x7) and subtracting that amount. That means to alu operations an and and a subtract. But we can take a shortcut if we and with the ones complement value of 0x7 (~0x7 = 0xFFFF...FFF8) we get 0xFF80 using one alu operation (so long as the compiler and processor have a single opcode way to do that, if not it may cost you more than the and and subtract).
This appears to be what your program was doing. Anding with -16 is the same as anding with 0xFFFF....FFF0, resulting in an address that is aligned on a 16 byte boundary.
So to wrap this up, if you have something like a typical stack pointer that works its way down memory from higher addresses to lower addresses, then you want to
sp = sp & (~(n-1))
where n is the number of bytes to align (must be powers but that is okay most alignment usually involves powers of two). If you have say done a malloc (addresses increase from low to high) and want to align the address of something (remember to malloc more than you need by at least the alignment size) then
if(ptr&(~(n-)) { ptr = (ptr+n)&(~(n-1)); }
Or if you want just take the if out there and perform the add and mask every time.
many/most non-x86 architectures have alignment rules and requirements. x86 is overly flexible as far as the instruction set goes, but as far as execution goes you can/will pay a penalty for unaligned accesses on an x86, so even though you can do it you should strive to stay aligned as you would with any other architecture. Perhaps that is what this code was doing.