I just started to use SS2 optimization of image processing, but for the 3 channel 24 bit color images have no idea. My pix data arranged by BGR BGR BGR ... ,unsigned char 8-bi, so if I want to implement the Color2Gray with SSE2/SSE3/SSE4's instruction C/C++ fun ,how would I do? Does need to align(4/8/16) for my pix data? I have read article:http://supercomputingblog.com/windows/image-processing-with-sse/ But it is ARGB 4 channel 32-bit color,exactly process 4 color pix data every time. Thanks!
//Assume the original pixel:
unsigned char* pDataColor=(unsigned char*)malloc(src.width*src.height*3);//3
//init pDataColor every pix val
// The dst pixel:
unsigned char* pDataGray=(unsigned char*)malloc(src.width*src.height*1);//1
//RGB->Gray: Y=0.212671*R + 0.715160*G + 0.072169*B