I am very confused and could not find a convincing answer on the internet to the following question regarding the data preprocessing clustering.
According to Python documentation, when we do preprocessing using the built-in command in sckit learn library given the data is formulated as N x D
matrix where rows are the samples and columns are the features, we make the mean across the rows to be zero and at the same time standard deviation across rows are unity like the following:
X_scaled.mean(axis=0)
array([ 0., 0., 0.])
X_scaled.std(axis=0)
array([ 1., 1., 1.])
My question is shouldn't we make the mean across the column (features instead of samples) to be zero and the same thing for standard deviation since we are trying to standardize the features not the samples. Websites and other resources always standardize across rows but they never explain why?