Multilabel classification typically means "many binary labels". With that definition in mind, cross entropy with softmax is not appropriate for multilabel classification. The document in the second link you provide talks about multiclass problems not multilabel problems. Cross entropy with softmax is appropriate for multiclass classification. For multilabel classification a common choice is to use the sum of binary cross entropies of each labels. The binary cross entropy can be computed with Logistic
in Brainscript or with binary_cross_entropy
in Python.
If on the other hand you are a problem with many multiclass labels, then you can use cross_entropy_with_softmax for each of them and CNTK will automatically sum all these loss values.