"Strong regularization such as maxout or dropout is applied to obtain the best results on this dataset. In this paper, we use no maxout/dropout and just simply impose regularization via deep and thin architectures by design, without distracting from the focus on the difficulties of optimization. But combining with stronger regularization may improve results, which we will study in the future." [He et. al, Deep Residual Learning for Image Recognition]
I think the regularization the authors refer to which is being applied directly within the RESNET architecture comes from the batch norm layers that are sandwiched between every conv layer and every activation. While the authors don't say anything about the use of L2 regularization, their statement about maxout and dropout ought apply. BN layers have the effect of regularizing the network without imposing an explicit penalty, so L2 regularization isn't necessary.
That said, the option is there in case you want to try out stronger regularization.