I believe the answer is dependent on the scenario.
Consider NN (neural network) as an operator F, so that F(input) = output. In the case where this relation is linear so that F(A * input) = A * output, then you might choose to either leave the input/output unnormalised in their raw forms, or normalise both to eliminate A. Obviously this linearity assumption is violated in classification tasks, or nearly any task that outputs a probability, where F(A * input) = 1 * output
In practice, normalisation allows non-fittable networks to be fittable, which is crucial to experimenters/programmers. Nevertheless, the precise impact of normalisation will depend not only on the network architecture/algorithm, but also on the statistical prior for the input and output.
What's more, NN is often implemented to solve very difficult problems in a black-box fashion, which means the underlying problem may have a very poor statistical formulation, making it hard to evaluate the impact of normalisation, causing the technical advantage (becoming fittable) to dominate over its impact on the statistics.
In statistical sense, normalisation removes variation that is believed to be non-causal in predicting the output, so as to prevent NN from learning this variation as a predictor (NN does not see this variation, hence cannot use it).