The reason is that clipping in general is a pretty hard constraint in a mathematical sense, not in a sense of implementation complexity. If you check original WGAN paper, you'll notice that clip procedure inputs model's weights and some hyperparameter c, which controls range for clipping.
If c is small then weights would be severely clipped to a tiny values range. The question is how to determine an appropriate c value. It depends on your model, dataset in a question, training procedure and so on and so forth. So why not to try soft penalizing instead of hard clipping? That's why WGAN-GP paper introduces additional constraint to a loss function that forces gradient's norm to be as much close to 1 as possible, avoiding hard collapsing to a predefined values.