Here's an argument for why it's O(N).
Assume it's a full heap, so every non-leaf node has two children. (It still works even if that's not the case, but it's more annoying.)
Put a coin on each node in the tree. Each time we do a swap, we're going to spend one of those coins. (Note that when elements swap in the heap, the coins don't swap with them.) If we run MAX-HEAPIFY, and there's any coins left over, that means we've done fewer swaps than there are nodes in the tree, and thus MAX-HEAPIFY performs O(N) swaps.
Claim: after MAX-HEAPIFY is done running, a heap will always have at least one path from the root to a leaf with coins on every node of the path.
Proof by induction: For a single-node heap, we don't need to do any swaps, so we don't need to spend any coins. Thus, the one node gets to keep its coin, and we have a full path from root to leaf (of length 1) with coin intact.
Now, assume we have a heap with left and right subheaps, and MAX-HEAPIFY has already run on both. By inductive hypothesis, each has at least one path from root to leaf with coins on it, so we have at least two root-to-leaf paths with coins, one for each child. The farthest the root would ever need to go in order to establish the MAX-HEAP property is to swap all the way to the bottom of the tree. Let's say it swaps down into the left subtree, and it swaps all the way to down to the bottom. For each swap, we need to spend the coin, so we spend it from the node that the root swapped to.
In doing this, we spent all the coins on one of the root-to-leaf paths, but remember we originally had at least two! Therefore, we still have a root-to-leaf path complete with coins after MAX-HEAPIFY runs on the whole heap. Therefore, MAX-HEAPIFY spent fewer coins than there are nodes in the tree. Therefore, the number of swaps is O(N). QED.