There are a few different factors at play here that, I believe, make it possible for all of the statements you've made to be true.
To see why this is, let's start by thinking about how a decrease-key in a d-ary heap works (we don't need to talk about binary heaps separately, since a binary heap is just a 2-ary heap). When performing a decrease-key, we change the priority of a node in the tree, then repeatedly swap it up with its parent until it either hits the root of the tree or its priority ends up becoming smaller than the priority of its parent. The number of times we have to do a swap is, in the worst case, given by the height of the d-ary heap. Since the number of nodes in each layer of a d-ary heap grows exponentially by a factor of d at each step, the height of a d-ary heap is O(logd n) = O(log n / log d). This means that if you increase the value of d, the height of the d-ary heap will decrease, so decrease-keys and insertions will take less time. If you think about an extreme case, if you have a 10100-ary heap, the number of layers in the tree will be about 100 times fewer than in a binary heap, so a decrease-key or insert will be about 100 times faster.
On the other hand, think about how a dequeue operation would work. To perform a dequeue, we swap the last leaf for the root, then repeatedly do the following: we scan across all the children of the current node, and if any of them are smaller than the current node, we swap the current node with the smallest of its children. Each of these iterations will require O(d) total comparisons to find the smallest child, and the number of iterations is given by the number of layers in the tree, which we've seen earlier is O(log n / log d). This means that the cost of a dequeue in a d-ary heap is O(d log n / log d). Since d grows much faster than log d (exponentially faster, in fact), as we increase d, the asymptotic - and actual - cost of a dequeue starts to rise. For example, in a 10100-ary heap, you might have to compare each node to 10100 children at each step, which is going to take a really long time! Therefore, d-ary heaps, as d gets larger and larger, tend to have much slower dequeues than binary heaps.
Now on to your last question: how is it still possible that a 4-ary heap would outperform a binary heap given the information here? I'm going to be perfectly honest and say that I have no idea if this is true, but that it (a) probably depends on the hardware and (b) wouldn't surprise me. Keep in mind that all of the preceding analyses tried to bound the cost of the d-ary heap operations by looking at quantities like the number of layers in the heap and the number of swaps made. However, this leaves out a lot of other factors, such as the cost of finding parents and children and locality of reference. For the first of these, note that in a d-ary heap, you can find your parent node by dividing your index by d. For d's that are perfect powers of two, this can be implemented with a simple, inexpensive bitshift (since n / 2k = n >> k). For odd numbers or numbers that aren't powers of two, this requires a division, which (in some architectures) is more expensive than a bit shift. Additionally, there's the impact of locality of reference. Computers these days have a huge number of layers of caches in memory, and the cost of accessing memory that is in cache can be hundreds or thousands of times faster than the cost of accessing memory not in the cache. As you increase the value of d in a d-ary heap, there are fewer layers in the tree and the elements accessed are closer together, giving better locality. Finding the sweet spot probably requires some experimentation, and if it happens to be that d = 4 is the best on your machine, then go for it!
EDIT: as @moreON pointed out, for d = 4, the number of layers in the heap goes down by a factor of two and the number of comparisons per later goes up by a factor of two, which may actually give better overall performance due to cache effects and a lower overall tree height. Therefore, it's probably a good candidate to outperform a binary heap.
Hope this helps!