I think the main intuitive point is the one mentioned in the blog post you already linked,
How do two random sets have a RI that is close to 1? The reason is due to the number of clusters. When there are a lot of clusters, there's a higher chance that a pair of items in both sets are in different clusters. This is still counted as a concordant event in the RI.
RI counts it as a "success" if a pair of elements are either both in the same respective cluster of each partition, or if both are in different respective clusters of each partition.
This notion of "success" can be adversely affected by random chance just by increasing the number of clusters in the partition. For example, imagine a data set with 100 examples. The partition X will divide it into 100 different subsets, each with 1 data point. The partition Y will divide it into 99 subsets, 98 with one data point each and 1 with two data points.
Regular RI would look almost perfect for this case, because for any two points chosen at random, they are definitely in two different subsets in X, and the only way they are not in two different subsets in Y is the unlikely chance that we drew the two items from the special 99th subset that contains two items. So RI will be very close to 1 (and if we make the data set larger than 100, we could make it arbitrarily close to 1).
But for ARI, all of the n_ij
terms in the contingency table will be 1 or 0 by definition, which means the numerator has to be negative, indicating a bad cluster similarity (which is basically driven by the fact that the only 'information' these partitions carry is the one subset of Y that has two data points ... so if X doesn't reproduce that, it is in some sense significantly bad at reproducing the relationships indicated by Y).
You can make this thought experiment more complicated by thinking about X as 50 different sets of two-element pairs, and Y as a different collections of 50 different sets of two-element pairs. Then again, RI can look good just by random chance, because most of the time elements will randomly both not belong to the same two-element subset. It would only be penalized for the pairs that actually do belong together in either X or Y (100 possible pairs), whereas for the other (100 choose 2) - 100
remaining pairs, RI will mark them as successfully placed into different groups in both X and Y. Again, just making the data set larger would improve RI more and more.