The history of cache line sizes is a bit convoluted (as with many microarchitectural parameters). Originally, the cache line size was made to match the bus size of the processor. The thinking was that if a read or write was done on the bus, it might as well fill the data bus.
As caches got bigger, the sizes of cache lines increased for a few reasons:
- Take advantage of locality in certain cases.
- Indexing overhead can be kept low <--- this one is actually pretty important.
The larger the cacheline size, the fewer lines you need to keep track of inside the cache for an equivalently sized cache. For larger caches (multi-MB) this can reduce the lookup/compare times.
There are also some performance advantages (depending on the workload) to a larger cacheline size. But it's not entirely clear (let's take Spec2k17) that it's always a win. Sometimes a larger cacheline size introduces more waste since the program has low spacial locality.
Note that you don't need to have a single cache line size for all levels of cache. You can have 32B cache lines for the L1. 64B for the L2 and 128B for the L3/LLC if you wanted to. It's more work to keep track of partial lines but lets you utilize each level of cache effectively.