ARC metadata is evicting too early on Solaris ZFS
Asked Answered
R

0

7

I have a Solaris 11.2 server running ZFS with the following configuration;

 6x 4TB HDDs in raidz2 (approx 14TB usable)
 16GB RAM (ECC)
 E5-2670 (16 cores)
 No ARC or L2ARC
 No zfs settings tweaks

Both read and write performance are blazing fast under both iozone and real world usage, in excess of 700MB/sec sequential.

However, metadata reads are painfully slow. For example, I have a mount which contains approx 130k files, running find on this dir will take approx 5 minutes. As a workaround, I'm having to use find > files.txt and then grep over that (it outputs a 15MB file).

I am under the impression that metadata should be stored in the LRU (in RAM) and that it will never be evicted from the LRU, thus find should complete quickly. Or at least, that's the behaviour I would expect.

Are there any ZFS tweaks or other changes I could consider that would improve performance of tools which rely heavily on meta data (such as find)? For example, could I force metadata to never be evicted from RAM?

Here are the ARC stats;

kstat -pn arcstats

zfs:0:arcstats:buf_size 79301640
zfs:0:arcstats:c        773376320
zfs:0:arcstats:c_max    16054530048
zfs:0:arcstats:c_min    67108864
zfs:0:arcstats:class    misc
zfs:0:arcstats:crtime   120.399677
zfs:0:arcstats:data_size        431609856
zfs:0:arcstats:deleted  412089
zfs:0:arcstats:demand_data_hits 523939
zfs:0:arcstats:demand_data_misses       10351
zfs:0:arcstats:demand_metadata_hits     2334880
zfs:0:arcstats:demand_metadata_misses   423667
zfs:0:arcstats:evict_l2_cached  0
zfs:0:arcstats:evict_l2_eligible        0
zfs:0:arcstats:evict_l2_ineligible      37581707264
zfs:0:arcstats:evict_prefetch   10717167616
zfs:0:arcstats:evicted_mfu      512746496
zfs:0:arcstats:evicted_mru      37068960768
zfs:0:arcstats:hash_chain_max   8
zfs:0:arcstats:hash_chains      53675
zfs:0:arcstats:hash_collisions  367438
zfs:0:arcstats:hash_elements    220238
zfs:0:arcstats:hash_elements_max        329038
zfs:0:arcstats:hits     4152502
zfs:0:arcstats:l2_abort_lowmem  0
zfs:0:arcstats:l2_cksum_bad     0
zfs:0:arcstats:l2_feeds 0
zfs:0:arcstats:l2_hdr_size      0
zfs:0:arcstats:l2_hits  0
zfs:0:arcstats:l2_imports       0
zfs:0:arcstats:l2_io_error      0
zfs:0:arcstats:l2_misses        434018
zfs:0:arcstats:l2_persistence_hits      0
zfs:0:arcstats:l2_read_bytes    0
zfs:0:arcstats:l2_rw_clash      0
zfs:0:arcstats:l2_size  0
zfs:0:arcstats:l2_write_bytes   0
zfs:0:arcstats:l2_writes_done   0
zfs:0:arcstats:l2_writes_error  0
zfs:0:arcstats:l2_writes_sent   0
zfs:0:arcstats:memory_throttle_count    8
zfs:0:arcstats:meta_limit       0
zfs:0:arcstats:meta_max 406053176
zfs:0:arcstats:meta_used        340106848
zfs:0:arcstats:mfu_ghost_hits   25036
zfs:0:arcstats:mfu_hits 1983081
zfs:0:arcstats:misses   1096150
zfs:0:arcstats:mru_ghost_hits   6868
zfs:0:arcstats:mru_hits 1063491
zfs:0:arcstats:mutex_miss       14476
zfs:0:arcstats:other_size       254914952
zfs:0:arcstats:p        47875874
zfs:0:arcstats:prefetch_behind_prefetch 505065
zfs:0:arcstats:prefetch_data_hits       1277533
zfs:0:arcstats:prefetch_data_misses     578922
zfs:0:arcstats:prefetch_joins   5131
zfs:0:arcstats:prefetch_meta_size       5890256
zfs:0:arcstats:prefetch_metadata_hits   16150
zfs:0:arcstats:prefetch_metadata_misses 83210
zfs:0:arcstats:prefetch_reads   73618
zfs:0:arcstats:prefetch_size    11386880
zfs:0:arcstats:rawdata_size     0
zfs:0:arcstats:size     771716704
zfs:0:arcstats:snaptime 321256.284490098

echo ::memstat | mdb -k

Page Summary                 Pages             Bytes  %Tot
----------------- ----------------  ----------------  ----
Kernel                      726109              2.7G   17%
Defdump prealloc            258925           1011.4M    6%
ZFS Metadata                110595            432.0M    3%
ZFS File Data                43060            168.2M    1%
Anon                         44656            174.4M    1%
Exec and libs                 1015              3.9M    0%
Page cache                    5552             21.6M    0%
Free (cachelist)              5949             23.2M    0%
Free (freelist)            2583982              9.8G   62%
Total                      4185804             15.9G

echo ::arc|mdb -k

size                      =          822 MB
target size (c)           =          824 MB
target mru_size (p)       =           57 MB
c_min                     =           64 MB
c_max                     =        15310 MB
buf_size                  =           75 MB
data_size                 =          487 MB
other_size                =          256 MB
rawdata_size              =            0 MB
meta_used                 =          335 MB
meta_max                  =          387 MB
meta_limit                =            0 MB
memory_throttle_count     =            8
arc_no_grow               =            0
arc_tempreserve           =            0 MB
mfu_size                  =          213 MB
mru_size                  =          205 MB
Rosenfeld answered 25/2, 2018 at 13:25 Comment(11)
Post the find command that takes 5 minutes to run.Penna
find > files.txt - takes 5 minutesRosenfeld
The ARC is the RAM LRU cache for ZFS. Have you disabled it in some way? That’s where metadata would be cached.Jevon
@Jevon Not afaik, I've just pasted kstat -pn arcstats into original post.Rosenfeld
Okay so if I run the find ./ command twice, the first one takes >5 minutes, but the second one executes in a few seconds. But if I wait a few hours, it takes >5 minutes again. I can also see misses incrementing on the >5 minute run. So my best guess is that metadata is being evicted from the ARC. Is there any way to force metadata to take priority over all other cache data?Rosenfeld
Think I've found the problem, it seems that ARC is not using anywhere near the maximum amount of available memory and is artificially limiting to less than 1GB. But it's not clear whyRosenfeld
Post the contents of /etc/system.Penna
/etc/system is empty, aside from the comments starting with *Rosenfeld
This is the source code for how ZFS cleaned up metadata in the ARC back in the days of OpenSolaris: src.illumos.org/source/xref/illumos-gate/usr/src/uts/common/fs/… You can also just set the primarycache property to metadata, although that might kill other IO performance. You can try setting zfs_prefetch_disable in /etc/system The IO pattern on your system might be causing a lot of data that's never read getting put into the ARC. And what model disks are you using?Penna
Disks are all the same model, "ATA-HGST HDN724040AL-A5E0-3.64TB". Indeed changing primarycache=metadata did resolve the problem, but that's obviously now impacted other caching performance because only the metadata is being cached. Prefetch is something I'd like to keep enabled. The most ideal situation would be that metadata is given priority over all other data in cache, but I can't seem to find any config to make that happen.Rosenfeld
One option I'd considered was manually assigning cache limits so metadata is always kept in memory, but it seems almost crazy that this isn't done by default.Rosenfeld

© 2022 - 2024 — McMap. All rights reserved.