I have a Solaris 11.2 server running ZFS with the following configuration;
6x 4TB HDDs in raidz2 (approx 14TB usable)
16GB RAM (ECC)
E5-2670 (16 cores)
No ARC or L2ARC
No zfs settings tweaks
Both read and write performance are blazing fast under both iozone
and real world usage, in excess of 700MB/sec
sequential.
However, metadata reads are painfully slow. For example, I have a mount which contains approx 130k files, running find
on this dir will take approx 5 minutes. As a workaround, I'm having to use find > files.txt
and then grep over that (it outputs a 15MB file).
I am under the impression that metadata should be stored in the LRU (in RAM) and that it will never be evicted from the LRU, thus find
should complete quickly. Or at least, that's the behaviour I would expect.
Are there any ZFS tweaks or other changes I could consider that would improve performance of tools which rely heavily on meta data (such as find
)? For example, could I force metadata to never be evicted from RAM?
Here are the ARC stats;
kstat -pn arcstats
zfs:0:arcstats:buf_size 79301640
zfs:0:arcstats:c 773376320
zfs:0:arcstats:c_max 16054530048
zfs:0:arcstats:c_min 67108864
zfs:0:arcstats:class misc
zfs:0:arcstats:crtime 120.399677
zfs:0:arcstats:data_size 431609856
zfs:0:arcstats:deleted 412089
zfs:0:arcstats:demand_data_hits 523939
zfs:0:arcstats:demand_data_misses 10351
zfs:0:arcstats:demand_metadata_hits 2334880
zfs:0:arcstats:demand_metadata_misses 423667
zfs:0:arcstats:evict_l2_cached 0
zfs:0:arcstats:evict_l2_eligible 0
zfs:0:arcstats:evict_l2_ineligible 37581707264
zfs:0:arcstats:evict_prefetch 10717167616
zfs:0:arcstats:evicted_mfu 512746496
zfs:0:arcstats:evicted_mru 37068960768
zfs:0:arcstats:hash_chain_max 8
zfs:0:arcstats:hash_chains 53675
zfs:0:arcstats:hash_collisions 367438
zfs:0:arcstats:hash_elements 220238
zfs:0:arcstats:hash_elements_max 329038
zfs:0:arcstats:hits 4152502
zfs:0:arcstats:l2_abort_lowmem 0
zfs:0:arcstats:l2_cksum_bad 0
zfs:0:arcstats:l2_feeds 0
zfs:0:arcstats:l2_hdr_size 0
zfs:0:arcstats:l2_hits 0
zfs:0:arcstats:l2_imports 0
zfs:0:arcstats:l2_io_error 0
zfs:0:arcstats:l2_misses 434018
zfs:0:arcstats:l2_persistence_hits 0
zfs:0:arcstats:l2_read_bytes 0
zfs:0:arcstats:l2_rw_clash 0
zfs:0:arcstats:l2_size 0
zfs:0:arcstats:l2_write_bytes 0
zfs:0:arcstats:l2_writes_done 0
zfs:0:arcstats:l2_writes_error 0
zfs:0:arcstats:l2_writes_sent 0
zfs:0:arcstats:memory_throttle_count 8
zfs:0:arcstats:meta_limit 0
zfs:0:arcstats:meta_max 406053176
zfs:0:arcstats:meta_used 340106848
zfs:0:arcstats:mfu_ghost_hits 25036
zfs:0:arcstats:mfu_hits 1983081
zfs:0:arcstats:misses 1096150
zfs:0:arcstats:mru_ghost_hits 6868
zfs:0:arcstats:mru_hits 1063491
zfs:0:arcstats:mutex_miss 14476
zfs:0:arcstats:other_size 254914952
zfs:0:arcstats:p 47875874
zfs:0:arcstats:prefetch_behind_prefetch 505065
zfs:0:arcstats:prefetch_data_hits 1277533
zfs:0:arcstats:prefetch_data_misses 578922
zfs:0:arcstats:prefetch_joins 5131
zfs:0:arcstats:prefetch_meta_size 5890256
zfs:0:arcstats:prefetch_metadata_hits 16150
zfs:0:arcstats:prefetch_metadata_misses 83210
zfs:0:arcstats:prefetch_reads 73618
zfs:0:arcstats:prefetch_size 11386880
zfs:0:arcstats:rawdata_size 0
zfs:0:arcstats:size 771716704
zfs:0:arcstats:snaptime 321256.284490098
echo ::memstat | mdb -k
Page Summary Pages Bytes %Tot
----------------- ---------------- ---------------- ----
Kernel 726109 2.7G 17%
Defdump prealloc 258925 1011.4M 6%
ZFS Metadata 110595 432.0M 3%
ZFS File Data 43060 168.2M 1%
Anon 44656 174.4M 1%
Exec and libs 1015 3.9M 0%
Page cache 5552 21.6M 0%
Free (cachelist) 5949 23.2M 0%
Free (freelist) 2583982 9.8G 62%
Total 4185804 15.9G
echo ::arc|mdb -k
size = 822 MB
target size (c) = 824 MB
target mru_size (p) = 57 MB
c_min = 64 MB
c_max = 15310 MB
buf_size = 75 MB
data_size = 487 MB
other_size = 256 MB
rawdata_size = 0 MB
meta_used = 335 MB
meta_max = 387 MB
meta_limit = 0 MB
memory_throttle_count = 8
arc_no_grow = 0
arc_tempreserve = 0 MB
mfu_size = 213 MB
mru_size = 205 MB
find
command that takes 5 minutes to run. – Pennafind > files.txt
- takes 5 minutes – Rosenfeldkstat -pn arcstats
into original post. – Rosenfeldfind ./
command twice, the first one takes >5 minutes, but the second one executes in a few seconds. But if I wait a few hours, it takes >5 minutes again. I can also seemisses
incrementing on the >5 minute run. So my best guess is that metadata is being evicted from the ARC. Is there any way to force metadata to take priority over all other cache data? – Rosenfeld/etc/system
. – Pennaprimarycache
property tometadata
, although that might kill other IO performance. You can try settingzfs_prefetch_disable
in/etc/system
The IO pattern on your system might be causing a lot of data that's never read getting put into the ARC. And what model disks are you using? – Penna