Context We have a homegrown filesystem-backed caching library. We currently have performance problems with one installation due to large number of entries (e.g. up to 100,000). The problem: we store all fs entries in one "cache directory". Very large directories perform poorly.
We're looking at spreading those entries over subdirectories--as git does, e.g. 100 subdirectories with ~ 1,000 entries each.
The question
I understand that smaller directories sizes will help with filesystem access.
But will "spreading into subdirectories" speed up traversing all entries, e.g. enumerating/reading all 100,000 entries? I.e. When we initialize/warm the cache from the FS store, we need to traversing all 100,000 entries (and deleting old entries) can take 10+ minutes.
Will "spreading the data" decrease this "traversal time". Additionally this "traversal" actually can/does delete stale entries (e.g older then N days) Will "spreading the data" improve delete times?
Additional Context -NTFS -Windows Family OS (Server 2003, 2008)
-Java J2ee application.
I/we would appreciate any schooling on filesystem scalability issues.
Thanks in advance.
will
p.s. I should comment that I have the tools and ability to test this myself, but figured I'd pick the hive mind for the theory and experience first.