How to fix a MarkLogic "File too large" forest merge error?
Asked Answered
B

1

6

I'm running MarkLogic version 8.0-6.1.

The host OS is Red Hat Enterprise Linux Server release 6.8 (Santiago).

The data is stored on a local disk that has 90% free space.

The server runs fairly well but it throws the following error sporadically.

SVC-FILWRT: File write error: write '/var/opt/MarkLogic/Forests/clickstream-1/0000008a/ListData': File too large

Any thoughts on the root cause and possible fix?

Berkow answered 1/2, 2017 at 18:11 Comment(5)
What kind of file? Binary, text, XML, JSON? How large is the file?Theosophy
The DB uses one forest. The one forest has 4 million XML files. The average size of each XML file is 3 KB.Berkow
I believe the ListData file is essentially the goodies that make up the universal index. So what looks 'sporadic' could be related directly to re-indexing operations. RHEL 6x EXT4 has a single file limit of 16TB so an actual issue with the filesize of ListData itself seems quite unlikey.Theosophy
How large is the ListData file, how large the disk space used by the entire forest, how many stands are there, how many deleted fragments, and last but not least, what is the merge max size setting?Giustino
I was storing 4 million docs in a single forest. I have since added a new forest which seems to have resolved the problem. I don't have the exact values for the ListData now since there are 2 forests. However, there are currently 3 stands per forest. The original forest still has large ListData and TreeData files. The largest ListData file is 2.8 GB. The largest TreeData file is 3.7 GB. The deleted fragments on the original forest is 570,181. The Merge Max setting is the default value of 32768 MB.Berkow
G
1

Stands should normally not get that big. I can imagine two cases how they could occur, though not 100% certain they are true:

  • You have upgraded a large database with a low number of forests from a version before merge max size was introduced, preventing MarkLogic from purging the deleted fragments straight-away

  • You have ran some large transactions, causing in-memory stands to exceed the merge max size, resulting in a similar situation once persisted to disk

This doesn't have to be a bad thing, unless you hit a file write error though, of course. Deleted fragments in such large stands may linger longer than usual, but if sufficient fragments get deleted, MarkLogic will eventually merge them out anyway.

If you like to get rid of the large stands sooner, you could try putting the old forest into delete-only mode, forcing new updates to move elsewhere, and then 'touching' all documents inside that forest, to get them migrated to one of the other forests. Once that forest only contains deleted fragments, you then simply take that forest out (unassign it from the db), and delete it. After that you could potentially recreate it, and assign the empty forest to the database again. It might trigger a rebalance, but that should setting down eventually, with more evenly balanced stands across all forests of your database.

Anyway, it is probably wise to use more than one forest from the start if you anticipate certain growth, or large transactions.

For those who would like to dive deeper into the technical side, I'd recommend reading the Inside MarkLogic paper:

https://developer.marklogic.com/inside-marklogic

The Data Management section in particular is relevant to databases, forests, and stands.

HTH!

Giustino answered 25/5, 2018 at 19:19 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.