Ceph Bluestore checksums: What's the word on bitrot?
Asked Answered
F

1

9

I'm getting ready to setup my first Ceph cluster (Luminous on Fedora) for production use, and thus far I've gone through the process of running a single OSD per node on a large ZFS pool so I have checksum-on-read bitrot protection with automatic repair (when possible).

The reason I've done this is because everything I've read is that Ceph doesn't really have bitrot protection in mind as one of its goals, including with Bluestore. Deep scrubbing works, but obviously has a heavy performance hit while running and more importantly, creates a window of time during which corrupt data can be read.

Today, though, I've read a few things about Bluestore around checksum-on-read that suggest I may have been incorrect. I cannot, however, find any documentation that seems to say authoritatively "this is what this does".

So hopefully this is a good outlet to ask: Can anybody speak with confidence on whether or not Bluestore provides bitrot detection and, with the help of other OSDs, automatic repair through its checksum mechanism?

Flouncing answered 13/12, 2017 at 18:44 Comment(0)
D
9

BlueStore very much has bitrot protection as one of its goals. It stores checksums for every block and validates them on reads. If they’re bad, it throws errors rather than returning known-bad data; that triggers the higher-level RADOS recovery mechanisms.

Disproportionation answered 18/12, 2017 at 8:48 Comment(1)
I did a bit of testing over the weekend based on this - zfsnas.com/2015/05/24/testing-bit-rot - adapted to Bluestore block devices. My findings were that it does indeed seem to not only do what you said, but also (as one would expect) rewrite the bad data with good data from another OSD. Thanks!Flouncing

© 2022 - 2024 — McMap. All rights reserved.