I have an SSD that is regularly developing bad sectors. (I'd happily buy a Samsung Enterprise sieve.) In this case the bad sectors are occurring on read, rather than on write, and the data contained on them is lost. The system is rock solid with server-grade components, ECC memory and BTRFS, so I have no doubt the issue is with the SSD, and that data is being lost. The easiest method to detect these is a long SMART self test, but that leaves me with an LBA number that is tricky to correlate to actual data, due to LVM abstracting the partition layout.
This article will focus on identifying which LV of an LVM setup is affected by the corruption.
First, get your corrupt sector from smartctl
/dmesg
/etc. Multiply by 512
or 4096
(check your device's sector size in smartctl) to get a byte offset.
Next, run the following command:
pvdisplay --maps --units b /dev/sda1
(Replace /dev/sda1
with your PV)
Divide the byte offset you got earlier by the number stated next to PE Size
.
Match it with the ranges stated in Physical extent 0 to x
, and hey presto you've got your affected logical volume.
With some additional arithmetic you can probably figure out the offset relative to the LV, but for sanity sake at this point I tend to just run dd to /dev/null of the corrupt LV to get an offset (the point where it read errors), or scrub if it's BTRFS.
The process of linking a filesystem sector to a named file is left to the reader, as it differs between filesystems. This is a fantastic resource: https://wiki.archlinux.org/index.php/Identify_damaged_files
Comments
haha yes me too