Finding LVM LV affected by corruption

I have an SSD that is regularly developing bad sectors. (I'd happily buy a Samsung Enterprise sieve.) In this case the bad sectors are occurring on read, rather than on write, and the data contained on them is lost. The system is rock solid with server-grade components, ECC memory and BTRFS, so I have no doubt the issue is with the SSD, and that data is being lost. The easiest method to detect these is a long SMART self test, but that leaves me with an LBA number that is tricky to correlate to actual data, due to LVM abstracting the partition layout.

This article will focus on identifying which LV of an LVM setup is affected by the corruption.

First, get your corrupt sector from smartctl/dmesg/etc. Multiply by 512 or 4096 (check your device's sector size in smartctl) to get a byte offset.

Next, run the following command:

pvdisplay --maps --units b /dev/sda1

(Replace /dev/sda1 with your PV)

Divide the byte offset you got earlier by the number stated next to PE Size.

Match it with the ranges stated in Physical extent 0 to x, and hey presto you've got your affected logical volume.

With some additional arithmetic you can probably figure out the offset relative to the LV, but for sanity sake at this point I tend to just run dd to /dev/null of the corrupt LV to get an offset (the point where it read errors), or scrub if it's BTRFS.

The process of linking a filesystem sector to a named file is left to the reader, as it differs between filesystems. This is a fantastic resource: https://wiki.archlinux.org/index.php/Identify_damaged_files

9th December 2020

Comments

haha yes me too says:

13th December 2020 at 18:30

haha yes me too

Dark's Code Dump

Finding LVM LV affected by corruption

Comments

Leave a Reply Cancel reply