前段时间一个朋友慌张的向我求助,他在重装系统的时候,初始化了存储上的两个LVM设备,导致数据丢失。
“案发现场”:
两台服务器做HA,通过光纤连在一个共享存储上,因为其中一台HA节点出现了问题,需要重装系统,在重装系统的过程中,由于没有拔光纤线,在初始化磁盘的时候,误将存储上的两个分区初始化了,但是系统还是正常的装完了,当时他并没有发现什么问题,因为另一台服务器还在正常运行,被初始化的这两个LVM设备在另一台服务器上还能正常访问,但是两个多小时后,服务器崩溃了,等服务器再起来时,这才发现两个LVM设备已经没了,数据也丢了。
问题分析:
初始化LVM时,只是清除了磁盘头部的lvm header信息,磁盘上的数据没有被清除,所以只需要根据备份的逻辑卷的Metadata信息把lvm header重新写到被初始化的磁盘头部就行,好在当时的服务器系统都有做备份,可以找到lvm Metadata信息。
当创建vg的时候,系统默认会自动备份逻辑卷Metadata信息到/etc/lvm/backup,当前逻辑卷metadata信息放在/etc/lvm/archive下面,保险起见,最好定期备份/etc/lvm这个目录下的文件到别的地方。
下面用虚拟机模拟LVM恢复
创建逻辑卷:
点击(此处)折叠或打开
- [root@oracle ~]# pvcreate /dev/sdb1
- Physical volume "/dev/sdb1" successfully created
- [root@oracle ~]# vgcreate lanv /dev/sdb1
- Volume group "lanv" successfully created
- [root@oracle ~]# lvcreate -n lgl -L 500M lanv
- Logical volume "lgl" created
- [root@oracle lgl]# pvs
- PV VG Fmt Attr PSize PFree
- /dev/sdb1 lanv lvm2 a- 1016.00M 516.00M
- [root@oracle lgl]# lvs
- LV VG Attr LSize Origin Snap% Move Log Copy% Convert
- lgl lanv -wi-ao 500.00M
- [root@oracle lgl]# vgs
- VG #PV #LV #SN Attr VSize VFree
- lanv 1 1 0 wz--n- 1016.00M 516.00M
- [root@oracle ~]# df -h /lanv/lgl
- Filesystem Size Used Avail Use% Mounted on
- /dev/mapper/lanv-lgl 485M 11M 449M 3% /lanv/lgl
- [root@oracle ~]# cat /etc/fstab
- /dev/lanv/lgl /lanv/lgl ext3 defaults 0 0
- [root@oracle lgl]# pwd
- /lanv/lgl
- [root@oracle lgl]# echo "Hello World" >lgl.txt
- [root@oracle lgl]# cat lgl.txt
- Hello World
删除lvm信息:
点击(此处)折叠或打开
- [root@oracle lgl]# fdisk /dev/sdb
- Command (m for help): p
- Disk /dev/sdb: 1073 MB, 1073741824 bytes
- 255 heads, 63 sectors/track, 130 cylinders
- Units = cylinders of 16065 * 512 = 8225280 bytes
- Device Boot Start End Blocks Id System
- /dev/sdb1 1 130 1044193+ 83 Linux
- Command (m for help): d
- Selected partition 1
- Command (m for help): p
- Disk /dev/sdb: 1073 MB, 1073741824 bytes
- 255 heads, 63 sectors/track, 130 cylinders
- Units = cylinders of 16065 * 512 = 8225280 bytes
- Device Boot Start End Blocks Id System
- Command (m for help): w
- The partition table has been altered!
- Calling ioctl() to re-read partition table.
- WARNING: Re-reading the partition table failed with error 16: Device or resource busy.
- The kernel still uses the old table.
- The new table will be used at the next reboot.
- Syncing disks.
这个时候lvm设备仍然能访问,能正常读写,过一段时间后会报错,系统崩溃。
重启系统报错:
点击(此处)折叠或打开
- [root@oracle backup]# fdisk /dev/sdb
- Command (m for help): p
- Disk /dev/sdb: 1073 MB, 1073741824 bytes
- 255 heads, 63 sectors/track, 130 cylinders
- Units = cylinders of 16065 * 512 = 8225280 bytes
- Device Boot Start End Blocks Id System
- Command (m for help): n
- Command action
- e extended
- p primary partition (1-4)
- p
- Partition number (1-4): 1
- First cylinder (1-130, default 1):
- Using default value 1
- Last cylinder or +size or +sizeM or +sizeK (1-130, default 130):
- Using default value 130
- Command (m for help): w
- The partition table has been altered!
- Calling ioctl() to re-read partition table.
- Syncing disks.
- [root@oracle backup]# fdisk -l
- Disk /dev/sda: 32.2 GB, 32212254720 bytes
- 255 heads, 63 sectors/track, 3916 cylinders
- Units = cylinders of 16065 * 512 = 8225280 bytes
- Device Boot Start End Blocks Id System
- /dev/sda1 * 1 3655 29358756 83 Linux
- /dev/sda2 3656 3916 2096482+ 82 Linux swap / Solaris
- Disk /dev/sdb: 1073 MB, 1073741824 bytes
- 255 heads, 63 sectors/track, 130 cylinders
- Units = cylinders of 16065 * 512 = 8225280 bytes
- Device Boot Start End Blocks Id System
- /dev/sdb1 1 130 1044193+ 83 Linux
现在可以用vgcfgrestore 命令根据备份的Metadata信息恢复lvm
先查看下备份的Metadata信息
点击(此处)折叠或打开
- [root@oracle backup]# more lanv
- # Generated by LVM2 version 2.02.46-RHEL5 (2009-06-18): Sun Jul 8 19:44:21 2012
- contents = "Text Format Volume Group"
- version = 1
- description = "Created *after* executing 'vgs'"
- creation_host = "oracle" # Linux oracle 2.6.18-164.el5 #1 SMP Tue Aug 18 15:51:48 EDT 2009 x86_64
- creation_time = 1341747861 # Sun Jul 8 19:44:21 2012
- lanv {
- id = "hlt8bK-xsoA-Z3GJ-iosl-7cwk-e5b1-TUx3OB"
- seqno = 4
- status = ["RESIZEABLE", "READ", "WRITE"]
- flags = []
- extent_size = 8192 # 4 Megabytes
- max_lv = 0
- max_pv = 0
- physical_volumes {
- pv0 {
- id = "qm6Oo3-92NX-E0ca-e0GV-2eA6-q2Jm-0CzGre"
- device = "/dev/sdb" # Hint only
- status = ["ALLOCATABLE"]
- flags = []
- dev_size = 2088387 # 1019.72 Megabytes
- pe_start = 384
- pe_count = 254 # 1016 Megabytes
- }
- }
- logical_volumes {
- lgl {
- id = "ifkQpI-bf7f-6SvS-8V9J-HVmO-wAZj-0jWY4B"
- status = ["READ", "WRITE", "VISIBLE"]
- flags = []
- segment_count = 1
- segment1 {
- start_extent = 0
- extent_count = 125 # 500 Megabytes
- type = "striped"
- stripe_count = 1 # linear
- stripes = [
- "pv0", 0
- ]
- }
- }
- }
- }
vgcfgrestore 命令默认从/etc/lvm/backup查找Metadata信息来恢复lvm
点击(此处)折叠或打开
- [root@oracle ~]# vgcfgrestore lanv
- Couldn't find device with uuid 'qm6Oo3-92NX-E0ca-e0GV-2eA6-q2Jm-0CzGre'.
- Cannot restore Volume Group lanv with 1 PVs marked as missing.
- Restore failed.
根据错误提示,重新创建pv
点击(此处)折叠或打开
- [root@oracle backup]# pvcreate --uuid qm6Oo3-92NX-E0ca-e0GV-2eA6-q2Jm-0CzGre /dev/sdb1
- Can't initialize physical volume "/dev/sdb1" of volume group "lanv" without -ff
- [root@oracle backup]# pvcreate --uuid qm6Oo3-92NX-E0ca-e0GV-2eA6-q2Jm-0CzGre /dev/sdb1 -ff
- Really INITIALIZE physical volume "/dev/sdb1" of volume group "lanv" [y/n]? y
- WARNING: Forcing physical volume creation on /dev/sdb1 of volume group "lanv"
- Physical volume "/dev/sdb1" successfully created
点击(此处)折叠或打开
- [root@oracle backup]# pvs
- PV VG Fmt Attr PSize PFree
- /dev/sdb1 lvm2 -- 1019.72M 1019.72M
现在再用vgcfgrestore 命令恢复
点击(此处)折叠或打开
- [root@oracle backup]# vgcfgrestore lanv
- Restored volume group lanv
- [root@oracle backup]# vgs
- VG #PV #LV #SN Attr VSize VFree
- lanv 1 1 0 wz--n- 1016.00M 516.00M
- [root@oracle backup]# lvs
- LV VG Attr LSize Origin Snap% Move Log Copy% Convert
- lgl lanv -wi--- 500.00M
- [root@oracle backup]# vgchange -ay lanv
- 1 logical volume(s) in volume group "lanv" now active
查看恢复结果
点击(此处)折叠或打开
- [root@oracle backup]# lvs
- LV VG Attr LSize Origin Snap% Move Log Copy% Convert
- lgl lanv -wi-a- 500.00M
- [root@oracle backup]# mount /dev/lanv/lgl /lanv/lgl/
- [root@oracle backup]# df -h /lanv/lgl/
- Filesystem Size Used Avail Use% Mounted on
- /dev/mapper/lanv-lgl 485M 11M 449M 3% /lanv/lgl
- [root@oracle backup]# cd /lanv/lgl/
- [root@oracle lgl]# ls
- lgl.txt
- [root@oracle lgl]# cat lgl.txt
- Hello World
到此lvm已经恢复,里面的数据完好无损
恢复步奏其实不难,就两条命令,但是在生产环境应该尽量避免发生这种情况,比如在装系统时,应该先把外部存储设备都拔了,免得不小心删除了数据。