某公有云平台虚机热迁移失败问题诊断过程

3830阅读 0评论2014-10-22 sak0
分类:云计算

好久不处理线上问题了,昨天处理一例平台服务器割接过程中KVM虚拟机热迁移问题,留记录

定位过程:

1.热迁移虚机失败,查看openstack日志


  1. Warning: option deprecated, use lost_tick_policy property of kvm-pit instead.
  2. char device redirected to /dev/pts/17 (label charserial1)
  3. Unknown savevm section or instance '0000:06.0/virtio-blk' 0
  4. load of migration failed

2.查看接收端宿主机libvirt日志目录下记录的对应KVM虚机日志,读取到不能识别的virtio-blk设备信息

3.因为迁移失败,虚机仍停留在发起端,通过libvirt调用qemu命令行查看虚机设备信息


块设备数据记录中,已经没有virtio-disk1设备


问题分析:
至此问题已清楚,在libvirt层面和qemu-block记录层面,virtio-disk1设备已卸载,在qemu-pci记录里仍存在,
通常,这种情况会发生在这里:

点击(此处)折叠或打开

  1. blockdev.c

  2. int do_drive_del(Monitor *mon, const QDict *qdict, QObject **ret_data)
  3. {
  4.     const char *id = qdict_get_str(qdict, "id");
  5.     BlockDriverState *bs;

  6.     bs = bdrv_find(id);
  7.     if (!bs) {
  8.         qerror_report(QERR_DEVICE_NOT_FOUND, id);
  9.         return -1;
  10.     }
  11.     if (bdrv_in_use(bs)) {
  12.         qerror_report(QERR_DEVICE_IN_USE, id);
  13.         return -1;
  14.     }

  15.     /* quiesce block driver; prevent further io */
  16.     bdrv_drain_all();
  17.     bdrv_flush(bs);
  18.     bdrv_close(bs);

  19.     /* if we have a device attached to this BlockDriverState
  20.      * then we need to make the drive anonymous until the device
  21.      * can be removed. If this is a drive with no device backing
  22.      * then we can just get rid of the block driver state right here.
  23.      */
  24.     if (bdrv_get_attached_dev(bs)) {
  25.         bdrv_make_anon(bs);

  26.         /* Further I/O must not pause the guest */
  27.         bdrv_set_on_error(bs, BLOCKDEV_ON_ERROR_REPORT,
  28.                           BLOCKDEV_ON_ERROR_REPORT);
  29.     } else {
  30.         drive_uninit(drive_get_by_blockdev(bs));
  31. }
个人认为原因是libvirt连续删除设备的前端与后端资源,应该异步等待前端设备删除的callback,再解除后端资源(以后另记录对此问题的分析),以下是前端设备卸载后,qemu处理guestos中断的流程

点击(此处)折叠或打开

  1. #0 release_drive (obj=0x7ffffffd41d8, name=0x7ffff8bcb5b0 "drive", opaque=0x7ffff82cc3e0) at hw/core/qdev-properties-system.c:85
  2. #1 0x00007ffff7df8828 in object_property_del_all (obj=0x7ffffffd41d8) at qom/object.c:367
  3. #2 0x00007ffff7df8a96 in object_finalize (data=0x7ffffffd41d8) at qom/object.c:422
  4. #3 0x00007ffff7df95f1 in object_unref (obj=0x7ffffffd41d8) at qom/object.c:729
  5. #4 0x00007ffff7df89ac in object_unparent (obj=0x7ffffffd41d8) at qom/object.c:402
  6. #5 0x00007ffff7ce84d1 in bus_unparent (obj=0x7ffffffd4160) at hw/core/qdev.c:548
  7. #6 0x00007ffff7df896b in object_unparent (obj=0x7ffffffd4160) at qom/object.c:396
  8. #7 0x00007ffff7ce9c1d in device_unparent (obj=0x7ffffffd3800) at hw/core/qdev.c:1010
  9. #8 0x00007ffff7df896b in object_unparent (obj=0x7ffffffd3800) at qom/object.c:396
  10. #9 0x00007ffff7cb0190 in acpi_pcihp_eject_slot (s=0x7ffff8bf7e18, bsel=0, slots=32) at hw/acpi/pcihp.c:139
  11. #10 0x00007ffff7cb087f in pci_write (opaque=0x7ffff8bf7e18, addr=8, data=32, size=4) at hw/acpi/pcihp.c:277
  12. #11 0x00007ffff7b4115c in memory_region_write_accessor (mr=0x7ffff8bf8a28, addr=8, value=0x7fffe98969f8, size=4, shift=0, mask=4294967295)
  13.     at /usr/local/src/qemu-2.1.2/memory.c:444
  14. #12 0x00007ffff7b412a9 in access_with_adjusted_size (addr=8, value=0x7fffe98969f8, size=4, access_size_min=1, access_size_max=4, access=
  15.     0x7ffff7b410ba <memory_region_write_accessor>, mr=0x7ffff8bf8a28) at /usr/local/src/qemu-2.1.2/memory.c:481
  16. #13 0x00007ffff7b444d7 in memory_region_dispatch_write (mr=0x7ffff8bf8a28, addr=8, data=32, size=4) at /usr/local/src/qemu-2.1.2/memory.c:1138
  17. #14 0x00007ffff7b48020 in io_mem_write (mr=0x7ffff8bf8a28, addr=8, val=32, size=4) at /usr/local/src/qemu-2.1.2/memory.c:1976
  18. #15 0x00007ffff7aef749 in address_space_rw (as=0x7ffff833dc00, addr=44552, buf=0x7ffff7a3e000 " ", len=4, is_write=true) at /usr/local/src/qemu-2.1.2/exec.c:2077
  19. #16 0x00007ffff7b3d5e5 in kvm_handle_io (port=44552, data=0x7ffff7a3e000, direction=1, size=4, count=1) at /usr/local/src/qemu-2.1.2/kvm-all.c:1597
  20. #17 0x00007ffff7b3db69 in kvm_cpu_exec (cpu=0x7ffff8b08490) at /usr/local/src/qemu-2.1.2/kvm-all.c:1734
  21. #18 0x00007ffff7b234bc in qemu_kvm_cpu_thread_fn (arg=0x7ffff8b08490) at /usr/local/src/qemu-2.1.2/cpus.c:874
  22. #19 0x00007ffff0d2b9d1 in start_thread () from /lib64/libpthread.so.0
  23. #20 0x00007ffff0a78b5d in clone () from /lib64/libc.so.6

如果不等来它,直接拔掉磁盘文件,就会出现上述数据不一致的情况。实际虚机里应该已出现异常,只是客户没反应。。

上一篇:openstack juno新增功能简介_nova
下一篇:openstack使用glusterfs排错一例