Lustre Error -- 碰到过的一些问题
关于lustre文件系统这里就不多说了。下面为工作中遇到过的一些关于lustre文件系统的问题及可能的解决方法,不定时更新。#外文就没翻译了,怕自己水平不够曲解了愿意
1. lvbo_init failed for resource xxxx rc -2
you can see that message in two cases
1) valid race, when client try get info via stat() syscall and unlink that file in same time.
2) some ost objects are lost. that case typically hit after unclean shutdown.
second case you need
1) stop OST
2) run e2fsck to OST partition.
3) mount partition as ldiskfs and run ll_recover_lost_found_objs tool to move objects into correct place from "lost+found".
4) umount partition
after it you can start OST again.
if you will see that messages after that procedure - you need run lfsck to find which files are corrupted and restore it from backup.
2. UUID xxxxxxx is not available for connect (no target)
you can see that message in two cases
1)This means that the device an OSS is using for an OST has become unavailable (i.e. ENODEV -- No such device). The question becomes, why? What kind of disk is the OST? You might want to look into your storage hardware's logs to see if there is any indication of troubles.
2)These messages are related to a client which tries to mount the SFS file system or tries to reconnect to the OST.
The client has in the /etc/fstab file entry both SFS server NIDS, but only one SFS server has the OST mounted.
When the client tries to access the OST on the node which does not have the OST mounted, he will see these errors/messages.
These messages just mean a client tries to access a OST(LUN), but it is not mounted on this server.
Most of the time the OST(LUN) is mounted on the other SFS(heartbeat) node.
These messages are normal when SFS servers are configured for high availability (like with heartbeat).
3.RPC Debug messages
req@ffff81034d168850 x1448410413532362/t0(0) o101->34a4f130-6594-ba08-18a5-15126d38e40b@20.11.0.1@tcp:0/0 lens 552/2096 e 2 to 0 dl 1381608355 ref 2 fl Interpret:/0/0 rc 0/0
1) req@: is pulled from the ptlrpc_request structure used for an RPC by the DEBUG_REQ macro
2) ffff81034d168850 : memory address denoted by req@
3) x1448410413532362/t0(0): XID and Transaction Number(transno)
4) o101: opcode; o400 is the obd_ping request and o101 is the LDLM enqueue request
5) 34a4f130-6594-ba08-18a5-15126d38e40b@20.11.0.1@tcp:0/0: export or import target UUID and protals request and reply buffers
6) lens: the request and reply buffer lengths
7) e: the number of early replies sent under adaptive timeouts
8) to: timeout and is a logical zero or one depending on whether the request timed out
9) dl: deadline time
10) ref: reference count
11) fl: flags and will indicate whether the request was resent,interrupted, complete, high priority, etc.
12) rc: request/reply flags and the request/reply status. The status is typically an errno, but higher numbers refers to Lustre specific uses.
*) The transno, opcode, and reply status are the most useful entries to parse while examining the logs.
4. LDLM Debug messages
This LDLM_ERROR macro is used whenever a server evicts a client and so it is quite common.
This macro uses the eye catcher '###' so it can be easily found as well.
(ldlm_lockd.c:357:waiting_locks_callback()) ### lock callback timer expired after 101s: evicting client at 20.3.21.19@tcp ns: mdt-ffff8104927e0000 lock: ffff81032b2f0240/0xbe90459d960b0a8a lrc: 3/0,0 mode: PR/PR res: 8589947646/1303 bits 0x3 rrc: 352 type: IBT flags: 0x4000020 remote: 0xe9142e8925e85238 expref: 26 pid: 27489 timeout: 8287289131
1) ns: namespaces, which is essentially the lock domain for the storage target.
2) mode: granted and requested mode. The types are exclusive mode (EX), protective write (PW), protective read (PR), concurrent write (CW), concurrent read (CR), or null (NL)
3) res: inode and generation numbers for the resource on ghe ldiskfs backing store.
4) type: lock type. extent EXT, ibits IBT, flock FLK
5. mdt_handler.c:913:mdt_getattr_name_lock() Parent doesn't exist!
lustre/lustre/mdt/mdt_handler.c
normal unlink/delete race, need lower the debug message level
6. readonly
Readonly
磁盘错误
1) 在fsck前,先在mds节点,deactivate 这个ost
lctl --device ggfs-OST0014-osc-MDT0000 deactivate
然后再umount 相应的挂载的目录
2) 用lustre工具,e2fsck命令来check。
e2fsck时,分两步操作,首先e2fsck -fn /dev/sdc;如果报错少的话,就e2fsck -fp /dev/sdc;如果报错多的话,
先备份该裸设备数据,再e2fsck -fp /dev/sdc
3) 备份并修复好后,重启机器并到mds上激活相应设备(少量错误,直接重启机器即可解决)
lctl --device ggfs-OST0014-osc-MDT0000 activate
7.Lustre: Service thread pid 8823 was inactive for 200.00s. The thread might be hung, or it might only be slow and will resume later
server线程被阻塞,watchdog time到期,说明server thread被阻塞了,原因可能是等待某一资源,死锁及被阻塞的RPC通讯
8.查看各IO节点分别挂载了哪些OST
gg2425:~ # lctl get_param osc.*.ost_conn_uuid
osc.ggfs-OST0000-osc-ffff8806262d3400.ost_conn_uuid=20.3.100.16@tcp
osc.ggfs-OST0001-osc-ffff8806262d3400.ost_conn_uuid=20.3.100.16@tcp
osc.ggfs-OST0002-osc-ffff8806262d3400.ost_conn_uuid=20.3.100.16@tcp
osc.ggfs-OST0003-osc-ffff8806262d3400.ost_conn_uuid=20.3.100.16@tcp
osc.ggfs-OST0004-osc-ffff8806262d3400.ost_conn_uuid=20.3.100.17@tcp
osc.ggfs-OST0005-osc-ffff8806262d3400.ost_conn_uuid=20.3.100.17@tcp
osc.ggfs-OST0006-osc-ffff8806262d3400.ost_conn_uuid=20.3.100.17@tcp
osc.ggfs-OST0007-osc-ffff8806262d3400.ost_conn_uuid=20.3.100.17@tcp
osc.ggfs-OST0008-osc-ffff8806262d3400.ost_conn_uuid=20.3.100.18@tcp
osc.ggfs-OST0009-osc-ffff8806262d3400.ost_conn_uuid=20.3.100.18@tcp
osc.ggfs-OST000a-osc-ffff8806262d3400.ost_conn_uuid=20.3.100.18@tcp
osc.ggfs-OST000b-osc-ffff8806262d3400.ost_conn_uuid=20.3.100.4@tcp
osc.ggfs-OST000c-osc-ffff8806262d3400.ost_conn_uuid=20.3.100.4@tcp
osc.ggfs-OST000d-osc-ffff8806262d3400.ost_conn_uuid=20.3.100.19@tcp
osc.ggfs-OST000e-osc-ffff8806262d3400.ost_conn_uuid=20.3.100.20@tcp
osc.ggfs-OST000f-osc-ffff8806262d3400.ost_conn_uuid=20.3.100.20@tcp