Zabbix housekeeper processes more than 75% busy 报警排查

21650阅读 0评论2014-08-15 fathermotherson
分类:系统运维

首先理解housekeeper 是什么,看zabbix_server.conf配置文件

点击(此处)折叠或打开

  1. ### Option: HousekeepingFrequency
  2. # How often Zabbix will perform housekeeping procedure (in hours).
  3. # Housekeeping is removing unnecessary information from history, alert, and alarms tables.
  4. #
  5. # Mandatory: no
  6. # Range: 1-24
  7. # Default:
  8. # HousekeepingFrequency=1


  9. ### Option: MaxHousekeeperDelete
  10. # The table "housekeeper" contains "tasks" for housekeeping procedure in the format:
  11. # [housekeeperid], [tablename], [field], [value].
  12. # No more than 'MaxHousekeeperDelete' rows (corresponding to [tablename], [field], [value])
  13. # will be deleted per one task in one housekeeping cycle.
  14. # SQLite3 does not use this parameter, deletes all corresponding rows without a limit.
  15. # If set to 0 then no limit is used at all. In this case you must know what you are doing!
  16. #
  17. # Mandatory: no
  18. # Range: 0-1000000
  19. # Default:
  20. # MaxHousekeeperDelete=500
大意就是housekeeper就是清理数据库里过期的历史数据神马的。然后MaxHousekeeperDelete就是一个阀值,每次轮询housekeeper这个任务的时候,超过这个阀值的行都会被清理。

怎么查看housekeeper的执行情况?看日志:

点击(此处)折叠或打开

  1. grep housekeeper /var/log/zabbix/zabbix_server.log 
  2.   4850:20140809:175626.071 executing housekeeper
  3.   4850:20140809:181408.036 housekeeper [deleted 279622 hist/trends, 0 items, 0 events, 0 sessions, 0 alarms, 0 audit items in 1061.962644 sec, idle 1 hour(s)]
  4.   4850:20140809:191408.037 executing housekeeper
  5.   4850:20140809:192611.432 housekeeper [deleted 287033 hist/trends, 0 items, 0 events, 0 sessions, 0 alarms, 0 audit items in 723.394480 sec, idle 1 hour(s)]
  6.   4850:20140809:202611.433 executing housekeeper
  7.   4850:20140809:203638.243 housekeeper [deleted 266125 hist/trends, 0 items, 0 events, 0 sessions, 0 alarms, 0 audit items in 626.808964 sec, idle 1 hour(s)]
  8.   4850:20140809:213638.244 executing housekeeper
  9.   4850:20140809:215445.003 housekeeper [deleted 258097 hist/trends, 0 items, 0 events, 0 sessions, 0 alarms, 0 audit items in 1086.756768 sec, idle 1 hour(s)]
  10.   4850:20140809:225445.004 executing housekeeper
  11.   4850:20140809:230601.581 housekeeper [deleted 286602 hist/trends, 0 items, 0 events, 0 sessions, 0 alarms, 0 audit items in 676.576122 sec, idle 1 hour(s)]
  12. ....
  13. ....
关于housekeeper的执行过程,摘抄了zabbix论坛上的:


点击(此处)折叠或打开

  1. That is fine.
  2. Zabbix server housekeeper is doing all deletes in few stages:
  3. - in first is deleting from history* and trends* tables using clock key and it deletes ALL data from items older than specified in "Keep history" param,
  4. - in second stage is deleting rows of items of deleted items and deleted hosts (zabbix does not deletes all these data just when you click on delete but it adds all these items ids to 'housekeeper' table).
  5. - at the end it deletes items from events, acknowledgements, alarms tables
另外,在Monitor-Dashboard-Graphs里面也可以查看到Zabbix server的一些性能情况。
比如Zabbix internal process busy % 

到zabbix server上查看系统性能情况,发现io很高:

点击(此处)折叠或打开

  1. [root@zabbix ~]# iostat -xm 1
  2. Linux 2.6.32-431.5.1.el6.x86_64 (zabbix)     08/14/2014     _x86_64_    (2 CPU)

  3. avg-cpu: %user %nice %system %iowait %steal %idle
  4.            8.57 0.00 4.23 11.05 0.13 76.02

  5. Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s avgrq-sz avgqu-sz await svctm %util
  6. xvda 0.00 588.40 0.56 275.92 0.01 3.38 25.11 0.19 0.69 0.84 23.15
  7. xvdb 0.00 76.12 0.29 59.23 0.01 0.53 18.46 1.12 18.87 1.88 11.17
  8. dm-0 0.00 0.00 0.86 999.67 0.02 3.90 8.04 0.86 0.86 0.26 26.15
  9. dm-1 0.00 0.00 0.00 0.00 0.00 0.00 8.00 0.00 67.83 6.77 0.00

  10. avg-cpu: %user %nice %system %iowait %steal %idle
  11.           23.98 0.00 11.73 42.35 0.51 21.43

  12. Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s avgrq-sz avgqu-sz await svctm %util
  13. xvda 0.00 2964.00 20.00 1292.00 0.31 16.39 26.07 13.19 9.43 0.60 78.90
  14. xvdb 0.00 37.00 0.00 36.00 0.00 0.14 8.00 4.02 59.64 6.64 23.90
  15. dm-0 0.00 0.00 22.00 4385.00 0.42 17.13 8.16 57.78 11.69 0.18 81.30
  16. dm-1 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00

  17. avg-cpu: %user %nice %system %iowait %steal %idle
  18.           18.46 0.00 6.67 34.36 0.00 40.51

  19. Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s avgrq-sz avgqu-sz await svctm %util
  20. xvda 0.00 1849.00 4.00 826.00 0.14 10.69 26.72 11.90 15.43 1.01 83.90
  21. xvdb 0.00 0.00 0.00 83.00 0.00 0.47 11.57 6.01 94.92 3.66 30.40
  22. dm-0 0.00 0.00 1.00 2700.00 0.02 10.55 8.01 57.89 23.79 0.32 87.10
  23. dm-1 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
感谢linux群友提供的实际经验:

点击(此处)折叠或打开

  1. “建议你照着群上的那个文档去优化,基本10w的items不会有太大问题”
  2. 广州-小王 15:44:26
  3. 做分区表,直接删分区就好了
  4. 广州-小王 15:44:43
  5. 文档上有说,很简单,用脚本定时去删
  6. 广州-小王 15:45:09
  7. history开头的都能做分区,events表貌似2.0后就不能直接做了,有外键约束
  8. 广州-小王 15:48:53
  9. housekeeper有点废,当你量到达一定程度的时候删的速度没你增加的快..
  10. 广州-Samma 15:38:33
  11. 看housekeeper具体做了什么事。如果写对象是小一些的表,可以放到内存。
  12. 广州-Samma 15:39:21
  13. 或者把housekeeper的频度调大一些。 间隔N小时才执行一次

先写到这里吧,搞优化去了
优化参考:
先配置独立数据库,使用独立表空间

然后按照官网的这个文档对mysql 做分区


参考资料:
&page=2



上一篇:手把手教你安全运行fsck命令
下一篇:fdisk 分区之后 不停机内核重新加载分区表