由mysql timestamp字段引发的一个系统bug

5750阅读 0评论2015-07-01 zxszcaijin
分类:Mysql/postgreSQL

   2015-06-30 13:11:00左右,memcache由于硬件故障,莫名重启,导致数据库访问量突然增大(按照之前的测试结果,远未到达系统瓶颈),且当时syscpu 远高于正常值。
监控信息如下:


当时的mysqld堆栈信息如下:

点击(此处)折叠或打开

  1. Thread 50 (Thread 0x2afb6591a700 (LWP 7020)):
  2. #0 0x0000003c34ef808f in __lll_unlock_wake_private () from /lib64/libc.so.6
  3. #1 0x0000003c34e9dccc in _L_unlock_2226 () from /lib64/libc.so.6
  4. #2 0x0000003c34e9db01 in __tz_convert () from /lib64/libc.so.6
  5. #3 0x0000000000783012 in Time_zone_system::gmt_sec_to_TIME(st_mysql_time*, long) const ()
  6. #4 0x00000000007c177b in Field_timestampf::get_date_internal(st_mysql_time*) ()
  7. #5 0x00000000007c23dd in Field_temporal_with_date::val_str(String*, String*) ()
  8. #6 0x0000000000672fa8 in Protocol_text::store(Field*) ()
  9. #7 0x00000000006722f5 in Protocol::send_result_set_row(List<Item>*) ()
  10. #8 0x00000000006c77af in select_send::send_data(List<Item>&) ()
  11. #9 0x00000000006db6b0 in end_send(JOIN*, st_join_table*, bool) ()
  12. #10 0x00000000006d6c4f in evaluate_join_record(JOIN*, st_join_table*) ()
  13. #11 0x00000000006d6fbb in sub_select(JOIN*, st_join_table*, bool) ()
  14. #12 0x00000000006d61e8 in JOIN::exec() ()
  15. #13 0x000000000071f0d5 in mysql_select(THD*, TABLE_LIST*, unsigned int, List<Item>&, Item*, SQL_I_List<st_order>*, SQL_I_List<st_order>*, Item*, unsigned long long, select_result*, st_select_lex_unit*, st_select_lex*) ()
  16. #14 0x000000000071f935 in handle_select(THD*, select_result*, unsigned long) ()
  17. #15 0x000000000057fe02 in execute_sqlcom_select(THD*, TABLE_LIST*) ()
  18. #16 0x00000000006f935c in mysql_execute_command(THD*) ()
  19. #17 0x00000000006fe5e8 in mysql_parse(THD*, char*, unsigned int, Parser_state*) ()
  20. #18 0x00000000006ffdc9 in dispatch_command(enum_server_command, THD*, char*, unsigned int) ()
  21. #19 0x0000000000789a08 in threadpool_process_request(THD*) ()
  22. #20 0x000000000078a9dd in worker_main(void*) ()
  23. #21 0x0000000000b4a8c3 in pfs_spawn_thread ()
  24. #22 0x0000003c352079d1 in start_thread () from /lib64/libpthread.so.0
  25. #23 0x0000003c34ee88fd in clone () from /lib64/libc.so.6
从堆栈可以看到大部分请求都在调用glibc的时区转换函数,但该函数由于全局锁的原因,在并发环境下产生了热点竞争。

深入分析mysql时区相关的函数:

点击(此处)折叠或打开

  1. void
  2. Time_zone_system::gmt_sec_to_TIME(MYSQL_TIME *tmp, my_time_t t) const
  3. {
  4.   struct tm tmp_tm;
  5.   time_t tmp_t= (time_t)t;

  6.   localtime_r(&tmp_t, &tmp_tm);
  7.   localtime_to_TIME(tmp, &tmp_tm);
  8.   tmp->time_type= MYSQL_TIMESTAMP_DATETIME;
  9.   adjust_leap_second(tmp);
  10. }

继续进入localtime_r函数

点击(此处)折叠或打开

  1. struct tm *
  2. __tz_convert (const time_t *timer, int use_localtime, struct tm *tp)
  3. {
  4.   long int leap_correction;
  5.   int leap_extra_secs;

  6.   if (timer == NULL)
  7.     {
  8.       __set_errno (EINVAL);
  9.       return NULL;
  10.     }

  11.   __libc_lock_lock (tzset_lock);

  12.   /* Update internal database according to current TZ setting.
  13.      POSIX.1 8.3.7.2 says that localtime_r is not required to set tzname.
  14.      This is a good idea since this allows at least a bit more parallelism. */
  15.   tzset_internal (tp == &_tmbuf && use_localtime, 1);

  16.   if (__use_tzfile)
  17.     __tzfile_compute (*timer, use_localtime, &leap_correction,
  18.               &leap_extra_secs, tp);
  19.   else
  20.     {
  21.       if (! __offtime (timer, 0, tp))
  22.     tp = NULL;
  23.       else
  24.     __tz_compute (*timer, tp, use_localtime);
  25.       leap_correction = 0L;
  26.       leap_extra_secs = 0;
  27.     }

  28.   if (tp)
  29.     {
  30.       if (! use_localtime)
  31.     {
  32.       tp->tm_isdst = 0;
  33.       tp->tm_zone = "GMT";
  34.       tp->tm_gmtoff = 0L;
  35.     }

  36.       if (__offtime (timer, tp->tm_gmtoff - leap_correction, tp))
  37.         tp->tm_sec += leap_extra_secs;
  38.       else
  39.     tp = NULL;
  40.     }

  41.   __libc_lock_unlock (tzset_lock);

  42.   return tp;
  43. }

很清楚的看到,在调用系统时区转换时,有全局锁__libc_lock_lock的保护,导致线程并发环境下,系统性能受限。
而在time_zone=system时,会调用该函数从而获取系统的时区。如果将time_zone='+8:00'则不会调用系统时区.
对此,真实的模拟了线上的case(分别对time_zone=system和time_zone='+8:00'做了压测):

点击(此处)折叠或打开

  1. mysqlslap --no-defaults -u$user -p$password --create-schema=DianPingTS -h$host -P$port --number-of-queries=1000000000 --concurrency=30 --query="select AddTime,UpdateTime from $table where addtime>='2015-07-01 15:21:35' and addtime<='2015-07-01 17:33:35';"
以下是测试结果:

由此可见,在time_zone='+8:00'在timestamp类型的大量转换情况下,性能能得到质的提升..

小伙伴们,如果也遇到该问题,只要将time_zone='+8:00' (注意地区在东8区哦)就可以了..^_^




上一篇:percona 5.6 flush_master_info如何保证数据一致性 ?
下一篇:MySQL MHA Failover 实现