# netstat -rn Routing tables Destination Gateway Flags Refs Use If Exp Groups Route Tree for Protocol Family 2 (Internet): default 10.209.3.62 UG 0 5 en2 - - 10.209.3.0 10.209.3.45 UHSb 0 0 en2 - - => 10.209.3/26 10.209.3.45 U 1 0 en2 - - 10.209.3.45 127.0.0.1 UGHS 0 0 lo0 - - 10.209.3.63 10.209.3.45 UHSb 0 0 en2 - - 127/8 127.0.0.1 U 8 24709 lo0 - - 192.168.2.0 192.168.2.45 UHSb 0 0 en2 - - => 192.168.2/26 192.168.2.45 U 2 3240 en2 - - 192.168.2.45 127.0.0.1 UGHS 0 5739 lo0 - - 192.168.2.63 192.168.2.45 UHSb 0 1395 en2 - - 192.168.3.0 192.168.3.45 UHSb 0 0 en0 - - => 192.168.3/26 192.168.3.45 U 2 3869 en0 - - 192.168.3.45 127.0.0.1 UGHS 0 13164 lo0 - - 192.168.3.63 192.168.3.45 UHSb 0 768 en0 - - 192.168.100.0 192.168.100.36 UHSb 0 0 en3 - - => 192.168.100/26 192.168.100.36 U 1 7078 en3 - - 192.168.100.36 127.0.0.1 UGHS 0 6343 lo0 - - 192.168.100.63 192.168.100.36 UHSb 0 4 en3 - - Route Tree for Protocol Family 24 (Internet v6): ::1 ::1 UH 0 591 lo0 - - |
资源组切换之后的路由变成了:
# netstat -rn Routing tables Destination Gateway Flags Refs Use If Exp Groups Route Tree for Protocol Family 2 (Internet): default 10.209.3.62 U 0 0 en0 - - 10.209.3.0 10.209.3.45 UHSb 0 0 en0 - - => 10.209.3/26 10.209.3.45 U 0 1 en0 - - 10.209.3.45 127.0.0.1 UGHS 0 1 lo0 - - 10.209.3.63 10.209.3.45 UHSb 0 0 en0 - - 127/8 127.0.0.1 U 4 25581 lo0 - - 192.168.2.0 192.168.2.45 UHSb 0 0 en2 - - => 192.168.2/26 192.168.2.45 U 0 3476 en2 - - 192.168.2.45 127.0.0.1 UGHS 0 5850 lo0 - - |
以上路由信息来自资料[1].
发生变化的主要是默认路由, 由:
default 10.209.3.62 UG 0 5 en2 - - |
变成了:
default 10.209.3.62 U 0 0 en0 - - |
AIX route的输出结果的FLAG列中的U表示其状态为UP, G表示这是一个GATEWAY,
观察HACMP的日志其中有如下信息:
+filetrans_rg:clifconfig[207] ifconfig en10 delete 192.168.0.13 +filetrans_rg:cl_swap_IP_address[+1280] [[ -n ]] +filetrans_rg:cl_swap_IP_address[+1303] /usr/es/sbin/cluster/.restore_routes +filetrans_rg:.restore_routes[+9] date +filetrans_rg:.restore_routes[+9] : Starting /usr/es/sbin/cluster/.restore_routes at Wed Mar 16 17:04:44 BEIST 2011 +filetrans_rg:.restore_routes[+11] cl_route_change default 127.0.0.1 192.168.0.254 inet +filetrans_rg:cl_swap_IP_address[+1304] : Completed /usr/es/sbin/cluster/.restore_routes with return code 0. +filetrans_rg:cl_swap_IP_address[+1304] [[ __AIX__ = __AIX__ ]] +filetrans_rg:cl_swap_IP_address[+1305] enable_pmtu_gated Setting tcp_pmtu_discover to 1 Setting udp_pmtu_discover to 1 +filetrans_rg:cl_swap_IP_address[+1308] cl_hats_adapter en10 -d 192.168.0.13 alias +filetrans_rg:cl_hats_adapter[+50] [[ high = high ]] +filetrans_rg:cl_hats_adapter[+50] version=1.40 +filetrans_rg:cl_hats_adapter[+51] +filetrans_rg:cl_hats_adapter[+51] cl_get_path HA_DIR=es +filetrans_rg:cl_hats_adapter[+52] +filetrans_rg:cl_hats_adapter[+52] cl_get_path -S |
可以看出, HACMP中负责恢复路由任务的是/usr/es/sbin/cluster/.restore_routes, 该脚本内容如下:
#cat /usr/es/sbin/cluster/.restore_routes #!/bin/ksh # # Script created by cl_swap_IP_address on Wed Mar 16 17:04:44 BEIST 2011 # PATH=/usr/es/sbin/cluster:/usr/es/sbin/cluster/utilities:/usr/es/sbin/cluster/events: /usr/es/sbin/cluster/events/utils:/usr/es/sbin/cluster/events/cmd:/usr/es/sbin/cluster/diag: /usr/es/sbin/cluster/etc:/usr/es/sbin/cluster/sbin:/usr/es/sbin/cluster/cspoc: /usr/es/sbin/cluster/conversion:/usr/es/sbin/cluster/events/emulate: /usr/es/sbin/cluster/events/emulate/driver:/usr/es/sbin/cluster/events/emulate/utils: /usr/es/sbin/cluster/tguides/bin:/usr/es/sbin/cluster/tguides/classes: /usr/es/sbin/cluster/tguides/images:/usr/es/sbin/cluster/tguides/scripts: /usr/es/sbin/cluster/glvm/utils:/usr/es/sbin/cluster/wpar:/usr/bin:/etc: /usr/sbin:/usr/ucb:/usr/bin/X11:/sbin PS4='${GROUPNAME:++$GROUPNAME}:${PROGNAME:-${0##*/}}${PS4_TIMER:+($SECONDS)}${PS4_LOOP:+:$PS4_LOOP}[${ERRNO:+${PS4_FUNC:-}+}$LINENO] ' export VERBOSE_LOGGING=${VERBOSE_LOGGING:-"high"} [[ "$VERBOSE_LOGGING" = "high" ]] && set -x : Starting $0 at $(date) # cl_route_change default 127.0.0.1 192.168.0.254 inet |
实际上负责路由改变的是cl_route_change命令,这是一个二进制文件, 在IBM和google中搜索cl_route_change, 可以搜到结果[2][3][4][5], 从这些文章确认这是HACMP的一个BUG, 通过打efax iz63775或者升级到PowerHA 6 SP01.
因为之前从未使用过efax, 今天顺手玩了一把, 记录如下:
[/tmp/hacmp]#emgr -e IZ63775.epkg.Z +-----------------------------------------------------------------------------+ Efix Manager Initialization +-----------------------------------------------------------------------------+ Initializing log /var/adm/ras/emgr.log ... Efix package file is: /tmp/hacmp/IZ63775.epkg.Z MD5 generating command is /usr/bin/csum MD5 checksum is 8ba66435963cf3318502e7953bfebf8a Accessing efix metadata ... Processing efix label "IZ63775" ... Verifying efix control file ... +-----------------------------------------------------------------------------+ Installp Prerequisite Verification +-----------------------------------------------------------------------------+ Verifying prerequisite file ... Checking prerequisites ... Prerequisite Number: 1 Fileset: cluster.es.server.events Minimal Level: 6.1.0.0 Maximum Level: 6.1.0.0 Actual Level: 6.1.0.0 Type: PREREQ Requisite Met: yes All prerequisites have been met. +-----------------------------------------------------------------------------+ Processing APAR reference file +-----------------------------------------------------------------------------+ APAR reference set to NONE. Interim fix is not enabled for automatic removal. +-----------------------------------------------------------------------------+ Efix Attributes +-----------------------------------------------------------------------------+ LABEL: IZ63775 PACKAGING DATE: Fri Oct 23 12:22:46 CDT 2009 ABSTRACT: Deflt route prblm in base HA 610 PACKAGER VERSION: 7 VUID: 00CCCC5B4C00102312104609 REBOOT REQUIRED: no BUILD BOOT IMAGE: no PRE-REQUISITES: yes SUPERSEDE: no PACKAGE LOCKS: no E2E PREREQS: no FIX TESTED: no ALTERNATE PATH: None EFIX FILES: 1 Install Scripts: PRE_INSTALL: no POST_INSTALL: no PRE_REMOVE: no POST_REMOVE: no File Number: 1 LOCATION: /usr/es/sbin/cluster/events/utils/cl_route_change FILE TYPE: Standard (file or executable) INSTALLER: installp SIZE: 76 ACL: DEFAULT CKSUM: 44210 PACKAGE: cluster.es.server.events MOUNT INST: no +-----------------------------------------------------------------------------+ Efix Description +-----------------------------------------------------------------------------+ This is a fix to cl_route_change for a problem introduced in base PowerHA 610. +-----------------------------------------------------------------------------+ Efix Lock Management +-----------------------------------------------------------------------------+ Checking locks for file /usr/es/sbin/cluster/events/utils/cl_route_change ... All files have passed lock checks. +-----------------------------------------------------------------------------+ Space Requirements +-----------------------------------------------------------------------------+ Checking space requirements ... Space statistics (in 512 byte-blocks): File system: /usr, Free: 16042168, Required: 1288, Deficit: 0. File system: /tmp, Free: 7191192, Required: 2570, Deficit: 0. +-----------------------------------------------------------------------------+ Efix Installation Setup +-----------------------------------------------------------------------------+ Unpacking efix package file ... Initializing efix installation ... +-----------------------------------------------------------------------------+ Efix State +-----------------------------------------------------------------------------+ Setting efix state to: INSTALLING +-----------------------------------------------------------------------------+ File Archiving +-----------------------------------------------------------------------------+ Saving all files that will be replaced ... Save directory is: /usr/emgrdata/efixdata/IZ63775/save File 1: Saving /usr/es/sbin/cluster/events/utils/cl_route_change as EFSAVE1 ... +-----------------------------------------------------------------------------+ Efix File Installation +-----------------------------------------------------------------------------+ Installing all efix files: Installing efix file #1 (File: /usr/es/sbin/cluster/events/utils/cl_route_change) ... /usr/sbin/emgr[160]: query: not found. Total number of efix files installed is 1. All efix files installed successfully. +-----------------------------------------------------------------------------+ Package Locking +-----------------------------------------------------------------------------+ Processing package locking for all files. File 1: locking installp fileset cluster.es.server.events. All package locks processed successfully. +-----------------------------------------------------------------------------+ Reboot Processing +-----------------------------------------------------------------------------+ Reboot is not required by this efix package. +-----------------------------------------------------------------------------+ Efix State +-----------------------------------------------------------------------------+ Setting efix state to: STABLE +-----------------------------------------------------------------------------+ Operation Summary +-----------------------------------------------------------------------------+ Log file is /var/adm/ras/emgr.log EPKG NUMBER LABEL OPERATION RESULT =========== ============== ================= ============== 1 IZ63775 INSTALL SUCCESS Return Status = SUCCESS [/tmp]#emgr -l ID STATE LABEL INSTALL TIME ABSTRACT === ===== ========== ================== ====================================== 1 S IZ63775 03/16/11 16:08:28 Deflt route prblm in base HA 610 STATE codes: S = STABLE M = MOUNTED U = UNMOUNTED Q = REBOOT REQUIRED B = BROKEN I = INSTALLING R = REMOVING T = TESTED [/tmp]#emgr -l There is no efix data on this system. [/tmp]#emgr -r -L IZ63775 +-----------------------------------------------------------------------------+ Efix Manager Initialization +-----------------------------------------------------------------------------+ Initializing log /var/adm/ras/emgr.log ... Accessing efix metadata ... Processing efix label "IZ63775" ... +-----------------------------------------------------------------------------+ Efix Attributes +-----------------------------------------------------------------------------+ LABEL: IZ63775 INSTALL DATE: 03/16/11 16:08:28 STATE: STABLE ABSTRACT: Deflt route prblm in base HA 610 PACKAGER VERSION: 7 VUID: 00CCCC5B4C00102312104609 REBOOT REQUIRED: no BUILD BOOT IMAGE: no PRE-REQUISITES: yes SUPERSEDE: no PACKAGE LOCKS: no E2E PREREQS: no FIX TESTED: no ALTERNATE PATH: None EFIX FILES: 1 Install Scripts: PRE_INSTALL: no POST_INSTALL: no PRE_REMOVE: no POST_REMOVE: no File Number: 1 LOCATION: /usr/es/sbin/cluster/events/utils/cl_route_change FILE TYPE: Standard (file or executable) INSTALLER: installp SIZE: 76 ACL: DEFAULT CKSUM: 44210 PACKAGE: cluster.es.server.events MOUNT INST: no +-----------------------------------------------------------------------------+ Efix Description +-----------------------------------------------------------------------------+ This is a fix to cl_route_change for a problem introduced in base PowerHA 610. +-----------------------------------------------------------------------------+ Space Requirements +-----------------------------------------------------------------------------+ Checking space requirements ... Space statistics (in 512 byte-blocks): File system: /usr, Free: 16041936, Required: 1247, Deficit: 0. +-----------------------------------------------------------------------------+ Efix State +-----------------------------------------------------------------------------+ Setting efix state to: REMOVING +-----------------------------------------------------------------------------+ Package Locking +-----------------------------------------------------------------------------+ Processing package unlocking for all files. File 1: unlocking installp fileset cluster.es.server.events. All package locks processed successfully. +-----------------------------------------------------------------------------+ Efix File Removal +-----------------------------------------------------------------------------+ Setting up for removal of efix files ... Removing all efix files (in reverse order of installation): Removing efix file #1 (File: /usr/es/sbin/cluster/events/utils/cl_route_change) ... Total number of efix files removed is 1. +-----------------------------------------------------------------------------+ Reboot Processing +-----------------------------------------------------------------------------+ Reboot is not required by this efix package. +-----------------------------------------------------------------------------+ Operation Summary +-----------------------------------------------------------------------------+ Log file is /var/adm/ras/emgr.log EFIX NUMBER LABEL OPERATION RESULT =========== ============== ================= ============== 1 IZ63775 REMOVE SUCCESS Return Status = SUCCESS |
系统工程师的三大法宝: 重启, 重装, 打补丁, 还是很有道理的.
[1]
[2]
[3]
[4]
[5]