一般实现Solaris cluster高可用,至少需要两台服务器和一个外置存储器,而服务器上还需要有独立的心跳网卡,在硬件上只有满足了以上的需求,才具备了实现HA高可用的可能性。
可以访问我豆丁文档:
一台T系列服务器,不外接存储器,如何实现cluster呢,这就需要使用ORACLE VM for SPARC技术实现虚拟化。首先服务器可以使用VM的hypervisor来划分两个虚拟主机,其次,使用虚拟VSW服务实现网卡(其实,在T系列机器中,网卡数目还是够的,只是为了充分使用虚拟技术,不采用物理网卡),最为主要的就是在于定额设备的实现,采用了vdsk虚拟磁盘服务实现一个内置硬盘共享给两台虚拟主机作为共享外置存储。
本测试环境为T5140一台,有四块300G硬盘,format显示如下:
format
Searching for disks...done
AVAILABLE DISK SELECTIONS:
0. c1t0d0
/pci@400/pci@0/pci@8/scsi@0/sd@0,0
1. c1t1d0
/pci@400/pci@0/pci@8/scsi@0/sd@1,0
2. c1t2d0
/pci@400/pci@0/pci@8/scsi@0/sd@2,0
3. c1t3d0
/pci@400/pci@0/pci@8/scsi@0/sd@3,0
Specify disk (enter its number): ^D
计划c1t0d0作为控制域系统盘,c1t1d0作为cluster1节点系统盘,c1t2d0作为cluster2节点系统盘,c1t3d0作为定额设备被cluster1和cluster2共享。
节点名字分别为cluster1,cluster2,cluster名字为cluster-ldm
虚拟服务器划分实施步骤
1. Control domain的实现
A. 创建三个缺省的虚拟服务:
# ldm add-vds primary-vds primary 创建虚拟磁盘服务
# ldm add-vcc port-range=5000-5100 primary-vcc primary 创建虚拟console服务
# dm add-vsw net-dev=nxge0 primary-vsw primary 创建虚拟网络交换机服务
使用ldm list-services primary查看和验证三个缺省服务的创建
B. 创建控制域
ldm set-vcpu 4 primary 设置控制域CPU资源
ldm set-memory 1g primary 设置控制域memory资源
ldm set-mau 0 primary 设置控制域加密单元(不使用该资源)
ldm add-config initial 保存配置设置
svcadm enable svc:/ldoms/vntsd:default将虚拟控制台服务启动
shutdown -y -g0 -i6重启主机,控制域control domain创建成功
2. 虚拟主机cluster1的实现
ldm add-domain cluster1创建cluster1域
ldm add-vcpu 12 cluster1添加CPU资源
ldm add-memory 2G cluster1添加内存资源
ldm add-vnet vnet1 primary-vsw cluster1添加网络网卡设备
ldm add-vdsdev /dev/dsk/c1t1d0s2 vol1@primary-vds添加OS系统盘
ldm add-vdisk bootdisk vol1@primary-vds cluster1
ldm set-var auto-boot\?=false cluster1
ldm set-var boot-device=bootdisk cluster1
ldm bind-domain cluster1 绑定资源
3. 虚拟主机cluster2的实现
ldm add-domain cluster2
ldm add-vcpu 12 cluster2
ldm add-memory 2G cluster2
ldm add-vnet vnet2 primary-vsw cluster2
ldm add-vdsdev /dev/dsk/c1t2d0s2 vol2@primary-vds
ldm add-vdisk bootdisk vol2@primary-vds cluster2
ldm set-var auto-boot\?=false cluster2
ldm set-var boot-device=bootdisk
ldm bind-domain cluster2
4. 安装操作系统
ldm add-vdsdev /opt/sun/sol-10-u10-ga-sparc-dvd.iso cdrom-iso@primary-vds
ldm add-vdisk cdrom cdrom-iso@primary-vds cluster1
这样可以使用iso文件给cluster1虚拟机安装操作系统,晚装结束后,将该ISO重新添加给cluster2,同样安装好操作系统。
至此,将所有设置save到SP中去:
ldm add-config final-config-two-clusternode
定额设备Quorum devices的实现
定额设备必须是两个节点同时能够访问到一个外置存储设备,也可以是其他定额设备服务器上的设备,在一台T系列机器上,没有同时外接存储时候,虚拟机之间可以使用vdsk虚拟技术实现一块内置硬盘的DAS架构同时访问。
本测试中使用内置硬盘c1t3d0 作为定额设备,具体实现方法如下:
ldm stop cluster1;ldm stop cluster2
ldm add-vdsdev /dev/dsk/c1t3d0s2 vol1-share@primary-vds
ldm add-vdsdev -f /dev/dsk/c1t3d0s2 vol2-share@primary-vds
ldm add-vdisk vdisk1-share vol1-share@primary-vds cluster1
ldm add-vdisk vdisk2-share vol2-share@primary-vds cluster2
通过以上方法就实现了同一内置硬盘共享给两个虚拟节点。
心跳网卡的实现
心跳网卡可以使用VSW来实现虚拟的网卡,具体方法如下:
ldm add-vnet vnet1-share1 primary-vsw cluster1
ldm add-vnet vnet1-share2 primary-vsw cluster1
ldm add-vnet vnet2-share1 primary-vsw cluster2
ldm add-vnet vnet2-share2 primary-vsw cluster2
启动两个节点:
ldm start cluster1
ldm start cluster2
安装cluster软件并进行cluster配置
将cluster软件分别ftp两个节点,解开后使用installer脚本进行安装然后进行手工设置,以下为两个节点的具体设置,注意红色字体部分:
1. Cluster1的具体设置
scinstall
*** Main Menu ***
Please select from one of the following (*) options:
* 1) Create a new cluster or add a cluster node
2) Configure a cluster to be JumpStarted from this install server
3) Manage a dual-partition upgrade
4) Upgrade this cluster node
* 5) Print release information for this cluster node
* ?) Help with menu options
* q) Quit
Option: 1
*** New Cluster and Cluster Node Menu ***
Please select from any one of the following options:
1) Create a new cluster
2) Create just the first node of a new cluster on this machine
3) Add this machine as a node in an existing cluster
?) Help with menu options
q) Return to the Main Menu
Option: 2
*** Establish Just the First Node of a New Cluster ***
This option is used to establish a new cluster using this machine as
the first node in that cluster.
Before you select this option, the Oracle Solaris Cluster framework
software must already be installed. Use the Oracle Solaris Cluster
installation media or the IPS packaging system to install Oracle
Solaris Cluster software.
Press Control-d at any time to return to the Main Menu.
Do you want to continue (yes/no) [yes]?
>>> Typical or Custom Mode <<<
This tool supports two modes of operation, Typical mode and Custom.
For most clusters, you can use Typical mode. However, you might need
to select the Custom mode option if not all of the Typical defaults
can be applied to your cluster.
For more information about the differences between Typical and Custom
modes, select the Help option from the menu.
Please select from one of the following options:
1) Typical
2) Custom
?) Help
q) Return to the Main Menu
Option [1]:
>>> Cluster Name <<<
Each cluster has a name assigned to it. The name can be made up of any
characters other than whitespace. Each cluster name should be unique
within the namespace of your enterprise.
What is the name of the cluster you want to establish? cluster-ldm
>>> Check <<<
This step allows you to run cluster check to verify that certain basic
hardware and software pre-configuration requirements have been met. If
cluster check detects potential problems with configuring this machine
as a cluster node, a report of violated checks is prepared and
available for display on the screen.
Do you want to run cluster check (yes/no) [yes]?
Running cluster check ...
initializing...
initializing xml output...
loading auxiliary data...
filtering out checks not marked with one of keywords: installtime
starting check run...
cluster1: S6708605.... starting: The /dev/rmt directory is missing.
cluster1: S6708605 passed
cluster1: S6708606.... starting: Multiple network interfaces on a single subn...
cluster1: S6708606 not applicable
cluster1: S6708642.... starting: /proc fails to mount periodically during reb...
searching /var/adm/messages
searching /var/adm/messages.0
cluster1: S6708642 passed
cluster1: S6708638.... starting: Node has insufficient physical memory.
cluster1: S6708638 passed
cluster1: S6708496.... starting: Cluster node (3.1 or later) OpenBoot Prom (O...
cluster1: S6708496 passed
finished check run
finishing xml output...
Maximum severity of all violations: No Violations
Reports in: /var/cluster/logs/install/cluster_check/
cleaning up...
Press Enter to continue:
>>> Cluster Nodes <<<
This Oracle Solaris Cluster release supports a total of up to 16 nodes.
Please list the names of the other nodes planned for the initial
cluster configuration. List one node name per line. When finished,
type Control-D:
Node name (Control-D to finish): cluster1
Node name (Control-D to finish): cluster2
Node name (Control-D to finish): ^D
This is the complete list of nodes:
cluster1
cluster2
Is it correct (yes/no) [yes]?
>>> Cluster Transport Adapters and Cables <<<
Transport adapters are the adapters that attach to the private cluster
interconnect.
Select the first cluster transport adapter:
1) vnet1
2) vnet2
3) Other
Option: 1
Will this be a dedicated cluster transport adapter (yes/no) [yes]?
Searching for any unexpected network traffic on "vnet1" ... done
Unexpected network traffic was seen on "vnet1".
"vnet1" may be cabled to a public network.
Do you want to use "vnet1" anyway (yes/no) [no]? yes
Select the second cluster transport adapter:
1) vnet1
2) vnet2
3) Other
Option: 2
Will this be a dedicated cluster transport adapter (yes/no) [yes]?
Searching for any unexpected network traffic on "vnet2" ... done
Unexpected network traffic was seen on "vnet2".
"vnet2" may be cabled to a public network.
Do you want to use "vnet2" anyway (yes/no) [no]? yes
Plumbing network address 172.16.0.0 on adapter vnet1 >> NOT DUPLICATE ... done Plumbing network address 172.16.0.0 on adapter vnet2 >> NOT DUPLICATE ... done/globaldevices is not mounted.
Cannot use "/globaldevices".
Do you want to use a lofi device instead and continue the installation (yes/no) [yes]?
>>> Quorum Configuration <<<
Every two-node cluster requires at least one quorum device. By
default, scinstall selects and configures a shared disk quorum device
for you.
This screen allows you to disable the automatic selection and
configuration of a quorum device.
You have chosen to turn on the global fencing. If your shared storage
devices do not support SCSI, such as Serial Advanced Technology
Attachment (SATA) disks, or if your shared disks do not support
SCSI-2, you must disable this feature.
If you disable automatic quorum device selection now, or if you intend
to use a quorum device that is not a shared disk, you must instead use
clsetup(1M) to manually configure quorum once both nodes have joined
the cluster for the first time.
Do you want to disable automatic quorum device selection (yes/no) [no]?
>>> Automatic Reboot <<<
Once scinstall has successfully initialized the Oracle Solaris Cluster
software for this machine, the machine must be rebooted. After the
reboot, this machine will be established as the first node in the new
cluster.
Do you want scinstall to reboot for you (yes/no) [yes]?
>>> Confirmation <<<
Your responses indicate the following options to scinstall:
scinstall -i \
-C cluster-ldm \
-F \
-G lofi \
-T node=cluster1,node=cluster2,authtype=sys \
-w netaddr=172.16.0.0,netmask=255.255.240.0,maxnodes=64,maxprivatenets=10,numvirtualclusters=12 \
-A trtype=dlpi,name=vnet1 -A trtype=dlpi,name=vnet2 \
-B type=switch,name=switch1 -B type=switch,name=switch2 \
-m endpoint=:vnet1,endpoint=switch1 \
-m endpoint=:vnet2,endpoint=switch2 \
-P task=quorum,state=INIT
Are these the options you want to use (yes/no) [yes]?
Do you want to continue with this configuration step (yes/no) [yes]?
Initializing cluster name to "cluster-ldm" ... done
Initializing authentication options ... done
Initializing configuration for adapter "vnet1" ... done
Initializing configuration for adapter "vnet2" ... done
Initializing configuration for switch "switch1" ... done
Initializing configuration for switch "switch2" ... done
Initializing configuration for cable ... done
Initializing configuration for cable ... done
Initializing private network address options ... done
Setting the node ID for "cluster1" ... done (id=1)
Verifying that NTP is configured ... done
Initializing NTP configuration ... done
Updating nsswitch.conf ... done
Adding cluster node entries to /etc/inet/hosts ... done
Configuring IP multipathing groups ...done
Ensure that the EEPROM parameter "local-mac-address?" is set to "true" ... done
Ensure network routing is disabled ... done
Network routing has been disabled on this node by creating /etc/notrouter.
Having a cluster node act as a router is not supported by Oracle Solaris Cluster.
Please do not re-enable network routing.
Log file - /var/cluster/logs/install/scinstall.log.2234
Rebooting ...
2. Cluster2节点具体设置
scinstall
*** Main Menu ***
Please select from one of the following (*) options:
* 1) Create a new cluster or add a cluster node
2) Configure a cluster to be JumpStarted from this install server
3) Manage a dual-partition upgrade
4) Upgrade this cluster node
* 5) Print release information for this cluster node
* ?) Help with menu options
* q) Quit
Option: 1
*** New Cluster and Cluster Node Menu ***
Please select from any one of the following options:
1) Create a new cluster
2) Create just the first node of a new cluster on this machine
3) Add this machine as a node in an existing cluster
?) Help with menu options
q) Return to the Main Menu
Option: 3
*** Add a Node to an Existing Cluster ***
This option is used to add this machine as a node in an already
established cluster. If this is a new cluster, there may only be a
single node which has established itself in the new cluster.
Before you select this option, the Oracle Solaris Cluster framework
software must already be installed. Use the Oracle Solaris Cluster
installation media or the IPS packaging system to install Oracle
Solaris Cluster software.
Press Control-d at any time to return to the Main Menu.
Do you want to continue (yes/no) [yes]?
>>> Typical or Custom Mode <<<
This tool supports two modes of operation, Typical mode and Custom.
For most clusters, you can use Typical mode. However, you might need
to select the Custom mode option if not all of the Typical defaults
can be applied to your cluster.
For more information about the differences between Typical and Custom
modes, select the Help option from the menu.
Please select from one of the following options:
1) Typical
2) Custom
?) Help
q) Return to the Main Menu
Option [1]:
>>> Sponsoring Node <<<
For any machine to join a cluster, it must identify a node in that
cluster willing to "sponsor" its membership in the cluster. When
configuring a new cluster, this "sponsor" node is typically the first
node used to build the new cluster. However, if the cluster is already
established, the "sponsoring" node can be any node in that cluster.
Already established clusters can keep a list of hosts which are able
to configure themselves as new cluster members. This machine should be
in the join list of any cluster which it tries to join. If the list
does not include this machine, you may need to add it by using
claccess(1CL) or other tools.
And, if the target cluster uses DES to authenticate new machines
attempting to configure themselves as new cluster members, the
necessary encryption keys must be configured before any attempt to
join.
What is the name of the sponsoring node? cluster1
>>> Cluster Name <<<
Each cluster has a name assigned to it. When adding a node to the
cluster, you must identify the name of the cluster you are attempting
to join. A sanity check is performed to verify that the "sponsoring"
node is a member of that cluster.
What is the name of the cluster you want to join? cluster-ldm
Attempting to contact "cluster1" ... done
Cluster name "cluster-ldm" is correct.
Press Enter to continue:
>>> Check <<<
This step allows you to run cluster check to verify that certain basic
hardware and software pre-configuration requirements have been met. If
cluster check detects potential problems with configuring this machine
as a cluster node, a report of violated checks is prepared and
available for display on the screen.
Do you want to run cluster check (yes/no) [yes]?
Running cluster check ...
initializing...
initializing xml output...
loading auxiliary data...
filtering out checks not marked with one of keywords: installtime
starting check run...
cluster2: S6708605.... starting: The /dev/rmt directory is missing.
cluster2: S6708605 passed
cluster2: S6708606.... starting: Multiple network interfaces on a single subn...
cluster2: S6708606 not applicable
cluster2: S6708642.... starting: /proc fails to mount periodically during reb...
searching /var/adm/messages
searching /var/adm/messages.0
cluster2: S6708642 passed
cluster2: S6708638.... starting: Node has insufficient physical memory.
cluster2: S6708638 passed
cluster2: S6708496.... starting: Cluster node (3.1 or later) OpenBoot Prom (O...
cluster2: S6708496 passed
finished check run
finishing xml output...
Maximum severity of all violations: No Violations
Reports in: /var/cluster/logs/install/cluster_check/
cleaning up...
Press Enter to continue:
>>> Autodiscovery of Cluster Transport <<<
If you are using Ethernet or Infiniband adapters as the cluster
transport adapters, autodiscovery is the best method for configuring
the cluster transport.
Do you want to use autodiscovery (yes/no) [yes]?
Probing .....
The following connection was discovered:
cluster1:vnet1 switch1 cluster2:vnet1
Probes were sent out from all transport adapters configured for
cluster node "cluster1". But, they were only received on less than 2
of the network adapters on this machine ("cluster2"). This may be due
to any number of reasons, including improper cabling, an improper
configuration for "cluster1", or a switch which was confused by the
probes.
You can either attempt to correct the problem and try the probes again
or manually configure the transport. To correct the problem might
involve re-cabling, changing the configuration for "cluster1", or
fixing hardware. You must configure the transport manually to
configure tagged VLAN adapters and non tagged VLAN adapters on the
same private interconnect VLAN.
Do you want to try again (yes/no) [yes]? no
>>> Cluster Transport Adapters and Cables <<<
Transport adapters are the adapters that attach to the private cluster
interconnect.
Select the first cluster transport adapter:
1) vnet1
2) vnet2
3) Other
Option: 1
Will this be a dedicated cluster transport adapter (yes/no) [yes]?
Select the second cluster transport adapter:
1) vnet1
2) vnet2
3) Other
Option: 2
Will this be a dedicated cluster transport adapter (yes/no) [yes]?
>>> Automatic Reboot <<<
Once scinstall has successfully initialized the Oracle Solaris Cluster
software for this machine, the machine must be rebooted. The reboot
will cause this machine to join the cluster for the first time.
Do you want scinstall to reboot for you (yes/no) [yes]?
>>> Confirmation <<<
Your responses indicate the following options to scinstall:
scinstall -i \
-C cluster-ldm \
-N cluster1 \
-A trtype=dlpi,name=vnet1 -A trtype=dlpi,name=vnet2 \
-m endpoint=:vnet1,endpoint=switch1 \
-m endpoint=:vnet2,endpoint=switch2
Are these the options you want to use (yes/no) [yes]?
Do you want to continue with this configuration step (yes/no) [yes]?
Checking device to use for global devices file system ... done
Adding node "cluster2" to the cluster configuration ... done
Adding adapter "vnet1" to the cluster configuration ... done
Adding adapter "vnet2" to the cluster configuration ... done
Adding cable to the cluster configuration ... done
Adding cable to the cluster configuration ... done
Copying the config from "cluster1" ... done
Copying the postconfig file from "cluster1" if it exists ... done
Setting the node ID for "cluster2" ... done (id=2)
Verifying the major number for the "did" driver with "cluster1" ... done
Checking for global devices global file system ... done
Updating vfstab ... done
Verifying that NTP is configured ... done
Initializing NTP configuration ... done
Updating nsswitch.conf ... done
Adding cluster node entries to /etc/inet/hosts ... done
Configuring IP multipathing groups ...done
Ensure that the EEPROM parameter "local-mac-address?" is set to "true" ... done
Ensure network routing is disabled ... done
Network routing has been disabled on this node by creating /etc/notrouter.
Having a cluster node act as a router is not supported by Oracle Solaris Cluster.
Please do not re-enable network routing.
Updating file ("ntp.conf.cluster") on node cluster1 ... done
Updating file ("hosts") on node cluster1 ... done
Log file - /var/cluster/logs/install/scinstall.log.2111
Rebooting ...
3. 设置定额设备Quorum devices
didadm –L显示设备,确认d3为定额设备:
1 cluster1:/dev/rdsk/c0d0 /dev/did/rdsk/d1
2 cluster2:/dev/rdsk/c0d0 /dev/did/rdsk/d2
3 cluster1:/dev/rdsk/c0d1 /dev/did/rdsk/d3
3 cluster2:/dev/rdsk/c0d1 /dev/did/rdsk/d3
设置定额设备scconf -a -q globaldev=d3
成功后scconf -c -q reset 重置installmode
至此cluster设置完毕,其他应用,比如ORACLE的HA设置请参考相关文档去做。