一台T服务器cluster详细实施步骤

1930阅读 0评论2013-01-18 zhangyudong1987
分类:系统运维

     一般实现Solaris cluster高可用,至少需要两台服务器和一个外置存储器,而服务器上还需要有独立的心跳网卡,在硬件上只有满足了以上的需求,才具备了实现HA高可用的可能性。

可以访问我豆丁文档:

一台T系列服务器,不外接存储器,如何实现cluster呢,这就需要使用ORACLE VM for SPARC技术实现虚拟化。首先服务器可以使用VMhypervisor来划分两个虚拟主机,其次,使用虚拟VSW服务实现网卡(其实,在T系列机器中,网卡数目还是够的,只是为了充分使用虚拟技术,不采用物理网卡),最为主要的就是在于定额设备的实现,采用了vdsk虚拟磁盘服务实现一个内置硬盘共享给两台虚拟主机作为共享外置存储。

本测试环境为T5140一台,有四块300G硬盘,format显示如下:

format

Searching for disks...done

AVAILABLE DISK SELECTIONS:

       0. c1t0d0

          /pci@400/pci@0/pci@8/scsi@0/sd@0,0

       1. c1t1d0

          /pci@400/pci@0/pci@8/scsi@0/sd@1,0

       2. c1t2d0

          /pci@400/pci@0/pci@8/scsi@0/sd@2,0

       3. c1t3d0

          /pci@400/pci@0/pci@8/scsi@0/sd@3,0

Specify disk (enter its number): ^D

计划c1t0d0作为控制域系统盘,c1t1d0作为cluster1节点系统盘,c1t2d0作为cluster2节点系统盘,c1t3d0作为定额设备被cluster1cluster2共享。

节点名字分别为cluster1cluster2cluster名字为cluster-ldm

虚拟服务器划分实施步骤

1.       Control domain的实现

A. 创建三个缺省的虚拟服务:

# ldm add-vds primary-vds primary 创建虚拟磁盘服务

# ldm add-vcc port-range=5000-5100 primary-vcc primary 创建虚拟console服务

# dm add-vsw net-dev=nxge0 primary-vsw primary  创建虚拟网络交换机服务

使用ldm list-services primary查看和验证三个缺省服务的创建

B. 创建控制域

 ldm set-vcpu 4 primary      设置控制域CPU资源

 ldm set-memory 1g primary 设置控制域memory资源

 ldm set-mau 0 primary            设置控制域加密单元(不使用该资源)

 ldm add-config initial           保存配置设置

svcadm enable svc:/ldoms/vntsd:default将虚拟控制台服务启动

 shutdown -y -g0 -i6重启主机,控制域control domain创建成功

2.       虚拟主机cluster1的实现

ldm add-domain cluster1创建cluster1

 ldm add-vcpu 12 cluster1添加CPU资源

 ldm add-memory 2G cluster1添加内存资源

ldm add-vnet vnet1 primary-vsw cluster1添加网络网卡设备

ldm add-vdsdev /dev/dsk/c1t1d0s2 vol1@primary-vds添加OS系统盘

 ldm add-vdisk bootdisk vol1@primary-vds cluster1

 ldm set-var auto-boot\?=false cluster1

 ldm set-var boot-device=bootdisk  cluster1

 ldm bind-domain cluster1 绑定资源

3.       虚拟主机cluster2的实现

ldm add-domain cluster2

 ldm add-vcpu 12 cluster2

 ldm add-memory 2G cluster2

ldm add-vnet vnet2  primary-vsw cluster2

ldm add-vdsdev /dev/dsk/c1t2d0s2 vol2@primary-vds

ldm add-vdisk bootdisk vol2@primary-vds cluster2

ldm set-var auto-boot\?=false cluster2

ldm set-var boot-device=bootdisk

ldm bind-domain cluster2

4.       安装操作系统

ldm add-vdsdev /opt/sun/sol-10-u10-ga-sparc-dvd.iso cdrom-iso@primary-vds

ldm add-vdisk cdrom cdrom-iso@primary-vds cluster1

这样可以使用iso文件给cluster1虚拟机安装操作系统,晚装结束后,将该ISO重新添加给cluster2,同样安装好操作系统。

至此,将所有设置saveSP中去:

ldm add-config final-config-two-clusternode

定额设备Quorum devices的实现

定额设备必须是两个节点同时能够访问到一个外置存储设备,也可以是其他定额设备服务器上的设备,在一台T系列机器上,没有同时外接存储时候,虚拟机之间可以使用vdsk虚拟技术实现一块内置硬盘的DAS架构同时访问。

本测试中使用内置硬盘c1t3d0 作为定额设备,具体实现方法如下:

ldm stop cluster1ldm stop cluster2

ldm add-vdsdev /dev/dsk/c1t3d0s2 vol1-share@primary-vds

ldm add-vdsdev -f /dev/dsk/c1t3d0s2 vol2-share@primary-vds

ldm add-vdisk vdisk1-share vol1-share@primary-vds cluster1

ldm add-vdisk vdisk2-share vol2-share@primary-vds cluster2

通过以上方法就实现了同一内置硬盘共享给两个虚拟节点。

心跳网卡的实现

心跳网卡可以使用VSW来实现虚拟的网卡,具体方法如下:

ldm add-vnet vnet1-share1 primary-vsw cluster1

ldm add-vnet vnet1-share2 primary-vsw cluster1

ldm add-vnet vnet2-share1 primary-vsw cluster2

ldm add-vnet vnet2-share2 primary-vsw cluster2

启动两个节点:

ldm start cluster1

ldm start cluster2

安装cluster软件并进行cluster配置

                    cluster软件分别ftp两个节点,解开后使用installer脚本进行安装然后进行手工设置,以下为两个节点的具体设置,注意红色字体部分:

1.       Cluster1的具体设置

scinstall

  *** Main Menu ***

    Please select from one of the following (*) options:

      * 1) Create a new cluster or add a cluster node

        2) Configure a cluster to be JumpStarted from this install server

        3) Manage a dual-partition upgrade

        4) Upgrade this cluster node

      * 5) Print release information for this cluster node

      * ?) Help with menu options

      * q) Quit

    Option:  1

  *** New Cluster and Cluster Node Menu ***

    Please select from any one of the following options:

        1) Create a new cluster

        2) Create just the first node of a new cluster on this machine

        3) Add this machine as a node in an existing cluster

        ?) Help with menu options

        q) Return to the Main Menu

    Option:  2

  *** Establish Just the First Node of a New Cluster ***

    This option is used to establish a new cluster using this machine as

    the first node in that cluster.

    Before you select this option, the Oracle Solaris Cluster framework

    software must already be installed. Use the Oracle Solaris Cluster

    installation media or the IPS packaging system to install Oracle

    Solaris Cluster software.

    Press Control-d at any time to return to the Main Menu.

    Do you want to continue (yes/no) [yes]? 

  >>> Typical or Custom Mode <<<

    This tool supports two modes of operation, Typical mode and Custom.

    For most clusters, you can use Typical mode. However, you might need

    to select the Custom mode option if not all of the Typical defaults

    can be applied to your cluster.

    For more information about the differences between Typical and Custom

    modes, select the Help option from the menu.

    Please select from one of the following options:

        1) Typical

        2) Custom

        ?) Help

        q) Return to the Main Menu

    Option [1]: 

  >>> Cluster Name <<<

    Each cluster has a name assigned to it. The name can be made up of any

    characters other than whitespace. Each cluster name should be unique

    within the namespace of your enterprise.

    What is the name of the cluster you want to establish?  cluster-ldm

  >>> Check <<<

    This step allows you to run cluster check to verify that certain basic

    hardware and software pre-configuration requirements have been met. If

    cluster check detects potential problems with configuring this machine

    as a cluster node, a report of violated checks is prepared and

    available for display on the screen.

    Do you want to run cluster check (yes/no) [yes]? 

    Running cluster check ...

  initializing...

  initializing xml output...

  loading auxiliary data...

  filtering out checks not marked with one of keywords: installtime

  starting check run...

     cluster1:  S6708605.... starting:  The /dev/rmt directory is missing.          

     cluster1:  S6708605       passed

     cluster1:  S6708606.... starting:  Multiple network interfaces on a single subn...

     cluster1:   S6708606       not applicable

     cluster1:   S6708642.... starting:  /proc fails to mount periodically during reb...

        searching /var/adm/messages

        searching /var/adm/messages.0

     cluster1:   S6708642       passed

     cluster1:   S6708638.... starting:  Node has insufficient physical memory.     

     cluster1:   S6708638       passed

     cluster1:   S6708496.... starting: Cluster node (3.1 or later) OpenBoot Prom (O...

     cluster1:   S6708496       passed

  finished check run

  finishing xml output...

  Maximum severity of all violations: No Violations

  Reports in: /var/cluster/logs/install/cluster_check/

  cleaning up...

   

Press Enter to continue: 

  >>> Cluster Nodes <<<

    This Oracle Solaris Cluster release supports a total of up to 16 nodes.

    Please list the names of the other nodes planned for the initial

    cluster configuration. List one node name per line. When finished,

    type Control-D:

    Node name (Control-D to finish):  cluster1

    Node name (Control-D to finish):  cluster2

    Node name (Control-D to finish):  ^D

    This is the complete list of nodes:

        cluster1

        cluster2

    Is it correct (yes/no) [yes]? 

  >>> Cluster Transport Adapters and Cables <<<

    Transport adapters are the adapters that attach to the private cluster

    interconnect.

    Select the first cluster transport adapter:

        1) vnet1

        2) vnet2

        3) Other

    Option:  1

    Will this be a dedicated cluster transport adapter (yes/no) [yes]? 

Searching for any unexpected network traffic on "vnet1" ... done

Unexpected network traffic was seen on "vnet1".

"vnet1" may be cabled to a public network.

Do you want to use "vnet1" anyway (yes/no) [no]?  yes

    Select the second cluster transport adapter:

        1) vnet1

        2) vnet2

        3) Other

    Option:  2

    Will this be a dedicated cluster transport adapter (yes/no) [yes]? 

    Searching for any unexpected network traffic on "vnet2" ... done

Unexpected network traffic was seen on "vnet2".

"vnet2" may be cabled to a public network.

Do you want to use "vnet2" anyway (yes/no) [no]?  yes

Plumbing network address 172.16.0.0 on adapter vnet1 >> NOT DUPLICATE ... done    Plumbing network address 172.16.0.0 on adapter vnet2 >> NOT DUPLICATE ... done/globaldevices is not mounted.

Cannot use "/globaldevices".

Do you want to use a lofi device instead and continue the installation (yes/no) [yes]? 

 >>> Quorum Configuration <<<

    Every two-node cluster requires at least one quorum device. By

    default, scinstall selects and configures a shared disk quorum device

    for you.

    This screen allows you to disable the automatic selection and

    configuration of a quorum device.

    You have chosen to turn on the global fencing. If your shared storage

    devices do not support SCSI, such as Serial Advanced Technology

    Attachment (SATA) disks, or if your shared disks do not support

    SCSI-2, you must disable this feature.

    If you disable automatic quorum device selection now, or if you intend

    to use a quorum device that is not a shared disk, you must instead use

    clsetup(1M) to manually configure quorum once both nodes have joined

    the cluster for the first time.

    Do you want to disable automatic quorum device selection (yes/no) [no]? 

  >>> Automatic Reboot <<<

    Once scinstall has successfully initialized the Oracle Solaris Cluster

    software for this machine, the machine must be rebooted. After the

    reboot, this machine will be established as the first node in the new

    cluster.

    Do you want scinstall to reboot for you (yes/no) [yes]?  

  >>> Confirmation <<<

    Your responses indicate the following options to scinstall:

      scinstall -i \

           -C cluster-ldm \

           -F \

           -G lofi \

           -T node=cluster1,node=cluster2,authtype=sys \

           -w netaddr=172.16.0.0,netmask=255.255.240.0,maxnodes=64,maxprivatenets=10,numvirtualclusters=12 \

           -A trtype=dlpi,name=vnet1 -A trtype=dlpi,name=vnet2 \

           -B type=switch,name=switch1 -B type=switch,name=switch2 \

           -m endpoint=:vnet1,endpoint=switch1 \

           -m endpoint=:vnet2,endpoint=switch2 \

           -P task=quorum,state=INIT

    Are these the options you want to use (yes/no) [yes]? 

    Do you want to continue with this configuration step (yes/no) [yes]? 

Initializing cluster name to "cluster-ldm" ... done

Initializing authentication options ... done

Initializing configuration for adapter "vnet1" ... done

Initializing configuration for adapter "vnet2" ... done

Initializing configuration for switch "switch1" ... done

Initializing configuration for switch "switch2" ... done

Initializing configuration for cable ... done

Initializing configuration for cable ... done

Initializing private network address options ... done

Setting the node ID for "cluster1" ... done (id=1)

Verifying that NTP is configured ... done

Initializing NTP configuration ... done

Updating nsswitch.conf ... done

Adding cluster node entries to /etc/inet/hosts ... done

Configuring IP multipathing groups ...done

Ensure that the EEPROM parameter "local-mac-address?" is set to "true" ... done

Ensure network routing is disabled ... done

Network routing has been disabled on this node by creating /etc/notrouter.

Having a cluster node act as a router is not supported by Oracle Solaris Cluster.

Please do not re-enable network routing.

Log file - /var/cluster/logs/install/scinstall.log.2234

Rebooting ...

2.       Cluster2节点具体设置

scinstall

  *** Main Menu ***

    Please select from one of the following (*) options:

      * 1) Create a new cluster or add a cluster node

        2) Configure a cluster to be JumpStarted from this install server

        3) Manage a dual-partition upgrade

        4) Upgrade this cluster node

      * 5) Print release information for this cluster node

      * ?) Help with menu options

      * q) Quit

    Option:  1

  *** New Cluster and Cluster Node Menu ***

    Please select from any one of the following options:

        1) Create a new cluster

        2) Create just the first node of a new cluster on this machine

        3) Add this machine as a node in an existing cluster

        ?) Help with menu options

        q) Return to the Main Menu

    Option:  3

  *** Add a Node to an Existing Cluster ***

    This option is used to add this machine as a node in an already

    established cluster. If this is a new cluster, there may only be a

    single node which has established itself in the new cluster.

    Before you select this option, the Oracle Solaris Cluster framework

    software must already be installed. Use the Oracle Solaris Cluster

    installation media or the IPS packaging system to install Oracle

    Solaris Cluster software.

    Press Control-d at any time to return to the Main Menu.

    Do you want to continue (yes/no) [yes]? 

  >>> Typical or Custom Mode <<<

    This tool supports two modes of operation, Typical mode and Custom.

    For most clusters, you can use Typical mode. However, you might need

    to select the Custom mode option if not all of the Typical defaults

    can be applied to your cluster.

    For more information about the differences between Typical and Custom

    modes, select the Help option from the menu.

    Please select from one of the following options:

        1) Typical

        2) Custom

        ?) Help

        q) Return to the Main Menu

    Option [1]: 

  >>> Sponsoring Node <<<

    For any machine to join a cluster, it must identify a node in that

    cluster willing to "sponsor" its membership in the cluster. When

    configuring a new cluster, this "sponsor" node is typically the first

    node used to build the new cluster. However, if the cluster is already

    established, the "sponsoring" node can be any node in that cluster.

    Already established clusters can keep a list of hosts which are able

    to configure themselves as new cluster members. This machine should be

    in the join list of any cluster which it tries to join. If the list

    does not include this machine, you may need to add it by using

    claccess(1CL) or other tools.

    And, if the target cluster uses DES to authenticate new machines

    attempting to configure themselves as new cluster members, the

    necessary encryption keys must be configured before any attempt to

    join.

    What is the name of the sponsoring node?  cluster1

  >>> Cluster Name <<<

    Each cluster has a name assigned to it. When adding a node to the

    cluster, you must identify the name of the cluster you are attempting

    to join. A sanity check is performed to verify that the "sponsoring"

    node is a member of that cluster.

    What is the name of the cluster you want to join?  cluster-ldm

    Attempting to contact "cluster1" ... done

    Cluster name "cluster-ldm" is correct.

   

Press Enter to continue: 

 >>> Check <<<

    This step allows you to run cluster check to verify that certain basic

    hardware and software pre-configuration requirements have been met. If

    cluster check detects potential problems with configuring this machine

    as a cluster node, a report of violated checks is prepared and

    available for display on the screen.

    Do you want to run cluster check (yes/no) [yes]? 

    Running cluster check ...

  initializing...

  initializing xml output...

  loading auxiliary data...

  filtering out checks not marked with one of keywords: installtime

  starting check run...

     cluster2:   S6708605.... starting:  The /dev/rmt directory is missing.         

     cluster2:   S6708605       passed

     cluster2:  S6708606.... starting:  Multiple network interfaces on a single subn...

     cluster2:   S6708606       not applicable

     cluster2:   S6708642.... starting:  /proc fails to mount periodically during reb...

        searching /var/adm/messages

        searching /var/adm/messages.0

     cluster2:   S6708642       passed

     cluster2:   S6708638.... starting:  Node has insufficient physical memory.     

     cluster2:   S6708638       passed

     cluster2:   S6708496.... starting: Cluster node (3.1 or later) OpenBoot Prom (O...

     cluster2:   S6708496       passed

  finished check run

  finishing xml output...

  Maximum severity of all violations: No Violations

  Reports in: /var/cluster/logs/install/cluster_check/

  cleaning up...

   

Press Enter to continue: 

  >>> Autodiscovery of Cluster Transport <<<

    If you are using Ethernet or Infiniband adapters as the cluster

    transport adapters, autodiscovery is the best method for configuring

    the cluster transport.

    Do you want to use autodiscovery (yes/no) [yes]?  

    Probing .....

    The following connection was discovered:

        cluster1:vnet1  switch1  cluster2:vnet1

    Probes were sent out from all transport adapters configured for

    cluster node "cluster1". But, they were only received on less than 2

    of the network adapters on this machine ("cluster2"). This may be due

    to any number of reasons, including improper cabling, an improper

    configuration for "cluster1", or a switch which was confused by the

    probes.

    You can either attempt to correct the problem and try the probes again

    or manually configure the transport. To correct the problem might

    involve re-cabling, changing the configuration for "cluster1", or

    fixing hardware. You must configure the transport manually to

    configure tagged VLAN adapters and non tagged VLAN adapters on the

    same private interconnect VLAN.

    Do you want to try again (yes/no) [yes]?  no

  >>> Cluster Transport Adapters and Cables <<<

    Transport adapters are the adapters that attach to the private cluster

    interconnect.

    Select the first cluster transport adapter:

        1) vnet1

        2) vnet2

        3) Other

    Option:  1

    Will this be a dedicated cluster transport adapter (yes/no) [yes]? 

    Select the second cluster transport adapter:

        1) vnet1

        2) vnet2

        3) Other

    Option:  2

    Will this be a dedicated cluster transport adapter (yes/no) [yes]? 

  >>> Automatic Reboot <<<

    Once scinstall has successfully initialized the Oracle Solaris Cluster

    software for this machine, the machine must be rebooted. The reboot

    will cause this machine to join the cluster for the first time.

    Do you want scinstall to reboot for you (yes/no) [yes]? 

  >>> Confirmation <<<

    Your responses indicate the following options to scinstall:

      scinstall -i \

           -C cluster-ldm \

           -N cluster1 \

           -A trtype=dlpi,name=vnet1 -A trtype=dlpi,name=vnet2 \

           -m endpoint=:vnet1,endpoint=switch1 \

           -m endpoint=:vnet2,endpoint=switch2

    Are these the options you want to use (yes/no) [yes]? 

    Do you want to continue with this configuration step (yes/no) [yes]? 

Checking device to use for global devices file system ... done

Adding node "cluster2" to the cluster configuration ... done

Adding adapter "vnet1" to the cluster configuration ... done

Adding adapter "vnet2" to the cluster configuration ... done

Adding cable to the cluster configuration ... done

Adding cable to the cluster configuration ... done

Copying the config from "cluster1" ... done

Copying the postconfig file from "cluster1" if it exists ... done

Setting the node ID for "cluster2" ... done (id=2)

Verifying the major number for the "did" driver with "cluster1" ... done

Checking for global devices global file system ... done

Updating vfstab ... done

Verifying that NTP is configured ... done

Initializing NTP configuration ... done

Updating nsswitch.conf ... done

Adding cluster node entries to /etc/inet/hosts ... done

Configuring IP multipathing groups ...done

Ensure that the EEPROM parameter "local-mac-address?" is set to "true" ... done

Ensure network routing is disabled ... done

Network routing has been disabled on this node by creating /etc/notrouter.

Having a cluster node act as a router is not supported by Oracle Solaris Cluster.

Please do not re-enable network routing.

Updating file ("ntp.conf.cluster") on node cluster1 ... done

Updating file ("hosts") on node cluster1 ... done

Log file - /var/cluster/logs/install/scinstall.log.2111

Rebooting ...

3.       设置定额设备Quorum devices

didadm –L显示设备,确认d3为定额设备:

1        cluster1:/dev/rdsk/c0d0        /dev/did/rdsk/d1    

2        cluster2:/dev/rdsk/c0d0        /dev/did/rdsk/d2    

3        cluster1:/dev/rdsk/c0d1        /dev/did/rdsk/d3    

3        cluster2:/dev/rdsk/c0d1        /dev/did/rdsk/d3    

设置定额设备scconf -a -q globaldev=d3

成功后scconf -c -q reset  重置installmode

至此cluster设置完毕,其他应用,比如ORACLEHA设置请参考相关文档去做。

 

上一篇:ORACLE T系列主机VM(LDOM)配置实施指导
下一篇:结合ZFS实现一个硬盘用于RAC所有LUN的需求