GNBD Guide-kubryjsc-ChinaUnix博客

最近，在网上看到一些在Redhat下用HeartBeat或者RHCM/RHCS做HA的文档，大部分都用到了GNBD，但是他们大多只是简单地描述一下安装过程，几乎不涉及到GNBD的介绍和作用。 google了一下，也没发现有多少有用的信息。

恶啃了一些English Document，尤其是GFS和Redhat官方文档，终于对GNBD有个大概的了解。 GNBD 的全名叫做GFS Network Block Device，它包含2部分，客户端，包括gnbd.ko模块、gnbd_import、 gnbd_recvd、以及gnbd_monitor程序，服务端包括gnbd_serv、gnbd_clusterd、以及 gnbd_export程序。GNBD服务器允许导出多个块设备或者GNBD文件，GNBD客户端允许导入这些导出的块设备或者文件，然后把它们当作本地块设备使用。GNBD和NBD（the network block device distributed with the Linux kernel）的最大区别就是GNBD允许多个客户端同时访问服务器上同一块设备或者GNBD文件。

GNBD的文档已经被分成了好几段，你可以在上分别找到gnbd_import、gnbd_export、fence_gnbd的文档或者Manpage仔细阅读。

GFS简介：

GFS是The Global File System的简称，GFS是Redhat's Linux Cluster的一部分，它是一个集群文件系统。它允许集群里面的成员同时使用共享存储里面的任一块设备，比如FC，iSCSI，NBD提供的共享存储。

GNBD的客户端读写这些块设备相当于读写本地设备一样，但是也使用一个锁模块来允许协调它们之间的I/O，所以文件系统的一致性读写是可控的。GFS的最大的特性就是它读写的完美一致性。在任何一台成员上修改了文件系统上的数据会立即在集群内其他成员上表现出来。

附Usage.txt：

Introduction

The following document describes the package dependencies of GFS and explains in short how to compile and install GFS in general. It's included in the document folder of the source code too.

We placed it here for your (and our) convenience. It get's included from our repository and always shows the current version.

Compilation, Installation and Notes

How to install and run clvm and gfs.

Refer to the cluster project page for the latest information.

Get source
----------

- using cvs

See these pages for instructions:

summary: after cvs login,
cvs -d :pserver:cvs@sources.redhat.com:/cvs/dm checkout device-mapper
cvs -d :pserver:cvs@sources.redhat.com:/cvs/lvm2 checkout LVM2
cvs -d :pserver:cvs@sources.redhat.com:/cvs/cluster checkout cluster

- or download tarballs

device-mapper  -
lvm2   -
cluster  -

Build and install
-----------------

1. build and install from cluster tree
   cd cluster
   ./configure --kernel_src=/path/to/kernel
   make; make install

- This builds and installs kernel modules, libraries and user programs.

   - Kernel modules can also be built within the original kernel source tree
     by applying the kernel patches from cman-kernel/patches,
     dlm-kernel/patches and gfs-kernel/patches.

2. build device mapper user space
   cd device-mapper
   ./configure
   make; make install

3. build lvm2/clvm
   cd LVM2
   ./configure --with-clvmd --with-cluster=shared
   make; make install
   LVM2/scripts/clvmd_fix_conf.sh /lib

Load kernel modules
-------------------

depmod -a
modprobe dm-mod
device-mapper/scripts/devmap_mknod.sh
modprobe gfs
modprobe lock_dlm

Modules that should be loaded: lock_dlm, dlm, cman, gfs, lock_harness
and dm-mod if device-mapper was built as a module

Startup procedure
-----------------

Run these commands on each cluster node:

> ccsd                             - Starts the CCS daemon
> cman_tool join                   - Joins the cluster
> fence_tool join                  - Joins the fence domain (starts fenced)
> clvmd                            - Starts the CLVM daemon
> vgchange -aly                    - Activates LVM volumes (locally)
> mount -t gfs /dev/vg/lvol /mnt   - Mounts a GFS file system

Shutdown procedure
------------------

Run these commands on each cluster node:

> umount /mnt      - Unmounts a GFS file system
> vgchange -aln                    - Deactivates LVM volumes (locally)
> killall clvmd                    - Stops the CLVM daemon
> fence_tool leave                 - Leaves the fence domain (stops fenced)
> cman_tool leave                  - Leaves the cluster
> killall ccsd                     - Stops the CCS daemon

Creating cluster.conf
---------------------

There is no GUI or command line program to create the config file yet.
The cluster config file "cluster.conf" must therefore be created manually.
Once created, cluster.conf should be placed in the /etc/cluster/ directory
on one cluster node. CCS daemon (ccsd) will take care of transferring it
to other nodes where it's needed.

A minimal cluster.conf example is shown below.

Creating CLVM logical volumes
-----------------------------

Use standard LVM commands (see LVM documentation on using pvcreate, vgcreate,
lvcreate.) A node must be running the CLVM system to use the LVM commands.
Running the CLVM system means successfully running the commands above up
through starting clvmd.

Creating GFS file systems
-------------------------

> gfs_mkfs -p lock_dlm -t : -j

        must match the cluster name used in cluster.conf
        is a unique name chosen now to distinguish this fs from others
        the number of journals in the fs, one for each node to mount
        a block device, usually an LVM logical volume

Creating a GFS file system means writing to a CLVM volume which means the CLVM
system must be running (see previous section.)

Cluster startup tips
--------------------

* Avoiding unnecessary startup fencing when fence domain is first created.

When the cluster is quorate and the fence domain is first created (by a fence
daemon being started), any nodes not yet in the cluster will be fenced. By
default there's a delay of 6 seconds in this case to allow any nodes
unnecessarily flagged for fencing to join the cluster an avoid being fenced.
This delay can be increased by setting post_join_delay in cluster.conf:

The cluster.conf value can be overriden manually using the -j option to
fence_tool (or fenced).

When starting up the cluster in general, it's best to ensure that all the
nodes have joined the cluster before any of them run "fence_tool join". This
can be difficult if the startup steps are not done manually, though, which
makes the delay mentioned above is especially helpful. If the user is certain
all nodes are in a clean (non-hung) state when starting up the cluster, the -c
option can be used with fence_tool/fenced to bypass any startup fencing. See
the fenced and fence_tool man pages for more information.

Avoiding startup fencing is primarily an issue when the fencing agents being
used are NPS/power/reboot based. When SAN-based fencing agents are used, an
unnecessary fencing operation during startup usually isn't disruptive (the
unfencing done by fence_tool is helpful in this case.)

Cluster shutdown tips
---------------------

* Avoiding a partly shutdown cluster due to lost quorum.

There is a practical timing issue with respect to the shutdown steps being run
on all nodes when shutting down an entire cluster (or most of it). When
shutting down the entire cluster (or shutting down a node for an extended
period) use "cman_tool leave remove". This automatically reduces the number
of votes needed for quorum as each node leaves and prevents the loss of quorum
which could keep the last nodes from cleanly completing shutdown.

Using the "remove" leave option should not be used in general since it
introduces potential split-brain risks.

If the "remove" leave option is not used, quorum will be lost after enough
nodes have left the cluster. Once the cluster is inquorate, remaining members
that have not yet completed "fence_tool leave" in the steps above will be
stuck. Operations such as umounting gfs or leaving the fence domain will
block while the cluster is inquorate. They can continue and complete only
when quorum is regained.

If this happens, one option is to join the cluster ("cman_tool join") on some
of the nodes that have left so that the cluster regains quorum and the stuck
nodes can complete their shutdown. Another option is to forcibly reduce the
number of expected votes for the cluster which allows the cluster to become
quorate again ("cman_tool expected ").

Cluster information
-------------------

/proc/cluster/status
/proc/cluster/nodes
/proc/cluster/services

Cluster config file - cluster.conf
----------------------------------

This example primarily illustrates the variety of fencing configurations.

The first node uses "cascade fencing"; if the first method fails (power cycling
with an APC Masterswitch), the second is tried (port disable on a Brocade FC
switch). In this example, the node has dual paths to the storage so the port
on both paths must be disabled (the same idea applies to nodes with dual power
supplies.)

There is only one method of fencing the second node (via an APC Masterswitch)
so no cascade fencing is possible.

If no hardware is available for fencing, manual fencing can be used as shown
for the third node. If a node with manual fencing fails, a human must take
notice (a message appears in the system log) and run fence_ack_manual after
resetting the failed node. (The node that actually carries out fencing
operations is the node with the lowest ID in the fence domain.)

Updating cluster.conf
---------------------

To update the config file in a running cluster:

1. have all nodes running as cluster members using the original cluster.conf
2. on one node, update /etc/cluster/cluster.conf, incrementing config_version
3. on this same node run "killall -HUP ccsd"
4. verify that the new cluster.conf exists on all nodes
5. on this same node run "cman_tool -r "
6. check /proc/cluster/status to verify the new config version

Multiple clusters
-----------------

When multiple clusters are used, it can be useful to specify the cluster name
on the cman_tool command line. This forces CCS to select a cluster.conf with
the same cluster name. The node then joins this cluster.

> cman_tool join -c

[Note: If the -c option is not used, ccsd will first check the local copy of
cluster.conf to extract the cluster name and will only grab a remote copy of
cluster.conf if it has the same cluster name and a greater version number. If
a local copy of cluster.conf does not exist, ccsd may grab a cluster.conf for
a different cluster than intended -- cman_tool would then report an error that
the node is not listed in the file.

So, if you don't currently have a local copy of cluster.conf (and there are
other clusters running) or you wish to join a different cluster with a
different cluster.conf from what exists locally, you must specify the -c
option.]

Two node clusters
-----------------

Ordinarily the loss of quorum after one node fails out of two will prevent the
remaining node from continuing (if both nodes have one vote.) Some special
configuration options can be set to allow the one remaining node to continue
operating if the other fails. To do this only two nodes with one vote each can
be defined in cluster.conf. The two_node and expected_votes values must then be
set to 1 in the cman config section as follows.

Advanced Network Configuration
------------------------------

* UDP Port

CMAN uses UDP port 6809 by default. A different port number can be used by:

* Multihome

CMAN can be configured to use multiple network interfaces. If one fails it
should be able to continue running with the one remaining. A node's name in
cluster.conf is always associated with the IP address on one network
interface; "nd1" in the following:

To use a second network interface, the node must have a second hostname
associated with the IP address on that interface; "nd1-e1" in the following.
The second hostname is specfied in an "altname" section.

* Multicast

CMAN can be configured to use multicast instead of broadcast (broadcast is used
by default if no multicast parameters are given.) To configure multicast when
one network interface is used add one line under the section and another
under the section:

The multicast addresses must match and the address must be usable on the
interface name given for the node.

When two interfaces are used, multicast is configured as follows:

* IPv6

- When using multiple interfaces, all must use the same address family. Mixing
IPv4 and IPv6 is not allowed.

- When using IPv6, multicast must be configured; there is no IPv6 broadcast.