Saturday, May 14, 2011

Red Hat Cluster Configuration

touch /etc/yum.repos.d/cluster.repo

echo [Server] >> /etc/yum.repos.d/cluster.repo
echo name=Server >> /etc/yum.repos.d/cluster.repo
echo baseurl=file:///misc/cd/Server >> /etc/yum.repos.d/cluster.repo
echo enabled=1 >> /etc/yum.repos.d/cluster.repo
echo gpgcheck=0 >> /etc/yum.repos.d/cluster.repo
echo [Cluster] >> /etc/yum.repos.d/cluster.repo
echo name=Cluster >> /etc/yum.repos.d/cluster.repo
echo baseurl=file:///misc/cd/Cluster >> /etc/yum.repos.d/cluster.repo
echo enabled=1 >> /etc/yum.repos.d/cluster.repo
echo gpgcheck=0 >> /etc/yum.repos.d/cluster.repo
echo [ClusterStorage] >> /etc/yum.repos.d/cluster.repo
echo name=ClusterStorage >> /etc/yum.repos.d/cluster.repo
echo baseurl=file:///misc/cd/ClusterStorage >> /etc/yum.repos.d/cluster.repo
echo enabled=1 >> /etc/yum.repos.d/cluster.repo
echo gpgcheck=0 >> /etc/yum.repos.d/cluster.repo

Insert the RHEL 5.6 X86_64 media on you CD/DVD Reader, and run the following command to update yum database :

yum update

If yum can’t use the new repository, check if autofs service is up and running (or start it) with the folowing command :

service autofs restart

At this point you can install all needed packages from create and administer a cluster :

yum groupinstall -y “Cluster Storage” “Clustering”

if you have to use iSCSI initiator (in this How-To I’ll use it) you have to install also the following packages :

yum install -y iscsi-initiator-utils isns-utils

And configure it to start at boot :

chkconfig iscsi on
chkconfig iscsid on

service iscsi start
service iscsid start

In this How-to I’ll use three systems, with this IP Address.
The two “rhel-cluster-nodeX” systems have two NICs, one for production and one for HighAvailability check.

rhel-cluster-node1
#node1

10.1.1.10 db1dev.intersil.com db1dev

192.168.1.40 db1dev.san.local

rhel-cluster-node2

#node2

10.1.1.20 db2dev.intersil.com db2dev

192.168.1.42 db2dev.san.local


rhel-cluster-san
192.168.1.30

What I’m going to do is create a cluster with 10.1.1.100 IP Address who share the service from 10.1.1.10 and 10.1.1.20 machines, and use a GFS filesystem reachable with iSCSI on 10.1.1.100 .
iscsiadm -m discovery -t st -p 192.168.234.203

iscsiadm -m node -L all


For convenience, add to both cluster nodes, the following lines in /etc/hosts :

#node1

10.1.1.10 db1dev.intersil.com db1dev

192.168.1.40 db1dev.san.local

rhel-cluster-node2

#node2

10.1.1.20 db2dev.intersil.com db2dev

192.168.1.42 db2dev.san.local


rhel-cluster-san
192.168.1.30


The 2 LUN has been taken from the SAN. One for quorum disk and other for data partition.

Be sure that the mapped device should be same in both the node

[root@db2dev ~]# fdisk -l

Disk /dev/sda: 11.1 GB, 11187257344 bytes

255 heads, 63 sectors/track, 1360 cylinders

Units = cylinders of 16065 * 512 = 8225280 bytes

Device Boot Start End Blocks Id System

/dev/sda1 * 1 13 104391 83 Linux

/dev/sda2 14 1360 10819777+ 8e Linux LVM

Disk /dev/sdb: 1073 MB, 1073741824 bytes

34 heads, 61 sectors/track, 1011 cylinders

Units = cylinders of 2074 * 512 = 1061888 bytes

Disk /dev/sdb doesn't contain a valid partition table

Disk /dev/sdc: 16.0 GB, 16039018496 bytes

64 heads, 32 sectors/track, 15296 cylinders

Units = cylinders of 2048 * 512 = 1048576 bytes

Disk /dev/sdc doesn't contain a valid partition table

[root@db1dev ~]# fdisk -l

Disk /dev/sda: 15.3 GB, 15384707072 bytes

255 heads, 63 sectors/track, 1870 cylinders

Units = cylinders of 16065 * 512 = 8225280 bytes

Device Boot Start End Blocks Id System

/dev/sda1 * 1 13 104391 83 Linux

/dev/sda2 14 1870 14916352+ 8e Linux LVM

Disk /dev/sdb: 1073 MB, 1073741824 bytes

34 heads, 61 sectors/track, 1011 cylinders

Units = cylinders of 2074 * 512 = 1061888 bytes

Disk /dev/sdb doesn't contain a valid partition table

Disk /dev/sdc: 16.0 GB, 16039018496 bytes

64 heads, 32 sectors/track, 15296 cylinders

Units = cylinders of 2048 * 512 = 1048576 bytes

Disk /dev/sdc doesn't contain a valid partition table

Now create the quorum disk

[root@db1dev ~]# mkqdisk -c /dev/sdb -l quorum

mkqdisk v0.6.0

Writing new quorum disk label 'quorum' to /dev/sdb.

WARNING: About to destroy all data on /dev/sdb; proceed [N/y] ? y

Initializing status block for node 1...

Initializing status block for node 2...

Initializing status block for node 3...

Initializing status block for node 4...

Initializing status block for node 5...

Initializing status block for node 6...

Initializing status block for node 7...

Initializing status block for node 8...

Initializing status block for node 9...

Initializing status block for node 10...

Initializing status block for node 11...

Initializing status block for node 12...

Initializing status block for node 13...

Initializing status block for node 14...

Initializing status block for node 15...

Initializing status block for node 16...

Now proceed creating a new Physical Volume, a new Volume Group and a new Logical Volume to use as a shared storage for cluster nodes, by using he following commands :

pvcreate /dev/sdc

vgcreate vg0 /dev/sdc

lvcreate -L 10G -n lv0 vg0

You’re done, you create a new volume group “vg1 and a new logical volume “lv0. The “-l 10239 parameter is based on the size on my iSCSI shared storage, in this case 40 GB.

At this point you are ready the create the clustered GFS file system on your device using the command below:

mkfs.gfs2 -p lock_dlm -t devcluster:sanvol1 -j 4 /dev/vg0/lv0

You’re done, you’ve created a GFS file system, with locking protocol “lock_dlm” for a cluster called “rhel-cluster” and with name “storage1, you can use this GFS for a maximum of 8 hosts and you’ve used the /dev/vg1/lvo device.

To administer Red Hat Clusters with Conga, run luci and ricci as follows :

On both systems, initialize the luci server using the luci_admin init command.

luci_admin init

This command create the ‘admin’ user and his password, for doing so follow on screen instruction, and check for an output as the following :

The admin password has been successfully set.
Generating SSL certificates…
The luci server has been successfully initialized

You must restart the luci server for changes to take effect, run the following to do it :

service luci restart

service ricci start

Configure the automatic startup for ricci and luci on both systems, using :

chkconfig luci on
chkconfig ricci on

chkconfig qdiskd on

Service qdiskd start ( on first node )


For a correct cluster configuration and maintenance, you have to start (and configure to start at boot) the following services :

chkconfig rgmanager on
chkconfig cman on

Now Create the Cluster :

Now open luci and create fail over domain

Now Add quorum disk

Now Add a Failover Domain

Now to add shared fence Devices

Click on Add fence Device

Now Add the Fence Device in both the node

Now Add Resources

Now Add the IP Resource

Now Add the Gfs resource

Now Add Services

Now Add the volume for testing

Install HeartBeat and build HA (High Availability) cluster environment.

The environment of 2 systems are like below. They have 2 NICs.

(1) db1dev.intersil.com[eth0:192.168.1.40] [eth1:10.1.1.10]

(2) db2dev.intersil.com[eth0:192.168.1.42] [eth1:10.1.1.20]

[1] Install HeartBeat first

yum -y install heartbeat

[root@db1dev ~]#

vi /etc/ha.d/authkeys

# create certificates

auth 1

1 crc

[root@db1dev ~]#

chmod 600 /etc/ha.d/authkeys

[2] Config for a server of (1)

[root@db1dev ~]#

vi /etc/ha.d/ha.cf

crm on

# debug log

debugfile /var/log/ha-debug

# log file

logfile /var/log/ha-log

# the way of output to syslog

logfacility local0

# keepalive

keepalive 2

# deadtime

deadtime 30

# deadping

deadping 40

# warntime

warntime 10

# initdead

initdead 60

# port

udpport 694

# interface and IP address of another Host

ucast eth1 10.1.1.20

# auto failback

auto_failback on

# node name (the name of "uname -n")

node db1dev.intersil.com

node db2dev.intersil.com

respawn root /usr/lib/heartbeat/pingd -m 100 -d 5s -a default_ping_set

[3] Config for a server of (2). The different point is only the section of ucast.

[root@db2dev ~]#

vi /etc/ha.d/ha.cf

crm on

debugfile /var/log/ha-debug

logfile /var/log/ha-log

logfacility local0

keepalive 2

deadtime 30

deadping 40

warntime 10

initdead 60

udpport 694

# interface and IP address of another Host

ucast eth1 10.1.1.20

auto_failback on

node db1dev.intersil.com

node db2dev.intersil.com

respawn root /usr/lib/heartbeat/pingd -m 100 -d 5s -a default_ping_set

[4] Start HeartBeat on both server

[root@db1dev ~]#

/etc/rc.d/init.d/heartbeat start

Starting High-Availability services:

[ OK ]

[root@db1dev ~]#

chkconfig heartbeat on

[5] Run crm_mon on both server, then if following result is shown, it's OK, heartbeat running normally. These are Basical configuration of HeartBeat.

[root@db1dev ~]#

crm_mon -i 3

============

Last updated: Sun May 8 22:29:23 2011

Current DC: db2dev.intersil.com (9a3ff13e-d6a0-4c0d-b8fc-dc7580082d4c)

2 Nodes configured.

0 Resources configured.

============

Node: db2dev.intersil.com (9a3ff13e-d6a0-4c0d-b8fc-dc7580082d4c): online

Node: db1dev.intersil.com (44359bd0-1b7e-4127-8f5f-1d9c093ea26e): online