Sunday, June 5, 2011

Red Hat Cluster with Oracle Service Failover

































Table of Contents

Install Red Hat Linux Operating System

You will want to install the same OS version on both servers in your cluster. Please verify that the version of Oracle you wish to install is certified on that release of Red Hat.
See Server Build installation guide for required Linux packages, kernel settings and user requirements.

Configure Network Bonding

The next step is to configure the network interfaces. Here you will use network bonding of dual public interface ports. This is an HA system and you should think about network redundancy.
To configure the bond0 device with the Ethernet interface eth0 and eth1, and configure the bond1 device perform the following steps:
1. Add the following line to /etc/modprobe.conf
alias bond0 bonding
alias bond1 bonding
2. Create the channel bonding interface files ifcfg-bond0 and ifcfg-bond1, in the /etc/sysconfig/network-scripts/ directory:
# cat /etc/sysconfig/network-scripts/ifcfg-bond0
 
DEVICE=bond0
IPADDR=132.158.201.177
NETMASK=255.255.252.0
GATEWAY=132.158.202.254
USERCTL=no
BOOTPROTO=none
ONBOOT=yes
BONDING_OPTS="mode=1 miimon=100" 
 
 # cat /etc/sysconfig/network-scripts/ifcfg-bond1
 
DEVICE=bond1
IPADDR=132.158.201.187
 
NETMASK=255.255.252.0
GATEWAY=132.158.202.254
USERCTL=no
 
BOOTPROTO=none
ONBOOT=yes
BONDING_OPTS="mode=1 miimon=100"
3. Configure the Ethernet interface in the file /etc/sysconfig/network-scripts/ifcfg-eth0. & /etc/sysconfig/network-scripts/ifcfg-eth1
ifcfg-eth0
DEVICE=eth0
BOOTPROTO=none
HWADDR=14:FE:B5:D8:7C:11
ONBOOT=yes
ONBOOT=yes
MASTER=bond0
SLAVE=yes
USERCTL=no
ifcfg-eth1
DEVICE=eth1
BOOTPROTO=none
HWADDR=14:fe:b5:d8:7c:13
ONBOOT=yes
ONBOOT=yes
MASTER=bond0
SLAVE=yes
USERCTL=no
Restart the network service:
# service network restart
5. In order to check the bonding status, check the following file:
# cat /proc/net/bonding/bond0
Note: Only mode 1 of Network Bonding is supported in Red Hat Cluster.

Configuring Cluster

This section discusses how to install and configure Red Hat Cluster Suite and Global File System on your Dell & Red Hat HA Cluster system using Conga and CLI Tools.
Conga is a configuration and management suite based on a server/agent model. You can access the management server luci using a standard web browser from anywhere on the network. Luci communicates to the client agent ricci on the nodes and installs all required packages, synchronizes the cluster configuration file, and manages the storage cluster. Though there are other possible methods such as system-config-cluster and creating an xml configuration file by hand, it is recommended that you use Conga to configure and manage your cluster.

Two nodes with RHEL 5.6 X86_64 installed and want to create a cluster to have High Availability for Oracle services.
It also assumes that Storage Area Network (SAN) accessible from the two systems have free space on it.
First of all need to install on both the systems all needed packages.
For doing this, create a cluster.repo file in /etc/yum.repos.d with the following command
touch /etc/yum.repos.d/cluster.repo
echo [Server] >> /etc/yum.repos.d/cluster.repo
echo name=Server >> /etc/yum.repos.d/cluster.repo
echo baseurl=file:///misc/cd/Server >> /etc/yum.repos.d/cluster.repo
echo enabled=1 >> /etc/yum.repos.d/cluster.repo
echo gpgcheck=0 >> /etc/yum.repos.d/cluster.repo
echo [Cluster] >> /etc/yum.repos.d/cluster.repo
echo name=Cluster >> /etc/yum.repos.d/cluster.repo
echo baseurl=file:///misc/cd/Cluster >> /etc/yum.repos.d/cluster.repo
echo enabled=1 >> /etc/yum.repos.d/cluster.repo
echo gpgcheck=0 >> /etc/yum.repos.d/cluster.repo
echo [ClusterStorage] >> /etc/yum.repos.d/cluster.repo
echo name=ClusterStorage >> /etc/yum.repos.d/cluster.repo
echo baseurl=file:///misc/cd/ClusterStorage >> /etc/yum.repos.d/cluster.repo
echo enabled=1 >> /etc/yum.repos.d/cluster.repo
echo gpgcheck=0 >> /etc/yum.repos.d/cluster.repo
Insert the RHEL 5.6 X86_64 media on you CD/DVD Reader, and run the following command to update yum database:
yum update
If yum can’t use the new repository, check if autofs service is up and running (or start it) with the folowing command :
service autofs restart
At this point you can install all needed packages from create and administer a cluster :
yum groupinstall -y “Cluster Storage” “Clustering”
The two “rhel-cluster-nodeX” systems have two NICs, one for production and one for High Availability check.
# node1
132.158.201.177 PBOADQ1A.intersil.corp PBOADQ1A
132.158.201.187 node1.intersil.corp node1
# node2
132.158.201.179 PBOADQ1B.intersil.corp PBOADQ1B
132.158.201.188 node2.intersil.corp node2
# Virtual IP
132.158.201.181 PBOADQC1.intersil.corp PBOADQC1
The Virtual IP Address (132.158.201.181) who shares the service from (132.158.201.177) node1 and (132.158.201.179) node2 Servers.
Note: hosts file entry should be same in both the nodes

Setting up a High-Availability Cluster

  • Initialize the luci server and assign the admin password in both the node
[root@PBOADQ1B ~]# luci_admin init
Initializing the luci server
Creating the 'admin' user
Enter password:
Confirm password:
Please wait...
The admin password has been successfully set.
Generating SSL certificates...
The luci server has been successfully initialized
You must restart the luci server for changes to take effect.
Run "service luci restart" to do so
  • Configure luci to start on boot:
[root@PBOADQ1B ~]#chkconfig luci on
  • Start the luci service:
[root@PBOADQ1B ~]#service luci start
Starting luci: Generating https SSL certificates...  done   [OK]
 
Point your web browser to https://pboadq1b.intersil.corp:8084 to access luci

Creating Your Cluster Using Conga

Conga automatically installs the software required for clustering, on all cluster nodes.
Connect to the luci server from any browser on the same network as the management node. In your web browser, enter:
https://pboadq1b.intersil.corp:8084
1) Login with the default username i.e admin and give the password (redhat) created earlier

2) Click on Cluster

3) Click on Create a New Cluster
4) Enter the Cluster name, Node name, password and click on View SSL Cert fingerprints for any error and submit.
Note: Node Hostname will be the heartbeat name which is defined is hosts file. (For eg node1.intersil.com) and it should be in separate interface preferably eth1
5) Building the Cluster with the 2 nodes
6) Cluster is created with the Configuration Version 1
7) Next step is to create the Failover Domains. Click on Failover Domains
8) click on Add a Failover Domain a) Enter the name of the Failover Domain b) Click on Prioritized c) click on the member of the 2 nodes.
9) Next Step is to configure the Fence Device. We will used Intelligent Platform Management Interface (IPMI) as Fence device
Note: DRAC6 is not supported by Red Hat Cluster, so we will use Intelligent Platform Management Interface (IPMI) as fenced device.
If a cluster node is configured to be fenced by an integrated fence device, disable ACPI Soft-Off for that node. Disabling ACPI Soft-Off allows an integrated fence device to turn off a node immediately and completely rather than attempting a clean shutdown
[root@PBOADQ1A ~]# chkconfig acpid off
[root@PBOADQ1B ~]# chkconfig acpid off
Prerequisites
· OpenIPMI
· OpenIPMI-tools
We need to configure the LAN device for network access. First, take a look at the configuration:
[root@PBOADQ1A ~]# ipmitool lan print 1
Set in Progress : Set Complete
Auth Type Support : NONE MD2 MD5 PASSWORD
Auth Type Enable : Callback : MD2 MD5
: User : MD2 MD5
: Operator : MD2 MD5
: Admin : MD2 MD5
: OEM :
IP Address Source : Static Address
IP Address : 132.158.201.178
Subnet Mask : 255.255.252.0
MAC Address : 14:fe:b5:d8:75:7d
SNMP Community String : public
IP Header : TTL=0x40 Flags=0x40 Precedence=0x00 TOS=0x10
Default Gateway IP : 132.158.202.254
Default Gateway MAC : 00:00:00:00:00:00
Backup Gateway IP : 0.0.0.0
Backup Gateway MAC : 00:00:00:00:00:00
802.1q VLAN ID : Disabled
802.1q VLAN Priority : 0
RMCP+ Cipher Suites : 0,1,2,3,4,5,6,7,8,9,10,11,12,13,14
Cipher Suite Priv Max : aaaaaaaaaaaaaaa
: X=Cipher Suite Unused
: c=CALLBACK
: u=USER
: o=OPERATOR
: a=ADMIN
: O=OEM
[root@PBOADQ1B ~]# ipmitool lan print 1
Set in Progress : Set Complete
Auth Type Support : NONE MD2 MD5 PASSWORD
Auth Type Enable : Callback : MD2 MD5
: User : MD2 MD5
: Operator: MD2 MD5
: Admin : MD2 MD5
: OEM :
IP Address Source : Static Address
IP Address : 132.158.201.180
Subnet Mask : 255.255.252.0
MAC Address : 14:fe:b5:d8:7c:19
SNMP Community String : public
IP Header : TTL=0x40 Flags=0x40 Precedence=0x00 TOS=0x10
Default Gateway IP : 132.158.202.254
Default Gateway MAC : 00:00:00:00:00:00
Backup Gateway IP : 0.0.0.0
Backup Gateway MAC : 00:00:00:00:00:00
802.1q VLAN ID : Disabled
802.1q VLAN Priority : 0
RMCP+ Cipher Suites : 0,1,2,3,4,5,6,7,8,9,10,11,12,13,14
Cipher Suite Priv Max : aaaaaaaaaaaaaaa
: X=Cipher Suite Unused
: c=CALLBACK
: u=USER
: o=OPERATOR
: a=ADMIN
: O=OEM
Note: The IP address is set, so we need to enable the interface for both the nodes.
[root@PBOADQ1A ~]# ipmitool lan set 1 user
 
[root@PBOADQ1A ~]# ipmitool lan set 1 access on
 
[root@PBOADQ1B ~]# ipmitool lan set 1 user
 
[root@PBOADQ1B ~]# ipmitool lan set 1 access on
 

User Configuration

First, we need to list the users and get the user ID of the administrative user for IPMI:
[root@PBOADQ1A ~]# ipmitool user list
ID Name Callin Link Auth IPMI Msg Channel Priv Limit
2 root true true true ADMINISTRATOR
[root@PBOADQ1B ~]# ipmitool user list
ID Name Callin Link Auth IPMI Msg Channel Priv Limit
2 root true true true ADMINISTRATOR
If you don't already know the password for the 'root' user, you can set it from the command line as well. This should NOT be the same as the root user password for the system). Note the 2 in the command line - this matches the administrator / root account from above:
[root@PBOADQ1A ~]# ipmitool user set password 2
Password for user 2: calvin
Password for user 2: calvin
[root@PBOADQ1B ~]#ipmitool user set password 2
Password for user 2: calvin
Password for user 2: calvin
Preliminary Testing
The easiest thing to check is the power status. Note - you CAN NOT issue IPMI-over-lan commands from the same machine. That is, you must perform the following test from a different machine than the one you have just configured.
 
[root@PBOADQ1A ~]# ipmitool -H 132.158.201.180 -I lan -U root -P calvin chassis power status
Chassis Power is on
 
[root@PBOADQ1B ~]# ipmitool -H 132.158.201.178 -I lan -U root -P calvin chassis power status
Chassis Power is on
 
(10) To configure the fence device from the Luci. Click on the node1.intersil.corp
 
   
 
 
 
 
 
 
 
(11)  Click on Add a fence to this level. Choose Fence Type as IPMI Lan. Provide the IP Address username, password for the node1.intersil.corp.
 
   
 
 
(12) Next to configure the fence device for the node2.intersil.corp. Click on node2.intersil.corp and 
Add a fence device to this level, and then select IPMI Lan. Next provide the IP Address, username & password
 
   
 
 
 
 
 

(13) Setting up a Storage Cluster

This section describes the procedure to set up a Global File System (GFS) that is shared between the cluster nodes. Verify that the high-availability Red Hat cluster is running before setting up the storage cluster.
Note: Please check SAN is visible from both the nodes. Be sure that the mapped device should be same in both the nodes
Configuring shared storage consists of the following steps:
a) Configuring a Clustered Logical Volume (CLVM).
b) Configuring Global File System (GFS).
Total 10 LUN has been allocated for Cluster. One LUN for quorum and 9 LUN for data partition and the mount point will be /d01, /d02, /d03, /d04, /d05, /d06, /d07, /d08, /d09
The mount point size is as follows for Prod/Dev/QA
Type
Prod (GB)
Dev (GB)
QA (GB)

Control + Redo
50
50
50

Archives
420
420
420

Tables
250
250
250

Indexes
250
250
250

Backup
500
500

Staging
350


a) Configuring a Clustered Logical Volume (CLVM).
[root@PBOADQ1B ~]# fdisk –lNote: It should show the same partition in both the nodes.
Now create the quorum disk
[root@PBOADQ1B ~]# mkqdisk -c /dev/emcpowera -l quorum
mkqdisk v0.6.0
Writing new quorum disk label 'quorum' to /dev/emcpowera
WARNING: About to destroy all data on /dev/ emcpowera; proceed [N/y]? Y
Initializing status block for node 1...
Initializing status block for node 2...
Initializing status block for node 3...
Initializing status block for node 4...
Initializing status block for node 5...
Initializing status block for node 6...
Initializing status block for node 7...
Initializing status block for node 8...
Initializing status block for node 9...
Initializing status block for node 10...
Initializing status block for node 11...
Initializing status block for node 12...
Initializing status block for node 13...
Initializing status block for node 14...
Initializing status block for node 15...
Initializing status block for node 16...
Now proceed creating a new Physical Volume, a new Volume Group and a new Logical Volume to use as a shared storage for cluster nodes, by using he following commands :
[root@PBOADQ1B ~]# pvcreate /dev/emcpowere
[root@PBOADQ1B ~]# vgcreate vg1 /dev/emcpowere
[root@PBOADQ1B ~]# lvcreate -L 50G -n lv1 vg1


Note: We have to follow the same process for the LUN. All will have separate volume group
The Volume groups created are vg1, vg2, vg3, vg4, vg5, vg6, vg7, vg8.
Now create the directory /d01, /d02, /d03, /d04, /d05, /d06, /d07, /d08, /d09 in both the nodes.
b) Configuring Global File System (GFS).
To create the clustered GFS file system on the device using the command below:
mkfs.gfs2 -p lock_dlm -t PBOADQC1: sanvol1 -j 4 /dev/vg1/lv1
Created a GFS file system, with locking protocol “lock_dlm” for a cluster called “PBOADQC1” and with name “sanvol1”
Note: For creating multiple cluster GFS file system the volume name will be different. Now it has sanvol1 for the next it will be sanvol2 and so on. For every GFS file system we will have different name.
(14) Install Oracle software on both nodes and create an Oracle database
Install the Oracle binaries on both servers. Do not create a database at this time.
We will create the database after we have installed the binaries. Make sure that your
Oracle home is set to a local, non-shared, directory. It is not advisable to install the
Oracle Binaries on a shared partition at this time.
To create your database you will need to have the shared storage LUN mounted to both
nodes in the cluster. Choose the mount point for the shared storage as the location for the Oracle files.
Note: The tnsnames.ora and listener.ora files on each server should be configured to use the virtual IP address for client connections.
The sample TNS entry for Oracle is below. Please note the “Host” entry is pointing to the hostname associated with Virtual IP.
APD2 =
(DESCRIPTION_LIST =
(DESCRIPTION =
(ADDRESS = (PROTOCOL = IPC)(KEY = EXTPROC1529))
(ADDRESS = (PROTOCOL = TCP)(HOST = PBOADQC1.intersil.corp)(PORT = 1529))
)
)
(15) Click on Resource then click on Add a Resource
(16) Choose GFS file system. Enter the Name, Mount point, device, File system type, Click on the option Force unmount. Repeat this step for the number of mount point.
(17) Click on resource. Next Add a Resource. Choose IP Address. Enter the Virtual IP. Click on submit.
(18) Click on resource. Next Add a Resource. Choose Script. Enter the Name, Full path to the script file.
Note: The script file should be placed in the both nodes.
The script file will look like this
#!/bin/bash
export ORACLE_HOME=/home/oracle/product/11.2.0
export ORACLE_SID=APD2
cd /home/oracle
export ORACLE_TRACE='T'
case $1 in
start)
echo "Starting Oracle: `date`"
su - oracle -c "/home/oracle/product/11.2.0/bin/dbstart $ORACLE_HOME $ORACLE_SID"
;;
stop)
echo "Stopping Oracle: `date`"
su - oracle -c "/home/oracle/product/11.2.0/bin/dbshut $ORACLE_HOME $ORACLE_SID"
;;
status)
echo "DB must be running. I don't know how to check. Can you please check it manually? Don't mind!!"
;;
esac
For the customized scripts dbstart and dbshut, please refer Appendix-A.
(19) Click on Services. Next click on Add a Service.
(20) Enter the Service name, Click on Automatically start this service, select the failover domain, choose Recovery policy as Relocate
(21) Click on Add a resource to this service. First resource should be File System
(22) Click on Add a child. Next Resource should be the virtual IP
(23) Click on Add a child. Next resource will be the script.
(24) The Service Composition looks like this. First File system, Next Virtual IP and then the script.
These changes are reflected in the /etc/cluster/cluster.conf file on each of the servers in the cluster. The sample cluster.conf is attached in the Appendix-B. This is just a sample file and should be used as a reference only.
Verification Checklist
Item
Verified
Cluster and Cluster Storage
Yes
Red Hat Cluster Suite installed and configured
Yes
Nodes participating in cluster
Yes
Fencing configured on nodes
Yes
Clustered logical volume
Yes
Global File System
Yes
Oracle Installed on both nodes
Yes
Resource created
Yes
Services created
Yes

Common Issues

Networking Issues

Red Hat Clustering nodes use Multicast to communicate. Your switches must be configured to enable multicast addresses and support IGMP.

Troubleshooting Conga

The following sections describe issues you may encounter while creating the cluster initially and the possible work-around.

Running luci on a Cluster Node

If you are using a cluster node also as a management node and running luci, you have to restart luci manually after the initial configuration. For example:
[root@node1]# service luci restart
 

Debugging problems with luci

luci can be started in debug mode, by changing the settings in /var/lib/luci/etc/zope.conf file. Change the debug-mode value to on and restart luci on the management node. The debug messages will be directed to /var/log/messages files after setting the debug mode.

Issues While Creating a Cluster Initially

If the following error appears when initially installing the cluster:
The following errors occurred:
Unable to add the key for node node1.intersil.corp to the trusted keys list.
Unable to add the key for node node2.intersil.corp to the trusted keys list.
Unable to connect to node2.intersil.corp: Unable to establish an SSL connection to node2.ha.lab:11111:
Client Socket (hostname, port, timeout): connect () failed
 
Unable to connect to node1.intersil.corp: Unable to establish an SSL connection to node1.ha.lab:11111:
Client Socket (hostname, port, timeout): connect () failed
This error occurs when the luci server cannot communicate with the ricci agent. Verify that ricci is installed and started on each node.

Configuration File Issues

Configuration errors manifest themselves as the following error in /var/log/messages:

"AIS Executive exiting (-9)"
Check for syntax errors in your /etc/cluster/cluster.conf file. This is unlikely to happen if you are using Conga to manage your cluster configuration file.

Logical Volume Issues

It may be necessary to restart the clustered logical volume manager with the command:
[root@node1]# service clvmd restart
Ensure all nodes have a consistent view of the shared storage with the command partprobe or clicking reprobe storage in Conga. As a last resort, reboot all nodes, or select restart cluster in Conga.

Test the Cluster installation

Network Setup Test
Ping
· Verify that each host in the Cluster configuration can ping all other host in the cluster on the public network
SSH
· Verify that ssh is operational between all nodes in the cluster
Loss of One Public Network Cable Test
· Remove one network cable from the primary host
· Network bonding should fail over at the network level without affecting the cluster
Loss of Network Test
Remove all network cables from the primary host
· The host should reboot

Testing Fencing Mechanisms

Fence each node to ensure that fencing is working properly.
1. Watch the logs from node 1 with the following command:
[root@node1]# tail -f /var/log/messages
2. Fence the node 2 by executing the following command:
[root@node1]# fence_node {fully qualified hostname or ip address of node2}
3. View the logs on node1 and the console node2. Node 1 should successfully fence node2.


4. Continue to watch the messages file for status changes. You can also use the Cluster Status tool to see the cluster view of a node. The parameter-i 2 refreshes the tool every two seconds. For more information on clusters see:
[root@node1]# clustat -i 2
Unexpected System Reboot Test
On the primary host, “reboot –fn”
Run clustat on the other node to verify cluster status


Appendix – A

The customized dbstart and dbshut scripts are below:

dbstart

#!/bin/sh
#
# $Id: dbstart.sh 22-may-2008.05:05:45 arogers Exp $
# Copyright (c) 1991, 2008, Oracle. All rights reserved.
#

###################################
#
# usage: dbstart $ORACLE_HOME
#
# This script is used to start ORACLE from /etc/rc(.local).
# It should ONLY be executed as part of the system boot procedure.
#
# This script will start all databases listed in the oratab file
# whose third field is a "Y".  If the third field is set to "Y" and
# there is no ORACLE_SID for an entry (the first field is a *),
# then this script will ignore that entry.
#
# This script requires that ASM ORACLE_SID's start with a +, and
# that non-ASM instance ORACLE_SID's do not start with a +.
#
# If ASM instances are to be started with this script, it cannot
# be used inside an rc*.d directory, and should be invoked from
# rc.local only. Otherwise, the CSS service may not be available
# yet, and this script will block init from completing the boot
# cycle.
#
# If you want dbstart to auto-start a single-instance database that uses
# an ASM server that is auto-started by CRS (this is the default behavior
# for an ASM cluster), you must change the database's ORATAB entry to use
# a third field of "W" and the ASM's ORATAB entry to use a third field of "N".
# These values specify that dbstart auto-starts the database only after
# the ASM instance is up and running.
#
# Note:
# Use ORACLE_TRACE=T for tracing this script.
#
# The progress log for each instance bringup plus Error and Warning message[s]
# are logged in file $ORACLE_HOME/startup.log. The error messages related to
# instance bringup are also logged to syslog (system log module).
# The Listener log is located at $ORACLE_HOME_LISTNER/listener.log
#
# On all UNIX platforms except SOLARIS
# ORATAB=/etc/oratab
#
# To configure, update ORATAB with Instances that need to be started up
#    Entries are of the form:
#    $ORACLE_SID:$ORACLE_HOME::
#    An example entry:
#    main:/usr/lib/oracle/emagent_10g:Y
#
# Overall algorithm:
# 1) Bring up all ASM instances with 'Y' entry in status field in oratab entry
# 2) Bring up all Database instances with 'Y' entry in status field in
#    oratab entry
# 3) If there are Database instances with 'W' entry in status field
#    then
#      iterate over all ASM instances (irrespective of 'Y' or 'N') AND
#      wait for all of them to be started
#    fi
# 4) Bring up all Database instances with 'W' entry in status field in
#    oratab entry
#
#####################################

LOGMSG="logger -puser.alert -s "

trap 'exit' 1 2 3

# for script tracing
case $ORACLE_TRACE in
  T) set -x ;;
esac

# Set path if path not set (if called from /etc/rc)
SAVE_PATH=/bin:/usr/bin:/etc:${PATH} ; export PATH
SAVE_LLP=$LD_LIBRARY_PATH

# First argument is used to bring up Oracle Net Listener
ORACLE_HOME_LISTNER=$1
if [ ! $ORACLE_HOME_LISTNER ] ; then
  echo "ORACLE_HOME_LISTNER is not SET, unable to auto-start Oracle Net Listener"
  echo "Usage: $0 ORACLE_HOME"
else
  LOG=$ORACLE_HOME_LISTNER/listener.log

  # Set the ORACLE_HOME for the Oracle Net Listener, it gets reset to
  # a different ORACLE_HOME for each entry in the oratab.
  export ORACLE_HOME=$ORACLE_HOME_LISTNER

  # Start Oracle Net Listener
  if [ -x $ORACLE_HOME_LISTNER/bin/tnslsnr ] ; then
    echo "$0: Starting Oracle Net Listener" >> $LOG 2>&1
    $ORACLE_HOME_LISTNER/bin/lsnrctl start $2 >> $LOG 2>&1 &
    VER10LIST=`$ORACLE_HOME_LISTNER/bin/lsnrctl version | grep "LSNRCTL for " | cut -d' ' -f5 | cut -d'.' -f1`
    export VER10LIST
  else
    echo "Failed to auto-start Oracle Net Listener using $ORACLE_HOME_LISTNER/bin/tnslsnr"
  fi
fi

# Set this in accordance with the platform
ORATAB=/etc/oratab
if [ ! $ORATAB ] ; then
  echo "$ORATAB not found"
  exit 1;
fi

# Checks Version Mismatch between Listener and Database Instance.
# A version 10 listener is required for an Oracle Database 10g database.
# Previous versions of the listener are not supported for use with an Oracle
# Database 10g database. However, it is possible to use a version 10 listener
# with previous versions of the Oracle database.
checkversionmismatch() {
  if [ $VER10LIST ] ; then
    VER10INST=`sqlplus -V | grep "Release " | cut -d' ' -f3 | cut -d'.' -f1`
    if [ $VER10LIST -lt $VER10INST ] ; then
      $LOGMSG "Listener version $VER10LIST NOT supported with Database version $VER10INST"
      $LOGMSG "Restart Oracle Net Listener using an alternate ORACLE_HOME_LISTNER:"
      $LOGMSG "lsnrctl start"
    fi
  fi
}

# Starts a Database Instance
startinst() {
  # Called programs use same database ID
  export ORACLE_SID

  # Put $ORACLE_HOME/bin into PATH and export.
  PATH=$ORACLE_HOME/bin:${SAVE_PATH} ; export PATH
  # add for bug # 652997
  LD_LIBRARY_PATH=${ORACLE_HOME}/lib:${SAVE_LLP} ; export LD_LIBRARY_PATH
  PFILE=${ORACLE_HOME}/dbs/init${ORACLE_SID}.ora
  SPFILE=${ORACLE_HOME}/dbs/spfile${ORACLE_SID}.ora
  SPFILE1=${ORACLE_HOME}/dbs/spfile.ora

  echo ""
  echo "$0: Starting up database \"$ORACLE_SID\""
  date
  echo ""

  checkversionmismatch

  # See if it is a V6 or V7 database
  VERSION=undef
  if [ -f $ORACLE_HOME/bin/sqldba ] ; then
    SQLDBA=sqldba
    VERSION=`$ORACLE_HOME/bin/sqldba command=exit | awk '
      /SQL\*DBA: (Release|Version)/ {split($3, V, ".") ;
      print V[1]}'`
    case $VERSION in
      "6") ;;
      *) VERSION="internal" ;;
    esac
  else
    if [ -f $ORACLE_HOME/bin/svrmgrl ] ; then
      SQLDBA=svrmgrl
      VERSION="internal"
    else
      SQLDBA="sqlplus /nolog"
    fi
  fi

  STATUS=1
  if [ -f $ORACLE_HOME/dbs/sgadef${ORACLE_SID}.dbf ] ; then
    STATUS="-1"
  fi
  if [ -f $ORACLE_HOME/dbs/sgadef${ORACLE_SID}.ora ] ; then
    STATUS="-1"
  fi
  pmon=`ps -ef | grep -w "ora_pmon_$ORACLE_SID"  | grep -v grep`
  if [ "$pmon" != "" ] ; then
    STATUS="-1"
    $LOGMSG "Warning: ${INST} \"${ORACLE_SID}\" already started."
  fi

  if [ $STATUS -eq -1 ] ; then
    $LOGMSG "Warning: ${INST} \"${ORACLE_SID}\" possibly left running when system went down (system crash?)."
    $LOGMSG "Action: Notify Database Administrator."
    case $VERSION in
      "6")  sqldba "command=shutdown abort" ;;
      "internal")  $SQLDBA $args <connect internal
shutdown abort
EOF
        ;;
      *)  $SQLDBA $args <connect / as sysdba
shutdown abort
quit
EOF
        ;;
    esac

    if [ $? -eq 0 ] ; then
      STATUS=1
    else
      $LOGMSG "Error: ${INST} \"${ORACLE_SID}\" NOT started."
    fi
  fi

  if [ $STATUS -eq 1 ] ; then
    if [ -e $SPFILE -o -e $SPFILE1 -o -e $PFILE ] ; then
      case $VERSION in
        "6")  sqldba command=startup ;;
        "internal")  $SQLDBA <connect internal
startup
EOF
          ;;
        *)  $SQLDBA <connect / as sysdba
startup
quit
EOF
          ;;
      esac

      if [ $? -eq 0 ] ; then
        echo ""
        echo "$0: ${INST} \"${ORACLE_SID}\" warm started."
      else
        $LOGMSG ""
        $LOGMSG "Error: ${INST} \"${ORACLE_SID}\" NOT started."
      fi
    else
      $LOGMSG ""
      $LOGMSG "No init file found for ${INST} \"${ORACLE_SID}\"."
      $LOGMSG "Error: ${INST} \"${ORACLE_SID}\" NOT started."
    fi
  fi
}

# Starts an ASM Instance
startasminst() {
  # Called programs use same database ID
  export ORACLE_SID
  ORACLE_HOME=`echo $LINE | awk -F: '{print $2}' -`
  # Called scripts use same home directory
  export ORACLE_HOME

  # For ASM instances, we have a dependency on the CSS service.
  # Wait here for it to become available before instance startup.

  # Is the 10g install intact? Are all necessary binaries present?
  if [ ! -x $ORACLE_HOME/bin/crsctl ]; then
    $LOGMSG "$ORACLE_HOME/bin/crsctl not found when attempting to start"
    $LOGMSG "  ASM instance $ORACLE_SID."

  else
    COUNT=0
    $ORACLE_HOME/bin/crsctl check css
    RC=$?
    while [ "$RC" != "0" ];
      do
      COUNT=`expr $COUNT + 1`
      if [ $COUNT = 15 ] ; then
        # 15 tries with 20 sec interval => 5 minutes timeout
        $LOGMSG "Timed out waiting to start ASM instance $ORACLE_SID"
        $LOGMSG "  CSS service is NOT available."
        exit $COUNT
      fi
      $LOGMSG "Waiting for Oracle CSS service to be available before starting "
      $LOGMSG " ASM instance $ORACLE_SID. Wait $COUNT."
      sleep 20
      $ORACLE_HOME/bin/crsctl check css
      RC=$?
    done
  fi
  startinst
}

# Start of dbstartup script
#
# Loop for every entry in oratab file and and try to start
# that ORACLE.
#
# ASM instances need to be started before 'Database instances'
# ASM instance is identified with '+' prefix in ORACLE_SID
# Following loop brings up ASM instance[s]

cat $ORATAB | while read LINE
do
case $LINE in
  \#*)                ;;        #comment-line in oratab
  *)
  ORACLE_SID=`echo $LINE | awk -F: '{print $1}' -`
  if [ "$ORACLE_SID" = '*' ] ; then
    # same as NULL SID - ignore this entry
    ORACLE_SID=""
    continue
  fi
  # Proceed only if last field is 'Y'.
  if [ "`echo $LINE | awk -F: '{print $NF}' -`" = "Y" ] ; then
    # If ASM instances
    if [ `echo $ORACLE_SID | cut -b 1` = '+' ]; then
      INST="ASM instance"
      ORACLE_HOME=`echo $LINE | awk -F: '{print $2}' -`
      # Called scripts use same home directory
      export ORACLE_HOME
      # file for logging script's output
      LOG=$ORACLE_HOME/startup.log
      touch $LOG
      chmod a+r $LOG
      echo "Processing $INST \"$ORACLE_SID\": log file $ORACLE_HOME/startup.log"
      startasminst >> $LOG 2>&1
    fi
  fi
  ;;
esac
done

# exit if there was any trouble bringing up ASM instance[s]
if [ "$?" != "0" ] ; then
  exit 2
fi

#
# Following loop brings up 'Database instances'
#
cat $ORATAB | while read LINE
do
case $LINE in
  \#*)                ;;        #comment-line in oratab
  *)
  ORACLE_SID=`echo $LINE | awk -F: '{print $1}' -`
  if [ "$ORACLE_SID" = '*' ] ; then
    # same as NULL SID - ignore this entry
    ORACLE_SID=""
    continue
  fi
  # Proceed only if last field is 'Y'.
  if [ "`echo $LINE | awk -F: '{print $NF}' -`" = "Y" ] ; then
    # If non-ASM instances
    if [ `echo $ORACLE_SID | cut -b 1` != '+' ]; then
      INST="Database instance"
      ORACLE_HOME=`echo $LINE | awk -F: '{print $2}' -`
      # Called scripts use same home directory
      export ORACLE_HOME
      # file for logging script's output
      LOG=$ORACLE_HOME/startup.log
      touch $LOG
      chmod a+r $LOG
      echo "Processing $INST \"$ORACLE_SID\": log file $ORACLE_HOME/startup.log"
      startinst >> $LOG 2>&1
    fi
  fi
  ;;
esac
done

#
# Following loop brings up 'Database instances' that have wait state 'W'
#
cat $ORATAB | while read LINE
do
case $LINE in
  \#*)                ;;        #comment-line in oratab
  *)
  ORACLE_SID=`echo $LINE | awk -F: '{print $1}' -`
  if [ "$ORACLE_SID" = '*' ] ; then
    # same as NULL SID - ignore this entry
    ORACLE_SID=""
    continue
  fi
  # Proceed only if last field is 'W'.
  if [ "`echo $LINE | awk -F: '{print $NF}' -`" = "W" ] ; then
    W_ORACLE_SID=`echo $LINE | awk -F: '{print $1}' -`
    # DB instances with 'W' (wait state) have a dependency on ASM instances via CRS.
    # Wait here for 'all' ASM instances to become available.
    cat $ORATAB | while read LINE
    do
    case $LINE in
      \#*)                ;;        #comment-line in oratab
      *)
      ORACLE_SID=`echo $LINE | awk -F: '{print $1}' -`
      if [ "$ORACLE_SID" = '*' ] ; then
        # same as NULL SID - ignore this entry
        ORACLE_SID=""
        continue
      fi
      if [ `echo $ORACLE_SID | cut -b 1` = '+' ]; then
        INST="ASM instance"
        ORACLE_HOME=`echo $LINE | awk -F: '{print $2}' -`
        if [ -x $ORACLE_HOME/bin/srvctl ] ; then
          COUNT=0
          NODE=`olsnodes -l`
          RNODE=`srvctl status asm -n $NODE | grep "$ORACLE_SID is running"`
          RC=$?
          while [ "$RC" != "0" ]; # wait until this comes up!
          do
            COUNT=$((COUNT+1))
            if [ $COUNT = 5 ] ; then
              # 5 tries with 60 sec interval => 5 minutes timeout
              $LOGMSG "Error: Timed out waiting on CRS to start ASM instance $ORACLE_SID"
              exit $COUNT
            fi
            $LOGMSG "Waiting for Oracle CRS service to start ASM instance $ORACLE_SID"
            $LOGMSG "Wait $COUNT."
            sleep 60
            RNODE=`srvctl status asm -n $NODE | grep "$ORACLE_SID is running"`
            RC=$?
          done
        else
          $LOGMSG "Error: \"${W_ORACLE_SID}\" has dependency on ASM instance \"${ORACLE_SID}\""
          $LOGMSG "Error: Need $ORACLE_HOME/bin/srvctl to check this dependency"
        fi
      fi     # asm instance
      ;;
    esac
    done # innner while
  fi
  ;;
esac
done # outer while

# by now all the ASM instances have come up and we can proceed to bring up
# DB instance with 'W' wait status

cat $ORATAB | while read LINE
do
case $LINE in
  \#*)                ;;        #comment-line in oratab
  *)
  ORACLE_SID=`echo $LINE | awk -F: '{print $1}' -`
  if [ "$ORACLE_SID" = '*' ] ; then
    # same as NULL SID - ignore this entry
    ORACLE_SID=""
    continue
  fi
  # Proceed only if last field is 'W'.
  if [ "`echo $LINE | awk -F: '{print $NF}' -`" = "W" ] ; then
    INST="Database instance"
    if [ `echo $ORACLE_SID | cut -b 1` = '+' ]; then
      $LOGMSG "Error: ${INST} \"${ORACLE_SID}\" NOT started"
      $LOGMSG "Error: incorrect usage: 'W' not allowed for ASM instances"
      continue
    fi
    ORACLE_HOME=`echo $LINE | awk -F: '{print $2}' -`
    # Called scripts use same home directory
    export ORACLE_HOME
    # file for logging script's output
    LOG=$ORACLE_HOME/startup.log
    touch $LOG
    chmod a+r $LOG
    echo "Processing $INST \"$ORACLE_SID\": log file $ORACLE_HOME/startup.log"
    startinst >> $LOG 2>&1
  fi
  ;;
esac
done

dbshut

#!/bin/sh
#
# $Id: dbshut.sh 22-may-2008.05:19:31 arogers Exp $
# Copyright (c) 1991, 2008, Oracle. All rights reserved.
#

###################################
#
# usage: dbshut $ORACLE_HOME
#
# This script is used to shutdown ORACLE from /etc/rc(.local).
# It should ONLY be executed as part of the system boot procedure.
#
# This script will shutdown all databases listed in the oratab file
# whose third field is a "Y" or "W".  If the third field is set to "Y" and
# there is no ORACLE_SID for an entry (the first field is a *),
# then this script will ignore that entry.
#
# This script requires that ASM ORACLE_SID's start with a +, and
# that non-ASM instance ORACLE_SID's do not start with a +.
#
# Note:
# Use ORACLE_TRACE=T for tracing this script.
# Oracle Net Listener is also shutdown using this script.
#
# The progress log for each instance shutdown is logged in file
# $ORACLE_HOME/shutdown.log.
#
# On all UNIX platforms except SOLARIS
# ORATAB=/etc/oratab
#
# To configure, update ORATAB with Instances that need to be shutdown
#    Entries are of the form:
#    $ORACLE_SID:$ORACLE_HOME::
#    An example entry:
#    main:/usr/lib/oracle/emagent_10g:Y
#
#####################################

trap 'exit' 1 2 3
case $ORACLE_TRACE in
  T) set -x ;;
esac

# Set path if path not set (if called from /etc/rc)
SAVE_PATH=/bin:/usr/bin:/etc:${PATH} ; export PATH
SAVE_LLP=$LD_LIBRARY_PATH

# The  this to bring down Oracle Net Listener
ORACLE_HOME_LISTNER=$1
if [ ! $ORACLE_HOME_LISTNER ] ; then
  echo "$ORACLE_HOME_LISTNER is not SET, unable to auto-stop Oracle Net Listener"
  echo "ORACLE_HOME_LISTNER is not SET, unable to auto-stop Oracle Net Listener"
  echo "Usage: $0 ORACLE_HOME"
else
  LOG=$ORACLE_HOME_LISTNER/listener.log

  # Set the ORACLE_HOME for the Oracle Net Listener, it gets reset to
  # a different ORACLE_HOME for each entry in the oratab.
  export ORACLE_HOME=$ORACLE_HOME_LISTNER

  # Stop Oracle Net Listener
  if [ -f $ORACLE_HOME_LISTNER/bin/tnslsnr ] ; then
    echo "$0: Stoping Oracle Net Listener" >> $LOG 2>&1
    $ORACLE_HOME_LISTNER/bin/lsnrctl stop $2 >> $LOG 2>&1 &
  else
    echo "Failed to auto-stop Oracle Net Listener using $ORACLE_HOME_LISTNER/bin/tnslsnr"
  fi
fi

# Set this in accordance with the platform
ORATAB=/etc/oratab
if [ ! $ORATAB ] ; then
  echo "$ORATAB not found"
  exit 1;
fi

# Stops an instance
stopinst() {
  ORACLE_SID=`echo $LINE | awk -F: '{print $1}' -`
  if [ "$ORACLE_SID" = '*' ] ; then
    ORACLE_SID=""
  fi
# Called programs use same database ID
  export ORACLE_SID
  ORACLE_HOME=`echo $LINE | awk -F: '{print $2}' -`
# Called scripts use same home directory
  export ORACLE_HOME
# Put $ORACLE_HOME/bin into PATH and export.
  PATH=$ORACLE_HOME/bin:${SAVE_PATH} ; export PATH
# add for bug 652997
  LD_LIBRARY_PATH=${ORACLE_HOME}/lib:${SAVE_LLP} ; export LD_LIBRARY_PATH
  PFILE=${ORACLE_HOME}/dbs/init${ORACLE_SID}.ora

# See if it is a V6 or V7 database
  VERSION=undef
  if [ -f $ORACLE_HOME/bin/sqldba ] ; then
    SQLDBA=sqldba
    VERSION=`$ORACLE_HOME/bin/sqldba command=exit | awk '
      /SQL\*DBA: (Release|Version)/ {split($3, V, ".") ;
      print V[1]}'`
    case $VERSION in
      "6") ;;
      *) VERSION="internal" ;;
    esac
  else
    if [ -f $ORACLE_HOME/bin/svrmgrl ] ; then
      SQLDBA=svrmgrl
      VERSION="internal"
    else
      SQLDBA="sqlplus /nolog"
    fi
  fi

  case $VERSION in
    "6")  sqldba command=shutdown ;;
    "internal")  $SQLDBA <connect internal
shutdown immediate
EOF
     ;;
     *)  $SQLDBA <connect / as sysdba
shutdown immediate
quit
EOF
     ;;
  esac

  if test $? -eq 0 ; then
    echo "${INST} \"${ORACLE_SID}\" shut down."
  else
    echo "${INST} \"${ORACLE_SID}\" not shut down."
  fi
}

#
# Loop for every entry in oratab file and and try to shut down
# that ORACLE
#
# Following loop shuts down 'Database Instance[s]' with 'Y' entry

cat $ORATAB | while read LINE
do
  case $LINE in
  \#*)                ;;        #comment-line in oratab
  *)
  ORACLE_SID=`echo $LINE | awk -F: '{print $1}' -`
  if [ "$ORACLE_SID" = '*' ] ; then
      # NULL SID - ignore
      ORACLE_SID=""
      continue
  fi
  # Proceed only if last field is 'Y' or 'W'
  if [ "`echo $LINE | awk -F: '{print $NF}' -`" = "Y" ] ; then
    if [ `echo $ORACLE_SID | cut -b 1` != '+' ]; then
      INST="Database instance"
      ORACLE_HOME=`echo $LINE | awk -F: '{print $2}' -`
      LOG=$ORACLE_HOME/shutdown.log
      echo "Processing $INST \"$ORACLE_SID\": log file $LOG"
      stopinst >> $LOG 2>&1
    fi
  fi
  ;;
  esac
done

#
# Following loop shuts down 'Database Instance[s]' with 'W' entry
#
cat $ORATAB | while read LINE
do
  case $LINE in
  \#*)                ;;        #comment-line in oratab
  *)
  ORACLE_SID=`echo $LINE | awk -F: '{print $1}' -`
  if [ "$ORACLE_SID" = '*' ] ; then
      # NULL SID - ignore
      ORACLE_SID=""
      continue
  fi
  # Proceed only if last field is 'Y' or 'W'
  if [ "`echo $LINE | awk -F: '{print $NF}' -`" = "W" ] ; then
    if [ `echo $ORACLE_SID | cut -b 1` != '+' ]; then
      INST="Database instance"
      ORACLE_HOME=`echo $LINE | awk -F: '{print $2}' -`
      LOG=$ORACLE_HOME/shutdown.log
      echo "Processing $INST \"$ORACLE_SID\": log file $LOG"
      stopinst >> $LOG 2>&1
    fi
  fi
  ;;
  esac
done

#
# Following loop shuts down 'ASM Instance[s]'
#

cat $ORATAB | while read LINE
do
  case $LINE in
    \#*)                ;;        #comment-line in oratab
    *)
    ORACLE_SID=`echo $LINE | awk -F: '{print $1}' -`
    if [ "$ORACLE_SID" = '*' ] ; then
      # NULL SID - ignore
      ORACLE_SID=""
      continue
    fi
    # Proceed only if last field is 'Y'.
    # Entries whose last field is not Y or N are not DB entry, ignore them.
    if [ "`echo $LINE | awk -F: '{print $NF}' -`" = "Y" ] ; then
      if [ `echo $ORACLE_SID | cut -b 1` = '+' ]; then
        INST="ASM instance"
        ORACLE_HOME=`echo $LINE | awk -F: '{print $2}' -`
        LOG=$ORACLE_HOME/shutdown.log
        echo "Processing $INST \"$ORACLE_SID\": log file $LOG"
        stopinst >> $LOG 2>&1
      fi
    fi
  ;;
  esac
done

Appendix – B

The sample /etc/cluster/cluster.conf file is attached below:

5 Comments:

At September 20, 2011 at 7:56 AM , Blogger Ramzi said...

Good Job

 
At December 1, 2011 at 6:09 AM , Blogger LaoTsao said...

where are the scripts?
thx

 
At June 24, 2012 at 10:23 PM , Blogger Arnab Mukherjee said...

#!/bin/bash

export ORACLE_HOME=/home/oracle/product/11.2.0

export ORACLE_SID=APD2

cd /home/oracle

export ORACLE_TRACE='T'

case $1 in

start)

echo "Starting Oracle: `date`"

su - oracle -c "/home/oracle/product/11.2.0/bin/dbstart $ORACLE_HOME $ORACLE_SID"

;;

stop)

echo "Stopping Oracle: `date`"

su - oracle -c "/home/oracle/product/11.2.0/bin/dbshut $ORACLE_HOME $ORACLE_SID"

;;

status)

echo "DB must be running. I don't know how to check. Can you please check it manually? Don't mind!!"

;;

esac

 
At March 27, 2014 at 10:14 PM , Blogger Arnab Mukherjee said...

Thanks

 
At July 21, 2014 at 5:06 AM , Blogger abhishek said...

The databse has to be created from only one of the nodes but the service must be running on all the nodes. Am i right?

 

Post a Comment

Subscribe to Post Comments [Atom]

<< Home