How to Startup an Oracle Exadata Machine

Posted on June 19, 2020 by Zed DBA Standard 3

There maybe times when you are required to fully shutdown an Oracle Exadata Machine, for example for maintenance.

For instructions on how to shutdown an Oracle Exadata Machine, please refer to my blog post:
How to Shutdown an Oracle Exadata Machine

Once shutdown, you will need to be able to re-start which this blog post will detail.

Below is the My Oracle Support note used to carry out the startup:
Steps To Shutdown/Startup The Exadata & RDBMS Services and Cell/Compute Nodes On An Exadata Configuration (Doc ID 1093890.1)

1. Pre-requisites

Ensure you have all the compute nodes and storage cells ILOM addresses and correct passwords. Otherwise you will not be able to remotely power back on and will require a physical power on using the power button on the front panels.

2. Power on first Compute Node

You can power on the first compute node via the ilom via ssh or WebILOM. I prefer the ssh method shown below:

[AnwarZ@v1proxy1 ~]$ ssh root@v1ex1dbadm01-ilom
Password:

Oracle(R) Integrated Lights Out Manager

Version 4.0.4.37 r130617

Copyright (c) 2019, Oracle and/or its affiliates. All rights reserved.

Warning: HTTPS certificate is set to factory default.

Hostname: v1ex1dbadm01-ilom

-> show /SYSTEM

/System
Targets:
Open_Problems (0)
Processors
Memory
Power
Cooling
Storage
Networking
PCI_Devices
Firmware
BIOS
Log

Properties:
health = OK
health_details = -
open_problems_count = 0
type = Rack Mount
model = Exadata X5-2
qpart_id = XXXXXX
part_number = Exadata X5-2
serial_number = XXXXXXXXXX
component_model = ORACLE SERVER X5-2
component_part_number = XXXXXXX
component_serial_number = XXXXXXXXXX
system_identifier = Exadata Database Machine X5-2 XXXXXXXXXX
system_fw_version = 4.0.4.37
primary_operating_system = Not Available
primary_operating_system_detail = Comprehensive System monitoring is not available. Ensure the host is
running with the Hardware Management Pack. For details go to
http://www.oracle.com/goto/ilom-redirect/hmp-osa
host_primary_mac_address = xx:xx:xx:xx:xx:xx
ilom_address = x.x.x.x
ilom_mac_address = xx:xx:xx:xx:xx:xx
locator_indicator = Off
power_state = Off
actual_power_consumption = 22 watts
action = (Cannot show property)

Commands:
cd
reset
set
show
start
stop

-> start /SYSTEM
Are you sure you want to start /System (y/n)? y
Starting /System

-> show /SYSTEM

/System
Targets:
Open_Problems (0)
Processors
Memory
Power
Cooling
Storage
Networking
PCI_Devices
Firmware
BIOS
Log

Properties:
health = OK
health_details = -
open_problems_count = 0
type = Rack Mount
model = Exadata X5-2
qpart_id = XXXXXX
part_number = Exadata X5-2
serial_number = XXXXXXXXXX
component_model = ORACLE SERVER X5-2
component_part_number = XXXXXXX
component_serial_number = XXXXXXXXXX
system_identifier = Exadata Database Machine X5-2 XXXXXXXXXX
system_fw_version = 4.0.4.37
primary_operating_system = Not Available
primary_operating_system_detail = Comprehensive System monitoring is not available. Ensure the host is
running with the Hardware Management Pack. For details go to
http://www.oracle.com/goto/ilom-redirect/hmp-osa
host_primary_mac_address = xx:xx:xx:xx:xx:xx
ilom_address = x.x.x.x
ilom_mac_address = xx:xx:xx:xx:xx:xx
locator_indicator = Off
power_state = On
actual_power_consumption = 220 watts
action = (Cannot show property)

Commands:
cd
reset
set
show
start
stop

-> exit
Connection to v1ex1dbadm01-ilom closed.
[AnwarZ@v1proxy1 ~]$

3. Power on all Storage Cells

login as: root
root@v1ex1dbadm01's password:
Last login: Wed Jun 10 09:21:41 IST 2020 from v1ex1dbadm01.v1.com on ssh
Last login: Wed Jun 10 17:31:31 2020 from x.x.x.x
[root@v1ex1dbadm01 ~]# uptime
17:31:37 up 1 min, 1 user, load average: 3.37, 1.22, 0.44
[root@v1ex1dbadm01 ~]# export HISTIGNORE='*'
[root@v1ex1dbadm01 ~]# for host in `cat /opt/oracle.SupportTools/onecommand/cell_group`; do
> echo ${host}: `ipmitool -I lanplus -H ${host}-ilom -U root -P XXXXXXXX chassis power on`
> done
v1ex1celadm01: Chassis Power Control: Up/On
v1ex1celadm02: Chassis Power Control: Up/On
v1ex1celadm03: Chassis Power Control: Up/On
[root@v1ex1dbadm01 ~]# export HISTIGNORE=''
[root@v1ex1dbadm01 ~]#

Please Note: the HISTIGNORE is used, so the password isn’t kept in history.

Check the storage cell services are up:

[root@v1ex1dbadm01 ~]# dcli -g /opt/oracle.SupportTools/onecommand/cell_group -l root 'hostname; uptime'
v1ex1celadm01: v1ex1celadm01.v1.com
v1ex1celadm01: 18:10:21 up 32 min, 0 users, load average: 1.86, 1.95, 2.03
v1ex1celadm02: v1ex1celadm02.v1.com
v1ex1celadm02: 18:10:21 up 32 min, 0 users, load average: 1.47, 1.82, 1.97
v1ex1celadm03: v1ex1celadm03.v1.com
v1ex1celadm03: 18:10:22 up 32 min, 0 users, load average: 1.51, 1.85, 2.01
[root@v1ex1dbadm01 ~]#

[root@v1ex1dbadm01 ~]# dcli -g /opt/oracle.SupportTools/onecommand/cell_group -l root "cellcli -e 'list cell detail'"
v1ex1celadm01: name: v1ex1celadm01
v1ex1celadm01: accessLevelPerm: remoteLoginEnabled
v1ex1celadm01: bbuStatus: normal
v1ex1celadm01: cellVersion: OSS_19.2.7.0.0_LINUX.X64_191012
v1ex1celadm01: cpuCount: 32/32
v1ex1celadm01: diagHistoryDays: 7
v1ex1celadm01: eighthRack: FALSE
v1ex1celadm01: fanCount: 8/8
v1ex1celadm01: fanStatus: normal
v1ex1celadm01: flashCacheMode: WriteBack
v1ex1celadm01: httpsAccess: ALL
v1ex1celadm01: id: XXXXXXXXXX
v1ex1celadm01: interconnectCount: 2
v1ex1celadm01: interconnect1: ib0
v1ex1celadm01: interconnect2: ib1
v1ex1celadm01: iormBoost: 0.0
v1ex1celadm01: ipaddress1: x.x.x.x/22
v1ex1celadm01: ipaddress2: x.x.x.x/22
v1ex1celadm01: kernelVersion: 4.1.12-124.30.1.el7uek.x86_64
v1ex1celadm01: locatorLEDStatus: off
v1ex1celadm01: makeModel: Oracle Corporation ORACLE SERVER X5-2L High Capacity
v1ex1celadm01: memoryGB: 94
v1ex1celadm01: metricHistoryDays: 7
v1ex1celadm01: notificationMethod: snmp
v1ex1celadm01: notificationPolicy: critical,warning,clear
v1ex1celadm01: offloadGroupEvents:
v1ex1celadm01: powerCount: 2/2
v1ex1celadm01: powerStatus: normal
v1ex1celadm01: ramCacheMaxSize: 0
v1ex1celadm01: ramCacheMode: Auto
v1ex1celadm01: ramCacheSize: 0
v1ex1celadm01: releaseImageStatus: success
v1ex1celadm01: releaseVersion: 19.2.7.0.0.191012
v1ex1celadm01: rpmVersion: cell-19.2.7.0.0_LINUX.X64_191012-1.x86_64
v1ex1celadm01: releaseTrackingBug: 30393131
v1ex1celadm01: rollbackVersion: 18.1.18.0.0.190709
v1ex1celadm01: snmpSubscriber: host=x.x.x.x,port=162,community=public,type=ASR,asrmPort=16161
v1ex1celadm01: host=x.x.x.x,port=161,community=V1
v1ex1celadm01: host=x.x.x.x,port=161,community=V1
v1ex1celadm01: status: online
v1ex1celadm01: temperatureReading: 22.0
v1ex1celadm01: temperatureStatus: normal
v1ex1celadm01: upTime: 0 days, 0:33
v1ex1celadm01: usbStatus: normal
v1ex1celadm01: cellsrvStatus: running
v1ex1celadm01: msStatus: running
v1ex1celadm01: rsStatus: running
v1ex1celadm02: name: v1ex1celadm02
...
v1ex1celadm02: status: online
v1ex1celadm02: temperatureReading: 22.0
v1ex1celadm02: temperatureStatus: normal
v1ex1celadm02: upTime: 0 days, 0:33
v1ex1celadm02: usbStatus: normal
v1ex1celadm02: cellsrvStatus: running
v1ex1celadm02: msStatus: running
v1ex1celadm02: rsStatus: running
v1ex1celadm03: name: v1ex1celadm03
...
v1ex1celadm03: status: online
v1ex1celadm03: temperatureReading: 22.0
v1ex1celadm03: temperatureStatus: normal
v1ex1celadm03: upTime: 0 days, 0:33
v1ex1celadm03: usbStatus: normal
v1ex1celadm03: cellsrvStatus: running
v1ex1celadm03: msStatus: running
v1ex1celadm03: rsStatus: running
[root@v1ex1dbadm01 ~]#

4. Power on remaining Compute Nodes

Power on remaining compute nodes via ipmitool:

[root@v1ex1dbadm01 ~]# export HISTIGNORE='*'
[root@v1ex1dbadm01 ~]# ipmitool -I lanplus -H v1ex1dbadm02-ilom -U root -P XXXXXXXX chassis power on
Chassis Power Control: Up/On
[root@v1ex1dbadm01 ~]# export HISTIGNORE=''
[root@v1ex1dbadm01 ~]#

If half or full rack, then the following can be used:

for host in `cat dbs_group_all_but_first`; do
echo ${host}: `ipmitool -H ${host}-ilom -U root -P XXXXXXXX chassis power on`
done

Check compute nodes are up:

[root@v1ex1dbadm01 ~]# dcli -g /opt/oracle.SupportTools/onecommand/dbs_group -l root 'hostname; uptime'
v1ex1dbadm01: v1ex1dbadm01.v1.com
v1ex1dbadm01: 18:21:55 up 12 min, 1 user, load average: 0.22, 0.89, 1.54
v1ex1dbadm02: v1ex1dbadm02.v1.com
v1ex1dbadm02: 18:21:55 up 3 min, 0 users, load average: 2.44, 1.58, 0.66
[root@v1ex1dbadm01 ~]#

5. Re-enable clusterware autostart

Re-enable clusterware autostart via dcli:

[root@v1ex1dbadm01 ~]# . oraenv
ORACLE_SID = [root] ? +ASM1
The Oracle base has been set to /u01/app/oracle
[root@v1ex1dbadm01 ~]# dcli -g /opt/oracle.SupportTools/onecommand/dbs_group -l root /u01/app/12.1.0.2/grid/bin/crsctl enable crs
v1ex1dbadm01: CRS-4622: Oracle High Availability Services autostart is enabled.
v1ex1dbadm02: CRS-4622: Oracle High Availability Services autostart is enabled.
[root@v1ex1dbadm01 ~]#

6. Restart Grid Infrastructure on the cluster

Start clusterware on first compute node:

[root@v1ex1dbadm01 ~]# crsctl start crs
CRS-4123: Oracle High Availability Services has been started.
[root@v1ex1dbadm01 ~]#

Now logon the remaining compute nodes and restart clusterware:

login as: root
root@v1ex1dbadm02's password:
Last login: Wed Jun 10 18:26:56 IST 2020 from x.x.x.x on ssh
Last login: Wed Jun 10 18:30:56 2020 from x.x.x.x
[root@v1ex1dbadm02 ~]# . oraenv
ORACLE_SID = [root] ? +ASM2
The Oracle base has been set to /u01/app/oracle
[root@v1ex1dbadm02 ~]# crsctl start crs
CRS-4123: Oracle High Availability Services has been started.
[root@v1ex1dbadm02 ~]#

Wait a few minutes and check clusterware is all up as shown below:

[root@v1ex1dbadm02 ~]# crsctl stat res -t
--------------------------------------------------------------------------------
Name Target State Server State details
--------------------------------------------------------------------------------
Local Resources
--------------------------------------------------------------------------------
ora.DATAC1.dg
ONLINE ONLINE v1ex1dbadm01 STABLE
ONLINE ONLINE v1ex1dbadm02 STABLE
...
ONLINE ONLINE v1ex1dbadm01 STABLE
ONLINE ONLINE v1ex1dbadm02 STABLE
ora.RECOC1.dg
ONLINE ONLINE v1ex1dbadm01 STABLE
ONLINE ONLINE v1ex1dbadm02 STABLE
ora.asm
ONLINE ONLINE v1ex1dbadm01 Started,STABLE
ONLINE ONLINE v1ex1dbadm02 Started,STABLE
...
ora.net1.network
ONLINE ONLINE v1ex1dbadm01 STABLE
ONLINE ONLINE v1ex1dbadm02 STABLE
ora.ons
ONLINE ONLINE v1ex1dbadm01 STABLE
ONLINE ONLINE v1ex1dbadm02 STABLE
--------------------------------------------------------------------------------
Cluster Resources
--------------------------------------------------------------------------------
ora.LISTENER_SCAN1.lsnr
1 ONLINE ONLINE v1ex1dbadm02 STABLE
ora.LISTENER_SCAN2.lsnr
1 ONLINE ONLINE v1ex1dbadm01 STABLE
ora.LISTENER_SCAN3.lsnr
1 ONLINE ONLINE v1ex1dbadm01 STABLE
...
1 ONLINE ONLINE v1ex1dbadm02 Open,STABLE
ora.oc4j
1 ONLINE ONLINE v1ex1dbadm01 STABLE
ora.scan1.vip
1 ONLINE ONLINE v1ex1dbadm02 STABLE
ora.scan2.vip
1 ONLINE ONLINE v1ex1dbadm01 STABLE
ora.scan3.vip
1 ONLINE ONLINE v1ex1dbadm01 STABLE
--------------------------------------------------------------------------------
[root@v1ex1dbadm02 ~]#

6. Restart OEM Agent

Optionally if you have an OEM agent (most likely), restart as follows:

[oracle@v1ex1dbadm01 ~]$ cd /u01/app/agent/agent_13.3.0.0.0/bin
[oracle@v1ex1dbadm01 bin]$ ./emctl status agent
Oracle Enterprise Manager Cloud Control 13c Release 3
Copyright (c) 1996, 2018 Oracle Corporation. All rights reserved.
---------------------------------------------------------------
Agent is Not Running
[oracle@v1ex1dbadm01 bin]$ ./emctl start agent
Oracle Enterprise Manager Cloud Control 13c Release 3
Copyright (c) 1996, 2018 Oracle Corporation. All rights reserved.
Starting agent ............................ started.
[oracle@v1ex1dbadm01 bin]$ ./emctl status agent
Oracle Enterprise Manager Cloud Control 13c Release 3
Copyright (c) 1996, 2018 Oracle Corporation. All rights reserved.
---------------------------------------------------------------
Agent Version : 13.3.0.0.0
OMS Version : 13.3.0.0.0
Protocol Version : 12.1.0.1.0
Agent Home : /u01/app/agent/agent_inst
Agent Log Directory : /u01/app/agent/agent_inst/sysman/log
Agent Binaries : /u01/app/agent/agent_13.3.0.0.0
Core JAR Location : /u01/app/agent/agent_13.3.0.0.0/jlib
Agent Process ID : 122257
Parent Process ID : 122120
Agent URL : https://v1ex1dbadm01.v1.com:3872/emd/main/
Local Agent URL in NAT : https://v1ex1dbadm01.v1.com:3872/emd/main/
Repository URL : https://v1oem.v1.com:4903/empbs/upload
Started at : 2020-06-17 15:40:59
Started by user : oracle
Operating System : Linux version 4.1.12-124.30.1.el7uek.x86_64 (amd64)
Number of Targets : 43
Last Reload : (none)
Last successful upload : 2020-06-17 15:41:23
Last attempted upload : 2020-06-17 15:41:23
Total Megabytes of XML files uploaded so far : 0.1
Number of XML files pending upload : 5
Size of XML files pending upload(MB) : 0.02
Available disk space on upload filesystem : 21.34%
Collection Status : Collections enabled
Heartbeat Status : Ok
Last attempted heartbeat to OMS : 2020-06-17 15:41:18
Last successful heartbeat to OMS : 2020-06-17 15:41:18
Next scheduled heartbeat to OMS : 2020-06-17 15:42:20

---------------------------------------------------------------
Agent is Running and Ready
[oracle@v1ex1dbadm01 bin]$

Now on any other compute nodes:

[oracle@v1ex1dbadm01 bin]$ ssh v1ex1dbadm02
Last login: Wed Jun 17 12:37:20 IST 2020 from x.x.x.x on ssh
Last login: Wed Jun 17 15:43:07 2020 from x.x.x.x
[oracle@v1ex1dbadm02 ~]$ cd /u01/app/agent/agent_13.3.0.0.0/bin
[oracle@v1ex1dbadm02 bin]$ ./emctl status agent
Oracle Enterprise Manager Cloud Control 13c Release 3
Copyright (c) 1996, 2018 Oracle Corporation. All rights reserved.
---------------------------------------------------------------
Agent is Not Running
[oracle@v1ex1dbadm02 bin]$ ./emctl start agent
Oracle Enterprise Manager Cloud Control 13c Release 3
Copyright (c) 1996, 2018 Oracle Corporation. All rights reserved.
Starting agent .................................... started.
[oracle@v1ex1dbadm02 bin]$ ./emctl status agent
Oracle Enterprise Manager Cloud Control 13c Release 3
Copyright (c) 1996, 2018 Oracle Corporation. All rights reserved.
---------------------------------------------------------------
Agent Version : 13.3.0.0.0
OMS Version : 13.3.0.0.0
Protocol Version : 12.1.0.1.0
Agent Home : /u01/app/agent/agent_inst
Agent Log Directory : /u01/app/agent/agent_inst/sysman/log
Agent Binaries : /u01/app/agent/agent_13.3.0.0.0
Core JAR Location : /u01/app/agent/agent_13.3.0.0.0/jlib
Agent Process ID : 189737
Parent Process ID : 189513
Agent URL : https://v1ex1dbadm02.v1.com:3872/emd/main/
Local Agent URL in NAT : https://v1ex1dbadm02.v1.com:3872/emd/main/
Repository URL : https://v1oem.v1.com:4903/empbs/upload
Started at : 2020-06-17 15:44:01
Started by user : oracle
Operating System : Linux version 4.1.12-124.30.1.el7uek.x86_64 (amd64)
Number of Targets : 37
Last Reload : (none)
Last successful upload : 2020-06-17 15:44:44
Last attempted upload : 2020-06-17 15:44:45
Total Megabytes of XML files uploaded so far : 0.17
Number of XML files pending upload : 1
Size of XML files pending upload(MB) : 0
Available disk space on upload filesystem : 28.51%
Collection Status : Collections enabled
Heartbeat Status : Ok
Last attempted heartbeat to OMS : 2020-06-17 15:44:28
Last successful heartbeat to OMS : 2020-06-17 15:44:28
Next scheduled heartbeat to OMS : 2020-06-17 15:45:28

---------------------------------------------------------------
Agent is Running and Ready
[oracle@v1ex1dbadm02 bin]$

If you found this blog post useful, please like as well as follow me through my various Social Media avenues available on the sidebar and/or subscribe to this oracle blog via WordPress/e-mail.

Thanks

Zed DBA (Zahid Anwar)

How to Shutdown an Oracle Exadata Machine

Posted on June 19, 2020 by Zed DBA Standard 5

There maybe times when you are required to fully shutdown an Oracle Exadata Machine, for example for maintenance.

Below is the My Oracle Support note used to carry out the shutdown:
Steps To Shutdown/Startup The Exadata & RDBMS Services and Cell/Compute Nodes On An Exadata Configuration (Doc ID 1093890.1)

1. Pre-requisites

2. Disable clusterware autostart

First we need to stop clusterware restarting up on reboot. So logon to your first compute node and disable via dcli (more info on dcli can be found in this blog post) using your correct crs home:

login as: root
root@x.x.x.x's password:
Last login: Wed Jun 10 08:45:30 IST 2020 from x.x.x.x on pts/0
Last login: Wed Jun 10 09:07:17 2020 from x.x.x.x
[root@v1ex1dbadm01 ~]# dcli -g /opt/oracle.SupportTools/onecommand/dbs_group -l root /u01/app/12.1.0.2/grid/bin/crsctl config crs
v1ex1dbadm01: CRS-4622: Oracle High Availability Services autostart is enabled.
v1ex1dbadm02: CRS-4622: Oracle High Availability Services autostart is enabled.
[root@v1ex1dbadm01 ~]# dcli -g /opt/oracle.SupportTools/onecommand/dbs_group -l root /u01/app/12.1.0.2/grid/bin/crsctl disable crs
v1ex1dbadm01: CRS-4621: Oracle High Availability Services autostart is disabled.
v1ex1dbadm02: CRS-4621: Oracle High Availability Services autostart is disabled.
[root@v1ex1dbadm01 ~]#

3. Stop Grid Infrastructure on the cluster

Next we stop clusterware cluster-wide gracefully:

[root@v1ex1dbadm01 ~]# . oraenv
ORACLE_SID = [root] ? +ASM1
The Oracle base has been set to /u01/app/oracle
[root@v1ex1dbadm01 ~]# dcli -g /opt/oracle.SupportTools/onecommand/dbs_group -l root /u01/app/12.1.0.2/grid/bin/crsctl check crs
v1ex1dbadm01: CRS-4638: Oracle High Availability Services is online
v1ex1dbadm01: CRS-4537: Cluster Ready Services is online
v1ex1dbadm01: CRS-4529: Cluster Synchronization Services is online
v1ex1dbadm01: CRS-4533: Event Manager is online
v1ex1dbadm02: CRS-4638: Oracle High Availability Services is online
v1ex1dbadm02: CRS-4537: Cluster Ready Services is online
v1ex1dbadm02: CRS-4529: Cluster Synchronization Services is online
v1ex1dbadm02: CRS-4533: Event Manager is online
[root@v1ex1dbadm01 ~]#

[root@v1ex1dbadm01 ~]# crsctl stat res -t
--------------------------------------------------------------------------------
Name Target State Server State details
--------------------------------------------------------------------------------
Local Resources
--------------------------------------------------------------------------------
ora.DATAC1.dg
ONLINE ONLINE v1ex1dbadm01 STABLE
ONLINE ONLINE v1ex1dbadm02 STABLE
...
ora.RECOC1.dg
ONLINE ONLINE v1ex1dbadm01 STABLE
ONLINE ONLINE v1ex1dbadm02 STABLE
ora.asm
ONLINE ONLINE v1ex1dbadm01 Started,STABLE
ONLINE ONLINE v1ex1dbadm02 Started,STABLE
...
ora.net1.network
ONLINE ONLINE v1ex1dbadm01 STABLE
ONLINE ONLINE v1ex1dbadm02 STABLE
ora.ons
ONLINE ONLINE v1ex1dbadm01 STABLE
ONLINE ONLINE v1ex1dbadm02 STABLE
--------------------------------------------------------------------------------
Cluster Resources
--------------------------------------------------------------------------------
ora.LISTENER_SCAN1.lsnr
1 ONLINE ONLINE v1ex1dbadm02 STABLE
ora.LISTENER_SCAN2.lsnr
1 ONLINE ONLINE v1ex1dbadm01 STABLE
ora.LISTENER_SCAN3.lsnr
1 ONLINE ONLINE v1ex1dbadm01 STABLE
...
1 ONLINE ONLINE v1ex1dbadm01 STABLE
ora.scan1.vip
1 ONLINE ONLINE v1ex1dbadm02 STABLE
ora.scan2.vip
1 ONLINE ONLINE v1ex1dbadm01 STABLE
ora.scan3.vip
1 ONLINE ONLINE v1ex1dbadm01 STABLE
--------------------------------------------------------------------------------
[root@v1ex1dbadm01 ~]#

[root@v1ex1dbadm01 ~]# crsctl stop cluster -all
CRS-2673: Attempting to stop 'ora.crsd' on 'v1ex1dbadm01'
CRS-2790: Starting shutdown of Cluster Ready Services-managed resources on 'v1ex1dbadm01'
...
CRS-2677: Stop of 'ora.ons' on 'v1ex1dbadm01' succeeded 
CRS-2673: Attempting to stop 'ora.net1.network' on 'v1ex1dbadm01' 
CRS-2677: Stop of 'ora.net1.network' on 'v1ex1dbadm01' succeeded 
CRS-2792: Shutdown of Cluster Ready Services-managed resources on 'v1ex1dbadm01' has completed 
CRS-2677: Stop of 'ora.crsd' on 'v1ex1dbadm02' succeeded 
CRS-2673: Attempting to stop 'ora.ctssd' on 'v1ex1dbadm02' 
CRS-2673: Attempting to stop 'ora.evmd' on 'v1ex1dbadm02' 
CRS-2673: Attempting to stop 'ora.storage' on 'v1ex1dbadm02' 
CRS-2677: Stop of 'ora.storage' on 'v1ex1dbadm02' succeeded 
CRS-2673: Attempting to stop 'ora.asm' on 'v1ex1dbadm02' 
CRS-2677: Stop of 'ora.crsd' on 'v1ex1dbadm01' succeeded 
CRS-2673: Attempting to stop 'ora.ctssd' on 'v1ex1dbadm01' 
CRS-2673: Attempting to stop 'ora.evmd' on 'v1ex1dbadm01' 
CRS-2673: Attempting to stop 'ora.storage' on 'v1ex1dbadm01' 
CRS-2677: Stop of 'ora.storage' on 'v1ex1dbadm01' succeeded 
CRS-2673: Attempting to stop 'ora.asm' on 'v1ex1dbadm01' 
CRS-2677: Stop of 'ora.ctssd' on 'v1ex1dbadm02' succeeded 
CRS-2677: Stop of 'ora.evmd' on 'v1ex1dbadm02' succeeded 
CRS-2677: Stop of 'ora.evmd' on 'v1ex1dbadm01' succeeded 
CRS-2677: Stop of 'ora.ctssd' on 'v1ex1dbadm01' succeeded 
CRS-2677: Stop of 'ora.asm' on 'v1ex1dbadm02' succeeded 
CRS-2673: Attempting to stop 'ora.cluster_interconnect.haip' on 'v1ex1dbadm02' 
CRS-2677: Stop of 'ora.cluster_interconnect.haip' on 'v1ex1dbadm02' succeeded 
CRS-2673: Attempting to stop 'ora.cssd' on 'v1ex1dbadm02' 
CRS-2677: Stop of 'ora.cssd' on 'v1ex1dbadm02' succeeded 
CRS-2673: Attempting to stop 'ora.diskmon' on 'v1ex1dbadm02' 
CRS-2677: Stop of 'ora.diskmon' on 'v1ex1dbadm02' succeeded
CRS-2677: Stop of 'ora.asm' on 'v1ex1dbadm01' succeeded 
CRS-2673: Attempting to stop 'ora.cluster_interconnect.haip' on 'v1ex1dbadm01' 
CRS-2677: Stop of 'ora.cluster_interconnect.haip' on 'v1ex1dbadm01' succeeded 
CRS-2673: Attempting to stop 'ora.cssd' on 'v1ex1dbadm01'
CRS-2677: Stop of 'ora.cssd' on 'v1ex1dbadm01' succeeded
CRS-2673: Attempting to stop 'ora.diskmon' on 'v1ex1dbadm01'
CRS-2677: Stop of 'ora.diskmon' on 'v1ex1dbadm01' succeeded
[root@v1ex1dbadm01 ~]#

[root@v1ex1dbadm01 ~]# dcli -g /opt/oracle.SupportTools/onecommand/dbs_group -l root /u01/app/12.1.0.2/grid/bin/crsctl stat res -t
v1ex1dbadm01: CRS-4535: Cannot communicate with Cluster Ready Services
v1ex1dbadm01: CRS-4000: Command Status failed, or completed with errors.
v1ex1dbadm02: CRS-4535: Cannot communicate with Cluster Ready Services
v1ex1dbadm02: CRS-4000: Command Status failed, or completed with errors.
[root@v1ex1dbadm01 ~]#

4. Power off Storage Cells

Now that clusterware is down including ASM, we can power down the storage cells by first shutting down the cell services:

[root@v1ex1dbadm01 ~]# dcli -g /opt/oracle.SupportTools/onecommand/cell_group -l root "cellcli -e alter cell shutdown services all"
v1ex1celadm01:
v1ex1celadm01: Stopping the RS, CELLSRV, and MS services...
v1ex1celadm01: The SHUTDOWN of services was successful.
v1ex1celadm02:
v1ex1celadm02: Stopping the RS, CELLSRV, and MS services...
v1ex1celadm02: The SHUTDOWN of services was successful.
v1ex1celadm03:
v1ex1celadm03: Stopping the RS, CELLSRV, and MS services...
v1ex1celadm03: The SHUTDOWN of services was successful.
[root@v1ex1dbadm01 ~]#

Now the storage cells are shutdown, we can power them off:

[root@v1ex1dbadm01 ~]# dcli -g /opt/oracle.SupportTools/onecommand/cell_group -l root poweroff

v1ex1celadm02:Connection to v1ex1celadm02 closed by remote host.

v1ex1celadm01:Connection to v1ex1celadm01 closed by remote host.

v1ex1celadm03:Connection to v1ex1celadm03 closed by remote host.

[root@v1ex1dbadm01 ~]#

5. Power off Compute Nodes

As we are on the first compute node, we can power this off as shown below:

[root@v1ex1dbadm01 ~]# poweroff

Now we power off the remaining compute node by logging on via ssh:

login as: root
root@x.x.x.x's password:
Last login: Wed Jun 10 08:45:25 IST 2020 from x.x.x.x on ssh
Last login: Wed Jun 10 09:03:41 2020 from x.x.x.x
[root@v1ex1dbadm02 ~]# poweroff

If you have a half or full rack and wish to power off all compute nodes, you can use:

[root@v1ex1dbadm01 ~]# dcli -g /opt/oracle.SupportTools/onecommand/dbs_group_all_but_first -l root poweroff

Then power off the first node:

[root@v1ex1dbadm01 ~]# poweroff

Carry out your maintenance and when completed, you can restart the Oracle Exadata Machine by follow my blog post:
How to Startup an Oracle Exadata Machine

If you found this blog post useful, please like as well as follow me through my various Social Media avenues available on the sidebar and/or subscribe to this oracle blog via WordPress/e-mail.

Thanks

Zed DBA (Zahid Anwar)

How to obtain the serial numbers on an Oracle Exadata Machine

Posted on May 31, 2020 by Zed DBA Standard Reply

You may be required to obtain serial numbers from an Oracle Exadata Machine, for example to confirm correct hardware for part replacement like disk controller battery (X4-2 and older) or disk, etc.

Below is how to obtain serials for each component.

Exadata Machine

To obtain the serial number of the Exadata Machine itself:

[root@v1ex1dbadm01 ~]# ipmitool sunoem cli "show /SP system_identifier" | grep "system_identifier ="
system_identifier = Exadata Database Machine X5-2 AK00XXXXXX
[root@v1ex1dbadm01 ~]#

Compute Nodes

To obtain the serial number of the compute nodes via dcli:

[root@v1ex1dbadm01 ~]# dcli -g /opt/oracle.SupportTools/onecommand/dbs_group -l root dmidecode -s system-serial-number
v1ex1dbadm01: 1514NMXXXX
v1ex1dbadm02: 1514NMXXXX
[root@v1ex1dbadm01 ~]#

Storage Cells

To obtain the serial number of the storage cells via dcli:

[root@v1ex1dbadm01 ~]# dcli -g /opt/oracle.SupportTools/onecommand/cell_group -l root dmidecode -s system-serial-number
v1ex1celadm01: 1515NMXXXX
v1ex1celadm02: 1515NMXXXX
v1ex1celadm03: 1515NMXXXX
[root@v1ex1dbadm01 ~]#

InfiniBands

To obtain the serial number of the InfiniBands via dcli:

[root@v1ex1dbadm01 ~]# dcli -g /opt/oracle.SupportTools/onecommand/ib_group -l root showfruinfo | grep -a Sun_Serial_Number
v1ex1sw-iba01: Sun_Serial_Number : XXXXXXT+1512RRXXXX
v1ex1sw-iba01: Sun_Serial_Number : AK0029XXXX
v1ex1sw-ibb01: Sun_Serial_Number : XXXXXXT+1512RRXXXX
v1ex1sw-ibb01: Sun_Serial_Number : AK0029XXXX
[root@v1ex1dbadm01 ~]#

If you found this blog post useful, please like as well as follow me through my various Social Media avenues available on the sidebar and/or subscribe to this oracle blog via WordPress/e-mail.

Thanks

Zed DBA (Zahid Anwar)

Exadata OL7 session disconnects after 10 minutes

Posted on May 21, 2020 by Zed DBA Standard 2

When upgrading to Exadata software 19c (release 19.1.0.0.0 and above) the compute nodes (database servers) upgrade to Oracle Linux 7. As part of this upgrade the sshd ServerAliveInterval settings are changed to a value of 600 for STIG (Security Technical Implementation Guide) purposes as detailed in the My Oracle Support Note below:

Changed sshd setting “Clientaliveinterval” after updating Exadata Database Nodes (domU, dom0 and physical) (Doc ID 2501968.1)

“When updating Exadata Database nodes (dom0, domu and physical) running either Oracle Linux 6 or Oracle Linux 7, “sshd Clientaliveinterval” settings are changed to a value of 600 for STIG purposes via unpublished bug 28204681.

This will result in your ssh connection being closed after being idle for 600 seconds while before this would not happen before 86400 seconds passed. While for the same security reasons, it’s not recommended to undo this change, it will be the choice of the operator and he/she is free to do so.”

This means your connections to the Exadata Machines disconnect after 10 minutes of inactivity 😦 :

[AnwarZ@v1proxy1 ~]$ date;ssh oracle@v1ex1dbadm01;date
Thu May 21 15:27:31 IST 2020
oracle@v1ex1dbadm01's password:
Last login: Thu May 21 15:27:31 IST 2020 from x.x.x.x on pts/0
Last login: Thu May 21 15:27:40 2020 from x.x.x.x
[oracle@v1ex1dbadm01 ~]$ Connection to x.x.x.x closed by remote host.
Connection to x.x.x.x closed.
[AnwarZ@v1proxy1 ~]$date
Thu May 21 15:37:40 IST 2020
[AnwarZ@v1proxy1 ~]$

As per the MOS note, the recommendation is to not change ClientAliveInterval on the compute nodes but to use the flags options ServerAliveInterval and ServerAliveCountMax on the ssh connection as shown below:

[AnwarZ@v1proxy1 ~]$ date;ssh -o ServerAliveInterval=550 -o ServerAliveCountMax=157 oracle@v1ex1dbadm01;date
Thu May 21 15:41:29 IST 2020
oracle@v1ex1dbadm01's password:
Last login: Thu May 21 15:27:40 IST 2020 from x.x.x.x on pts/0
Last login: Thu May 21 15:41:37 2020 from x.x.x.x
[oracle@v1ex1dbadm01 ~]$ date
Thu May 21 15:55:10 IST 2020
[oracle@v1ex1dbadm01 ~]$

This session didn’t disconnect and a manual ‘date‘ show it’s greater then 10 minutes 🙂

This is because the ServerAliveInterval=550 ensure that a null packet is sent every 550 seconds from the client side, this ensures the server will not disconnect the session as this is less then the ClientAliveInterval=600 on the compute nodes. The ServerAliveCountMax is multiplied with the ServerAliveInterval value to determine the maximum amount of time the session can be idle before disconnecting the session back in line with the previous standard of 86400.

Alternatively if you are using program like putty you can set in the settings to the same affect:

putty-keep-alive

It also appears from the MOS note, that this can affect OL6 on higher Exadata releases when the STIG recommendations were implemented. In which case same workaround can be used.

If you found this blog post useful, please like as well as follow me through my various Social Media avenues available on the sidebar and/or subscribe to this oracle blog via WordPress/e-mail.

Thanks

Zed DBA (Zahid Anwar)

Prevent crontab jobs overlapping using flock

Posted on May 31, 2019 by Zed DBA Standard 2

As an Oracle DBA, you may find yourself in the situation where you have crontab jobs overlapping.

For example an RMAN backup takes longer then normal, then overlaps with another RMAN database backup leading to more resources being consumed:

00 00 * * * /home/oracle/scripts/backups/db_backup.sh PROD1 >> /home/oracle/backups/logs/db_backup_PROD1.log 2>&1
00 01 * * * /home/oracle/scripts/backups/db_backup.sh PROD2 >> /home/oracle/backups/logs/db_backup_PROD2.log 2>&1

If the PROD1 backup takes longer then 1 hour, then it will contend with the PROD2 backup when it starts.

Another more recent example for myself, is when I was copying archive logs to AWS for an Oracle Database Standard Edition migration using a manual standby, thus needed to manually transfer archive logs using rsync every 15 minutes:

0,15,30,45 * * * * /home/oracle/copy_Arch_to_AWS.sh > /home/oracle/copy_Arch_to_AWS.log

Ran fine most the time but when there was significant churn in the database, the crontab job would overlap causing several rsync 😦

The easiest solution is to wrap the crontab job in flock 🙂

Using flock

flock is a linux utility that can uses a lock file to determine if the process is already running. The syntax I use is:

flock -x <lockfile> -c '<command>'

The “-x” is to obtain exclude lock and the “-c” is the command to run.

Flock examples

Backup example

00 00 * * * flock -x /home/oracle/scripts/backups/backup.lock -c '/home/oracle/scripts/backups/db_backup.sh PROD1 >> /home/oracle/backups/logs/db_backup_PROD1.log 2>&1'
00 01 * * * flock -x /home/oracle/scripts/backups/backup.lock -c '/home/oracle/scripts/backups/db_backup.sh PROD2 >> /home/oracle/backups/logs/db_backup_PROD2.log 2>&1'

Now when the backup for PROD2 starts flock will check for the lock and will see if exist and will not run the command until the backup for PROD1 is completed 🙂

Archive log copy example

0,15,30,45 * * * * flock -x /home/oracle/copy_Arch_to_AWS.lock -c '/home/oracle/copy_Arch_to_AWS.sh > /home/oracle/copy_Arch_to_AWS.log'

Now when the job runs, an exclusive lock is taken an hence when it runs again in 15 minutes if there an existing run, then it will not run the command until the previous one is completed 🙂 This will essentially queue the copies instead of them overlapping causing several rsync, which just exacerbate the issue.

Advance use of flock

Timeout

You can add “-w <seconds>“, which is the amount of time to wait for exclusive lock before exiting without running command. for example:

0,15,30,45 * * * * flock -w 300 -x /home/oracle/copy_Arch_to_AWS.lock -c '/home/oracle/copy_Arch_to_AWS.sh > /home/oracle/copy_Arch_to_AWS.log'

Now flock will wait 5 minutes for the previous archive log copy job to complete before exiting without running the command for that run 🙂

Viewing lock

If you want to see the lock taken by flock, you can run :

[oracle@dc1sbxdb001 ~]$ fuser -v /home/oracle/copy_Arch_to_AWS.lock
                                    USER   PID    ACCESS COMMAND
/home/oracle/copy_Arch_to_AWS.lock: oracle 341039 f....  flock
                                    oracle 341040 f....  rsync

If you found this blog post useful, please like as well as follow me through my various Social Media avenues available on the sidebar and/or subscribe to this oracle blog via WordPress/e-mail.

Thanks

Zed DBA (Zahid Anwar)

Zed DBA's Oracle Blog

The Oracle DBA Specialist – OCM, OCP, OCE, OCS, ACE

Category Archives: Linux

How to Startup an Oracle Exadata Machine

1. Pre-requisites

2. Power on first Compute Node

3. Power on all Storage Cells

4. Power on remaining Compute Nodes

5. Re-enable clusterware autostart

6. Restart Grid Infrastructure on the cluster

6. Restart OEM Agent

How to Shutdown an Oracle Exadata Machine

1. Pre-requisites

2. Disable clusterware autostart

3. Stop Grid Infrastructure on the cluster

4. Power off Storage Cells

5. Power off Compute Nodes

How to obtain the serial numbers on an Oracle Exadata Machine

Exadata Machine

Compute Nodes

Storage Cells

InfiniBands

Exadata OL7 session disconnects after 10 minutes

Prevent crontab jobs overlapping using flock

Using flock

Flock examples

Backup example

Archive log copy example

Advance use of flock

Timeout

Viewing lock