How to restart InfiniBand Subnet Manager Exadata

You may encounter a scenario where the subnet manager is down on your InfiniBand on your Exadata. To restart is very simple:

[root@v1ex1dbadm01 ~]# ssh v1ex1sw-ibb01
You are now logged in to the root shell.
It is recommended to use ILOM shell instead of root shell.
All usage should be restricted to documented commands and documented
config files.
To view the list of documented commands, use "help" at linux prompt.
[root@v1ex1sw-ibb01 ~]# enablesm
Starting IB Subnet Manager. [ OK ]
Starting partitiond-daemon [ OK ]
[root@v1ex1sw-ibb01 ~]# ps -ef | grep opensm
root 8075 1 0 09:59 ? 00:00:00 /usr/sbin/opensm --daemon
root 8224 7909 0 09:59 pts/0 00:00:00 grep opensm
[root@v1ex1sw-ibb01 ~]#

More information can be found here:
Sun Datacenter InfiniBand Switch 36 – Enable the Subnet Manager

If you found this blog post useful, please like as well as follow me through my various Social Media avenues available on the sidebar and/or subscribe to this oracle blog via WordPress/e-mail.

Thanks

Zed DBA (Zahid Anwar)

How to Startup an Oracle Exadata Machine

There maybe times when you are required to fully shutdown an Oracle Exadata Machine, for example for maintenance.

For instructions on how to shutdown an Oracle Exadata Machine, please refer to my blog post:
How to Shutdown an Oracle Exadata Machine

Once shutdown, you will need to be able to re-start which this blog post will detail.

Below is the My Oracle Support note used to carry out the startup:
Steps To Shutdown/Startup The Exadata & RDBMS Services and Cell/Compute Nodes On An Exadata Configuration (Doc ID 1093890.1)

1. Pre-requisites

Ensure you have all the compute nodes and storage cells ILOM addresses and correct passwords.  Otherwise you will not be able to remotely power back on and will require a physical power on using the power button on the front panels.

2. Power on first Compute Node

You can power on the first compute node via the ilom via ssh or WebILOM.  I prefer the ssh method shown below:

[AnwarZ@v1proxy1 ~]$ ssh root@v1ex1dbadm01-ilom
Password:

Oracle(R) Integrated Lights Out Manager

Version 4.0.4.37 r130617

Copyright (c) 2019, Oracle and/or its affiliates. All rights reserved.

Warning: HTTPS certificate is set to factory default.

Hostname: v1ex1dbadm01-ilom

-> show /SYSTEM

/System
Targets:
Open_Problems (0)
Processors
Memory
Power
Cooling
Storage
Networking
PCI_Devices
Firmware
BIOS
Log

Properties:
health = OK
health_details = -
open_problems_count = 0
type = Rack Mount
model = Exadata X5-2
qpart_id = XXXXXX
part_number = Exadata X5-2
serial_number = XXXXXXXXXX
component_model = ORACLE SERVER X5-2
component_part_number = XXXXXXX
component_serial_number = XXXXXXXXXX
system_identifier = Exadata Database Machine X5-2 XXXXXXXXXX
system_fw_version = 4.0.4.37
primary_operating_system = Not Available
primary_operating_system_detail = Comprehensive System monitoring is not available. Ensure the host is
running with the Hardware Management Pack. For details go to
http://www.oracle.com/goto/ilom-redirect/hmp-osa
host_primary_mac_address = xx:xx:xx:xx:xx:xx
ilom_address = x.x.x.x
ilom_mac_address = xx:xx:xx:xx:xx:xx
locator_indicator = Off
power_state = Off
actual_power_consumption = 22 watts
action = (Cannot show property)

Commands:
cd
reset
set
show
start
stop

-> start /SYSTEM
Are you sure you want to start /System (y/n)? y
Starting /System

-> show /SYSTEM

/System
Targets:
Open_Problems (0)
Processors
Memory
Power
Cooling
Storage
Networking
PCI_Devices
Firmware
BIOS
Log

Properties:
health = OK
health_details = -
open_problems_count = 0
type = Rack Mount
model = Exadata X5-2
qpart_id = XXXXXX
part_number = Exadata X5-2
serial_number = XXXXXXXXXX
component_model = ORACLE SERVER X5-2
component_part_number = XXXXXXX
component_serial_number = XXXXXXXXXX
system_identifier = Exadata Database Machine X5-2 XXXXXXXXXX
system_fw_version = 4.0.4.37
primary_operating_system = Not Available
primary_operating_system_detail = Comprehensive System monitoring is not available. Ensure the host is
running with the Hardware Management Pack. For details go to
http://www.oracle.com/goto/ilom-redirect/hmp-osa
host_primary_mac_address = xx:xx:xx:xx:xx:xx
ilom_address = x.x.x.x
ilom_mac_address = xx:xx:xx:xx:xx:xx
locator_indicator = Off
power_state = On
actual_power_consumption = 220 watts
action = (Cannot show property)

Commands:
cd
reset
set
show
start
stop

-> exit
Connection to v1ex1dbadm01-ilom closed.
[AnwarZ@v1proxy1 ~]$

3. Power on all Storage Cells

Login to the first compute node and power on all the storage cells as shown below:

login as: root
root@v1ex1dbadm01's password:
Last login: Wed Jun 10 09:21:41 IST 2020 from v1ex1dbadm01.v1.com on ssh
Last login: Wed Jun 10 17:31:31 2020 from x.x.x.x
[root@v1ex1dbadm01 ~]# uptime
17:31:37 up 1 min, 1 user, load average: 3.37, 1.22, 0.44
[root@v1ex1dbadm01 ~]# export HISTIGNORE='*'
[root@v1ex1dbadm01 ~]# for host in `cat /opt/oracle.SupportTools/onecommand/cell_group`; do
> echo ${host}: `ipmitool -I lanplus -H ${host}-ilom -U root -P XXXXXXXX chassis power on`
> done
v1ex1celadm01: Chassis Power Control: Up/On
v1ex1celadm02: Chassis Power Control: Up/On
v1ex1celadm03: Chassis Power Control: Up/On
[root@v1ex1dbadm01 ~]# export HISTIGNORE=''
[root@v1ex1dbadm01 ~]#

Please Note: the HISTIGNORE is used, so the password isn’t kept in history.

Check the storage cell services are up:

[root@v1ex1dbadm01 ~]# dcli -g /opt/oracle.SupportTools/onecommand/cell_group -l root 'hostname; uptime'
v1ex1celadm01: v1ex1celadm01.v1.com
v1ex1celadm01: 18:10:21 up 32 min, 0 users, load average: 1.86, 1.95, 2.03
v1ex1celadm02: v1ex1celadm02.v1.com
v1ex1celadm02: 18:10:21 up 32 min, 0 users, load average: 1.47, 1.82, 1.97
v1ex1celadm03: v1ex1celadm03.v1.com
v1ex1celadm03: 18:10:22 up 32 min, 0 users, load average: 1.51, 1.85, 2.01
[root@v1ex1dbadm01 ~]#
[root@v1ex1dbadm01 ~]# dcli -g /opt/oracle.SupportTools/onecommand/cell_group -l root "cellcli -e 'list cell detail'"
v1ex1celadm01: name: v1ex1celadm01
v1ex1celadm01: accessLevelPerm: remoteLoginEnabled
v1ex1celadm01: bbuStatus: normal
v1ex1celadm01: cellVersion: OSS_19.2.7.0.0_LINUX.X64_191012
v1ex1celadm01: cpuCount: 32/32
v1ex1celadm01: diagHistoryDays: 7
v1ex1celadm01: eighthRack: FALSE
v1ex1celadm01: fanCount: 8/8
v1ex1celadm01: fanStatus: normal
v1ex1celadm01: flashCacheMode: WriteBack
v1ex1celadm01: httpsAccess: ALL
v1ex1celadm01: id: XXXXXXXXXX
v1ex1celadm01: interconnectCount: 2
v1ex1celadm01: interconnect1: ib0
v1ex1celadm01: interconnect2: ib1
v1ex1celadm01: iormBoost: 0.0
v1ex1celadm01: ipaddress1: x.x.x.x/22
v1ex1celadm01: ipaddress2: x.x.x.x/22
v1ex1celadm01: kernelVersion: 4.1.12-124.30.1.el7uek.x86_64
v1ex1celadm01: locatorLEDStatus: off
v1ex1celadm01: makeModel: Oracle Corporation ORACLE SERVER X5-2L High Capacity
v1ex1celadm01: memoryGB: 94
v1ex1celadm01: metricHistoryDays: 7
v1ex1celadm01: notificationMethod: snmp
v1ex1celadm01: notificationPolicy: critical,warning,clear
v1ex1celadm01: offloadGroupEvents:
v1ex1celadm01: powerCount: 2/2
v1ex1celadm01: powerStatus: normal
v1ex1celadm01: ramCacheMaxSize: 0
v1ex1celadm01: ramCacheMode: Auto
v1ex1celadm01: ramCacheSize: 0
v1ex1celadm01: releaseImageStatus: success
v1ex1celadm01: releaseVersion: 19.2.7.0.0.191012
v1ex1celadm01: rpmVersion: cell-19.2.7.0.0_LINUX.X64_191012-1.x86_64
v1ex1celadm01: releaseTrackingBug: 30393131
v1ex1celadm01: rollbackVersion: 18.1.18.0.0.190709
v1ex1celadm01: snmpSubscriber: host=x.x.x.x,port=162,community=public,type=ASR,asrmPort=16161
v1ex1celadm01: host=x.x.x.x,port=161,community=V1
v1ex1celadm01: host=x.x.x.x,port=161,community=V1
v1ex1celadm01: status: online
v1ex1celadm01: temperatureReading: 22.0
v1ex1celadm01: temperatureStatus: normal
v1ex1celadm01: upTime: 0 days, 0:33
v1ex1celadm01: usbStatus: normal
v1ex1celadm01: cellsrvStatus: running
v1ex1celadm01: msStatus: running
v1ex1celadm01: rsStatus: running
v1ex1celadm02: name: v1ex1celadm02
...
v1ex1celadm02: status: online
v1ex1celadm02: temperatureReading: 22.0
v1ex1celadm02: temperatureStatus: normal
v1ex1celadm02: upTime: 0 days, 0:33
v1ex1celadm02: usbStatus: normal
v1ex1celadm02: cellsrvStatus: running
v1ex1celadm02: msStatus: running
v1ex1celadm02: rsStatus: running
v1ex1celadm03: name: v1ex1celadm03
...
v1ex1celadm03: status: online
v1ex1celadm03: temperatureReading: 22.0
v1ex1celadm03: temperatureStatus: normal
v1ex1celadm03: upTime: 0 days, 0:33
v1ex1celadm03: usbStatus: normal
v1ex1celadm03: cellsrvStatus: running
v1ex1celadm03: msStatus: running
v1ex1celadm03: rsStatus: running
[root@v1ex1dbadm01 ~]#

4. Power on remaining Compute Nodes

Power on remaining compute nodes via ipmitool:

[root@v1ex1dbadm01 ~]# export HISTIGNORE='*'
[root@v1ex1dbadm01 ~]# ipmitool -I lanplus -H v1ex1dbadm02-ilom -U root -P XXXXXXXX chassis power on
Chassis Power Control: Up/On
[root@v1ex1dbadm01 ~]# export HISTIGNORE=''
[root@v1ex1dbadm01 ~]#

If half or full rack, then the following can be used:

for host in `cat dbs_group_all_but_first`; do
echo ${host}: `ipmitool -H ${host}-ilom -U root -P XXXXXXXX chassis power on`
done

Check compute nodes are up:

[root@v1ex1dbadm01 ~]# dcli -g /opt/oracle.SupportTools/onecommand/dbs_group -l root 'hostname; uptime'
v1ex1dbadm01: v1ex1dbadm01.v1.com
v1ex1dbadm01: 18:21:55 up 12 min, 1 user, load average: 0.22, 0.89, 1.54
v1ex1dbadm02: v1ex1dbadm02.v1.com
v1ex1dbadm02: 18:21:55 up 3 min, 0 users, load average: 2.44, 1.58, 0.66
[root@v1ex1dbadm01 ~]#

5. Re-enable clusterware autostart

Re-enable clusterware autostart via dcli:

[root@v1ex1dbadm01 ~]# . oraenv
ORACLE_SID = [root] ? +ASM1
The Oracle base has been set to /u01/app/oracle
[root@v1ex1dbadm01 ~]# dcli -g /opt/oracle.SupportTools/onecommand/dbs_group -l root /u01/app/12.1.0.2/grid/bin/crsctl enable crs
v1ex1dbadm01: CRS-4622: Oracle High Availability Services autostart is enabled.
v1ex1dbadm02: CRS-4622: Oracle High Availability Services autostart is enabled.
[root@v1ex1dbadm01 ~]#

6. Restart Grid Infrastructure on the cluster

Start clusterware on first compute node:

[root@v1ex1dbadm01 ~]# crsctl start crs
CRS-4123: Oracle High Availability Services has been started.
[root@v1ex1dbadm01 ~]#

Now logon the remaining compute nodes and restart clusterware:

login as: root
root@v1ex1dbadm02's password:
Last login: Wed Jun 10 18:26:56 IST 2020 from x.x.x.x on ssh
Last login: Wed Jun 10 18:30:56 2020 from x.x.x.x
[root@v1ex1dbadm02 ~]# . oraenv
ORACLE_SID = [root] ? +ASM2
The Oracle base has been set to /u01/app/oracle
[root@v1ex1dbadm02 ~]# crsctl start crs
CRS-4123: Oracle High Availability Services has been started.
[root@v1ex1dbadm02 ~]#

Wait a few minutes and check clusterware is all up as shown below:

[root@v1ex1dbadm02 ~]# crsctl stat res -t
--------------------------------------------------------------------------------
Name Target State Server State details
--------------------------------------------------------------------------------
Local Resources
--------------------------------------------------------------------------------
ora.DATAC1.dg
ONLINE ONLINE v1ex1dbadm01 STABLE
ONLINE ONLINE v1ex1dbadm02 STABLE
...
ONLINE ONLINE v1ex1dbadm01 STABLE
ONLINE ONLINE v1ex1dbadm02 STABLE
ora.RECOC1.dg
ONLINE ONLINE v1ex1dbadm01 STABLE
ONLINE ONLINE v1ex1dbadm02 STABLE
ora.asm
ONLINE ONLINE v1ex1dbadm01 Started,STABLE
ONLINE ONLINE v1ex1dbadm02 Started,STABLE
...
ora.net1.network
ONLINE ONLINE v1ex1dbadm01 STABLE
ONLINE ONLINE v1ex1dbadm02 STABLE
ora.ons
ONLINE ONLINE v1ex1dbadm01 STABLE
ONLINE ONLINE v1ex1dbadm02 STABLE
--------------------------------------------------------------------------------
Cluster Resources
--------------------------------------------------------------------------------
ora.LISTENER_SCAN1.lsnr
1 ONLINE ONLINE v1ex1dbadm02 STABLE
ora.LISTENER_SCAN2.lsnr
1 ONLINE ONLINE v1ex1dbadm01 STABLE
ora.LISTENER_SCAN3.lsnr
1 ONLINE ONLINE v1ex1dbadm01 STABLE
...
1 ONLINE ONLINE v1ex1dbadm02 Open,STABLE
ora.oc4j
1 ONLINE ONLINE v1ex1dbadm01 STABLE
ora.scan1.vip
1 ONLINE ONLINE v1ex1dbadm02 STABLE
ora.scan2.vip
1 ONLINE ONLINE v1ex1dbadm01 STABLE
ora.scan3.vip
1 ONLINE ONLINE v1ex1dbadm01 STABLE
--------------------------------------------------------------------------------
[root@v1ex1dbadm02 ~]#

6. Restart OEM Agent

Optionally if you have an OEM agent (most likely), restart as follows:

[oracle@v1ex1dbadm01 ~]$ cd /u01/app/agent/agent_13.3.0.0.0/bin
[oracle@v1ex1dbadm01 bin]$ ./emctl status agent
Oracle Enterprise Manager Cloud Control 13c Release 3
Copyright (c) 1996, 2018 Oracle Corporation. All rights reserved.
---------------------------------------------------------------
Agent is Not Running
[oracle@v1ex1dbadm01 bin]$ ./emctl start agent
Oracle Enterprise Manager Cloud Control 13c Release 3
Copyright (c) 1996, 2018 Oracle Corporation. All rights reserved.
Starting agent ............................ started.
[oracle@v1ex1dbadm01 bin]$ ./emctl status agent
Oracle Enterprise Manager Cloud Control 13c Release 3
Copyright (c) 1996, 2018 Oracle Corporation. All rights reserved.
---------------------------------------------------------------
Agent Version : 13.3.0.0.0
OMS Version : 13.3.0.0.0
Protocol Version : 12.1.0.1.0
Agent Home : /u01/app/agent/agent_inst
Agent Log Directory : /u01/app/agent/agent_inst/sysman/log
Agent Binaries : /u01/app/agent/agent_13.3.0.0.0
Core JAR Location : /u01/app/agent/agent_13.3.0.0.0/jlib
Agent Process ID : 122257
Parent Process ID : 122120
Agent URL : https://v1ex1dbadm01.v1.com:3872/emd/main/
Local Agent URL in NAT : https://v1ex1dbadm01.v1.com:3872/emd/main/
Repository URL : https://v1oem.v1.com:4903/empbs/upload
Started at : 2020-06-17 15:40:59
Started by user : oracle
Operating System : Linux version 4.1.12-124.30.1.el7uek.x86_64 (amd64)
Number of Targets : 43
Last Reload : (none)
Last successful upload : 2020-06-17 15:41:23
Last attempted upload : 2020-06-17 15:41:23
Total Megabytes of XML files uploaded so far : 0.1
Number of XML files pending upload : 5
Size of XML files pending upload(MB) : 0.02
Available disk space on upload filesystem : 21.34%
Collection Status : Collections enabled
Heartbeat Status : Ok
Last attempted heartbeat to OMS : 2020-06-17 15:41:18
Last successful heartbeat to OMS : 2020-06-17 15:41:18
Next scheduled heartbeat to OMS : 2020-06-17 15:42:20

---------------------------------------------------------------
Agent is Running and Ready
[oracle@v1ex1dbadm01 bin]$

Now on any other compute nodes:

[oracle@v1ex1dbadm01 bin]$ ssh v1ex1dbadm02
Last login: Wed Jun 17 12:37:20 IST 2020 from x.x.x.x on ssh
Last login: Wed Jun 17 15:43:07 2020 from x.x.x.x
[oracle@v1ex1dbadm02 ~]$ cd /u01/app/agent/agent_13.3.0.0.0/bin
[oracle@v1ex1dbadm02 bin]$ ./emctl status agent
Oracle Enterprise Manager Cloud Control 13c Release 3
Copyright (c) 1996, 2018 Oracle Corporation. All rights reserved.
---------------------------------------------------------------
Agent is Not Running
[oracle@v1ex1dbadm02 bin]$ ./emctl start agent
Oracle Enterprise Manager Cloud Control 13c Release 3
Copyright (c) 1996, 2018 Oracle Corporation. All rights reserved.
Starting agent .................................... started.
[oracle@v1ex1dbadm02 bin]$ ./emctl status agent
Oracle Enterprise Manager Cloud Control 13c Release 3
Copyright (c) 1996, 2018 Oracle Corporation. All rights reserved.
---------------------------------------------------------------
Agent Version : 13.3.0.0.0
OMS Version : 13.3.0.0.0
Protocol Version : 12.1.0.1.0
Agent Home : /u01/app/agent/agent_inst
Agent Log Directory : /u01/app/agent/agent_inst/sysman/log
Agent Binaries : /u01/app/agent/agent_13.3.0.0.0
Core JAR Location : /u01/app/agent/agent_13.3.0.0.0/jlib
Agent Process ID : 189737
Parent Process ID : 189513
Agent URL : https://v1ex1dbadm02.v1.com:3872/emd/main/
Local Agent URL in NAT : https://v1ex1dbadm02.v1.com:3872/emd/main/
Repository URL : https://v1oem.v1.com:4903/empbs/upload
Started at : 2020-06-17 15:44:01
Started by user : oracle
Operating System : Linux version 4.1.12-124.30.1.el7uek.x86_64 (amd64)
Number of Targets : 37
Last Reload : (none)
Last successful upload : 2020-06-17 15:44:44
Last attempted upload : 2020-06-17 15:44:45
Total Megabytes of XML files uploaded so far : 0.17
Number of XML files pending upload : 1
Size of XML files pending upload(MB) : 0
Available disk space on upload filesystem : 28.51%
Collection Status : Collections enabled
Heartbeat Status : Ok
Last attempted heartbeat to OMS : 2020-06-17 15:44:28
Last successful heartbeat to OMS : 2020-06-17 15:44:28
Next scheduled heartbeat to OMS : 2020-06-17 15:45:28

---------------------------------------------------------------
Agent is Running and Ready
[oracle@v1ex1dbadm02 bin]$

If you found this blog post useful, please like as well as follow me through my various Social Media avenues available on the sidebar and/or subscribe to this oracle blog via WordPress/e-mail.

Thanks

Zed DBA (Zahid Anwar)

How to Shutdown an Oracle Exadata Machine

There maybe times when you are required to fully shutdown an Oracle Exadata Machine, for example for maintenance.

Below is the My Oracle Support note used to carry out the shutdown:
Steps To Shutdown/Startup The Exadata & RDBMS Services and Cell/Compute Nodes On An Exadata Configuration (Doc ID 1093890.1)

1. Pre-requisites

Ensure you have all the compute nodes and storage cells ILOM addresses and correct passwords.  Otherwise you will not be able to remotely power back on and will require a physical power on using the power button on the front panels.

2. Disable clusterware autostart

First we need to stop clusterware restarting up on reboot.  So logon to your first compute node and disable via dcli (more info on dcli can be found in this blog post) using your correct crs home:

login as: root
root@x.x.x.x's password:
Last login: Wed Jun 10 08:45:30 IST 2020 from x.x.x.x on pts/0
Last login: Wed Jun 10 09:07:17 2020 from x.x.x.x
[root@v1ex1dbadm01 ~]# dcli -g /opt/oracle.SupportTools/onecommand/dbs_group -l root /u01/app/12.1.0.2/grid/bin/crsctl config crs
v1ex1dbadm01: CRS-4622: Oracle High Availability Services autostart is enabled.
v1ex1dbadm02: CRS-4622: Oracle High Availability Services autostart is enabled.
[root@v1ex1dbadm01 ~]# dcli -g /opt/oracle.SupportTools/onecommand/dbs_group -l root /u01/app/12.1.0.2/grid/bin/crsctl disable crs
v1ex1dbadm01: CRS-4621: Oracle High Availability Services autostart is disabled.
v1ex1dbadm02: CRS-4621: Oracle High Availability Services autostart is disabled.
[root@v1ex1dbadm01 ~]#

3. Stop Grid Infrastructure on the cluster

Next we stop clusterware cluster-wide gracefully:

[root@v1ex1dbadm01 ~]# . oraenv
ORACLE_SID = [root] ? +ASM1
The Oracle base has been set to /u01/app/oracle
[root@v1ex1dbadm01 ~]# dcli -g /opt/oracle.SupportTools/onecommand/dbs_group -l root /u01/app/12.1.0.2/grid/bin/crsctl check crs
v1ex1dbadm01: CRS-4638: Oracle High Availability Services is online
v1ex1dbadm01: CRS-4537: Cluster Ready Services is online
v1ex1dbadm01: CRS-4529: Cluster Synchronization Services is online
v1ex1dbadm01: CRS-4533: Event Manager is online
v1ex1dbadm02: CRS-4638: Oracle High Availability Services is online
v1ex1dbadm02: CRS-4537: Cluster Ready Services is online
v1ex1dbadm02: CRS-4529: Cluster Synchronization Services is online
v1ex1dbadm02: CRS-4533: Event Manager is online
[root@v1ex1dbadm01 ~]#
[root@v1ex1dbadm01 ~]# crsctl stat res -t
--------------------------------------------------------------------------------
Name Target State Server State details
--------------------------------------------------------------------------------
Local Resources
--------------------------------------------------------------------------------
ora.DATAC1.dg
ONLINE ONLINE v1ex1dbadm01 STABLE
ONLINE ONLINE v1ex1dbadm02 STABLE
...
ora.RECOC1.dg
ONLINE ONLINE v1ex1dbadm01 STABLE
ONLINE ONLINE v1ex1dbadm02 STABLE
ora.asm
ONLINE ONLINE v1ex1dbadm01 Started,STABLE
ONLINE ONLINE v1ex1dbadm02 Started,STABLE
...
ora.net1.network
ONLINE ONLINE v1ex1dbadm01 STABLE
ONLINE ONLINE v1ex1dbadm02 STABLE
ora.ons
ONLINE ONLINE v1ex1dbadm01 STABLE
ONLINE ONLINE v1ex1dbadm02 STABLE
--------------------------------------------------------------------------------
Cluster Resources
--------------------------------------------------------------------------------
ora.LISTENER_SCAN1.lsnr
1 ONLINE ONLINE v1ex1dbadm02 STABLE
ora.LISTENER_SCAN2.lsnr
1 ONLINE ONLINE v1ex1dbadm01 STABLE
ora.LISTENER_SCAN3.lsnr
1 ONLINE ONLINE v1ex1dbadm01 STABLE
...
1 ONLINE ONLINE v1ex1dbadm01 STABLE
ora.scan1.vip
1 ONLINE ONLINE v1ex1dbadm02 STABLE
ora.scan2.vip
1 ONLINE ONLINE v1ex1dbadm01 STABLE
ora.scan3.vip
1 ONLINE ONLINE v1ex1dbadm01 STABLE
--------------------------------------------------------------------------------
[root@v1ex1dbadm01 ~]#
[root@v1ex1dbadm01 ~]# crsctl stop cluster -all
CRS-2673: Attempting to stop 'ora.crsd' on 'v1ex1dbadm01'
CRS-2790: Starting shutdown of Cluster Ready Services-managed resources on 'v1ex1dbadm01'
...
CRS-2677: Stop of 'ora.ons' on 'v1ex1dbadm01' succeeded 
CRS-2673: Attempting to stop 'ora.net1.network' on 'v1ex1dbadm01' 
CRS-2677: Stop of 'ora.net1.network' on 'v1ex1dbadm01' succeeded 
CRS-2792: Shutdown of Cluster Ready Services-managed resources on 'v1ex1dbadm01' has completed 
CRS-2677: Stop of 'ora.crsd' on 'v1ex1dbadm02' succeeded 
CRS-2673: Attempting to stop 'ora.ctssd' on 'v1ex1dbadm02' 
CRS-2673: Attempting to stop 'ora.evmd' on 'v1ex1dbadm02' 
CRS-2673: Attempting to stop 'ora.storage' on 'v1ex1dbadm02' 
CRS-2677: Stop of 'ora.storage' on 'v1ex1dbadm02' succeeded 
CRS-2673: Attempting to stop 'ora.asm' on 'v1ex1dbadm02' 
CRS-2677: Stop of 'ora.crsd' on 'v1ex1dbadm01' succeeded 
CRS-2673: Attempting to stop 'ora.ctssd' on 'v1ex1dbadm01' 
CRS-2673: Attempting to stop 'ora.evmd' on 'v1ex1dbadm01' 
CRS-2673: Attempting to stop 'ora.storage' on 'v1ex1dbadm01' 
CRS-2677: Stop of 'ora.storage' on 'v1ex1dbadm01' succeeded 
CRS-2673: Attempting to stop 'ora.asm' on 'v1ex1dbadm01' 
CRS-2677: Stop of 'ora.ctssd' on 'v1ex1dbadm02' succeeded 
CRS-2677: Stop of 'ora.evmd' on 'v1ex1dbadm02' succeeded 
CRS-2677: Stop of 'ora.evmd' on 'v1ex1dbadm01' succeeded 
CRS-2677: Stop of 'ora.ctssd' on 'v1ex1dbadm01' succeeded 
CRS-2677: Stop of 'ora.asm' on 'v1ex1dbadm02' succeeded 
CRS-2673: Attempting to stop 'ora.cluster_interconnect.haip' on 'v1ex1dbadm02' 
CRS-2677: Stop of 'ora.cluster_interconnect.haip' on 'v1ex1dbadm02' succeeded 
CRS-2673: Attempting to stop 'ora.cssd' on 'v1ex1dbadm02' 
CRS-2677: Stop of 'ora.cssd' on 'v1ex1dbadm02' succeeded 
CRS-2673: Attempting to stop 'ora.diskmon' on 'v1ex1dbadm02' 
CRS-2677: Stop of 'ora.diskmon' on 'v1ex1dbadm02' succeeded
CRS-2677: Stop of 'ora.asm' on 'v1ex1dbadm01' succeeded 
CRS-2673: Attempting to stop 'ora.cluster_interconnect.haip' on 'v1ex1dbadm01' 
CRS-2677: Stop of 'ora.cluster_interconnect.haip' on 'v1ex1dbadm01' succeeded 
CRS-2673: Attempting to stop 'ora.cssd' on 'v1ex1dbadm01'
CRS-2677: Stop of 'ora.cssd' on 'v1ex1dbadm01' succeeded
CRS-2673: Attempting to stop 'ora.diskmon' on 'v1ex1dbadm01'
CRS-2677: Stop of 'ora.diskmon' on 'v1ex1dbadm01' succeeded
[root@v1ex1dbadm01 ~]#
[root@v1ex1dbadm01 ~]# dcli -g /opt/oracle.SupportTools/onecommand/dbs_group -l root /u01/app/12.1.0.2/grid/bin/crsctl stat res -t
v1ex1dbadm01: CRS-4535: Cannot communicate with Cluster Ready Services
v1ex1dbadm01: CRS-4000: Command Status failed, or completed with errors.
v1ex1dbadm02: CRS-4535: Cannot communicate with Cluster Ready Services
v1ex1dbadm02: CRS-4000: Command Status failed, or completed with errors.
[root@v1ex1dbadm01 ~]#

4. Power off Storage Cells

Now that clusterware is down including ASM, we can power down the storage cells by first shutting down the cell services:

[root@v1ex1dbadm01 ~]# dcli -g /opt/oracle.SupportTools/onecommand/cell_group -l root "cellcli -e alter cell shutdown services all"
v1ex1celadm01:
v1ex1celadm01: Stopping the RS, CELLSRV, and MS services...
v1ex1celadm01: The SHUTDOWN of services was successful.
v1ex1celadm02:
v1ex1celadm02: Stopping the RS, CELLSRV, and MS services...
v1ex1celadm02: The SHUTDOWN of services was successful.
v1ex1celadm03:
v1ex1celadm03: Stopping the RS, CELLSRV, and MS services...
v1ex1celadm03: The SHUTDOWN of services was successful.
[root@v1ex1dbadm01 ~]#

Now the storage cells are shutdown, we can power them off:

[root@v1ex1dbadm01 ~]# dcli -g /opt/oracle.SupportTools/onecommand/cell_group -l root poweroff

v1ex1celadm02:Connection to v1ex1celadm02 closed by remote host.

v1ex1celadm01:Connection to v1ex1celadm01 closed by remote host.

v1ex1celadm03:Connection to v1ex1celadm03 closed by remote host.

[root@v1ex1dbadm01 ~]#

5. Power off Compute Nodes

As we are on the first compute node, we can power this off as shown below:

[root@v1ex1dbadm01 ~]# poweroff

Now we power off the remaining compute node by logging on via ssh:

login as: root
root@x.x.x.x's password:
Last login: Wed Jun 10 08:45:25 IST 2020 from x.x.x.x on ssh
Last login: Wed Jun 10 09:03:41 2020 from x.x.x.x
[root@v1ex1dbadm02 ~]# poweroff

If you have a half or full rack and wish to power off all compute nodes, you can use:

[root@v1ex1dbadm01 ~]# dcli -g /opt/oracle.SupportTools/onecommand/dbs_group_all_but_first -l root poweroff

Then power off the first node:

[root@v1ex1dbadm01 ~]# poweroff

Carry out your maintenance and when completed, you can restart the Oracle Exadata Machine by follow my blog post:
How to Startup an Oracle Exadata Machine

If you found this blog post useful, please like as well as follow me through my various Social Media avenues available on the sidebar and/or subscribe to this oracle blog via WordPress/e-mail.

Thanks

Zed DBA (Zahid Anwar)

How to obtain the serial numbers on an Oracle Exadata Machine

You may be required to obtain serial numbers from an Oracle Exadata Machine, for example to confirm correct hardware for part replacement like disk controller battery (X4-2 and older) or disk, etc.

Below is how to obtain serials for each component.

Exadata Machine

To obtain the serial number of the Exadata Machine itself:

[root@v1ex1dbadm01 ~]# ipmitool sunoem cli "show /SP system_identifier" | grep "system_identifier ="
system_identifier = Exadata Database Machine X5-2 AK00XXXXXX
[root@v1ex1dbadm01 ~]#

Compute Nodes

To obtain the serial number of the compute nodes via dcli:

[root@v1ex1dbadm01 ~]# dcli -g /opt/oracle.SupportTools/onecommand/dbs_group -l root dmidecode -s system-serial-number
v1ex1dbadm01: 1514NMXXXX
v1ex1dbadm02: 1514NMXXXX
[root@v1ex1dbadm01 ~]#

Storage Cells

To obtain the serial number of the storage cells via dcli:

[root@v1ex1dbadm01 ~]# dcli -g /opt/oracle.SupportTools/onecommand/cell_group -l root dmidecode -s system-serial-number
v1ex1celadm01: 1515NMXXXX
v1ex1celadm02: 1515NMXXXX
v1ex1celadm03: 1515NMXXXX
[root@v1ex1dbadm01 ~]#

InfiniBands

To obtain the serial number of the InfiniBands via dcli:

[root@v1ex1dbadm01 ~]# dcli -g /opt/oracle.SupportTools/onecommand/ib_group -l root showfruinfo | grep -a Sun_Serial_Number
v1ex1sw-iba01: Sun_Serial_Number : XXXXXXT+1512RRXXXX
v1ex1sw-iba01: Sun_Serial_Number : AK0029XXXX
v1ex1sw-ibb01: Sun_Serial_Number : XXXXXXT+1512RRXXXX
v1ex1sw-ibb01: Sun_Serial_Number : AK0029XXXX
[root@v1ex1dbadm01 ~]#

 

If you found this blog post useful, please like as well as follow me through my various Social Media avenues available on the sidebar and/or subscribe to this oracle blog via WordPress/e-mail.

Thanks

Zed DBA (Zahid Anwar)

Exadata OL7 session disconnects after 10 minutes

When upgrading to Exadata software 19c (release 19.1.0.0.0 and above) the compute nodes (database servers) upgrade to Oracle Linux 7.  As part of this upgrade the sshd ServerAliveInterval settings are changed to a value of 600 for STIG (Security Technical Implementation Guide) purposes as detailed in the My Oracle Support Note below:

Changed sshd setting “Clientaliveinterval” after updating Exadata Database Nodes (domU, dom0 and physical) (Doc ID 2501968.1)

When updating Exadata Database nodes (dom0, domu and physical) running either Oracle Linux 6 or Oracle Linux 7, “sshd Clientaliveinterval” settings are changed to a value of 600 for STIG purposes via unpublished bug 28204681.

This will result in your ssh connection being closed after being idle for 600 seconds while before this would not happen before 86400 seconds passed. While for the same security reasons, it’s not recommended to undo this change, it will be the choice of the operator and he/she is free to do so.

This means your connections to the Exadata Machines disconnect after 10 minutes of inactivity 😦 :

[AnwarZ@v1proxy1 ~]$ date;ssh oracle@v1ex1dbadm01;date
Thu May 21 15:27:31 IST 2020
oracle@v1ex1dbadm01's password:
Last login: Thu May 21 15:27:31 IST 2020 from x.x.x.x on pts/0
Last login: Thu May 21 15:27:40 2020 from x.x.x.x
[oracle@v1ex1dbadm01 ~]$ Connection to x.x.x.x closed by remote host.
Connection to x.x.x.x closed.
[AnwarZ@v1proxy1 ~]$date
Thu May 21 15:37:40 IST 2020
[AnwarZ@v1proxy1 ~]$

As per the MOS note, the recommendation is to not change ClientAliveInterval on the compute nodes but to use the flags options ServerAliveInterval and ServerAliveCountMax on the ssh connection as shown below:

[AnwarZ@v1proxy1 ~]$ date;ssh -o ServerAliveInterval=550 -o ServerAliveCountMax=157 oracle@v1ex1dbadm01;date
Thu May 21 15:41:29 IST 2020
oracle@v1ex1dbadm01's password:
Last login: Thu May 21 15:27:40 IST 2020 from x.x.x.x on pts/0
Last login: Thu May 21 15:41:37 2020 from x.x.x.x
[oracle@v1ex1dbadm01 ~]$ date
Thu May 21 15:55:10 IST 2020
[oracle@v1ex1dbadm01 ~]$

This session didn’t disconnect and a manual ‘date‘ show it’s greater then 10 minutes 🙂

This is because the ServerAliveInterval=550 ensure that a null packet is sent every 550 seconds from the client side, this ensures the server will not disconnect the session as this is less then the ClientAliveInterval=600 on the compute nodes.  The ServerAliveCountMax is multiplied with the ServerAliveInterval value to determine the maximum amount of time the session can be idle before disconnecting the session back in line with the previous standard of 86400.

Alternatively if you are using program like putty you can set in the settings to the same affect:

putty-keep-alive

It also appears from the MOS note, that this can affect OL6 on higher Exadata releases when the STIG recommendations were implemented.  In which case same workaround can be used.

 

If you found this blog post useful, please like as well as follow me through my various Social Media avenues available on the sidebar and/or subscribe to this oracle blog via WordPress/e-mail.

Thanks

Zed DBA (Zahid Anwar)