ODC Appreciation Day : Oracle dcli Utility

What is ODC Appreciation Day?

The Oracle Developer Community (ODC) Appreciation Day formally known as OTN Appreciation Day, is a great initiative by Tim Hall aka Oracle-Base.com.  Where we take the opportunity to say thanks to the Oracle Developer Community #ThanksODC.

More info on Oracle Developer Community can be found here:
About Oracle Developer Community

When is it?

This year, it is on Thursday 11th October 2018.

You can see my previous post here:
2017 – ODC Appreciation Day : Oracle Exadata Database Machine

You can see a summary of previous years blog posts here:
2016 – OTN Appreciation Day : Summary
2017 – ODC Appreciation Day 2017 : It’s a Wrap (#ThanksODC)

My Contribution : Oracle dcli Utility

When thinking of a subject, Oracle’s dcli Utility on Oracle Exadata Database Machine came to mind due to the frequent use 🙂

What is dcli Utility?

Distributed Command Line Interface (dcli), which it’s main purpose is to execute commands on storage cells on Exadata in parallel.  Which actually is just a Python script.  Those who don’t know Exadata, it’s an Engineered Systems which includes storage in the form of storage cells i.e. servers that have multiple disks utilised by Automatic Storage Management (ASM).  However, the smaller of offer still has 3 storage cells that can go up to 18 storage cells in a rack (elastic configuration).  More info on Exadata can be found here on the latest datasheet (at time of writing):
Exadata X7-2 Datasheet

So as you can imagine, executing commands on 3 servers is tidiuos enough, let alone 18, hence the power and usefulness of dcli!  I don’t just use it for storage cells but compute nodes (database servers), as well as the InfiniBand switches 🙂

More info on dcli can be found in the Exadata Documentation:
Exadata System Software User’s Guide -> 9 Using the dcli Utility

Example usage

Quickly see the version of Exadata Software on your Exadata Machine:

Storage Cells:

[root@v1ex1dbadm01 ~]# dcli -g /opt/oracle.SupportTools/onecommand/cell_group -l root imageinfo | grep "Kernel version\|Active image version"
v1ex1celadm01: Kernel version: 4.1.12-94.8.4.el6uek.x86_64 #2 SMP Sat May 5 16:14:51 PDT 2018 x86_64
v1ex1celadm01: Active image version: 18.1.7.0.0.180821
v1ex1celadm02: Kernel version: 4.1.12-94.8.4.el6uek.x86_64 #2 SMP Sat May 5 16:14:51 PDT 2018 x86_64
v1ex1celadm02: Active image version: 18.1.7.0.0.180821
v1ex1celadm03: Kernel version: 4.1.12-94.8.4.el6uek.x86_64 #2 SMP Sat May 5 16:14:51 PDT 2018 x86_64
v1ex1celadm03: Active image version: 18.1.7.0.0.180821
[root@v1ex1dbadm01 ~]#

Compute Nodes:

[root@v1ex1dbadm01 ~]# dcli -g /opt/oracle.SupportTools/onecommand/dbs_group -l root imageinfo | grep "Kernel version\|Image version"
v1ex1dbadm01: Kernel version: 4.1.12-94.8.4.el6uek.x86_64 #2 SMP Sat May 5 16:14:51 PDT 2018 x86_64
v1ex1dbadm01: Image version: 18.1.7.0.0.180821
v1ex1dbadm02: Kernel version: 4.1.12-94.8.4.el6uek.x86_64 #2 SMP Sat May 5 16:14:51 PDT 2018 x86_64
v1ex1dbadm02: Image version: 18.1.7.0.0.180821
[root@v1ex1dbadm01 ~]#

InfiniBands:

[root@v1ex1dbadm01 ~]# dcli -g /opt/oracle.SupportTools/onecommand/ib_group -l root version | grep "version"
v1ex1sw-iba01: SUN DCS 36p version: 2.2.9-3
v1ex1sw-iba01: BIOS version: SUN0R100
v1ex1sw-ibb01: SUN DCS 36p version: 2.2.9-3
v1ex1sw-ibb01: BIOS version: SUN0R100
[root@v1ex1dbadm01 ~]#

The usage are endless 🙂

When I get a chance, I will create a more in depth blog post about dcli including, how to setup, etc.  I will add the link here, for ease of reference.

Finally Happy ODC Appreciation Day! #ThanksODC #ThanksOTN 🙂

If you found this blog post useful, please like as well as follow me through my various Social Media avenues available on the sidebar and/or subscribe to this oracle blog via WordPress/e-mail.

Thanks

Zed DBA (Zahid Anwar)

Resolving Slow Performance, Skipped Checks and Timeouts on Exa Check (exachk)

Background

For more information with regards to Exa Check, please read the following post:
How to use Oracle Exadata Database Machine Exa Check (exachk)

Slow Performance, Skipped Checks and Timeouts

When running the latest exachk (at time of writing, version 18.3.0_20180808), you may notice it takes a long time to run compared to the past.  This is due to the vast amount of additional checks carried out by the tool.  Due to this, you may also notice you get timeout issues reported in the report:

Killed Processes

exachk found that below commands were killed during the run, so some checks might have failed to execute properly. Refer to the “Slow Performance, Skipped Checks, and Timeouts” section of the user guide for corrective actions.

Killed check Manage ASM Audit File Directory Growth with cron (CHECK-ID: 9DEBED7B8DAB583DE040E50A1EC01BA0) at v1ex2dbadm01 because it timed out
Killed check Manage ASM Audit File Directory Growth with cron (CHECK-ID: 9DEBED7B8DAB583DE040E50A1EC01BA0) at v1ex2dbadm02 because it timed out

 

If you refer to the documentation “Slow Performance, Skipped Checks, and Timeouts“, you’ll see there are various parameters you can set in your environment to increase the default timeouts, which I have done below:

[root@v1ex2dbadm01 exachk]# export RAT_TIMEOUT=300
[root@v1ex2dbadm01 exachk]# export RAT_ROOT_TIMEOUT=900
[root@v1ex2dbadm01 exachk]# export RAT_PASSWORDCHECK_TIMEOUT=10
[root@v1ex2dbadm01 exachk]# export RAT_PROMPT_TIMEOUT=30
[root@v1ex2dbadm01 exachk]# export RAT_PROMPT_WAIT_TIMEOUT=60
[root@v1ex2dbadm01 exachk]# export RAT_REMOTE_RUN_TIMEOUT=10800
[root@v1ex2dbadm01 exachk]#
[root@v1ex2dbadm01 exachk]# env | grep RAT
RAT_ROOT_TIMEOUT=900
RAT_PROMPT_TIMEOUT=30
RAT_TIMEOUT=300
RAT_REMOTE_RUN_TIMEOUT=10800
RAT_PASSWORDCHECK_TIMEOUT=10
RAT_PROMPT_WAIT_TIMEOUT=60
[root@v1ex2dbadm01 exachk]#

Now when you run exachk, it will wait longer before killing processes.

In addition, if you run the “-dbparallelmax” option, you will increase the number of slave processes for database checks:

[root@v1ex2dbadm01 exachk]# ./exachk -dbparallelmax

PLEASE NOTE: This will consume more resources but will run quicker, so use with caution.  Alternatively you can run with “-dbparallel” with a acceptable number of processes and increase as per your requirements.

Now you should not have any timeouts and if you still do, then you will need to review the parameters above and increase again.  Alternatively raise an Support Request with Oracle Support.

 

If you found this blog post useful, please like as well as follow me through my various Social Media avenues available on the sidebar and/or subscribe to this oracle blog via WordPress/e-mail.

Thanks

Zed DBA (Zahid Anwar)