Resolving Slow Performance, Skipped Checks and Timeouts on Exa Check (exachk)

Background

For more information with regards to Exa Check, please read the following post:
How to use Oracle Exadata Database Machine Exa Check (exachk)

Slow Performance, Skipped Checks and Timeouts

When running the latest exachk (at time of writing, version 18.3.0_20180808), you may notice it takes a long time to run compared to the past.  This is due to the vast amount of additional checks carried out by the tool.  Due to this, you may also notice you get timeout issues reported in the report:

Killed Processes

exachk found that below commands were killed during the run, so some checks might have failed to execute properly. Refer to the “Slow Performance, Skipped Checks, and Timeouts” section of the user guide for corrective actions.

Killed check Manage ASM Audit File Directory Growth with cron (CHECK-ID: 9DEBED7B8DAB583DE040E50A1EC01BA0) at v1ex2dbadm01 because it timed out
Killed check Manage ASM Audit File Directory Growth with cron (CHECK-ID: 9DEBED7B8DAB583DE040E50A1EC01BA0) at v1ex2dbadm02 because it timed out

 

If you refer to the documentation “Slow Performance, Skipped Checks, and Timeouts“, you’ll see there are various parameters you can set in your environment to increase the default timeouts, which I have done below:

[root@v1ex2dbadm01 exachk]# export RAT_TIMEOUT=300
[root@v1ex2dbadm01 exachk]# export RAT_ROOT_TIMEOUT=900
[root@v1ex2dbadm01 exachk]# export RAT_PASSWORDCHECK_TIMEOUT=10
[root@v1ex2dbadm01 exachk]# export RAT_PROMPT_TIMEOUT=30
[root@v1ex2dbadm01 exachk]# export RAT_PROMPT_WAIT_TIMEOUT=60
[root@v1ex2dbadm01 exachk]# export RAT_REMOTE_RUN_TIMEOUT=10800
[root@v1ex2dbadm01 exachk]#
[root@v1ex2dbadm01 exachk]# env | grep RAT
RAT_ROOT_TIMEOUT=900
RAT_PROMPT_TIMEOUT=30
RAT_TIMEOUT=300
RAT_REMOTE_RUN_TIMEOUT=10800
RAT_PASSWORDCHECK_TIMEOUT=10
RAT_PROMPT_WAIT_TIMEOUT=60
[root@v1ex2dbadm01 exachk]#

Now when you run exachk, it will wait longer before killing processes.

In addition, if you run the “-dbparallelmax” option, you will increase the number of slave processes for database checks:

[root@v1ex2dbadm01 exachk]# ./exachk -dbparallelmax

PLEASE NOTE: This will consume more resources but will run quicker, so use with caution.  Alternatively you can run with “-dbparallel” with a acceptable number of processes and increase as per your requirements.

Now you should not have any timeouts and if you still do, then you will need to review the parameters above and increase again.  Alternatively raise an Support Request with Oracle Support.

 

If you found this blog post useful, please like as well as follow me through my various Social Media avenues available on the sidebar and/or subscribe to this oracle blog via WordPress/e-mail.

Thanks

Zed DBA (Zahid Anwar)

Advertisements

Extending the root LVM Partition on Exadata

On an Oracle Exadata Database Machine, the ‘/’ (root) is defaulted to a size of 30Gb, which can easily fill up.  Luckily this is just a Logical Volume and there’s normally lots of space available on the Logical Volume Group which is usually untapped.

Extending ‘/’

Identify how much space is used and free on ‘/’ using df:

[root@v1ex1dbadm01 ~]# df -h /
Filesystem            Size  Used Avail Use% Mounted on
/dev/mapper/VGExaDb-LVDbSys1
                       30G   22G  6.2G  79% /
[root@v1ex1dbadm01 ~]#

Display the current logical volume configuration using the lvs command:

[root@v1ex1dbadm01 ~]# lvs -o lv_name,lv_path,vg_name,lv_size
  LV                 Path                            VG      LSize
  LVDbOra1           /dev/VGExaDb/LVDbOra1           VGExaDb 200.00g
  LVDbSwap1          /dev/VGExaDb/LVDbSwap1          VGExaDb  24.00g
  LVDbSys1           /dev/VGExaDb/LVDbSys1           VGExaDb  30.00g
  LVDbSys2           /dev/VGExaDb/LVDbSys2           VGExaDb  30.00g
  LVDoNotRemoveOrUse /dev/VGExaDb/LVDoNotRemoveOrUse VGExaDb   1.00g
[root@v1ex1dbadm01 ~]#

PLEASE NOTE: On Exadata there are 2 SYS volumes, of which one is active and the other inactive.  These are used when patching the compute node, as one is a backup of the current and is used for rollback purposes.

Check the online resize option is available using the tune2fs command:

[root@v1ex1dbadm01 ~]# tune2fs -l /dev/VGExaDb/LVDbSys1 | grep resize_inode
Filesystem features:      has_journal ext_attr resize_inode dir_index filetype needs_recovery extent flex_bg sparse_super large_file huge_file uninit_bg dir_nlink extra_isize
[root@v1ex1dbadm01 ~]# tune2fs -l /dev/VGExaDb/LVDbSys2 | grep resize_inode
Filesystem features:      has_journal ext_attr resize_inode dir_index filetype needs_recovery extent flex_bg sparse_super large_file huge_file uninit_bg dir_nlink extra_isize
[root@v1ex1dbadm01 ~]#

If not available then the file system needs to be unmounted before resizing.  Refer to documentation:

Extending the root LVM Partition on Systems Running Oracle Exadata System Software Earlier than Release 11.2.3.2.1

Verify there’s space available in the Logical Volume Group using vgdisplay command:

[root@v1ex1dbadm01 ~]# vgdisplay -s
  "VGExaDb" 1.63 TiB  [285.00 GiB used / 1.36 TiB free]
[root@v1ex1dbadm01 ~]#

Finally if there’s enough space, then extend the Logical Volumes using lvextend command:

[root@v1ex1dbadm01 ~]# lvextend -L +70G /dev/VGExaDb/LVDbSys1
  Size of logical volume VGExaDb/LVDbSys1 changed from 30.00 GiB (7680 extents) to 100.00 GiB (25600 extents).
  Logical volume LVDbSys1 successfully resized.
[root@v1ex1dbadm01 ~]#
[root@v1ex1dbadm01 ~]# lvextend -L +70G /dev/VGExaDb/LVDbSys2
  Size of logical volume VGExaDb/LVDbSys2 changed from 30.00 GiB (7680 extents) to 100.00 GiB (25600 extents).
  Logical volume LVDbSys2 successfully resized.
[root@v1ex1dbadm01 ~]#

Followed by a resize of the file system using resize2fs command:

[root@v1ex1dbadm01 ~]# resize2fs /dev/VGExaDb/LVDbSys1
resize2fs 1.43-WIP (20-Jun-2013)
Filesystem at /dev/VGExaDb/LVDbSys1 is mounted on /; on-line resizing required
old_desc_blocks = 2, new_desc_blocks = 7
Performing an on-line resize of /dev/VGExaDb/LVDbSys1 to 26214400 (4k) blocks.
The filesystem on /dev/VGExaDb/LVDbSys1 is now 26214400 blocks long.
[root@v1ex1dbadm01 ~]#

The inactive SYS volume may give you errors as shown below:

[root@v1ex1dbadm01 ~]# resize2fs /dev/VGExaDb/LVDbSys2
resize2fs 1.43-WIP (20-Jun-2013)
Please run 'e2fsck -f /dev/VGExaDb/LVDbSys2' first.
[root@v1ex1dbadm01 ~]#

In which case, you just need to run the command to check the file system for error that may have occurred with journal-ling after unclear shutdown:

[root@v1ex1dbadm01 ~]# e2fsck -f /dev/VGExaDb/LVDbSys2
e2fsck 1.43-WIP (20-Jun-2013)
Pass 1: Checking inodes, blocks, and sizes
Pass 2: Checking directory structure
Pass 3: Checking directory connectivity
Pass 4: Checking reference counts
Pass 5: Checking group summary information
/dev/VGExaDb/LVDbSys2: 111629/1966080 files (0.1% non-contiguous), 5031185/7864320 blocks
[root@v1ex1dbadm01 ~]#

Now re-run the resize of the file system using resize2fs command:

[root@v1ex1dbadm01 ~]# resize2fs /dev/VGExaDb/LVDbSys2
resize2fs 1.43-WIP (20-Jun-2013)
Resizing the filesystem on /dev/VGExaDb/LVDbSys2 to 26214400 (4k) blocks.
The filesystem on /dev/VGExaDb/LVDbSys2 is now 26214400 blocks long.
[root@v1ex1dbadm01 ~]#

You should now see ‘/’ with additional 70Gb less formatting:

[root@v1ex1dbadm01 ~]# df -h /
Filesystem            Size  Used Avail Use% Mounted on
/dev/mapper/VGExaDb-LVDbSys1
                       99G   22G   72G  24% /
[root@v1ex1dbadm01 ~]#

Also see the Logical Volumes are now 100Gb from 30Gb:

[root@v1ex1dbadm01 ~]# lvs -o lv_name,lv_path,vg_name,lv_size
  LV                 Path                            VG      LSize
  LVDbOra1           /dev/VGExaDb/LVDbOra1           VGExaDb 200.00g
  LVDbSwap1          /dev/VGExaDb/LVDbSwap1          VGExaDb  24.00g
  LVDbSys1           /dev/VGExaDb/LVDbSys1           VGExaDb 100.00g
  LVDbSys2           /dev/VGExaDb/LVDbSys2           VGExaDb 100.00g
  LVDoNotRemoveOrUse /dev/VGExaDb/LVDoNotRemoveOrUse VGExaDb   1.00g
[root@v1ex1dbadm01 ~]#

Documentation for reference:
Extending the root LVM Partition on Systems Running Oracle Exadata System Software Release 11.2.3.2.1 or Later

Related Post:
Extending a Non-root LVM Partition on Exadata

If you found this blog post useful, please like as well as follow me through my various Social Media avenues available on the sidebar and/or subscribe to this oracle blog via WordPress/e-mail.

Thanks

Zed DBA (Zahid Anwar)

Oracle Database Backup Service Fails with: ORA-19511: – KBHS-00715: HTTP error occurred ‘oracle-error’ – ORA-29024

I discovered an Oracle Cloud Database Backup failing with:

Starting backup at 2018/09/08 20:00:04
current log archived
RMAN-00571: ===========================================================
RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS ===============
RMAN-00571: ===========================================================
RMAN-03002: failure of backup plus archivelog command at 09/08/2018 20:00:07
ORA-19554: error allocating device, device type: SBT_TAPE, device name:
ORA-27023: skgfqsbi: media manager protocol error
ORA-19511: non RMAN, but media manager or vendor specific failure, error text:
   KBHS-00715: HTTP error occurred 'oracle-error'
KBHS-00712: ORA-29024 received from local HTTP service

Recovery Manager complete.

Upon investigation I found the following metalink note:
RMAN Backup to Oracle Database Backup Cloud Service fails with KBHS-00715 ORA-29024 (Doc ID 2360941.1)

The “ORA-29024” is raised due to incorrect certificate chain.  This issue was investigated in Bug 27402663, however no fix is needed.  The later library versions will by default include the trusted certificate workaround for the issue.

So the solution is to re-install the cloud backup module with the “-trustedCerts” option:

[oracle@V1LOEM ~]$ cd /u01/oracle_stage/cloud/
[oracle@V1LOEM cloud]$ . oraenv
ORACLE_SID = [oracle] ? EMREPOS
The Oracle base for ORACLE_HOME=/u01/app/oracle/product/12.1.0/dbhome_1 is /u01/app/oracle
[oracle@V1LOEM cloud]$ java -jar opc_install.jar -host https://em2.storage.oraclecloud.com/v1/Storage-aXXX -opcId 'oraclecloudbackup@version1.com' -opcPass 'xxx' -walletDir '/u01/oracle/opc_wallet' -libDir $ORACLE_HOME/lib -debug -trustedCerts
Oracle Database Cloud Backup Module Install Tool, build 2017-05-04
Debug: os.name        = Linux
Debug: os.arch        = amd64
Debug: os.version     = 3.8.13-98.1.2.el6uek.x86_64
Debug: file.separator = /
Debug: Platform = PLATFORM_LINUX64
Debug: OPC Account Verification: <?xml version="1.0" encoding="UTF-8" standalone="yes"?><account name="Storage-aXXX"><container><name>oracle-data-storagea-1</name><count>944</count><bytes>52740693928</bytes><accountId><id>XXX</id></accountId><deleteTimestamp>0.0</deleteTimestamp><containerId><id>XXX</id></containerId></container></account>
Oracle Database Cloud Backup Module credentials are valid.
Debug: Certificate Success:
       Subject  : CN=DigiCert Global Root CA, OU=www.digicert.com, O=DigiCert Inc, C=US
       Validity : Fri Nov 10 00:00:00 GMT 2006 - Mon Nov 10 00:00:00 GMT 2031
       Issuer   : CN=DigiCert Global Root CA, OU=www.digicert.com, O=DigiCert Inc, C=US
Oracle Database Cloud Backup Module wallet created in directory /u01/oracle/opc_wallet.
Oracle Database Cloud Backup Module initialization file /u01/app/oracle/product/12.1.0/dbhome_1/dbs/opcEMREPOS.ora created.
Downloading Oracle Database Cloud Backup Module Software Library from file opc_linux64.zip.
Debug: Temp zip file = /tmp/opc_linux648852138548086808899.zip
Debug: Downloaded 27342262 bytes in 13 seconds.
Debug: Transfer rate was 2103250 bytes/second.
Download complete.
Debug: Delete RC = true

Now test the latest Cloud Backup Module:

[oracle@V1LOEM cloud]$ rman target /

Recovery Manager: Release 12.1.0.2.0 - Production on Mon Sep 10 13:10:17 2018

Copyright (c) 1982, 2014, Oracle and/or its affiliates.  All rights reserved.

connected to target database: EMREPOS (DBID=XXX)

RMAN> delete obsolete recovery window of 8 days device type sbt;

using target database control file instead of recovery catalog
allocated channel: ORA_SBT_TAPE_1
channel ORA_SBT_TAPE_1: SID=794 device type=SBT_TAPE
channel ORA_SBT_TAPE_1: Oracle Database Backup Service Library VER=12.2.0.2
allocated channel: ORA_SBT_TAPE_2
channel ORA_SBT_TAPE_2: SID=785 device type=SBT_TAPE
channel ORA_SBT_TAPE_2: Oracle Database Backup Service Library VER=12.2.0.2
allocated channel: ORA_SBT_TAPE_3
channel ORA_SBT_TAPE_3: SID=407 device type=SBT_TAPE
channel ORA_SBT_TAPE_3: Oracle Database Backup Service Library VER=12.2.0.2
allocated channel: ORA_SBT_TAPE_4
channel ORA_SBT_TAPE_4: SID=1169 device type=SBT_TAPE
channel ORA_SBT_TAPE_4: Oracle Database Backup Service Library VER=12.2.0.2
allocated channel: ORA_SBT_TAPE_5
channel ORA_SBT_TAPE_5: SID=416 device type=SBT_TAPE
channel ORA_SBT_TAPE_5: Oracle Database Backup Service Library VER=12.2.0.2
allocated channel: ORA_SBT_TAPE_6
channel ORA_SBT_TAPE_6: SID=1167 device type=SBT_TAPE
channel ORA_SBT_TAPE_6: Oracle Database Backup Service Library VER=12.2.0.2
allocated channel: ORA_SBT_TAPE_7
channel ORA_SBT_TAPE_7: SID=782 device type=SBT_TAPE
channel ORA_SBT_TAPE_7: Oracle Database Backup Service Library VER=12.2.0.2
allocated channel: ORA_SBT_TAPE_8
channel ORA_SBT_TAPE_8: SID=1164 device type=SBT_TAPE
channel ORA_SBT_TAPE_8: Oracle Database Backup Service Library VER=12.2.0.2
Deleting the following obsolete backups and copies:
Type                 Key    Completion Time    Filename/Handle
-------------------- ------ ------------------ --------------------
Backup Set           18192  30-AUG-18
  Backup Piece       18192  30-AUG-18          pptbsjl2_1_1
...
  Backup Piece       18213  01-SEP-18          qutc1sae_1_1

Do you really want to delete the above objects (enter YES or NO)? no

RMAN> exit

Recovery Manager complete.
[oracle@V1LOEM cloud]$

Everything working again 🙂

Now you have the latest library version, by default you have the workaround and can now omit the -trustedCerts option:

[oracle@V1LOEM cloud]$ java -jar opc_install.jar -host https://em2.storage.oraclecloud.com/v1/Storage-aXXX -opcId 'oraclecloudbackup@version1.com' -opcPass 'xxx' -walletDir '/u01/oracle/opc_wallet' -libDir $ORACLE_HOME/lib -debug
Oracle Database Cloud Backup Module Install Tool, build 2017-05-04
Debug: os.name = Linux
Debug: os.arch = amd64
Debug: os.version = 3.8.13-98.1.2.el6uek.x86_64
Debug: file.separator = /
Debug: Platform = PLATFORM_LINUX64
Debug: OPC Account Verification: <?xml version="1.0" encoding="UTF-8" standalone="yes"?><account name="Storage-aXXX"><container><name>oracle-data-storagea-1</name><count>944</count><bytes>52740693928</bytes><accountId><id>XXX</id></accountId><deleteTimestamp>0.0</deleteTimestamp><containerId><id>XXX</id></containerId></container></account>
Oracle Database Cloud Backup Module credentials are valid.
Debug: Certificate Success:
Subject : CN=DigiCert Global Root CA, OU=www.digicert.com, O=DigiCert Inc, C=US
Validity : Fri Nov 10 00:00:00 GMT 2006 - Mon Nov 10 00:00:00 GMT 2031
Issuer : CN=DigiCert Global Root CA, OU=www.digicert.com, O=DigiCert Inc, C=US
Oracle Database Cloud Backup Module wallet created in directory /u01/oracle/opc_wallet.
Oracle Database Cloud Backup Module initialization file /u01/app/oracle/product/12.1.0/dbhome_1/dbs/opcEMREPOS.ora created.
Downloading Oracle Database Cloud Backup Module Software Library from file opc_linux64.zip.
Debug: Temp zip file = /tmp/opc_linux647954567264150845107.zip
Debug: Downloaded 27342262 bytes in 15 seconds.
Debug: Transfer rate was 1822817 bytes/second.
Download complete.
Debug: Delete RC = true
[oracle@V1LOEM cloud]$

Now test again, to ensure still working:

[oracle@V1LOEM cloud]$ rman target /

Recovery Manager: Release 12.1.0.2.0 - Production on Mon Sep 10 13:12:45 2018

Copyright (c) 1982, 2014, Oracle and/or its affiliates. All rights reserved.

connected to target database: EMREPOS (DBID=XXX)

RMAN> delete obsolete recovery window of 8 days device type sbt;

using target database control file instead of recovery catalog
allocated channel: ORA_SBT_TAPE_1
channel ORA_SBT_TAPE_1: SID=794 device type=SBT_TAPE
channel ORA_SBT_TAPE_1: Oracle Database Backup Service Library VER=12.2.0.2
allocated channel: ORA_SBT_TAPE_2
channel ORA_SBT_TAPE_2: SID=785 device type=SBT_TAPE
channel ORA_SBT_TAPE_2: Oracle Database Backup Service Library VER=12.2.0.2
allocated channel: ORA_SBT_TAPE_3
channel ORA_SBT_TAPE_3: SID=407 device type=SBT_TAPE
channel ORA_SBT_TAPE_3: Oracle Database Backup Service Library VER=12.2.0.2
allocated channel: ORA_SBT_TAPE_4
channel ORA_SBT_TAPE_4: SID=1169 device type=SBT_TAPE
channel ORA_SBT_TAPE_4: Oracle Database Backup Service Library VER=12.2.0.2
allocated channel: ORA_SBT_TAPE_5
channel ORA_SBT_TAPE_5: SID=416 device type=SBT_TAPE
channel ORA_SBT_TAPE_5: Oracle Database Backup Service Library VER=12.2.0.2
allocated channel: ORA_SBT_TAPE_6
channel ORA_SBT_TAPE_6: SID=1167 device type=SBT_TAPE
channel ORA_SBT_TAPE_6: Oracle Database Backup Service Library VER=12.2.0.2
allocated channel: ORA_SBT_TAPE_7
channel ORA_SBT_TAPE_7: SID=782 device type=SBT_TAPE
channel ORA_SBT_TAPE_7: Oracle Database Backup Service Library VER=12.2.0.2
allocated channel: ORA_SBT_TAPE_8
channel ORA_SBT_TAPE_8: SID=1164 device type=SBT_TAPE
channel ORA_SBT_TAPE_8: Oracle Database Backup Service Library VER=12.2.0.2
Deleting the following obsolete backups and copies:
Type Key Completion Time Filename/Handle
-------------------- ------ ------------------ --------------------
Backup Set 18192 30-AUG-18
Backup Piece 18192 30-AUG-18 pptbsjl2_1_1
...
Backup Piece 18213 01-SEP-18 qutc1sae_1_1

Do you really want to delete the above objects (enter YES or NO)? no

RMAN> exit

Recovery Manager complete.
[oracle@V1LOEM cloud]$

Related Posts

Oracle Database Backup Service Fails with: ORA-19511: – KBHS-00715: HTTP error occurred ‘oracle-error’ – ORA-28750

If you found this blog post useful, please like as well as follow me through my various Social Media avenues available on the sidebar and/or subscribe to this oracle blog via WordPress/e-mail.

Thanks

Zed DBA (Zahid Anwar)

Session using a database link hangs on “SQL*Net more data from dblink”

I have a client who recently move a database server from on-premise to a Cloud provider.  A database on this database server had a database link to their E-business database in the Oracle Cloud.  Since the move, any sessions in the database that use the database link to the E-business database would hang if the query was to return large dataset.

Below is selecting from dual over the database link that worked:

SQL> set timing on
SQL> set autotrace on
SQL> select * from dual@ebs;

D
-
X

Elapsed: 00:00:00.13

Execution Plan
----------------------------------------------------------
Plan hash value: 272002086

----------------------------------------------------------------------------------------
| Id  | Operation              | Name | Rows  | Bytes | Cost (%CPU)| Time     | Inst   |
----------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT REMOTE|      |     1 |     2 |     2   (0)| 00:00:01 |        |
|   1 |  TABLE ACCESS FULL     | DUAL |     1 |     2 |     2   (0)| 00:00:01 | PWGPSI |
----------------------------------------------------------------------------------------

Note
-----
   - fully remote statement


Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
          0  consistent gets
          0  physical reads
          0  redo size
        511  bytes sent via SQL*Net to client
        492  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
          1  rows processed

SQL>

But selecting from a table hangs:

SQL> select * from XXX_PER_PEOPLE_F@ebs;

Hangs!......

The session in the source database shows the wait “SQL*Net more data from dblink”:

       SID USERNAME       STATUS   MACHINE                         TERMINAL                       PROGRAM                                          EVENT                                        WAIT_CLASS                   WAIT_TIME SECONDS_IN_WAIT STATE

---------- -------------- -------- ------------------------------- ------------------------------ ------------------------------------------------ -------------------------------------------- ---------------------------- --------- --------------- -------------------

       144 XXXXXXXXX      ACTIVE   XXX                             unknown                        sqlplus@xxxxx.xxxxx.xxx (TNS V1-V3)              SQL*Net more data from dblink                Network                      0         341             WAITING

The session in the target database shows the wait “SQL*Net message from client”:

SID USERNAME EVENT                       SQL_ID   MACHINE       PROGRAM                        PROCESS 
--- -------- --------------------------- -------- ------------- ------------------------------ ---------- 
35  XXX      SQL*Net message from client XXX      XXX.XXX.XXX   oracle@XXX.XXX.XXX (TNS V1-V3) 15340

I traced the source session and could see the session hangs waiting for more data from the target (EBS) database:

WAIT #2: nam='SQL*Net message to client' ela= 2 driver id=1650815232 #bytes=1 p3=0 obj#=-1 tim=1495920328443464

*** 2018-07-17 11:13:47.934

WAIT #2: nam='SQL*Net message from client' ela= 11336000 driver id=1650815232 #bytes=1 p3=0 obj#=-1 tim=1495920339779586
WAIT #0: nam='single-task message' ela= 179653 p1=0 p2=0 p3=0 obj#=-1 tim=1495920339960679
WAIT #0: nam='SQL*Net message to dblink' ela= 1 driver id=1413697536 #bytes=1 p3=0 obj#=-1 tim=1495920339989423
WAIT #0: nam='SQL*Net message from dblink' ela= 29064 driver id=1413697536 #bytes=1 p3=0 obj#=-1 tim=1495920340018526
WAIT #0: nam='SQL*Net message to dblink' ela= 0 driver id=1413697536 #bytes=1 p3=0 obj#=-1 tim=1495920340018982
WAIT #0: nam='SQL*Net message from dblink' ela= 30534 driver id=1413697536 #bytes=1 p3=0 obj#=-1 tim=1495920340049542
WAIT #0: nam='SQL*Net message to dblink' ela= 1 driver id=1413697536 #bytes=1 p3=0 obj#=-1 tim=1495920340049793
WAIT #0: nam='SQL*Net message from dblink' ela= 42171 driver id=1413697536 #bytes=1 p3=0 obj#=-1 tim=1495920340091996
WAIT #0: nam='SQL*Net more data from dblink' ela= 4 driver id=1413697536 #bytes=17 p3=0 obj#=-1 tim=1495920340092053
WAIT #0: nam='SQL*Net message to dblink' ela= 0 driver id=1413697536 #bytes=1 p3=0 obj#=-1 tim=1495920340092253
WAIT #0: nam='SQL*Net message from dblink' ela= 28448 driver id=1413697536 #bytes=1 p3=0 obj#=-1 tim=1495920340120720
WAIT #0: nam='SQL*Net message to dblink' ela= 0 driver id=1413697536 #bytes=1 p3=0 obj#=-1 tim=1495920340120854
WAIT #0: nam='SQL*Net message from dblink' ela= 30884 driver id=1413697536 #bytes=1 p3=0 obj#=-1 tim=1495920340151761
WAIT #0: nam='SQL*Net message to dblink' ela= 1 driver id=1413697536 #bytes=1 p3=0 obj#=-1 tim=1495920340151816
WAIT #0: nam='SQL*Net message from dblink' ela= 28590 driver id=1413697536 #bytes=1 p3=0 obj#=-1 tim=1495920340180420
WAIT #0: nam='SQL*Net message to client' ela= 2 driver id=1650815232 #bytes=1 p3=0 obj#=-1 tim=1495920340180491
WAIT #0: nam='SQL*Net message from client' ela= 207 driver id=1650815232 #bytes=1 p3=0 obj#=-1 tim=1495920340180713
WAIT #0: nam='SQL*Net message to dblink' ela= 0 driver id=1413697536 #bytes=1 p3=0 obj#=-1 tim=1495920340180787
WAIT #0: nam='SQL*Net message from dblink' ela= 28635 driver id=1413697536 #bytes=1 p3=0 obj#=-1 tim=1495920340209434
WAIT #0: nam='SQL*Net message to client' ela= 1 driver id=1650815232 #bytes=1 p3=0 obj#=-1 tim=1495920340209472
WAIT #0: nam='SQL*Net message from client' ela= 70 driver id=1650815232 #bytes=1 p3=0 obj#=-1 tim=1495920340209554
WAIT #0: nam='SQL*Net message to dblink' ela= 0 driver id=1413697536 #bytes=1 p3=0 obj#=-1 tim=1495920340209601
WAIT #0: nam='SQL*Net message from dblink' ela= 29357 driver id=1413697536 #bytes=1 p3=0 obj#=-1 tim=1495920340238969
WAIT #0: nam='SQL*Net more data from dblink' ela= 6 driver id=1413697536 #bytes=4 p3=0 obj#=-1 tim=1495920340239133
WAIT #0: nam='SQL*Net more data from dblink' ela= 37 driver id=1413697536 #bytes=1 p3=0 obj#=-1 tim=1495920340239196
WAIT #0: nam='SQL*Net more data from dblink' ela= 124 driver id=1413697536 #bytes=2 p3=0 obj#=-1 tim=1495920340239340
WAIT #0: nam='SQL*Net more data from dblink' ela= 37 driver id=1413697536 #bytes=4 p3=0 obj#=-1 tim=1495920340239409
WAIT #0: nam='SQL*Net more data from dblink' ela= 93 driver id=1413697536 #bytes=3 p3=0 obj#=-1 tim=1495920340239528
WAIT #0: nam='SQL*Net more data from dblink' ela= 79 driver id=1413697536 #bytes=1 p3=0 obj#=-1 tim=1495920340239632
WAIT #0: nam='SQL*Net more data from dblink' ela= 28330 driver id=1413697536 #bytes=2 p3=0 obj#=-1 tim=1495920340268062
WAIT #0: nam='SQL*Net more data from dblink' ela= 4 driver id=1413697536 #bytes=2 p3=0 obj#=-1 tim=1495920340268111
WAIT #0: nam='SQL*Net more data from dblink' ela= 168 driver id=1413697536 #bytes=1 p3=0 obj#=-1 tim=1495920340268297
WAIT #0: nam='SQL*Net more data from dblink' ela= 55 driver id=1413697536 #bytes=1 p3=0 obj#=-1 tim=1495920340268378
WAIT #0: nam='SQL*Net more data from dblink' ela= 69 driver id=1413697536 #bytes=4 p3=0 obj#=-1 tim=1495920340268483
WAIT #0: nam='SQL*Net more data from dblink' ela= 47 driver id=1413697536 #bytes=4 p3=0 obj#=-1 tim=1495920340268553
Hangs!...

After raising a SR with My Oracle Support (MOS) for the database we manage and for the one Oracle manage in the Oracle Cloud, it was concluded the package size (Session Data Unit) was larger then the allowable for a network component between the two databases.  They referred to MOS note:

Query on “bigtable” from remote Client hangs (certain queries or fields) (Doc ID 2104257.1)

“SYMPTOMS

Certain queries are hanging when run from some remote Clients.  However, other (smaller) queries are successful.

This is especially evident on queries that require more than 2kb of data to be returned.
Some examples:

select * from v$database; –> hangs
select count(*) from v$database; –> works

DESCRIBE with large data results –> hangs
DESCRIBE with small data results –> works

This might also show up with Database Links (DBLINK) as well.

CAUSE

A Network “security” device or setting (possibly local especially on a Microsoft Windows machine) is preventing or “altering” larger TCP packets from being transported across the network.
This in turn is causing the Client to wait on the Server for the data from the query, and the Server to wait on the Client (which thinks part of the packet is still on the way).

1. Check for settings like the DF (“Don’t Fragment”) bit being set.
2. Check for ALG SQL settings being enabled.

*Note: these causes are all external to Oracle so provided only as potential causes.

SOLUTION

Workaround
~~~~~~~~~
As a workaround (or test to prove this is or is not the issue) lower the SQL*Net SDU from the default size of 8192 to 1400 (see reference below for more details on this setting):

1. Add the following single line to the sqlnet.ora file on BOTH ends of the communication:
DEFAULT_SDU_SIZE = 1400

2. Restart the Listener(s) servicing the Database in question, make a new connection from the Client, and test the query that was hanging.

3. If this corrects the issue and allows queries to complete, then there is a network / system device or setting causing fragmentation, detention, or alteration of SQL packets mid-stream.”

Oracle Support set the SDU in the sqlnet.ora file on the target database server:
DEFAULT_SDU_SIZE = 1400

More info on DEFAULT_SDU_SIZE can be found here:

Database Net Services Reference -> 5 Parameters for the sqlnet.ora File -> DEFAULT_SDU_SIZE

And then restarted the listener.  I did the same for the source database server and then the session no longer hanged 🙂 :

SQL> select * from XXX_PER_PEOPLE_F@ebs;
...

1069 rows selected.

Elapsed: 00:00:02.57

Update

28/08/2018 The issue wasn’t resolved and the MTU of the network cards on both the source and target servers was also changed to 1400.  It could possibly been resolved by reducing the SDU further but it was decided to change the MTU on network cards.  In most cases the SDU change will fix the issue, but otherwise the MTU on network card can also resolve the issue as in this case.

If you found this blog post useful, please like as well as follow me through my various Social Media avenues available on the sidebar and/or subscribe to this oracle blog via WordPress/e-mail.

Thanks

Zed DBA (Zahid Anwar)

Stop password for user accounts expiring on Exadata

Depending on how the Oracle Exadata Machine was setup, the password for user accounts can expire thus requiring the password to be changed.

This has the knock on effect of the crontab not being accessible and more importantly jobs do not run:

[oracle@v1ex1dbadm01 ~]$ crontab -l

Authentication token is no longer valid; new one required
You (oracle) are not allowed to access to (crontab) because of pam configuration.
[oracle@v1ex1dbadm01 ~]$

You can check the pam configuration for the password expiry as shown below as the root user:

[root@v1ex1dbadm01 ~]# chage -l oracle
Last password change : Dec 11, 2017
Password expires : Mar 11, 2018
Password inactive : never
Account expires : never
Minimum number of days between password change : 1
Maximum number of days between password change : 90
Number of days of warning before password expires : 7
[root@v1ex1dbadm01 ~]#

We can see the password expired on the 11th March 2018, hence why the crontab jobs are not running since then.

To change, so the password doesn’t expire, use chage as shown below:

[root@v1ex1dbadm01 ~]# chage -d 9999 -E -1 -m 0 -M -1 oracle

The manual page for chage explains the switches:

-d, --lastday LAST_DAY
 Set the number of days since January 1st, 1970 when the password was last changed. The date may also be expressed in the format YYYY-MM-DD (or the format more commonly used in your area). If the LAST_DAY is set to 0 the user is forced to change his password on the next log on.

-E, --expiredate EXPIRE_DATE
 Set the date or number of days since January 1, 1970 on which the user´s account will no longer be accessible. The date may also be expressed in the format YYYY-MM-DD (or the format more commonly used in your area). A user whose account is locked must contact the system administrator before being able to use the system again.

Passing the number -1 as the EXPIRE_DATE will remove an account expiration date.

-m, --mindays MIN_DAYS
 Set the minimum number of days between password changes to MIN_DAYS. A value of zero for this field indicates that the user may change his/her password at any time.

-M, --maxdays MAX_DAYS
 Set the maximum number of days during which a password is valid. When MAX_DAYS plus LAST_DAY is less than the current day, the user will be required to change his/her password before being able to use his/her account. This occurrence can be planned for in advance by use of the -W option, which provides the user with advance warning.

Passing the number -1 as MAX_DAYS will remove checking a password´s validity.

Now, when re-checking the password expiry, you can see it’s changed to ‘never‘:

[root@v1ex1dbadm01 ~]# chage -l oracle
Last password change : May 01, 2008
Password expires : never
Password inactive : never
Account expires : never
Minimum number of days between password change : 0
Maximum number of days between password change : -1
Number of days of warning before password expires : 7
[root@v1ex1dbadm01 ~]#

And we didn’t need to change the password for the user and the crontab job work again 🙂

This doesn’t just apply to Exadata but to Linux.

See Related MOS Note:
Expiry of user accounts on Oracle Linux 5 (Doc ID 2327855.1)

If you found this blog post useful, please like as well as follow me through my various Social Media avenues available on the sidebar and/or subscribe to this oracle blog via WordPress/e-mail.

Thanks

Zed DBA (Zahid Anwar)

How to use the Oracle Exadata Diagnostics Collection Tool (sundiag.sh)

What is the Oracle Exadata Diagnostics Collection Tool sundiag.sh

Very often when creating a Support Request (SR) for an issue on an Oracle Exadata Database Machine, you’ll need to run the script “sundiag.sh“.  Which is the “Oracle Exadata Database Machine – Diagnostics Collection Tool“.

The tool collects a lot of diagnostics information that assist the support analyst in diagnosing your problem, such as failed hardware like a failed disk, etc.

More information can be found on My Oracle Support (MOS) Note:
SRDC – EEST Sundiag (Doc ID 1683842.1)
Oracle Exadata Diagnostic Information required for Disk Failures and some other Hardware issues (Doc ID 761868.1)

How to run the Diagnostics Collection Tool

To run “sundiag.sh“, is very simple as shown below:

[root@v1ex1dbadm01 ~]# /opt/oracle.SupportTools/sundiag.sh
Oracle Exadata Database Machine - Diagnostics Collection Tool
Gathering Linux information
Skipping collection of OSWatcher/ExaWatcher logs, Cell Metrics and Traces
Skipping ILOM collection. Use the ilom or snapshot options, or login to ILOM
over the network and run Snapshot separately if necessary.
/var/log/exadatatmp/sundiag_v1ex1dbadm01_xxxxxxxxxx_2018_01_17_13_49
Gathering dbms information
Generating diagnostics tarball and removing temp directory
==============================================================================
Done. The report files are bzip2 compressed in /var/log/exadatatmp/sundiag_v1ex1dbadm01_xxxxxxxxxx_2018_01_17_13_49.tar.bz2
==============================================================================
[root@v1ex1dbadm01 ~]#

For more advanced collections, use the option switches to override default behaviour as shown in the help:

[root@v1ex1dbadm01 ~]# /opt/oracle.SupportTools/sundiag.sh -h

Oracle Exadata Database Machine - Diagnostics Collection Tool

Version: 12.1.2.3.3.161109

By default sundiag will collect OSWatcher/ExaWatcher, Cell Metrics and traces,
if there was an alert in the last 7 days. If there is more than one alert, latest
alert is chosen to set the time range for data collection.
Time range is 8hrs prior to and 1hr after the latest alert, for the total of 9 hrs
e.g: latest alert timestamp = 2014-03-29T01:20:04-05:00
 echo Time range = 2014-03-28_16:00:00 and 2014-03-29_01:00:00
User can also specify time ranges (as explained in usage below), which takes
precedence over default behavior of checking for alerts

Usage: /opt/oracle.SupportTools/sundiag.sh [ilom | snapshot] [osw <time ranges>]
 osw - This argument when used expects value of one or more comma separated
 time ranges. OSWatcher/ExaWatcher, cell metrics and traces will be gathered
 in those time ranges.
 The format for time range(s) is <from>-<to>,<from>-<to> and so on without spaces
 where <from> and <to> format is <date>_<time>
 <date> and <time> format should be any valid format that can be recognized by
 'date' command. The command 'date -d <date>' or 'date -d <time>' should be valid
 e.g: /opt/oracle.SupportTools/sundiag.sh osw 2014/03/31_15:00:00-2014/03/31_18:00:00
 Note: Total time range should not exceed 9 hrs. Only the time ranges that
 fall within this limit are considered for the collection of above data
 ilom - User level ILOM data gathering option via ipmitool, in place of
 separately using root login to get ILOM snapshot over the network.
 snapshot - Collects node ILOM snapshot- requires host root password for ILOM
 to send snapshot data over the network.
[root@v1ex1dbadm01 ~]#

Then just upload the bzip2 file to your SR on MOS.

I tend to run this as part of my SR creation and upload to save time.

If you found this blog post useful, please like as well as follow me through my various Social Media avenues available on the sidebar and/or subscribe to this oracle blog via WordPress/e-mail.

Thanks

Zed DBA (Zahid Anwar)

Install Oracle’s VirtualBox

If you don’t have the luxury of having a server readily available but you want to do some research and development or training on Oracle, then Oracle’s VirtualBox is a perfect solution.

Prerequisites

To be able to use VirtualBox, you need to disable Hyper-V as it blocks all other Hyper Visors from calling VT hardware.  See my post blog on how to do this:
Disabling Microsoft’s Hyper-V to use Oracle’s VirtualBox

Download

You can download the latest VirtualBox from:
https://www.virtualbox.org/
https://www.virtualbox.org/wiki/Downloads

At the time of writing this blog, the latest is:
http://download.virtualbox.org/virtualbox/5.1.28/VirtualBox-5.1.28-117968-Win.exe

Install VirtualBox

To install, launch the VirtualBox executable i.e. VirtualBox-5.1.28-117968-Win.exe with a user with admin rights:

VirtualBox_Install_Step1

Click ‘Next‘.

VirtualBox_Install_Step2

Click ‘Next‘.

VirtualBox_Install_Step3

Change options as you wish, otherwise click ‘Next‘.

VirtualBox_Install_Step4

Accept the warning and click ‘Yes‘.

VirtualBox_Install_Step5

Click ‘Install‘.

VirtualBox_Install_Step6

You will see the progress, wait until you see the following:

VirtualBox_Install_Step7

Click ‘Install‘.

VirtualBox_Install_Step8

Once finished, click ‘Finish‘ and VirtualBox will load:

VirtualBox_Install_Step9

If you found this blog post useful, please like as well as follow me through my various Social Media avenues available on the sidebar and/or subscribe to this oracle blog via WordPress/e-mail.

Thanks

Zed DBA (Zahid Anwar)