Prevent crontab jobs overlapping using flock

As an Oracle DBA, you may find yourself in the situation where you have crontab jobs overlapping.

For example an RMAN backup takes longer then normal, then overlaps with another RMAN database backup leading to more resources being consumed:

00 00 * * * /home/oracle/scripts/backups/db_backup.sh PROD1 >> /home/oracle/backups/logs/db_backup_PROD1.log 2>&1
00 01 * * * /home/oracle/scripts/backups/db_backup.sh PROD2 >> /home/oracle/backups/logs/db_backup_PROD2.log 2>&1

If the PROD1 backup takes longer then 1 hour, then it will contend with the PROD2 backup when it starts.

Another more recent example for myself, is when I was copying archive logs to AWS for an Oracle Database Standard Edition migration using a manual standby, thus needed to manually transfer archive logs using rsync every 15 minutes:

0,15,30,45 * * * * /home/oracle/copy_Arch_to_AWS.sh > /home/oracle/copy_Arch_to_AWS.log

Ran fine most the time but when there was significant churn in the database, the crontab job would overlap causing several rsync 😦

The easiest solution is to wrap the crontab job in flock 🙂

Using flock

flock is a linux utility that can uses a lock file to determine if the process is already running.  The syntax I use is:

flock -x <lockfile> -c '<command>'

The “-x” is to obtain exclude lock and the “-c” is the command to run.

Flock examples

Backup example

00 00 * * * flock -x /home/oracle/scripts/backups/backup.lock -c '/home/oracle/scripts/backups/db_backup.sh PROD1 >> /home/oracle/backups/logs/db_backup_PROD1.log 2>&1'
00 01 * * * flock -x /home/oracle/scripts/backups/backup.lock -c '/home/oracle/scripts/backups/db_backup.sh PROD2 >> /home/oracle/backups/logs/db_backup_PROD2.log 2>&1'

Now when the backup for PROD2 starts flock will check for the lock and will see if exist and will not run the command until the backup for PROD1 is completed 🙂

Archive log copy example

0,15,30,45 * * * * flock -x /home/oracle/copy_Arch_to_AWS.lock -c '/home/oracle/copy_Arch_to_AWS.sh > /home/oracle/copy_Arch_to_AWS.log'

Now when the job runs, an exclusive lock is taken an hence when it runs again in 15 minutes if there an existing run, then it will not run the command until the previous one is completed 🙂  This will essentially queue the copies instead of them overlapping causing several rsync, which just exacerbate the issue.

Advance use of flock

Timeout

You can add “-w <seconds>“, which is the amount of time to wait for exclusive lock before exiting without running command. for example:

0,15,30,45 * * * * flock -w 300 -x /home/oracle/copy_Arch_to_AWS.lock -c '/home/oracle/copy_Arch_to_AWS.sh > /home/oracle/copy_Arch_to_AWS.log'

Now flock will wait 5 minutes for the previous archive log copy job to complete before exiting without running the command for that run 🙂

Viewing lock

If you want to see the lock taken by flock, you can run :

[oracle@dc1sbxdb001 ~]$ fuser -v /home/oracle/copy_Arch_to_AWS.lock
                                    USER   PID    ACCESS COMMAND
/home/oracle/copy_Arch_to_AWS.lock: oracle 341039 f....  flock
                                    oracle 341040 f....  rsync

 

If you found this blog post useful, please like as well as follow me through my various Social Media avenues available on the sidebar and/or subscribe to this oracle blog via WordPress/e-mail.

Thanks

Zed DBA (Zahid Anwar)

How to create an sosreport on Oracle Linux

When creating a SR for an issues on Oracle Linux, for example in an Exadata environment, you are quite often enough asked to run an sosreport.

What is sosreport?

“The “sosreport” is a tool to collect troubleshooting data on an Oracle Linux system. It generates a compressed tarball of debugging information that gives an overview of the most important logs and configuration of a Linux system, to be sent to Oracle Support.

Among other things, the sosreport includes information about the installed rpm versions, syslog, network configuration, mounted filesystems, disk partition details, loaded kernel modules and status of all services

It has a plugin-based architecture that enables features to be enabled or disabled, and additional functionality added.”

How To Collect Sosreport on Oracle Linux (Doc ID 1500235.1)

Why support needs sosreport?

“The sosreport collects system information from an Oracle Linux system by capturing various log files, configuration files and command outputs that helps in diagnosing a problem faster.

Since this collects most of the commonly sort information while troubleshooting problems, collecting a sosreport helps in reducing the number of iterations of data request from the customer.

The logs, configuration files and related command outputs provides a better picture about the system environment and thus it is very helpful for cases about Root cause analysis and on going issues.

The sosreport helps the support to identify configuration errors and make proactive recommendations too.”

How To Collect Sosreport on Oracle Linux (Doc ID 1500235.1)

How to use

To use, simple run “sosreport”:

[root@v1ex1dbadm01 ~]# sosreport

sosreport (version 3.2)

This command will collect diagnostic and configuration information from
this Oracle Linux system and installed applications.

An archive containing the collected information will be generated in
/tmp/sos.9gvK0N and may be provided to a Oracle USA support
representative.

Any information provided to Oracle USA will be treated in accordance
with the published support policies at:

http://linux.oracle.com/

The generated archive may contain data considered sensitive and its
content should be reviewed by the originating organization before being
passed to any third party.

No changes will be made to system configuration.

Press ENTER to continue, or CTRL-C to quit.

Please enter your first initial and last name [v1ex1dbadm01.v1.com]: Z Anwar
Please enter the case id that you are generating this report for []: 3-XXXXXXX1234

Setting up archive ...
Setting up plugins ...
Running plugins. Please wait ...

Running 70/70: xfs...
Creating compressed archive...

Your sosreport has been generated and saved in:
/tmp/sosreport-ZAnwar.3-XXXXXXX1234-20181004103417.tar.xz

The checksum is: 04d1a2b728216ba79df6cc38f801de6d

Please send this file to your support representative.

[root@v1ex1dbadm01 ~]#

You will then have a tar file at the end, which you can upload to your SR for your support engineer to analysis.

If you don’t have sosreport installed, then install the sos package:

[root@v1ex1dbadm01 ~]# yum install sos

References

More info, can be found in the following MOS note:
  • How To Collect Sosreport on Oracle Linux (Doc ID 1500235.1)
  • SRDC – How To Collect Sosreport on Oracle Linux and Oracle VM (Doc ID 1928183.1)

Related Posts

 

If you found this blog post useful, please like as well as follow me through my various Social Media avenues available on the sidebar and/or subscribe to this oracle blog via WordPress/e-mail.

Thanks

Zed DBA (Zahid Anwar)

Extending the root LVM Partition on Exadata

On an Oracle Exadata Database Machine, the ‘/’ (root) is defaulted to a size of 30Gb, which can easily fill up.  Luckily this is just a Logical Volume and there’s normally lots of space available on the Logical Volume Group which is usually untapped.

Extending ‘/’

Identify how much space is used and free on ‘/’ using df:

[root@v1ex1dbadm01 ~]# df -h /
Filesystem            Size  Used Avail Use% Mounted on
/dev/mapper/VGExaDb-LVDbSys1
                       30G   22G  6.2G  79% /
[root@v1ex1dbadm01 ~]#

Display the current logical volume configuration using the lvs command:

[root@v1ex1dbadm01 ~]# lvs -o lv_name,lv_path,vg_name,lv_size
  LV                 Path                            VG      LSize
  LVDbOra1           /dev/VGExaDb/LVDbOra1           VGExaDb 200.00g
  LVDbSwap1          /dev/VGExaDb/LVDbSwap1          VGExaDb  24.00g
  LVDbSys1           /dev/VGExaDb/LVDbSys1           VGExaDb  30.00g
  LVDbSys2           /dev/VGExaDb/LVDbSys2           VGExaDb  30.00g
  LVDoNotRemoveOrUse /dev/VGExaDb/LVDoNotRemoveOrUse VGExaDb   1.00g
[root@v1ex1dbadm01 ~]#

PLEASE NOTE: On Exadata there are 2 SYS volumes, of which one is active and the other inactive.  These are used when patching the compute node, as one is a backup of the current and is used for rollback purposes.

Check the online resize option is available using the tune2fs command:

[root@v1ex1dbadm01 ~]# tune2fs -l /dev/VGExaDb/LVDbSys1 | grep resize_inode
Filesystem features:      has_journal ext_attr resize_inode dir_index filetype needs_recovery extent flex_bg sparse_super large_file huge_file uninit_bg dir_nlink extra_isize
[root@v1ex1dbadm01 ~]# tune2fs -l /dev/VGExaDb/LVDbSys2 | grep resize_inode
Filesystem features:      has_journal ext_attr resize_inode dir_index filetype needs_recovery extent flex_bg sparse_super large_file huge_file uninit_bg dir_nlink extra_isize
[root@v1ex1dbadm01 ~]#

If not available then the file system needs to be unmounted before resizing.  Refer to documentation:

Extending the root LVM Partition on Systems Running Oracle Exadata System Software Earlier than Release 11.2.3.2.1

Verify there’s space available in the Logical Volume Group using vgdisplay command:

[root@v1ex1dbadm01 ~]# vgdisplay -s
  "VGExaDb" 1.63 TiB  [285.00 GiB used / 1.36 TiB free]
[root@v1ex1dbadm01 ~]#

Finally if there’s enough space, then extend the Logical Volumes using lvextend command:

[root@v1ex1dbadm01 ~]# lvextend -L +70G /dev/VGExaDb/LVDbSys1
  Size of logical volume VGExaDb/LVDbSys1 changed from 30.00 GiB (7680 extents) to 100.00 GiB (25600 extents).
  Logical volume LVDbSys1 successfully resized.
[root@v1ex1dbadm01 ~]#
[root@v1ex1dbadm01 ~]# lvextend -L +70G /dev/VGExaDb/LVDbSys2
  Size of logical volume VGExaDb/LVDbSys2 changed from 30.00 GiB (7680 extents) to 100.00 GiB (25600 extents).
  Logical volume LVDbSys2 successfully resized.
[root@v1ex1dbadm01 ~]#

Followed by a resize of the file system using resize2fs command:

[root@v1ex1dbadm01 ~]# resize2fs /dev/VGExaDb/LVDbSys1
resize2fs 1.43-WIP (20-Jun-2013)
Filesystem at /dev/VGExaDb/LVDbSys1 is mounted on /; on-line resizing required
old_desc_blocks = 2, new_desc_blocks = 7
Performing an on-line resize of /dev/VGExaDb/LVDbSys1 to 26214400 (4k) blocks.
The filesystem on /dev/VGExaDb/LVDbSys1 is now 26214400 blocks long.
[root@v1ex1dbadm01 ~]#

The inactive SYS volume may give you errors as shown below:

[root@v1ex1dbadm01 ~]# resize2fs /dev/VGExaDb/LVDbSys2
resize2fs 1.43-WIP (20-Jun-2013)
Please run 'e2fsck -f /dev/VGExaDb/LVDbSys2' first.
[root@v1ex1dbadm01 ~]#

In which case, you just need to run the command to check the file system for error that may have occurred with journal-ling after unclear shutdown:

[root@v1ex1dbadm01 ~]# e2fsck -f /dev/VGExaDb/LVDbSys2
e2fsck 1.43-WIP (20-Jun-2013)
Pass 1: Checking inodes, blocks, and sizes
Pass 2: Checking directory structure
Pass 3: Checking directory connectivity
Pass 4: Checking reference counts
Pass 5: Checking group summary information
/dev/VGExaDb/LVDbSys2: 111629/1966080 files (0.1% non-contiguous), 5031185/7864320 blocks
[root@v1ex1dbadm01 ~]#

Now re-run the resize of the file system using resize2fs command:

[root@v1ex1dbadm01 ~]# resize2fs /dev/VGExaDb/LVDbSys2
resize2fs 1.43-WIP (20-Jun-2013)
Resizing the filesystem on /dev/VGExaDb/LVDbSys2 to 26214400 (4k) blocks.
The filesystem on /dev/VGExaDb/LVDbSys2 is now 26214400 blocks long.
[root@v1ex1dbadm01 ~]#

You should now see ‘/’ with additional 70Gb less formatting:

[root@v1ex1dbadm01 ~]# df -h /
Filesystem            Size  Used Avail Use% Mounted on
/dev/mapper/VGExaDb-LVDbSys1
                       99G   22G   72G  24% /
[root@v1ex1dbadm01 ~]#

Also see the Logical Volumes are now 100Gb from 30Gb:

[root@v1ex1dbadm01 ~]# lvs -o lv_name,lv_path,vg_name,lv_size
  LV                 Path                            VG      LSize
  LVDbOra1           /dev/VGExaDb/LVDbOra1           VGExaDb 200.00g
  LVDbSwap1          /dev/VGExaDb/LVDbSwap1          VGExaDb  24.00g
  LVDbSys1           /dev/VGExaDb/LVDbSys1           VGExaDb 100.00g
  LVDbSys2           /dev/VGExaDb/LVDbSys2           VGExaDb 100.00g
  LVDoNotRemoveOrUse /dev/VGExaDb/LVDoNotRemoveOrUse VGExaDb   1.00g
[root@v1ex1dbadm01 ~]#

Documentation for reference:
Extending the root LVM Partition on Systems Running Oracle Exadata System Software Release 11.2.3.2.1 or Later

Related Post:
Extending a Non-root LVM Partition on Exadata

If you found this blog post useful, please like as well as follow me through my various Social Media avenues available on the sidebar and/or subscribe to this oracle blog via WordPress/e-mail.

Thanks

Zed DBA (Zahid Anwar)

Oracle Database Backup Service Fails with: ORA-19511: – KBHS-00715: HTTP error occurred ‘oracle-error’ – ORA-29024

I discovered an Oracle Cloud Database Backup failing with:

Starting backup at 2018/09/08 20:00:04
current log archived
RMAN-00571: ===========================================================
RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS ===============
RMAN-00571: ===========================================================
RMAN-03002: failure of backup plus archivelog command at 09/08/2018 20:00:07
ORA-19554: error allocating device, device type: SBT_TAPE, device name:
ORA-27023: skgfqsbi: media manager protocol error
ORA-19511: non RMAN, but media manager or vendor specific failure, error text:
   KBHS-00715: HTTP error occurred 'oracle-error'
KBHS-00712: ORA-29024 received from local HTTP service

Recovery Manager complete.

Upon investigation I found the following metalink note:
RMAN Backup to Oracle Database Backup Cloud Service fails with KBHS-00715 ORA-29024 (Doc ID 2360941.1)

The “ORA-29024” is raised due to incorrect certificate chain.  This issue was investigated in Bug 27402663, however no fix is needed.  The later library versions will by default include the trusted certificate workaround for the issue.

So the solution is to re-install the cloud backup module with the “-trustedCerts” option:

[oracle@V1LOEM ~]$ cd /u01/oracle_stage/cloud/
[oracle@V1LOEM cloud]$ . oraenv
ORACLE_SID = [oracle] ? EMREPOS
The Oracle base for ORACLE_HOME=/u01/app/oracle/product/12.1.0/dbhome_1 is /u01/app/oracle
[oracle@V1LOEM cloud]$ java -jar opc_install.jar -host https://em2.storage.oraclecloud.com/v1/Storage-aXXX -opcId 'oraclecloudbackup@version1.com' -opcPass 'xxx' -walletDir '/u01/oracle/opc_wallet' -libDir $ORACLE_HOME/lib -debug -trustedCerts
Oracle Database Cloud Backup Module Install Tool, build 2017-05-04
Debug: os.name        = Linux
Debug: os.arch        = amd64
Debug: os.version     = 3.8.13-98.1.2.el6uek.x86_64
Debug: file.separator = /
Debug: Platform = PLATFORM_LINUX64
Debug: OPC Account Verification: <?xml version="1.0" encoding="UTF-8" standalone="yes"?><account name="Storage-aXXX"><container><name>oracle-data-storagea-1</name><count>944</count><bytes>52740693928</bytes><accountId><id>XXX</id></accountId><deleteTimestamp>0.0</deleteTimestamp><containerId><id>XXX</id></containerId></container></account>
Oracle Database Cloud Backup Module credentials are valid.
Debug: Certificate Success:
       Subject  : CN=DigiCert Global Root CA, OU=www.digicert.com, O=DigiCert Inc, C=US
       Validity : Fri Nov 10 00:00:00 GMT 2006 - Mon Nov 10 00:00:00 GMT 2031
       Issuer   : CN=DigiCert Global Root CA, OU=www.digicert.com, O=DigiCert Inc, C=US
Oracle Database Cloud Backup Module wallet created in directory /u01/oracle/opc_wallet.
Oracle Database Cloud Backup Module initialization file /u01/app/oracle/product/12.1.0/dbhome_1/dbs/opcEMREPOS.ora created.
Downloading Oracle Database Cloud Backup Module Software Library from file opc_linux64.zip.
Debug: Temp zip file = /tmp/opc_linux648852138548086808899.zip
Debug: Downloaded 27342262 bytes in 13 seconds.
Debug: Transfer rate was 2103250 bytes/second.
Download complete.
Debug: Delete RC = true

Now test the latest Cloud Backup Module:

[oracle@V1LOEM cloud]$ rman target /

Recovery Manager: Release 12.1.0.2.0 - Production on Mon Sep 10 13:10:17 2018

Copyright (c) 1982, 2014, Oracle and/or its affiliates.  All rights reserved.

connected to target database: EMREPOS (DBID=XXX)

RMAN> delete obsolete recovery window of 8 days device type sbt;

using target database control file instead of recovery catalog
allocated channel: ORA_SBT_TAPE_1
channel ORA_SBT_TAPE_1: SID=794 device type=SBT_TAPE
channel ORA_SBT_TAPE_1: Oracle Database Backup Service Library VER=12.2.0.2
allocated channel: ORA_SBT_TAPE_2
channel ORA_SBT_TAPE_2: SID=785 device type=SBT_TAPE
channel ORA_SBT_TAPE_2: Oracle Database Backup Service Library VER=12.2.0.2
allocated channel: ORA_SBT_TAPE_3
channel ORA_SBT_TAPE_3: SID=407 device type=SBT_TAPE
channel ORA_SBT_TAPE_3: Oracle Database Backup Service Library VER=12.2.0.2
allocated channel: ORA_SBT_TAPE_4
channel ORA_SBT_TAPE_4: SID=1169 device type=SBT_TAPE
channel ORA_SBT_TAPE_4: Oracle Database Backup Service Library VER=12.2.0.2
allocated channel: ORA_SBT_TAPE_5
channel ORA_SBT_TAPE_5: SID=416 device type=SBT_TAPE
channel ORA_SBT_TAPE_5: Oracle Database Backup Service Library VER=12.2.0.2
allocated channel: ORA_SBT_TAPE_6
channel ORA_SBT_TAPE_6: SID=1167 device type=SBT_TAPE
channel ORA_SBT_TAPE_6: Oracle Database Backup Service Library VER=12.2.0.2
allocated channel: ORA_SBT_TAPE_7
channel ORA_SBT_TAPE_7: SID=782 device type=SBT_TAPE
channel ORA_SBT_TAPE_7: Oracle Database Backup Service Library VER=12.2.0.2
allocated channel: ORA_SBT_TAPE_8
channel ORA_SBT_TAPE_8: SID=1164 device type=SBT_TAPE
channel ORA_SBT_TAPE_8: Oracle Database Backup Service Library VER=12.2.0.2
Deleting the following obsolete backups and copies:
Type                 Key    Completion Time    Filename/Handle
-------------------- ------ ------------------ --------------------
Backup Set           18192  30-AUG-18
  Backup Piece       18192  30-AUG-18          pptbsjl2_1_1
...
  Backup Piece       18213  01-SEP-18          qutc1sae_1_1

Do you really want to delete the above objects (enter YES or NO)? no

RMAN> exit

Recovery Manager complete.
[oracle@V1LOEM cloud]$

Everything working again 🙂

Now you have the latest library version, by default you have the workaround and can now omit the -trustedCerts option:

[oracle@V1LOEM cloud]$ java -jar opc_install.jar -host https://em2.storage.oraclecloud.com/v1/Storage-aXXX -opcId 'oraclecloudbackup@version1.com' -opcPass 'xxx' -walletDir '/u01/oracle/opc_wallet' -libDir $ORACLE_HOME/lib -debug
Oracle Database Cloud Backup Module Install Tool, build 2017-05-04
Debug: os.name = Linux
Debug: os.arch = amd64
Debug: os.version = 3.8.13-98.1.2.el6uek.x86_64
Debug: file.separator = /
Debug: Platform = PLATFORM_LINUX64
Debug: OPC Account Verification: <?xml version="1.0" encoding="UTF-8" standalone="yes"?><account name="Storage-aXXX"><container><name>oracle-data-storagea-1</name><count>944</count><bytes>52740693928</bytes><accountId><id>XXX</id></accountId><deleteTimestamp>0.0</deleteTimestamp><containerId><id>XXX</id></containerId></container></account>
Oracle Database Cloud Backup Module credentials are valid.
Debug: Certificate Success:
Subject : CN=DigiCert Global Root CA, OU=www.digicert.com, O=DigiCert Inc, C=US
Validity : Fri Nov 10 00:00:00 GMT 2006 - Mon Nov 10 00:00:00 GMT 2031
Issuer : CN=DigiCert Global Root CA, OU=www.digicert.com, O=DigiCert Inc, C=US
Oracle Database Cloud Backup Module wallet created in directory /u01/oracle/opc_wallet.
Oracle Database Cloud Backup Module initialization file /u01/app/oracle/product/12.1.0/dbhome_1/dbs/opcEMREPOS.ora created.
Downloading Oracle Database Cloud Backup Module Software Library from file opc_linux64.zip.
Debug: Temp zip file = /tmp/opc_linux647954567264150845107.zip
Debug: Downloaded 27342262 bytes in 15 seconds.
Debug: Transfer rate was 1822817 bytes/second.
Download complete.
Debug: Delete RC = true
[oracle@V1LOEM cloud]$

Now test again, to ensure still working:

[oracle@V1LOEM cloud]$ rman target /

Recovery Manager: Release 12.1.0.2.0 - Production on Mon Sep 10 13:12:45 2018

Copyright (c) 1982, 2014, Oracle and/or its affiliates. All rights reserved.

connected to target database: EMREPOS (DBID=XXX)

RMAN> delete obsolete recovery window of 8 days device type sbt;

using target database control file instead of recovery catalog
allocated channel: ORA_SBT_TAPE_1
channel ORA_SBT_TAPE_1: SID=794 device type=SBT_TAPE
channel ORA_SBT_TAPE_1: Oracle Database Backup Service Library VER=12.2.0.2
allocated channel: ORA_SBT_TAPE_2
channel ORA_SBT_TAPE_2: SID=785 device type=SBT_TAPE
channel ORA_SBT_TAPE_2: Oracle Database Backup Service Library VER=12.2.0.2
allocated channel: ORA_SBT_TAPE_3
channel ORA_SBT_TAPE_3: SID=407 device type=SBT_TAPE
channel ORA_SBT_TAPE_3: Oracle Database Backup Service Library VER=12.2.0.2
allocated channel: ORA_SBT_TAPE_4
channel ORA_SBT_TAPE_4: SID=1169 device type=SBT_TAPE
channel ORA_SBT_TAPE_4: Oracle Database Backup Service Library VER=12.2.0.2
allocated channel: ORA_SBT_TAPE_5
channel ORA_SBT_TAPE_5: SID=416 device type=SBT_TAPE
channel ORA_SBT_TAPE_5: Oracle Database Backup Service Library VER=12.2.0.2
allocated channel: ORA_SBT_TAPE_6
channel ORA_SBT_TAPE_6: SID=1167 device type=SBT_TAPE
channel ORA_SBT_TAPE_6: Oracle Database Backup Service Library VER=12.2.0.2
allocated channel: ORA_SBT_TAPE_7
channel ORA_SBT_TAPE_7: SID=782 device type=SBT_TAPE
channel ORA_SBT_TAPE_7: Oracle Database Backup Service Library VER=12.2.0.2
allocated channel: ORA_SBT_TAPE_8
channel ORA_SBT_TAPE_8: SID=1164 device type=SBT_TAPE
channel ORA_SBT_TAPE_8: Oracle Database Backup Service Library VER=12.2.0.2
Deleting the following obsolete backups and copies:
Type Key Completion Time Filename/Handle
-------------------- ------ ------------------ --------------------
Backup Set 18192 30-AUG-18
Backup Piece 18192 30-AUG-18 pptbsjl2_1_1
...
Backup Piece 18213 01-SEP-18 qutc1sae_1_1

Do you really want to delete the above objects (enter YES or NO)? no

RMAN> exit

Recovery Manager complete.
[oracle@V1LOEM cloud]$

Related Posts

Oracle Database Backup Service Fails with: ORA-19511: – KBHS-00715: HTTP error occurred ‘oracle-error’ – ORA-28750

If you found this blog post useful, please like as well as follow me through my various Social Media avenues available on the sidebar and/or subscribe to this oracle blog via WordPress/e-mail.

Thanks

Zed DBA (Zahid Anwar)

Stop password for user accounts expiring on Exadata

Depending on how the Oracle Exadata Machine was setup, the password for user accounts can expire thus requiring the password to be changed.

This has the knock on effect of the crontab not being accessible and more importantly jobs do not run:

[oracle@v1ex1dbadm01 ~]$ crontab -l

Authentication token is no longer valid; new one required
You (oracle) are not allowed to access to (crontab) because of pam configuration.
[oracle@v1ex1dbadm01 ~]$

You can check the pam configuration for the password expiry as shown below as the root user:

[root@v1ex1dbadm01 ~]# chage -l oracle
Last password change : Dec 11, 2017
Password expires : Mar 11, 2018
Password inactive : never
Account expires : never
Minimum number of days between password change : 1
Maximum number of days between password change : 90
Number of days of warning before password expires : 7
[root@v1ex1dbadm01 ~]#

We can see the password expired on the 11th March 2018, hence why the crontab jobs are not running since then.

To change, so the password doesn’t expire, use chage as shown below:

[root@v1ex1dbadm01 ~]# chage -d 9999 -E -1 -m 0 -M -1 oracle

The manual page for chage explains the switches:

-d, --lastday LAST_DAY
 Set the number of days since January 1st, 1970 when the password was last changed. The date may also be expressed in the format YYYY-MM-DD (or the format more commonly used in your area). If the LAST_DAY is set to 0 the user is forced to change his password on the next log on.

-E, --expiredate EXPIRE_DATE
 Set the date or number of days since January 1, 1970 on which the user´s account will no longer be accessible. The date may also be expressed in the format YYYY-MM-DD (or the format more commonly used in your area). A user whose account is locked must contact the system administrator before being able to use the system again.

Passing the number -1 as the EXPIRE_DATE will remove an account expiration date.

-m, --mindays MIN_DAYS
 Set the minimum number of days between password changes to MIN_DAYS. A value of zero for this field indicates that the user may change his/her password at any time.

-M, --maxdays MAX_DAYS
 Set the maximum number of days during which a password is valid. When MAX_DAYS plus LAST_DAY is less than the current day, the user will be required to change his/her password before being able to use his/her account. This occurrence can be planned for in advance by use of the -W option, which provides the user with advance warning.

Passing the number -1 as MAX_DAYS will remove checking a password´s validity.

Now, when re-checking the password expiry, you can see it’s changed to ‘never‘:

[root@v1ex1dbadm01 ~]# chage -l oracle
Last password change : May 01, 2008
Password expires : never
Password inactive : never
Account expires : never
Minimum number of days between password change : 0
Maximum number of days between password change : -1
Number of days of warning before password expires : 7
[root@v1ex1dbadm01 ~]#

And we didn’t need to change the password for the user and the crontab job work again 🙂

This doesn’t just apply to Exadata but to Linux.

See Related MOS Note:
Expiry of user accounts on Oracle Linux 5 (Doc ID 2327855.1)

If you found this blog post useful, please like as well as follow me through my various Social Media avenues available on the sidebar and/or subscribe to this oracle blog via WordPress/e-mail.

Thanks

Zed DBA (Zahid Anwar)

How to use the Oracle Exadata Diagnostics Collection Tool (sundiag.sh)

What is the Oracle Exadata Diagnostics Collection Tool sundiag.sh

Very often when creating a Support Request (SR) for an issue on an Oracle Exadata Database Machine, you’ll need to run the script “sundiag.sh“.  Which is the “Oracle Exadata Database Machine – Diagnostics Collection Tool“.

The tool collects a lot of diagnostics information that assist the support analyst in diagnosing your problem, such as failed hardware like a failed disk, etc.

More information can be found on My Oracle Support (MOS) Note:
SRDC – EEST Sundiag (Doc ID 1683842.1)
Oracle Exadata Diagnostic Information required for Disk Failures and some other Hardware issues (Doc ID 761868.1)

How to run the Diagnostics Collection Tool

To run “sundiag.sh“, is very simple as shown below:

[root@v1ex1dbadm01 ~]# /opt/oracle.SupportTools/sundiag.sh
Oracle Exadata Database Machine - Diagnostics Collection Tool
Gathering Linux information
Skipping collection of OSWatcher/ExaWatcher logs, Cell Metrics and Traces
Skipping ILOM collection. Use the ilom or snapshot options, or login to ILOM
over the network and run Snapshot separately if necessary.
/var/log/exadatatmp/sundiag_v1ex1dbadm01_xxxxxxxxxx_2018_01_17_13_49
Gathering dbms information
Generating diagnostics tarball and removing temp directory
==============================================================================
Done. The report files are bzip2 compressed in /var/log/exadatatmp/sundiag_v1ex1dbadm01_xxxxxxxxxx_2018_01_17_13_49.tar.bz2
==============================================================================
[root@v1ex1dbadm01 ~]#

For more advanced collections, use the option switches to override default behaviour as shown in the help:

[root@v1ex1dbadm01 ~]# /opt/oracle.SupportTools/sundiag.sh -h

Oracle Exadata Database Machine - Diagnostics Collection Tool

Version: 12.1.2.3.3.161109

By default sundiag will collect OSWatcher/ExaWatcher, Cell Metrics and traces,
if there was an alert in the last 7 days. If there is more than one alert, latest
alert is chosen to set the time range for data collection.
Time range is 8hrs prior to and 1hr after the latest alert, for the total of 9 hrs
e.g: latest alert timestamp = 2014-03-29T01:20:04-05:00
 echo Time range = 2014-03-28_16:00:00 and 2014-03-29_01:00:00
User can also specify time ranges (as explained in usage below), which takes
precedence over default behavior of checking for alerts

Usage: /opt/oracle.SupportTools/sundiag.sh [ilom | snapshot] [osw <time ranges>]
 osw - This argument when used expects value of one or more comma separated
 time ranges. OSWatcher/ExaWatcher, cell metrics and traces will be gathered
 in those time ranges.
 The format for time range(s) is <from>-<to>,<from>-<to> and so on without spaces
 where <from> and <to> format is <date>_<time>
 <date> and <time> format should be any valid format that can be recognized by
 'date' command. The command 'date -d <date>' or 'date -d <time>' should be valid
 e.g: /opt/oracle.SupportTools/sundiag.sh osw 2014/03/31_15:00:00-2014/03/31_18:00:00
 Note: Total time range should not exceed 9 hrs. Only the time ranges that
 fall within this limit are considered for the collection of above data
 ilom - User level ILOM data gathering option via ipmitool, in place of
 separately using root login to get ILOM snapshot over the network.
 snapshot - Collects node ILOM snapshot- requires host root password for ILOM
 to send snapshot data over the network.
[root@v1ex1dbadm01 ~]#

Then just upload the bzip2 file to your SR on MOS.

I tend to run this as part of my SR creation and upload to save time.

If you found this blog post useful, please like as well as follow me through my various Social Media avenues available on the sidebar and/or subscribe to this oracle blog via WordPress/e-mail.

Thanks

Zed DBA (Zahid Anwar)

How to update OPatch

When applying patches, such as PSUs or one-offs, you may need to update OPatch to meet the minimum OPatch version.  It is also recommended to update OPatch when applying any patch.

To see your current OPatch version:

[oracle@v1ex1dbadm01 ~]$ export ORACLE_HOME=/u01/app/oracle/product/12.1.0.2/dbhome_1
[oracle@v1ex1dbadm01 ~]$ $ORACLE_HOME/OPatch/opatch version
OPatch Version: 12.1.0.1.3

OPatch succeeded.
[oracle@v1ex1dbadm01 ~]$

Backup existing OPatch:

[oracle@v1ex1dbadm01 ~]$ cd $ORACLE_HOME
[oracle@v1ex1dbadm01 dbhome_1]$ tar -cvf OPatch_backup.tar OPatch/*
OPatch/datapatch
OPatch/datapatch.bat
OPatch/docs/
...
OPatch/oplan/README.txt
OPatch/oplan/README.html
OPatch/oplan/oplan
[oracle@v1ex1dbadm01 dbhome_1]$

Check the backup of OPatch:

[oracle@v1ex1dbadm01 dbhome_1]$ ls -lh | grep OPatch_backup.tar
-rw-rw-r--. 1 oracle oracle 6.7M Aug 29 11:51 OPatch_backup.tar
[oracle@v1ex1dbadm01 dbhome_1]$

Remove the existing OPatch:

[oracle@v1ex1dbadm01 dbhome_1]$ rm -rf OPatch

Unzip the latest OPatch:

[oracle@v1ex1dbadm01 dbhome_1]$ unzip -d $ORACLE_HOME ~/sw/p6880880_122010_Linux-x86-64.zip
Archive: /home/oracle/sw/p6880880_122010_Linux-x86-64.zip
creating: /u01/app/oracle/product/12.1.0/dbhome_1/OPatch/
inflating: /u01/app/oracle/product/12.1.0/dbhome_1/OPatch/datapatch
...
inflating: /u01/app/oracle/product/12.1.0/dbhome_1/OPatch/docs/cversion.txt
inflating: /u01/app/oracle/product/12.1.0/dbhome_1/OPatch/docs/FAQ
inflating: /u01/app/oracle/product/12.1.0/dbhome_1/OPatch/opatch.bat
[oracle@v1ex1dbadm01 dbhome_1]$

Which can be found here:
OPatch – Where Can I Find the Latest Version of OPatch(6880880)? [Video] (Doc ID 224346.1)
OPATCH PLACEHOLDER Patch 6880880

To see your newOPatch version:

[oracle@v1ex1dbadm01 dbhome_1]$ $ORACLE_HOME/OPatch/opatch version
OPatch Version: 12.2.0.1.9

OPatch succeeded.
[oracle@v1ex1dbadm01 dbhome_1]$

 

If you found this blog post useful, please like as well as follow me through my various Social Media avenues available on the sidebar and/or subscribe to this oracle blog via WordPress/e-mail.

Thanks

Zed DBA (Zahid Anwar)