Prevent crontab jobs overlapping using flock

As an Oracle DBA, you may find yourself in the situation where you have crontab jobs overlapping.

For example an RMAN backup takes longer then normal, then overlaps with another RMAN database backup leading to more resources being consumed:

00 00 * * * /home/oracle/scripts/backups/db_backup.sh PROD1 >> /home/oracle/backups/logs/db_backup_PROD1.log 2>&1
00 01 * * * /home/oracle/scripts/backups/db_backup.sh PROD2 >> /home/oracle/backups/logs/db_backup_PROD2.log 2>&1

If the PROD1 backup takes longer then 1 hour, then it will contend with the PROD2 backup when it starts.

Another more recent example for myself, is when I was copying archive logs to AWS for an Oracle Database Standard Edition migration using a manual standby, thus needed to manually transfer archive logs using rsync every 15 minutes:

0,15,30,45 * * * * /home/oracle/copy_Arch_to_AWS.sh > /home/oracle/copy_Arch_to_AWS.log

Ran fine most the time but when there was significant churn in the database, the crontab job would overlap causing several rsync 😦

The easiest solution is to wrap the crontab job in flock πŸ™‚

Using flock

flock is a linux utility that can uses a lock file to determine if the process is already running.Β  The syntax I use is:

flock -x <lockfile> -c '<command>'

The “-x” is to obtain exclude lock and the “-c” is the command to run.

Flock examples

Backup example

00 00 * * * flock -x /home/oracle/scripts/backups/backup.lock -c '/home/oracle/scripts/backups/db_backup.sh PROD1 >> /home/oracle/backups/logs/db_backup_PROD1.log 2>&1'
00 01 * * * flock -x /home/oracle/scripts/backups/backup.lock -c '/home/oracle/scripts/backups/db_backup.sh PROD2 >> /home/oracle/backups/logs/db_backup_PROD2.log 2>&1'

Now when the backup for PROD2 starts flock will check for the lock and will see if exist and will not run the command until the backup for PROD1 is completed πŸ™‚

Archive log copy example

0,15,30,45 * * * * flock -x /home/oracle/copy_Arch_to_AWS.lock -c '/home/oracle/copy_Arch_to_AWS.sh > /home/oracle/copy_Arch_to_AWS.log'

Now when the job runs, an exclusive lock is taken an hence when it runs again in 15 minutes if there an existing run, then it will not run the command until the previous one is completed πŸ™‚Β  This will essentially queue the copies instead of them overlapping causing several rsync, which just exacerbate the issue.

Advance use of flock

Timeout

You can add “-w <seconds>“, which is the amount of time to wait for exclusive lock before exiting without running command. for example:

0,15,30,45 * * * * flock -w 300 -x /home/oracle/copy_Arch_to_AWS.lock -c '/home/oracle/copy_Arch_to_AWS.sh > /home/oracle/copy_Arch_to_AWS.log'

Now flock will wait 5 minutes for the previous archive log copy job to complete before exiting without running the command for that run πŸ™‚

Viewing lock

If you want to see the lock taken by flock, you can run :

[oracle@dc1sbxdb001 ~]$ fuser -v /home/oracle/copy_Arch_to_AWS.lock
                                    USER   PID    ACCESS COMMAND
/home/oracle/copy_Arch_to_AWS.lock: oracle 341039 f....  flock
                                    oracle 341040 f....  rsync

 

If you found this blog post useful, please like as well as follow me through my various Social Media avenues available on the sidebar and/or subscribe to this oracle blog via WordPress/e-mail.

Thanks

Zed DBA (Zahid Anwar)

Stop password for user accounts expiring on Exadata

Depending on how the Oracle Exadata Machine was setup, the password for user accounts can expire thus requiring the password to be changed.

This has the knock on effect of the crontab not being accessible and more importantly jobs do not run:

[oracle@v1ex1dbadm01 ~]$ crontab -l

Authentication token is no longer valid; new one required
You (oracle) are not allowed to access to (crontab) because of pam configuration.
[oracle@v1ex1dbadm01 ~]$

You can check the pam configuration for the password expiry as shown below as the root user:

[root@v1ex1dbadm01 ~]# chage -l oracle
Last password change : Dec 11, 2017
Password expires : Mar 11, 2018
Password inactive : never
Account expires : never
Minimum number of days between password change : 1
Maximum number of days between password change : 90
Number of days of warning before password expires : 7
[root@v1ex1dbadm01 ~]#

We can see the password expired on the 11th March 2018, hence why the crontab jobs are not running since then.

To change, so the password doesn’t expire, use chage as shown below:

[root@v1ex1dbadm01 ~]# chage -d 9999 -E -1 -m 0 -M -1 oracle

The manual page for chage explains the switches:

-d, --lastday LAST_DAY
 Set the number of days since January 1st, 1970 when the password was last changed. The date may also be expressed in the format YYYY-MM-DD (or the format more commonly used in your area). If the LAST_DAY is set to 0 the user is forced to change his password on the next log on.

-E, --expiredate EXPIRE_DATE
 Set the date or number of days since January 1, 1970 on which the userΒ΄s account will no longer be accessible. The date may also be expressed in the format YYYY-MM-DD (or the format more commonly used in your area). A user whose account is locked must contact the system administrator before being able to use the system again.

Passing the number -1 as the EXPIRE_DATE will remove an account expiration date.

-m, --mindays MIN_DAYS
 Set the minimum number of days between password changes to MIN_DAYS. A value of zero for this field indicates that the user may change his/her password at any time.

-M, --maxdays MAX_DAYS
 Set the maximum number of days during which a password is valid. When MAX_DAYS plus LAST_DAY is less than the current day, the user will be required to change his/her password before being able to use his/her account. This occurrence can be planned for in advance by use of the -W option, which provides the user with advance warning.

Passing the number -1 as MAX_DAYS will remove checking a passwordΒ΄s validity.

Now, when re-checking the password expiry, you can see it’s changed to ‘never‘:

[root@v1ex1dbadm01 ~]# chage -l oracle
Last password change : May 01, 2008
Password expires : never
Password inactive : never
Account expires : never
Minimum number of days between password change : 0
Maximum number of days between password change : -1
Number of days of warning before password expires : 7
[root@v1ex1dbadm01 ~]#

And we didn’t need to change the password for the user and the crontab job work again πŸ™‚

This doesn’t just apply to Exadata but to Linux.

See Related MOS Note:
Expiry of user accounts on Oracle Linux 5 (Doc ID 2327855.1)

If you found this blog post useful, please like as well as follow me through my various Social Media avenues available on the sidebar and/or subscribe to this oracle blog via WordPress/e-mail.

Thanks

Zed DBA (Zahid Anwar)