As an Oracle DBA, you may find yourself in the situation where you have crontab jobs overlapping.
For example an RMAN backup takes longer then normal, then overlaps with another RMAN database backup leading to more resources being consumed:
00 00 * * * /home/oracle/scripts/backups/db_backup.sh PROD1 >> /home/oracle/backups/logs/db_backup_PROD1.log 2>&1 00 01 * * * /home/oracle/scripts/backups/db_backup.sh PROD2 >> /home/oracle/backups/logs/db_backup_PROD2.log 2>&1
If the PROD1 backup takes longer then 1 hour, then it will contend with the PROD2 backup when it starts.
Another more recent example for myself, is when I was copying archive logs to AWS for an Oracle Database Standard Edition migration using a manual standby, thus needed to manually transfer archive logs using rsync every 15 minutes:
0,15,30,45 * * * * /home/oracle/copy_Arch_to_AWS.sh > /home/oracle/copy_Arch_to_AWS.log
Ran fine most the time but when there was significant churn in the database, the crontab job would overlap causing several rsync 😦
The easiest solution is to wrap the crontab job in flock 🙂
flock is a linux utility that can uses a lock file to determine if the process is already running. The syntax I use is:
flock -x <lockfile> -c '<command>'
The “-x” is to obtain exclude lock and the “-c” is the command to run.
00 00 * * * flock -x /home/oracle/scripts/backups/backup.lock -c '/home/oracle/scripts/backups/db_backup.sh PROD1 >> /home/oracle/backups/logs/db_backup_PROD1.log 2>&1' 00 01 * * * flock -x /home/oracle/scripts/backups/backup.lock -c '/home/oracle/scripts/backups/db_backup.sh PROD2 >> /home/oracle/backups/logs/db_backup_PROD2.log 2>&1'
Now when the backup for PROD2 starts flock will check for the lock and will see if exist and will not run the command until the backup for PROD1 is completed 🙂
Archive log copy example
0,15,30,45 * * * * flock -x /home/oracle/copy_Arch_to_AWS.lock -c '/home/oracle/copy_Arch_to_AWS.sh > /home/oracle/copy_Arch_to_AWS.log'
Now when the job runs, an exclusive lock is taken an hence when it runs again in 15 minutes if there an existing run, then it will not run the command until the previous one is completed 🙂 This will essentially queue the copies instead of them overlapping causing several rsync, which just exacerbate the issue.
Advance use of flock
You can add “-w <seconds>“, which is the amount of time to wait for exclusive lock before exiting without running command. for example:
0,15,30,45 * * * * flock -w 300 -x /home/oracle/copy_Arch_to_AWS.lock -c '/home/oracle/copy_Arch_to_AWS.sh > /home/oracle/copy_Arch_to_AWS.log'
Now flock will wait 5 minutes for the previous archive log copy job to complete before exiting without running the command for that run 🙂
If you want to see the lock taken by flock, you can run :
[oracle@dc1sbxdb001 ~]$ fuser -v /home/oracle/copy_Arch_to_AWS.lock USER PID ACCESS COMMAND /home/oracle/copy_Arch_to_AWS.lock: oracle 341039 f.... flock oracle 341040 f.... rsync
If you found this blog post useful, please like as well as follow me through my various Social Media avenues available on the sidebar and/or subscribe to this oracle blog via WordPress/e-mail.
Zed DBA (Zahid Anwar)