Prevent crontab jobs overlapping using flock

As an Oracle DBA, you may find yourself in the situation where you have crontab jobs overlapping.

For example an RMAN backup takes longer then normal, then overlaps with another RMAN database backup leading to more resources being consumed:

00 00 * * * /home/oracle/scripts/backups/db_backup.sh PROD1 >> /home/oracle/backups/logs/db_backup_PROD1.log 2>&1
00 01 * * * /home/oracle/scripts/backups/db_backup.sh PROD2 >> /home/oracle/backups/logs/db_backup_PROD2.log 2>&1

If the PROD1 backup takes longer then 1 hour, then it will contend with the PROD2 backup when it starts.

Another more recent example for myself, is when I was copying archive logs to AWS for an Oracle Database Standard Edition migration using a manual standby, thus needed to manually transfer archive logs using rsync every 15 minutes:

0,15,30,45 * * * * /home/oracle/copy_Arch_to_AWS.sh > /home/oracle/copy_Arch_to_AWS.log

Ran fine most the time but when there was significant churn in the database, the crontab job would overlap causing several rsync 😦

The easiest solution is to wrap the crontab job in flock 🙂

Using flock

flock is a linux utility that can uses a lock file to determine if the process is already running.  The syntax I use is:

flock -x <lockfile> -c '<command>'

The “-x” is to obtain exclude lock and the “-c” is the command to run.

Flock examples

Backup example

00 00 * * * flock -x /home/oracle/scripts/backups/backup.lock -c '/home/oracle/scripts/backups/db_backup.sh PROD1 >> /home/oracle/backups/logs/db_backup_PROD1.log 2>&1'
00 01 * * * flock -x /home/oracle/scripts/backups/backup.lock -c '/home/oracle/scripts/backups/db_backup.sh PROD2 >> /home/oracle/backups/logs/db_backup_PROD2.log 2>&1'

Now when the backup for PROD2 starts flock will check for the lock and will see if exist and will not run the command until the backup for PROD1 is completed 🙂

Archive log copy example

0,15,30,45 * * * * flock -x /home/oracle/copy_Arch_to_AWS.lock -c '/home/oracle/copy_Arch_to_AWS.sh > /home/oracle/copy_Arch_to_AWS.log'

Now when the job runs, an exclusive lock is taken an hence when it runs again in 15 minutes if there an existing run, then it will not run the command until the previous one is completed 🙂  This will essentially queue the copies instead of them overlapping causing several rsync, which just exacerbate the issue.

Advance use of flock

Timeout

You can add “-w <seconds>“, which is the amount of time to wait for exclusive lock before exiting without running command. for example:

0,15,30,45 * * * * flock -w 300 -x /home/oracle/copy_Arch_to_AWS.lock -c '/home/oracle/copy_Arch_to_AWS.sh > /home/oracle/copy_Arch_to_AWS.log'

Now flock will wait 5 minutes for the previous archive log copy job to complete before exiting without running the command for that run 🙂

Viewing lock

If you want to see the lock taken by flock, you can run :

[oracle@dc1sbxdb001 ~]$ fuser -v /home/oracle/copy_Arch_to_AWS.lock
                                    USER   PID    ACCESS COMMAND
/home/oracle/copy_Arch_to_AWS.lock: oracle 341039 f....  flock
                                    oracle 341040 f....  rsync

 

If you found this blog post useful, please like as well as follow me through my various Social Media avenues available on the sidebar and/or subscribe to this oracle blog via WordPress/e-mail.

Thanks

Zed DBA (Zahid Anwar)