Running scheduled tasks in SeedDMS

Whenever lots of data is managed like SeedDMS does it, there is sooner or later a need for running certain task, e.g. to do clean ups or update operations, or simply to check for data changes occurred over the past. One of the rather obvious operations in SeedDMS is checking for expired documents. But there are others, like informing users about reviews or approvals to be due or updating the full text index. None of them would ever be done without an external trigger, because a web application is not a constantly running process doing all the above at recurring intervals. A document in SeedDMS, which has expired some time ago, will not change its status to expired unless a user accesses that document and forces SeedDMS to check the expiration date again. If you were looking at the database, you would see a document remaining in its old state. In most cases this is just fine, because nobody actually cares about the status of a document unless it is being accessed. But there are other cases where it makes a difference. The full text index, which also stores the status of a document, will not be aware of the expired documents, unless it is updated regularly. That’s why the so called scheduler was added in SeedDMS 6.

The scheduler

The scheduler in SeedDMS manages those task which need to be run regularly. It is much like a cron daemon on Unix systems and actually the way to schedule a task has been borrowed from crond.

SeedDMS 6 already ships with a small number of tasks, which need to be configured and activated to be of any use.

  • finding expired documents and informing the owner
  • updating or recreating the full text index
  • checking for an incorrect checksum of a document
  • checking for missing preview images
  • checking for upcomming events in the calendar and informing the owner of the event

Many of the available extensions add more task of this kind, e.g. to import mail attachments, checking for due revisions, emptying the trash can, etc.

If you have ever looked into the directory utils of your SeedDMS installation, you will find some php scripts which do exactly what the tasks above do. In the past those scripts had to be called individually by a cron job. Hence, each task needs its own cron job. The scheduler comprises all tasks into one cron job, by just calling utils/seeddms-schedulercli. All the remaining configuration and activation of a task can be done within SeedDMS.

Configuring the scheduler

The scheduler itself also is not a constantly running process, it must be run by a cronjob with the same user running your webserver

*/5 * * * * /home/seeddms/utils/seeddms-schedulercli --mode=run

In this particular case it is run every 5 minutes. It could be less or more time between the runs, but keep in mind that this is also the minimum time between two runs of a task. If you configure a task to run every minute, it will not do it if the scheduler is run every 5 minutes.

The scheduler will not run unless a user cli_scheduler exists in SeedDMS. This user will be the one used to access the documents and folders. Hence, it should have sufficient access rights.

Thought the scheduler script seeddms-schedulercli is usually called from a cronjob it can also be called on the command line. This can be very helpful for debugging. In that case it’s worth to have a look at the different options which can be passed on the command line. Just run the script with the option -h to get a list of possible command line parameters. Always run the scheduler with the same user like your web server user, otherwise files created by any of the tasks may not be changeable later (e.g. the full text index if recreated). On Debian would usually run

sudo -u www-data utils/seeddms-schedulercli --mode=run

In case SeedDMS’ configuration file cannot be found, then just specify the full path of the configuration file with the command line option --config.

What if you can’t run it as a cronjob

There are cases where you cannot run the scheduler by a cron daemon. That’s why the page op/op.Cron.php exists. Calling this page with the parameter mode=run will be identical to running utils/seeddms-schedulercli --mode=run. The result of that call is a json data object with information about each task. You may evaluate it, but often it’s sufficient to check for the http status code 200, whether the execution of the scheduler was successful or not. The page uses basic authentication and requires to log in as user cli_scheduler.

Configuring a task

Each task itself is configured in SeedDMS. Just click on the button ‘Scheduler’ in the admin area. The page will list all available task classes. It’s important to recognize the difference between the task class and it’s instantiation. It’s very much like objects and classes in object oriented programming. The task classes shipped with SeedDMS all start with core::. Extensions should use its extension name as prefix followed by ::.

Scheduler

The first step to run a task is choosing one of the task classes and clicking on the + button. It will open a form next to the list of task classes. The first four fields of the form are identical for all tasks. The name and the frequency are mandatory. The frequency follows the same syntax like the first five numbers in a crontab. It also understand the terms @daily and @hourly which mean exactly what the say. The fourth parameter Disabled can be use to activate or deactivate a task. If there are more fields in the form, then they are specific for the task. Those extra parameters explain, why there can be more tasks derived from the same task class. They do different things depending on the parameters. A common scenario is to have two task base on the class core::indexingdocs. One running more frequently with the parameter recreate not being checked and a second task, running e.g. once a week, to recreate the whole full text index.

Once you have created the first task it will appear in another table below the task classes. Those task which are activated will have a green background all other task are white. Besides the name, description, class and frequency of a task, it also shows the time of the last and next run. Tasks which have been deactivated may have a time for the next run a long time ago. If such as task is activated again, it will be run the next time the scheduler is run.

Conclusion

Most installations of SeedDMS get along without tasks, but sooner or later your users will request reminders, status reports, monitoring, etc. or your daily administrative duties become too boring. Than it’s time to automate things with periodically run tasks. A good starting point might be the example extension in SeedDMS.