Non-file dependencies¶
uptodate¶
Apart from file dependencies you can extend doit to support other ways
to determine if a task is up-to-date through the attribute uptodate
.
This can be used in cases where you need to some kind of calculation to determine if the task is up-to-date or not.
uptodate
is a list where each element can be True, False, None, a callable
or a command(string).
False
indicates that the task is NOT up-to-date
True
indicates that the task is up-to-date
None
values will just be ignored. This is used when the value is dynamically calculated
Note
An uptodate
value equal to True
does not override others
up-to-date checks. It is one more way to check if task is not up-to-date.
i.e. if uptodate==True but a file_dep changes the task is still considered not up-to-date.
If an uptodate
item is a string it will be executed on the shell.
If the process exits with the code 0
, it is considered as up-to-date.
All other values would be considered as not up-to-date.
uptodate
elements can also be a callable that will be executed on runtime
(not when the task is being created).
The section custom-uptodate
will explain in details how to extend doit
writing your own callables for uptodate
. This callables will typically
compare a value on the present time with a value calculated on the last
successful execution.
Note
There is no guarantee uptodate
callables or commands will be executed.
doit short-circuit the checks, if it is already determined that the
task is no up-to-date it will not execute remaining uptodate
checks.
doit includes several implementations to be used as uptodate
.
They are all included in module doit.tools and will be discussed in detail
later:
result_dep: check if the result of another task has changed
run_once: execute a task only once (used for tasks without dependencies)
timeout: indicate that a task should “expire” after a certain time interval
config_changed: check for changes in a “configuration” string or dictionary
check_timestamp_unchanged(): check access, status change/create or modify timestamp of a given file/directory
doit up-to-date definition¶
A task is not up-to-date if any of:
an uptodate item is (or evaluates to) False
a file is added to or removed from file_dep
a file_dep changed since last successful execution
a target path does not exist
a task has no file_dep and uptodate item equal to True
It means that if a task does not explicitly define any input (dependency) it will never be considered up-to-date.
Note that since a target represents an output of the task, a missing target is enough to determine that a task is not up-to-date. But its existence by itself is not enough to mark a task up-to-date.
In some situations, it is useful to define a task with targets but no
dependencies. If you want to re-execute this task only when targets are missing
you must explicitly add a dependency: you could add a uptodate
with True
value or use run_once() to force at least one
execution managed by doit. Example:
def task_touch():
return {
'actions': ['touch foo.txt'],
'targets': ['foo.txt'],
# force doit to always mark the task
# as up-to-date (unless target removed)
'uptodate': [True],
}
Apart from file_dep
and uptodate
used to determine if a task
is up-to-date or not,
doit
also includes other kind of dependencies (introduced below)
to help you combine tasks
so they are executed in appropriate order.
uptodate API¶
This section will explain how to extend doit
writing an uptodate
implementation. So unless you need to write an uptodate
implementation
you can skip this.
Let’s start with trivial example. uptodate is a function that returns a boolean value.
def fake_get_value_from_db():
return 5
def check_outdated():
total = fake_get_value_from_db()
return total > 10
def task_put_more_stuff_in_db():
def put_stuff(): pass
return {'actions': [put_stuff],
'uptodate': [check_outdated],
}
You could also execute this function in the task-creator and pass the value to to uptodate. The advantage of just passing the callable is that this check will not be executed at all if the task was not selected to be executed.
Example: run-once implementation¶
Most of the time an uptodate implementation will compare the current value of something with the value it had last time the task was executed.
We already saw how tasks can save values by returning dict on its actions. But usually the “value” we want to check is independent from the task actions. So the first step is to add a callable to the task so it can save some extra values. These values are not used by the task itself, they are only used for dependency checking.
The Task has a property called value_savers
that contains a list of
callables. These callables should return a dict that will be saved together
with other task values. The value_savers
will be executed after all actions.
The second step is to actually compare the saved value with its “current” value.
The uptodate callable can take two positional parameters task
and values
. The callable can also be represented by a tuple (callable, args, kwargs).
task
parameter will give you access to task object. So you have access to its metadata and opportunity to modify the task itself!
values
is a dictionary with the computed values saved in the lastsuccessful execution of the task.
Let’s take a look in the run_once
implementation.
def run_once(task, values):
def save_executed():
return {'run-once': True}
task.value_savers.append(save_executed)
return values.get('run-once', False)
The function save_executed
returns a dict. In this case it is not checking
for any value because it just checks it the task was ever executed.
The next line we use the task
parameter adding
save_executed
to task.value_savers
.So whenever this task is executed this
task value ‘run-once’ will be saved.
Finally the return value should be a boolean to indicate if the task is
up-to-date or not. Remember that the ‘values’ parameter contains the dict with
the values saved from last successful execution of the task.
So it just checks if this task was executed before by looking for the
run-once
entry in `values
.
Example: timeout implementation¶
Let’s look another example, the timeout
. The main difference is that
we actually pass the parameter timeout_limit
. Here we present
a simplified version that only accepts integers (seconds) as a parameter.
class timeout(object):
def __init__(self, timeout_limit):
self.limit_sec = timeout_limit
def __call__(self, task, values):
def save_now():
return {'success-time': time_module.time()}
task.value_savers.append(save_now)
last_success = values.get('success-time', None)
if last_success is None:
return False
return (time_module.time() - last_success) < self.limit_sec
This is a class-based implementation where the objects are made callable
by implementing a __call__
method.
On __init__
we just save the timeout_limit
as an attribute.
The __call__
is very similar with the run-once
implementation.
First it defines a function (save_now
) that is registered
into task.value_savers
. Than it compares the current time
with the time that was saved on last successful execution.
Example: result_dep implementation¶
The result_dep
is more complicated due to two factors. It needs to modify
the task’s task_dep
.
And it needs to check the task’s saved values and metadata
from a task different from where it is being applied.
A result_dep
implies that its dependency is also a task_dep
.
We have seen that the callable takes a task parameter that we used
to modify the task object. The problem is that modifying task_dep
when the callable gets called would be “too late” according to the
way doit works. When an object is passed uptodate
and this
object’s class has a method named configure_task
it will be called
during the task creation.
The base class dependency.UptodateCalculator
gives access to
an attribute named tasks_dict
containing a dictionary with
all task objects where the key
is the task name (this is used to get all
sub-tasks from a task-group). And also a method called get_val
to access
the saved values and results from any task.
See the result_dep source.
task-dependency¶
It is used to enforce tasks are executed on the desired order.
By default tasks are executed on the same order as they were defined in
the dodo file. To define a dependency on another task use the
task name (whatever comes after task_
on the function name) in the
“task_dep” attribute.
Note
A task-dependency only indicates that another task should be “executed” before itself. The task-dependency might not really be executed if it is up-to-date.
Note
task-dependencies are not used to determine if a task is up-to-date or not. If a task defines only task-dependency it will always be executed.
This example we make sure we include a file with the latest revision number of the mercurial repository on the tar file.
def task_tar():
return {'actions': ["tar -cf foo.tar *"],
'task_dep':['version'],
'targets':['foo.tar']}
def task_version():
return {'actions': ["hg tip --template '{rev}' > revision.txt"]}
$ doit
. version
. tar
groups¶
You can define a group of tasks by adding tasks as dependencies and setting
its actions to None
.
def task_foo():
return {'actions': ["echo foo"]}
def task_bar():
return {'actions': ["echo bar"]}
def task_mygroup():
return {'actions': None,
'task_dep': ['foo', 'bar']}
Note that tasks are never executed twice in the same “run”.
setup-task¶
Some tasks may require some kind of environment setup. In this case they can define a list of “setup” tasks.
the setup-task will be executed only if the task is to be executed (not up-to-date)
setup-tasks are just normal tasks that follow all other task behavior
Note
A task-dependency is executed before checking if the task is up-to-date. A setup-task is executed after the checking if the task is up-to-date and it is executed only if the task is not up-to-date and will be executed.
teardown¶
Task may also define ‘teardown’ actions. These actions are executed after all tasks have finished their execution. They are executed in reverse order their tasks were executed.
Example:
### task setup env. good for functional tests!
DOIT_CONFIG = {'verbosity': 2,
'default_tasks': ['withenvX', 'withenvY']}
def start(name):
print("start %s" % name)
def stop(name):
print("stop %s" % name)
def task_setup_sample():
for name in ('setupX', 'setupY'):
yield {'name': name,
'actions': [(start, (name,))],
'teardown': [(stop, (name,))],
}
def task_withenvX():
for fin in ('a','b','c'):
yield {'name': fin,
'actions':['echo x %s' % fin],
'setup': ['setup_sample:setupX'],
}
def task_withenvY():
return {'actions':['echo y'],
'setup': ['setup_sample:setupY'],
}
$ doit withenvX
. setup_sample:setupX
start setupX
. withenvX:c
x c
. withenvX:b
x b
. withenvX:a
x a
stop setupX
$ doit withenvY
. setup_sample:setupY
start setupY
. withenvY
y
stop setupY
saving computed values¶
Tasks can save computed values by returning a dictionary on it’s python-actions. The values must be JSON encodable.
A cmd-action can also save it’s output. But for this you will need to explicitly import CmdAction and set its save_out parameter with the name used to save the output in values
from doit.action import CmdAction
def task_save_output():
return {
'actions': [CmdAction("echo x1", save_out='out')],
}
# The task values will contain: {'out': u'x1'}
These values can be used on uptodate and getargs. Check those sections for examples.
getargs¶
getargs provides a way to use values computed from one task in another task. The values are taken from “saved computed values” (returned dict from a python-action).
For cmd-action use dictionary-based string formatting. Formatting style is controlled by action_string_formatting
key in DOIT_CONFIG
(see keywords on cmd-action string).
For python-action the action callable parameter names must match with keys from getargs.
getargs is a dictionary where the key is the argument name used on actions, and the value is a tuple with 2 strings: task name, “value name”.
DOIT_CONFIG = {
'default_tasks': ['use_cmd', 'use_python'],
'action_string_formatting': 'both',
}
def task_compute():
def comp():
return {'x':5,'y':10, 'z': 20}
return {'actions': [(comp,)]}
def task_use_cmd():
return {'actions': ['echo x={x}', # new-style formatting
'echo z=%(z)s'], # old-style formatting
'getargs': {'x': ('compute', 'x'),
'z': ('compute', 'z')},
'verbosity': 2,
}
def task_use_python():
return {'actions': [show_getargs],
'getargs': {'x': ('compute', 'x'),
'y': ('compute', 'z')},
'verbosity': 2,
}
def show_getargs(x, y):
print("this is x: {}".format(x))
print("this is y: {}".format(y))
The values are being passed on to a python-action you can pass the whole dict
by specifying the value name as None
.
def task_compute():
def comp():
return {'x':5,'y':10, 'z': 20}
return {'actions': [(comp,)]}
def show_getargs(values):
print(values)
def task_args_dict():
return {'actions': [show_getargs],
'getargs': {'values': ('compute', None)},
'verbosity': 2,
}
If a group-task is used, the values from all its sub-tasks are passed as a dict.
def task_compute():
def comp(x):
return {'x':x}
yield {'name': '5',
'actions': [ (comp, [5]) ]
}
yield {'name': '7',
'actions': [ (comp, [7]) ]
}
def show_getargs(values):
print(values)
assert sum(v['x'] for v in values.values()) == 12
def task_args_dict():
return {'actions': [show_getargs],
'getargs': {'values': ('compute', None)},
'verbosity': 2,
}
Note
getargs
creates an implicit setup-task.
calculated-dependencies¶
Calculation of dependencies might be an expensive operation, so not suitable to be done on load time by task-creators. For this situation it is better to delegate the calculation of dependencies to another task. The task calculating dependencies must have a python-action returning a dictionary with file_dep, task_dep, uptodate or another calc_dep.
Note
An alternative way (and often easier) to have task attributes that rely on other tasks execution is to use delayed tasks.
On the example below mod_deps
prints on the screen all direct dependencies
from a module. The dependencies itself are calculated on task get_dep
(note: get_dep has a fake implementation where the results are taken from a dict).
DOIT_CONFIG = {'verbosity': 2}
MOD_IMPORTS = {'a': ['b','c'],
'b': ['f','g'],
'c': [],
'f': ['a'],
'g': []}
def print_deps(mod, dependencies):
print("%s -> %s" % (mod, dependencies))
def task_mod_deps():
"""task that depends on all direct imports"""
for mod in MOD_IMPORTS.keys():
yield {'name': mod,
'actions': [(print_deps,(mod,))],
'file_dep': [mod],
'calc_dep': ["get_dep:%s" % mod],
}
def get_dep(mod):
# fake implementation
return {'file_dep': MOD_IMPORTS[mod]}
def task_get_dep():
"""get direct dependencies for each module"""
for mod in MOD_IMPORTS.keys():
yield {'name': mod,
'actions':[(get_dep,[mod])],
'file_dep': [mod],
}