Shell Tools
Jump right into with an example.
dryrun
dryrun
runs a command only when the $DRYRUN
environmental variable is not set. 1
Also see try
, comparable to make -n
- this is printed but not run
$ echo hi > myfile
$ export DRYRUN=
$ dryrun rm myfile # (1)!
$ cat myfile
cat: myfile: No such file or directory
- nothing is printed.
rm
runs silently as ifdryrun
was not there
It's worth noting bash allows environmental variables to be set and scoped to a single command by prefacing the call with var=val
. For dryrun
enabled scripts and functions, this means staring with DRYRUN=1
for the "just print" version.
rm myfile
is printed but not run- empty line showing
$DRYRUN
is not set but was for the call above (where it was explicitly declared)
drytee
drytee
works like dryrun
but for capturing output you may want to be written to a file unless $DRYRUN
is set. It's like the command tee
but for writing to standard error when the user wants a dry run.
$ echo hi | drytee myfile
$ cat myfile
hi # (1)!
$ DRYRUN=1
$ echo bye | drytee myfile
# bye
# would be written to myfile
$ cat myfile
hi # (2)!
myfile
was written ("hi") b/cDRYRUN
is not setmyfile
is unchanged.bye
was not written
warn
warn
could be written echo "$@" > &2
. It simply writes it's arguments to standard error (2) instead of standard output. This is useful to avoid shell capture to either a variable or a file.
- 'oh no' seen on the terminal b/c it's written to stderr. "resutls" on stdout is captured into
$a
A contrived example for giving a warning that doesn't end up in the output (but still potentially notifies the user)
# create a file of n lines sequentally numbered
filelines(){
n="$1"
[ $n -lt 2 ] && warn "# WARNING: n=$n < 2. limited output"
printf "%s\n" $(seq 1 $n)
}
waitforjobs
waitforjobs
tracks the number of forked child processes. It waits SLEEPTIME
and polls the count until there are fewer than MAXJOBS
jobs running. It uses shell job control facilities and is useful for local, single user, or small servers. On HPC, you'd use sbatch
from e.g. slurm
or torque
. Other alternatives include bq
and task-spooler
. GNU Parallel and Make also have job dispatching facilities.
sleep
here is a stand in for a more useful long running command to be parallelized- waitforjobs will exit the final loop with MAXJOBS-1 still running. this
wait
will wait for those (but wont have the the notifications every SLEEPTIME. could considerwaitforjobs -p 1
instead.
when running locally, output looks like:
Arguments
-c auto
is worth exploring in more detail. Using this option, a temporary file like /tmp/host-user-basename.jobcfg
is created. Modifying the sleep and job settings in that file will affect the waitforjobs process watching it. You can change the number of cores to use in real time!
iffmain
In a scripts where main_function
is a deifned function, iffmain
use at the end like
Defensive shell scripting calls for set -euo pipefail
but running that (e.g. via source
) on the command line will break other scripts and normal interactive shell 2. iffmain
is modeled after the python idiom if __name__ == "__main__"
. When the script is not sourced, it toggles the ideal settings and sets a standard trap
to notify on error.
Sourcing
Using iffmain
makes it easier to write bash scripts that are primarily functions. Scripts styled this way are easy to source and test.
A bash file that can be sourced can be reused and is able to be tested. See Bash Test Driven Development
Template
iffmain
generates shell code that looks like
if [[ "$(caller)" == "0 "* ]]; then
set -euo pipefail
trap 'e=$?; [ $e -ne 0 ] && echo "$0 exited in error $e"' EXIT
MAINFUNCNAME "$@"
exit $?
fi
Example Script
As an example, we'll use drytee
, dryrun
, and waitforjobs
in the script tat2all.bash
to
- run
tat2
(tat2_single
) on a collection of bold files - in parallel (
all_parallel
) and - need to do a few checks (
input_checks
) before hand.
We'll support
- printing what the script would do instead of actually doing it (
dryrun
anddrytee
) and - using hygienic shell settings (e.g.
set -euo pipefail
) only when run as a file but not when sourced 3
drytee
writes to the specified file unlessDRYRUN
is set, then it truncates the output and writes output to stderr.dryrun
echos everything after it tostderr
ifDRYRUN
is set. Otherwise, it runs the command.waitforjobs
watches the children of the current process and sleeps until there are fewer than 10 running.iffmain
generates bash code. It runsset -euo pipefail
and the specified function only if file is not sourced -- e.g.bash tat2_all.bash
or./tat2_all.bash
3warn
sends a message tostderr
so it doesn't get included in any eval/capture --a=$(warn 'oh no'; echo 'yes')
yieldsa="yes"
In Use
If we have files like
sub-1
└── ses-1
└── func
├── sub-1_ses-1_func_task-rest_bold.nii.gz
└── sub-1_ses-1_func_task-rest_motion.txt
If we set DRYRUN
, we'll see what the script would do: a "dry run".
# 1
# 1
# 1
# 0
# 1 # (1)
# would be written to sub-1/ses-1/func/sub-1_ses-1_func_task-rest_fdcen.1D # (2)
tat2 sub-1/ses-1/func/sub-1_ses-1_func_task-rest_bold.nii.gz -censor sub-1/ses-1/func/sub-1_ses-1_func_task-rest_fdcen.1D
# (3)!
- output of
fd_calc
,drytee
truncated, prefixed with#\t
and sent to stderr drytee
also mentions what file it would have created. This file still does not existdryrun
shows but does not run thetat2
command.
Source/Debug
Because the bash file is only functions and iffmain
does not run if sourced, we can debug with source
.
Here we'll run the create_censor
function defined in tat2_all.bash
to check that it does what we expect.
source tat2_all.bash
create_censor sub-1/ses-1/func/sub-1_ses-1_func_task-rest_bold.nii.gz
cat sub-1/ses-1/func/sub-1_ses-1_func_task-rest_fdcen.1D
-
"dryrun"'s name is taken from the rsync "--dryrun" option.
perl-rename
alias--dry-run
with--just-print
↩ -
set -e
"exit on an error" is especially disruptive. One typo command and your interactive shell closes itself. ↩ -
sourcing a shell script is useful for running same-file tests with bats and/or embedding the current file in other scripts to reuse function definitions. See [Sourcing][#sourcing] ↩↩