Shell Tools
Jump right into with an example.
dryrun
dryrun runs a command only when the $DRYRUN environmental variable is not set. 1
Also see try, comparable to make -n
- this is printed but not run
$ echo hi > myfile
$ export DRYRUN=
$ dryrun rm myfile # (1)!
$ cat myfile
cat: myfile: No such file or directory
- nothing is printed.
rmruns silently as ifdryrunwas not there
It's worth noting bash allows environmental variables to be set and scoped to a single command by prefacing the call with var=val. For dryrun enabled scripts and functions, this means staring with DRYRUN=1 for the "just print" version.
rm myfileis printed but not run- empty line showing
$DRYRUNis not set but was for the call above (where it was explicitly declared)
drytee
drytee works like dryrun but for capturing output you may want to be written to a file unless $DRYRUN is set. It's like the command tee but for writing to standard error when the user wants a dry run.
$ echo hi | drytee myfile
$ cat myfile
hi # (1)!
$ DRYRUN=1
$ echo bye | drytee myfile
# bye
# would be written to myfile
$ cat myfile
hi # (2)!
myfilewas written ("hi") b/cDRYRUNis not setmyfileis unchanged.byewas not written
warn
warn could be written echo "$@" > &2. It simply writes it's arguments to standard error (2) instead of standard output. This is useful to avoid shell capture to either a variable or a file.
- 'oh no' seen on the terminal b/c it's written to stderr. "resutls" on stdout is captured into
$a
A contrived example for giving a warning that doesn't end up in the output (but still potentially notifies the user)
# create a file of n lines sequentally numbered
filelines(){
n="$1"
[ $n -lt 2 ] && warn "# WARNING: n=$n < 2. limited output"
printf "%s\n" $(seq 1 $n)
}
waitforjobs
waitforjobs tracks the number of forked child processes. It waits SLEEPTIME and polls the count until there are fewer than MAXJOBS jobs running. It uses shell job control facilities and is useful for local, single user, or small servers. On HPC, you'd use sbatch from e.g. slurm or torque. Other alternatives include bq and task-spooler. GNU Parallel and Make also have job dispatching facilities.
sleephere is a stand in for a more useful long running command to be parallelized- waitforjobs will exit the final loop with MAXJOBS-1 still running. this
waitwill wait for those (but wont have the the notifications every SLEEPTIME. could considerwaitforjobs -p 1instead.
when running locally, output looks like:
Arguments
-c auto is worth exploring in more detail. Using this option, a temporary file like /tmp/host-user-basename.jobcfg is created. Modifying the sleep and job settings in that file will affect the waitforjobs process watching it. You can change the number of cores to use in real time!
iffmain
In a scripts where main_function is a deifned function, iffmain use at the end like
Defensive shell scripting calls for set -euo pipefail but running that (e.g. via source) on the command line will break other scripts and normal interactive shell 2. iffmain is modeled after the python idiom if __name__ == "__main__". When the script is not sourced, it toggles the ideal settings and sets a standard trap to notify on error.
Sourcing
Using iffmain makes it easier to write bash scripts that are primarily functions. Scripts styled this way are easy to source and test.
A bash file that can be sourced can be reused and is able to be tested. See Bash Test Driven Development
Template
iffmain generates shell code that looks like
if [[ "$(caller)" == "0 "* ]]; then
set -euo pipefail
trap 'e=$?; [ $e -ne 0 ] && echo "$0 exited in error $e"' EXIT
MAINFUNCNAME "$@"
exit $?
fi
Example Script
As an example, we'll use drytee, dryrun, and waitforjobs in the script tat2all.bash to
- run
tat2(tat2_single) on a collection of bold files - in parallel (
all_parallel) and - need to do a few checks (
input_checks) before hand.
We'll support
- printing what the script would do instead of actually doing it (
dryrunanddrytee) and - using hygienic shell settings (e.g.
set -euo pipefail) only when run as a file but not when sourced 3
dryteewrites to the specified file unlessDRYRUNis set, then it truncates the output and writes output to stderr.dryrunechos everything after it tostderrifDRYRUNis set. Otherwise, it runs the command.waitforjobswatches the children of the current process and sleeps until there are fewer than 10 running.iffmaingenerates bash code. It runsset -euo pipefailand the specified function only if file is not sourced -- e.g.bash tat2_all.bashor./tat2_all.bash3warnsends a message tostderrso it doesn't get included in any eval/capture --a=$(warn 'oh no'; echo 'yes')yieldsa="yes"
In Use
If we have files like
sub-1
└── ses-1
└── func
├── sub-1_ses-1_func_task-rest_bold.nii.gz
└── sub-1_ses-1_func_task-rest_motion.txt
If we set DRYRUN, we'll see what the script would do: a "dry run".
# 1
# 1
# 1
# 0
# 1 # (1)
# would be written to sub-1/ses-1/func/sub-1_ses-1_func_task-rest_fdcen.1D # (2)
tat2 sub-1/ses-1/func/sub-1_ses-1_func_task-rest_bold.nii.gz -censor sub-1/ses-1/func/sub-1_ses-1_func_task-rest_fdcen.1D
# (3)!
- output of
fd_calc,dryteetruncated, prefixed with#\tand sent to stderr dryteealso mentions what file it would have created. This file still does not existdryrunshows but does not run thetat2command.
Source/Debug
Because the bash file is only functions and iffmain does not run if sourced, we can debug with source.
Here we'll run the create_censor function defined in tat2_all.bash to check that it does what we expect.
source tat2_all.bash
create_censor sub-1/ses-1/func/sub-1_ses-1_func_task-rest_bold.nii.gz
cat sub-1/ses-1/func/sub-1_ses-1_func_task-rest_fdcen.1D
-
"dryrun"'s name is taken from the rsync "--dryrun" option.
perl-renamealias--dry-runwith--just-print↩ -
set -e"exit on an error" is especially disruptive. One typo command and your interactive shell closes itself. ↩ -
sourcing a shell script is useful for running same-file tests with bats and/or embedding the current file in other scripts to reuse function definitions. See [Sourcing][#sourcing] ↩↩