How to make Bash scripts faster?

Prev: Subshell, Up: Index, Next: Specialized tools

DIY commands

We know already that many commands that are included in the operating system as standalone binaries (basename, dirname etc.) either have a corresponding built-in command or can be easily replaced with a single shell expansion. In some other cases, we might need to try a bit harder to reproduce the functionality we need. Consider the following example:

time for ((i=0; i<10000; i++)); do
    DIR=$(realpath -e "${HOME}")      # External command
done >/dev/null

real    0m9,044s
user    0m6,833s
sys     0m2,502s

realpath is an external command which returns an absolute, physical (i.e. without symbolic links) path for the path provided as a parameter. For directories, this goal can also be achieved using a combination of two built-in commands: cd and pwd:

time for ((i=0;i<10000;i++)); do
    DIR=$(cd "${HOME}" && pwd -P)  # Two built-in commands
                                   # Still one subprocess
done >/dev/null

real    0m4,083s
user    0m2,963s
sys     0m1,387s

In this case, command substitution is not just a mere convenience: running that in a subshell guarantees that the present working directory of the parent process remains unchanged. The fact that those commands are built in saves us more than 50% of time. But can we somehow get rid of the subprocess?

Re-implementing realpath requires an extra command to return to the directory the process was prior to cd. This can be done using either cd - or pushd/popd commands:

cd "${HOME}" && real_home="${PWD}" && cd -

or if we care about resolving symbolic links:

set -P
cd "${HOME}" && real_home="${PWD}" && cd -
set +P

How much faster is our built-in version?

for ((i=0; i<10000; i++)); do
    set -P
    cd "${HOME}" && real_home="${PWD}" && cd -
    set +P
done

real    0m0,100s
user    0m0,064s
sys     0m0,036s

While the performance improvement is definitely satisfying, what suffers here is readability and reusability. Fortunately, both of these issues can be resolved with a single feature.

Functions

Functions in Bash are basically user-defined compound commands. They have human-readable names, parameters, exit code and even their own redirections. Functions are executed within the current environment (i.e. they don’t require a subshell), which makes them promising candidates for “custom built-ins”.

An important feature of functions is that, contrary to substitutions, they can modify variables from outer scope. This functionality is often used to pass a result to the caller (as the return statement applies only to status codes):

my_realpath() {
    cd "${2}" && REALPATH="${PWD}" && cd -
} >/dev/null

The downside of this approach is that we need to ensure the global variable is not used for anything else in the script, which might prove difficult in some cases. A more flexible way is to use passing by reference, where the output variable name is passed as an argument and bound to a local variable:

my_realpath() {
    local -n dir="${1}"
    cd "${2}" && dir="${PWD}" && cd -
} >/dev/null

Finally, let’s check the performance of this solution:

time for ((i=0; i<10000; i++)); do
    my_realpath DIR "${HOME}"
done

real    0m0,164s
user    0m0,132s
sys     0m0,032s

It turns out to be a bit slower than inlining (which is due to the overhead of calling a function), but still an order of magnitude faster than command substitution.

Prev: Subshell, Up: Index, Next: Specialized tools