Prev: Subshell, Up: Index, Next: Specialized tools
DIY commands
We know already that many commands that are included in the operating system as standalone binaries (basename
, dirname
etc.) either have a corresponding built-in command or can be easily replaced with a single shell expansion. In some other cases, we might need to try a bit harder to reproduce the functionality we need. Consider the following example:
time for ((i=0; i<10000; i++)); do
DIR=$(realpath -e "${HOME}") # External command
done >/dev/null
real 0m9,044s
user 0m6,833s
sys 0m2,502s
realpath
is an external command which returns an absolute, physical (i.e. without symbolic links) path for the path provided as a parameter. For directories, this goal can also be achieved using a combination of two built-in commands: cd
and pwd
:
time for ((i=0;i<10000;i++)); do
DIR=$(cd "${HOME}" && pwd -P) # Two built-in commands
# Still one subprocess
done >/dev/null
real 0m4,083s
user 0m2,963s
sys 0m1,387s
In this case, command substitution is not just a mere convenience: running that in a subshell guarantees that the present working directory of the parent process remains unchanged. The fact that those commands are built in saves us more than 50% of time. But can we somehow get rid of the subprocess?
Re-implementing realpath
requires an extra command to return to the directory the process was prior to cd
. This can be done using either cd -
or pushd
/popd
commands:
cd "${HOME}" && real_home="${PWD}" && cd -
or if we care about resolving symbolic links:
set -P
cd "${HOME}" && real_home="${PWD}" && cd -
set +P
How much faster is our built-in version?
for ((i=0; i<10000; i++)); do
set -P
cd "${HOME}" && real_home="${PWD}" && cd -
set +P
done
real 0m0,100s
user 0m0,064s
sys 0m0,036s
While the performance improvement is definitely satisfying, what suffers here is readability and reusability. Fortunately, both of these issues can be resolved with a single feature.
Functions
Functions in Bash are basically user-defined compound commands. They have human-readable names, parameters, exit code and even their own redirections. Functions are executed within the current environment (i.e. they don’t require a subshell), which makes them promising candidates for “custom built-ins”.
An important feature of functions is that, contrary to substitutions, they can modify variables from outer scope. This functionality is often used to pass a result to the caller (as the return
statement applies only to status codes):
my_realpath() {
cd "${2}" && REALPATH="${PWD}" && cd -
} >/dev/null
The downside of this approach is that we need to ensure the global variable is not used for anything else in the script, which might prove difficult in some cases. A more flexible way is to use passing by reference, where the output variable name is passed as an argument and bound to a local variable:
my_realpath() {
local -n dir="${1}"
cd "${2}" && dir="${PWD}" && cd -
} >/dev/null
Finally, let’s check the performance of this solution:
time for ((i=0; i<10000; i++)); do
my_realpath DIR "${HOME}"
done
real 0m0,164s
user 0m0,132s
sys 0m0,032s
It turns out to be a bit slower than inlining (which is due to the overhead of calling a function), but still an order of magnitude faster than command substitution.
Prev: Subshell, Up: Index, Next: Specialized tools