Linux Fu: Bash Strings

The content below is taken from the original ( Linux Fu: Bash Strings), to continue reading please visit the site. Remember to respect the Author & Copyright.

If you are a traditional programmer, using bash for scripting may seem limiting sometimes, but for certain tasks, bash can be very productive. It turns out, some of the limits of bash are really limits of older shells and people code to that to be compatible. Still other perceived issues are because some of the advanced functions in bash are arcane or confusing.

Strings are a good example. You don’t think of bash as a string manipulation language, but it has many powerful ways to handle strings. In fact, it may have too many ways, since the functionality winds up in more than one place. Of course, you can also call out to programs, and sometimes it is just easier to make a call to an awk or Python script to do the heavy lifting.

But let’s stick with bash-isms for handling strings. Obviously, you can put a string in an environment variable and pull it back out. I am going to assume you know how string interpolation and quoting works. In other words, this should make sense:

echo "Your path is $PATH and the current directory is ${PWD}"

The Long and the Short

Suppose you want to know the length of a string. That’s a pretty basic string operation. In bash, you can write ${#var} to find the length of $var:


#/bin/bash
echo -n "Project Name? "
read PNAME
if (( ${#PNAME} > 16 ))
then
   echo Error: Project name longer than 16 characters
else
   echo ${PNAME} it is!
fi

The “((” forms an arithmetic context which is why you can get away with an unquoted greater-than sign here. If you don’t mind using expr — which is an external program — there are at least two more ways to get there:


echo ${#STR}
expr length "${STR}"
expr match "${STR}" '.*'

Of course, if you allow yourself to call outside of bash, you could use awk or anything else to do this, too, but we’ll stick with expr as it is relatively lightweight.

Swiss Army Knife

In fact, expr can do a lot of string manipulations in addition to length and match. You can pull a substring from a string using substr. It is often handy to use index to find a particular character in the string first. The expr program uses 1 as the first character of the string. So, for example:


#/bin/bash
echo -n "Full path? "
read FFN
LAST_SLASH=0
SLASH=$( expr index "$FFN" / ) # find first slash
while (( $SLASH != 0 ))
do
   let LAST_SLASH=$LAST_SLASH+$SLASH  # point at next slash
   SLASH=$(expr index "${FFN:$LAST_SLASH}" / )  # look for another
done
# now LAST_SLASH points to last slash
echo -n "Directory: "
expr substr "$FFN" 1 $LAST_SLASH
echo -or-
echo ${FFN:0:$LAST_SLASH}
# Yes, I know about dirname but this is an example

Enter a full path (like /foo/bar/hackaday) and the script will find the last slash and print the name up to and including the last slash using two different methods. This script makes use of expr but also uses the syntax for bash‘s built in substring extraction which starts at index zero. For example, if the variable FOO contains “Hackaday”:

  • ${FOO} -> Hackaday
  • ${FOO:1} -> ackaday
  • ${FOO:5:3} -> day

The first number is an offset and the second is a length if it is positive. You can also make either of the numbers negative, although you need a space after the colon if the offset is negative. The last character of the string is at index -1, for example. A negative length is shorthand for an absolute position from the end of the string. So:

  • ${FOO: -3} -> day
  • ${FOO:1:-4} -> ack
  • ${FOO: -8:-4} -> Hack

Of course, either or both numbers could be variables, as you can see in the example.

Less is More

Sometimes you don’t want to find something, you just want to get rid of it. bash has lots of ways to remove substrings using fixed strings or glob-based pattern matching. There are four variations. One pair of deletions remove the longest and shortest possible substrings from the front of the string and the other pair does the same thing from the back of the string. Consider this:


TSTR=my.first.file.txt
echo ${TSTR%.*} # prints my.first.file
echo ${TSTR%%.*}  # prints my
echo ${TSTR#*fi}  # prints rst.file.txt
echo $TSTR##*fi} # prints le.txt

Transformation

Of course, sometimes you don’t want to delete, as much as you want to replace some string with another string. You can use a single slash to replace the first instance of a search string or two slashes to replace globally. You can also fail to provide a replacement string and you’ll get another way to delete parts of strings. One other trick is to add a # or % to anchor the match to the start or end of the string, just like with a deletion.


TSTR=my.first.file.txt
echo ${TSTR/fi/Fi}   # my.First.file.txt
echo ${TSTR//fi/Fi}  # my.First.File.txt
echo ${TSTR/#*./PREFIX-} # PREFIX-txt  (note: always longest match)
echo ${TSTR/%.*/.backup}  # my.backup (note: always longest match)

Miscellaneous

Some of the more common ways to manipulate strings in bash have to do with dealing with parameters. Suppose you have a script that expects a variable called OTERM to be set but you want to be sure:


REALTERM=${OTERM:-vt100}

Now REALTERM will have the value of OTERM or the string “vt100” if there was nothing in OTERM. Sometimes you want to set OTERM itself so while you could assign to OTERM instead of REALTERM, there is an easier way. Use := instead of the :- sequence. If you do that, you don’t necessarily need an assignment at all, although you can use one if you like:


echo ${OTERM:=vt100}  # now OTERM is vt100 if it was empty before

You can also reverse the sense so that you replace the value only if the main value is not empty, although that’s not as generally useful:


echo ${DEBUG:+"Debug mode is ON"}  # reverse -; no assignment

A more drastic measure lets you print an error message to stderr and abort a non-interactive shell:


REALTERM=${OTERM:?"Error. Please set OTERM before calling this script"}

Just in Case

Converting things to upper or lower case is fairly simple. You can provide a glob pattern that matches a single character. If you omit it, it is the same as ?, which matches any character. You can elect to change all the matching characters or just attempt to match the first character. Here are the obligatory examples:


NAME="joe Hackaday"

echo ${NAME^} # prints Joe Hackaday (first match of any character)
echo ${NAME^^} # prints JOE HACKADAY (all of any character)
echo ${NAME^^[a]} # prints joe HAckAdAy (all a characters)
echo ${NAME,,] # prints joe hackaday (all characters)
echo ${NAME,] # prints joe Hackaday (first character matched and didn't convert)
NAME="Joe Hackaday"
echo ${NAME,,[A-H]} # prints Joe hackaday (apply pattern to all characters and convert A-H to lowercase)

Recent versions of bash can also convert upper and lower case using ${VAR@U} and ${VAR@L} along with just the first character using @u and @l, but your mileage may vary.

Pass the Test

You probably realize that when you do a standard test, that actually calls a program:


if [ $f -eq 0 ]
then ...

If you do an ls on /usr/bin, you’ll see an executable actually named “[” used as a shorthand for the test program. However, bash has its own test in the form of two brackets:


if [[ $f == 0 ]
then ...

That test built-in can handle regular expressions using =~ so that’s another option for matching strings:


if [[ "$NAME" =~ [hH]a.k ]] ...

Choose Wisely

Of course, if you are doing a slew of text processing, maybe you don’t need to be using bash. Even if you are, don’t forget you can always leverage other programs like tr, awk, sed, and many others to do things like this. Sure, performance won’t be as good — probably — but if you are worried about performance why are you writing a script?

Unless you just swear off scripting altogether, it is nice to have some of these tricks in your back pocket. Use them wisely.