Long arguments and getops

I recently had a need to adapt a script that recrawls a site with nutch. One of my design goals was to use the same command line options as the Fetchtool (one of the steps I had to take to recrawl a site).

It became apparent fairly quickly that bash’s built-in ‘getopts’ didn’t support long command line arguments, so I had to fall back on getopt.

Here is the portion of the script that parses the command line arguments:

set -- `getopt -n$0 -u -a --longoptions="depth: adddays: topN:" "h" "$@"` || usage
[ $# -eq 0 ] && usage

while [ $# -gt 0 ]
do
    case "$1" in
       --depth)   depth=$2;shift;;
       --adddays) adddays=$2;shift;;
       --topN)    topN=$2;shift;;
       -h)        usage;;
       --)        shift;break;;
       -*)        usage;;
       *)         break;;            #better be the crawl directory
    esac
    shift
done

Deconstructing this bit by bit:

  set -- `getopt -n$0 -u -a --longoptions="depth: adddays: topN:" "h" "$@"` || usage
[ $# -eq 0 ] && usage
  

’set –’ unsets the existing postional parameters and sets them to the result of getopt.
The call to getopts works like this:

  • -n$0, sets the nicename to the name of the script (so warnings come back nicely from getopts)
  • -a, allows long arguments to start with a singe ‘-’ (they ususally have two (’–’)
  • –longoptions=”depth: adddays: topN:”, sets the format of the long options. In this case I have 3 (depth, adddays, and topN). The
    trailing colon indicates I am expecting an additional argument.

  • “h”, the short options (-h)
  • “$@”, the arguments passed into the script

The ‘||’ at the end and the second line will call my usage statement if an error comes back from getopt (a non-0 return code). The next line make sure we get at least one argument back.

To help understand what goes on next, lets run that command at the shell:

   $ getopt -nrecrawl.sh -u -a --longoptions="depth: adddays: topN:" "h" -depth 5 -adddays 10 -topN 3 -h -x
recrawl.sh: unrecognized option `-x'
 --depth 5 --adddays 10 --topN 3 -h --
  

A few items of note. The first is the warning message we get because ‘x’ is an unknown option (notice it is prefaced by what we supplied to the -n argument). The second is the result of the getopt operation on my command line parameters.

Now to interpret the results:

  while [ $# -gt 0 ]
do
    case "$1" in
       --depth)   depth=$2;shift;;
...
       -h)        usage;;
       --)        shift;break;;
       -*)        usage;;
       *)         break;;            #better be the crawl directory
    esac
    shift
done
  

The while loop loops through each of the arguments return from getopt. If an argument requires an additional value, I use $2 to snag that value and assign it to a variable. When we are done with an argument, we shift passed it and move on to the next one. A few special cases exist:

  • -*) matches any unkown option and prints the usage statement
  • *) matchs any other argument (in this case it is our required directory)
  • –) is the end marker from getopt

The script then goes on to verify the parameters (like the directory exists) and does the crawl….but that is for another day.

del.icio.us:Long arguments and getops digg:Long arguments and getops spurl:Long arguments and getops wists:Long arguments and getops simpy:Long arguments and getops newsvine:Long arguments and getops blinklist:Long arguments and getops furl:Long arguments and getops reddit:Long arguments and getops fark:Long arguments and getops blogmarks:Long arguments and getops Y!:Long arguments and getops smarking:Long arguments and getops magnolia:Long arguments and getops segnalo:Long arguments and getops gifttagging:Long arguments and getops

Leave a Reply