a complicated 'wc' script for mu + notes (Repost...)

From: Alfie Costa (agcosta@gis.net)
Date: Fri Mar 10 2000 - 19:15:12 CET

Here's another 'wc' script. It's based on the old mu-awk one.

Advantages over the old 'wc': 1) It understands the various 'wc' command line
switches, in most any order. The output looks like GNU wc, with totals and
hyphens. There's four columns, from left to right: lines, words, chars,
filename. Lines, words and chars are output in the same columns every time,
which is hopefully more intuitively obvious than letting them slide leftward.
For programs calling 'wc' for output, it shouldn't matter as the columns are
only whitespace.

2) Takes standard IO or file lists, or both. Standard IO can also be called
with a hyphen.

3) Better character counts. Attached to this message is 'nullfile.gz', which
is 170 bytes compressed, 128K of nulls uncompressed. The old mu-awk 'wc'
doesn't understand that kind of file.

Disadvantages: 1) Bigger, can't be helped, more features. Chopping out the
comments or abbreviating the variable names could reduce it some.

2) Uglier, it's another ash script, and has too many if-thens. There are weird
kludges. Any hints? A getopts command would probably help. (getopts might
make an interesting script.)

3) If you only want to count words and lines, it's as fast as the old one: it's
the same code. If you want to count chars from standard IO, it's slower
because it writes a temp file, and does an 'ls -o', and gets the temp file size
which is the same thing as counting the chars. Is there a better way? (In
ash, that is.) For named files, it's not too bad, as it doesn't need to make
any temp files. It still runs 'ls' once per file, which might be improved.

4) Because the line count and word count is the same ash code: On a big binary
file, awk may say it has only 1 word and 1 line; this seems unlikely. Haven't
tested this.

1) GNU wc gives an error message if you ask it to count a directory. When my
script sees a directory, it simply ignores it.

2) The command line options can be in any order. Example:

'ls -l | wc foo.txt t* -l - "/mnt/c/win/name with spaces.exe" '

This will count lines (-l), of foo.txt, t*, standard IO, (which would be the
output of 'ls -l'), and finally a vfat (or Win95) filename with spaces. Then
it gives a total.

The Code:

Various kludges, tricks, or whatever may be worth mentioning...

There's some functions which are supposed to make things more compact and
easier to read. Not sure if they really do.

The command line parsing is complicated. There's two 'for' loops, the first
one looks for switches with hyphens, and sets variables. It checks each and
every argument on the command line. This loop also checks for one error, as
well as for the help switch. After the first loop, the script checks if any of
the flags are set, if they're not, it sets three of them. The second loop uses
the 'case' statement to look for anything that's more than one character and
begins with a hyphen. The switches all get dumped, everything else is assumed
to be a filename and is kept. There are quotes within quotes to preserve vfat
filenames, which can have spaces in them.

Here's the quotes voodoo:

Z*) opts="$opts"" "\""$b"\" ;;

This is what decodes it:

eval set dummyoption $opts

I couldn't get it to work without the 'eval', but there may be a way. The
'dummyoption' and 'shift' is a kludge. It may happen that the first option is
a hyphen or '-', which has a special meaning to 'set', but only when the hyphen
is the first thing 'set' sees on the command line. The 'dummyoption' nullifies
that special meaning.

Here's another odd bit:

if [ -z "$checkC$checkW$checkL" ]

This checks if all three variables are nulls. It's shorter than having 3 sets
of '-z's and '||'s.

This line is interesting:

It's for when there's standard IO input. If there's a hyphen on the command
line, a "-" should show up in the filename column. If there's no hyphen, then
the filename column should be blank. $NoFiles is set earlier to a space if
there's no hyphen (and no files), otherwise it's a null. The above line is the
same thing as:

if [ -z "$NoFiles" ] # if NoFiles is a null
then filename="-"
else filename=" "

...only it's one line long, but harder to read of course.


Impossible! Maybe!


# rustique wc (3/8/00 by A. Costa)
# writes a temp file to count stdIO chars, uses awk and ls...
# (NB: Currently formatted to 4 spaces per tab.)

# Functions

echo "Usage: wc [-clw | -a] [filename]"

CleanUp() # get rid of temp files if necessary
[ -w "$stdIOfile" ] && rm $stdIOfile 2>/dev/null

exit 2

ShowLine() # syntax: Showline lines# words# chars# filename
echo $1:$2:$3:"$4" | awk -F: '{printf "%7s%10s%12s %s\n", $1, $2, $3, $4}'

if [ $hyphen ] # Chastise user?...
    echo "error: only one stdIO hyphen allowed." >& 2

#Parse options...

for b in "$@" # Pass 1, get options, wherever they are...
    case "Z$b" in
        Z-) CheckHyphen;;
        Z-d) set -x;; # debug mode
        Z-a) checkC=0 checkL=0 checkW=0;;
        Z-c) checkC=0;;
        Z-w) checkW=0;;
        Z-l) checkL=0;;
        Z-cw|Z-wc) checkC=0 checkW=0;;
        Z-cl|Z-lc) checkC=0 checkL=0;;
        Z-lw|Z-wl) checkL=0 checkW=0;;
        Z-h|Z-?*) Help ;;
        Z*) ;;

if [ -z "$checkC$checkW$checkL" ] # no options?
    checkC=0 checkL=0 checkW=0 # the default

for b in "$@" # Pass 2, remove all options from command line...
    case "Z$b" in
        Z-?*) ;;
        Z*) opts="$opts"" "\""$b"\" ;; # for vfat filenames with spaces
                                # 'eval' is needed to parse $opts
eval set dummyoption $opts # new commandline has no switches...
shift # remove dummyoption

if [ "$1" = "" ] # no filenames? Use standard I/O
    NoFiles=" " # the filename of no file
    set dummyoption - ; shift

for f in "$@"
    if [ "$f" = "-" ] # stdIO?
        trap 'CleanUp' 1 2 3 15
        if cat > $stdIOfile
        then # all's well
        filename=${NoFiles:-"-"} # display a hyphen or not?
        if [ ! -r "$f" ] # is the file readable?
            echo "error: can't read \"$f\" " >& 2
            exit 2
        else # skip any directories...
            [ -d "$f" ] && continue

    # get how many chars it is..
    if [ "$checkC" = "0" ]
        c=`ls -o "$f" | awk '{ print $4 }'`
        cSum=`expr "$cSum" + $c`

    # check words and lines.
    # this first "if-then" is a wrapper, so awk is only called once per file.
    if [ "$checkW$checkL" -eq 0 ]
        TmpWL=`cat "$f" | awk 'BEGIN { w=0 } { w+=NF } END { print NR, w }'`

        if [ "$checkW" = "0" ]
            w=`echo $TmpWL | awk '{ print $2 }'`
            wSum=`expr "$wSum" + $w`

        if [ "$checkL" = "0" ]
            lines=`echo $TmpWL | awk '{ print $1 }'`
            linesSum=`expr "$linesSum" + $lines`
    ShowLine "$lines" "$w" "$c" "$filename"
    n=`expr "$n" + 1`

[ "$n" -gt "1" ] && ShowLine "$linesSum" "$wSum" "$cSum" total

Content-type: text/plain; charset=US-ASCII
Content-disposition: inline
Content-description: Attachment information.

The following section of this message contains a file attachment
prepared for transmission using the Internet MIME message format.
If you are using Pegasus Mail, or any another MIME-compliant system,
you should be able to save it or view it from within your mailer.
If you cannot, please ask your system administrator for assistance.

   ---- File information -----------
     File: Nullfile.gz
     Date: 5 Mar 2000, 12:41
     Size: 170 bytes.
     Type: Unknown

To unsubscribe, e-mail: mulinux-unsubscribe@sunsite.auc.dk
For additional commands, e-mail: mulinux-help@sunsite.auc.dk

This archive was generated by hypermail 2.1.6 : Sat Feb 08 2003 - 15:27:13 CET