NAME

sort - sort or sequence check text files


SYNOPSIS

sort [-bdfinruD] [-t str|-: regexp] [+pos1 [-pos2]] [-k pos1[,pos2]] [file ...]


DESCRIPTION

The sort program sorts the lines of one or more text files. Comparisons are based on one or more sort keys extracted from each line of input. If no sort keys are explicitly specified, the entire lines are used. By default, the comparison is made lexicographically, using the ordering specified by the current locale (if any).


Options

The following global options control the operation of sort:

-u

Output only a single line for each set of lines having equal keys. (``unique'' output)

-t STRING

Set the field separator to STRING. The specified field separator is not included in the fields themselves.

The space between the -t specifier and STRING is optional if and only if STRING consists of a single character.

-: REGEXP

Set the field separator to REGEXP, which should be a Perl regular expression. Occurrences of / (forward slash) in REGEXP must be quoted. The string matched by REGEXP is normally not included in the fields themselves, but:

-D

Enables debugging output. The behavior of this option is subject to change.

The following options override the default ordering rules, and may also be attached to specific keys (see +pos1... and -k). When they appear independent of key field specifications, the requested field ordering rules are applied globally to all sort keys. When attached to a specific key, the ordering options override all global ordering options for that key.

-b

Ignore leading whitespace when determining the start and the end of each input field.

-d

Ignore everything except letters, digits and whitespace characters. (``dictionary'' order)

-f

Fold upper case letters to lower case.

-i

Ignore non-printable characters. (Requires the POSIX module to be present in the Perl installation.)

-n

Compare fields numerically. (An initial numeric string, consisting of optional blanks, optional minus sign, and zero or more digits with optional decimal point, is sorted by arithmetic value.)

In some versions of sort, -n implies -b (which matters if you use character position offsets). With this version, if you want -b, use -b.

-r

Reverse the sense of comparisons (and therefore the order of sorting).

Finally, there are two, mutually confusing, ways of specifying sort keys:

+POS1 [-POS2]

Specifies the starting position, POS1, and optionally the ending position, POS2, of a sort key. POS2 denotes the first position not to be included in the sort key. A missing POS2 argument indicates that the key should include all fields until the end of the line.

Each of POS1 and POS2 is of the form M[.N], followed by zero or more of the option letters b, d, f, i, n and r. M is a non-negative integer specifying the field. If present, N is a non-negative integer specifying the character offset into the Mth field. Both M and N are counted from 0; thus, 1.2 specifies the third character of the second field. If .N is omitted, the position refers to the start of the field, so 2 is equivalent to 2.0.

The option letters, if present, specify options to be used for the current sort key; if no letters are specified, the global sort options are used.

The -POS2 argument must immediately follow the +POS1 argument. Things like sort +1 -n -2 will produce an error.

-k POS1[,POS2]

Specifies the starting position, POS1, and optionally the ending position, POS2, of a sort key. POS2 denotes the last position to be included in the sort key. A missing POS2 argument indicates that the key should include all fields until the end of the line.

Each of POS1 and POS2 is of the form M[.N], followed by zero or more of the option letters b, d, f, i, n and r. M is a non-negative integer specifying the field. If present, N is a non-negative integer specifying the character offset into the Mth field. Both M and N are counted from 1; thus, 1.2 specifies the second character of the first field. If the character offset of POS1 is omitted, the position refers to the start of the field, so -k 2,... is equivalent to -k 2.1,... . If the character offset of POS2 is omitted, the position refers to the end of the field.

As a special case, if the character offset of POS2 is zero, it is taken to refer to the end of the specified field, just as if it was omitted. Thus, -k ...,2 is equivalent to -k ...,2.0 .

The option letters, if present, specify options to be used for the current sort key; if no letters are specified, the global sort options are used.


EXAMPLES

        sort +1 -2
        sort -k 2,2
Either example sorts lexicographically by the second field of each line.

        sort +1 -2 +3 -5
        sort -k 2,2 -k 4,5
        sort +1 -2 -k 4,5
Sorts lexicographically by the second, fourth and fifth field of each
line.  (More verbosely, to compare two lines, we first compare their
second field.  If the two second fields are lexicographically equal,
we compare the fourth field.  If the fourth fields are equal, compare
the fifth field.  If the fifth fields are also equal, the lines are
considered equal.)

        sort -n +2 +0b -1
        sort -n -k 3 -k 1b,1
Sorts numerically by the fields of each line starting from the third
(ie, 3rd, 4th, 5th, ...).  If two lines compare as equal, compare the
first field lexicographically, ignoring any leading whitespace.

        sort -n +2.1 -2.4
        sort -n -k 3.2,3.4
Numerically compares the second through fourth characters of the third
field.

        sort.pl -t: +2n /etc/passwd
Splits the lines of the file C</etc/passwd> into colon-separated
fields, and sorts numerically on fields starting from the third field.

        sort -: '(\d+)' +3n -4 +2 -3 +1n -2
Separates the input into fields consisting alternately of either all
non-digits or all digits.  (The regular expression instructs C<sort>
to use fields separated by fields of digits.  Therefore, the first
field will be non-digits, but may be empty.)  Sort numerically by
the fourth field (second numeric field), lexicographically by the
third field (second non-numeric field), and numerically by the second
field (first numeric field).


ENVIRONMENT

These environment variables affect the execution of sort:

LC_COLLATE

Determine the locale for ordering rules.

LC_CTYPE

Determine the locale for the interpretation of sequences of bytes of text data as characters (for example, single- versus multi-byte characters in arguments and input files) and the behaviour of character classification for the -b, -d, -f, -i and -n options.

See locale for more information about localization and Perl.


BUGS

No bugs in sort are currently known.


AUTHOR

Albert Dvornik, <bert@mit.edu>


COPYRIGHT and LICENSE

This program is copyright (c) Albert Dvornik 1999.

This program is free and open software. You may use, modify, distribute, and sell this program (and any modified variants) in any way you wish, provided you do not restrict others from doing the same.