sort - sort or sequence check text files
sort [-bdfinruD] [-t str|-: regexp] [+pos1 [-pos2]] [-k pos1[,pos2]] [file ...]
The sort
program sorts the lines of one or more text files. Comparisons are based on
one or more sort keys extracted from each line of input. If no sort keys
are explicitly specified, the entire lines are used. By default, the
comparison is made lexicographically, using the ordering specified by the
current locale (if any).
The following global options control the operation of sort
:
Output only a single line for each set of lines having equal keys. (``unique'' output)
Set the field separator to STRING. The specified field separator is not included in the fields themselves.
The space between the -t specifier and STRING is optional if and only if STRING consists of a single character.
Set the field separator to REGEXP, which should be a Perl regular expression. Occurrences of / (forward slash) in REGEXP must be quoted. The string matched by REGEXP is normally not included in the fields themselves, but:
If REGEXP contains parenthesized subexpressions, the data matched by those subexpressions will be treated as additional fields. (See the EXAMPLES section.)
If REGEXP uses lookbehind or lookahead (see perlre), the matched data is left as a part of the field preceding or following the match, respectively.
Enables debugging output. The behavior of this option is subject to change.
The following options override the default ordering rules, and may also be attached to specific keys (see +pos1... and -k). When they appear independent of key field specifications, the requested field ordering rules are applied globally to all sort keys. When attached to a specific key, the ordering options override all global ordering options for that key.
Ignore leading whitespace when determining the start and the end of each input field.
Ignore everything except letters, digits and whitespace characters. (``dictionary'' order)
Fold upper case letters to lower case.
Ignore non-printable characters. (Requires the POSIX module to be present in the Perl installation.)
Compare fields numerically. (An initial numeric string, consisting of optional blanks, optional minus sign, and zero or more digits with optional decimal point, is sorted by arithmetic value.)
In some versions of sort
, -n implies -b (which matters if you use character position offsets). With this version,
if you want
-b, use -b.
Reverse the sense of comparisons (and therefore the order of sorting).
Finally, there are two, mutually confusing, ways of specifying sort keys:
Specifies the starting position, POS1, and optionally the ending position, POS2, of a sort key. POS2 denotes the first position not to be included in the sort key. A missing POS2 argument indicates that the key should include all fields until the end of the line.
Each of POS1 and POS2 is of the form M[.N], followed by zero or more of the option letters b, d, f, i, n and
r. M is a non-negative integer specifying the field. If present, N is a non-negative integer specifying the character offset into the Mth field. Both M and N are counted from
0; thus, 1.2
specifies the third character of the second field. If .N is omitted, the position refers to the start of the field, so
2
is equivalent to 2.0
.
The option letters, if present, specify options to be used for the current sort key; if no letters are specified, the global sort options are used.
The -POS2 argument must immediately follow the +POS1 argument. Things like sort +1 -n -2
will produce an error.
Specifies the starting position, POS1, and optionally the ending position, POS2, of a sort key. POS2 denotes the last position to be included in the sort key. A missing POS2 argument indicates that the key should include all fields until the end of the line.
Each of POS1 and POS2 is of the form M[.N], followed by zero or more of the option letters b, d, f, i, n and
r. M is a non-negative integer specifying the field. If present, N is a non-negative integer specifying the character offset into the Mth field. Both M and N are counted from
1; thus, 1.2
specifies the second character of the first field. If the character offset
of POS1 is omitted, the position refers to the start of the field, so -k 2,...
is equivalent to -k 2.1,...
. If the character offset of POS2 is omitted, the position refers to the end of the field.
As a special case, if the character offset of POS2 is zero, it is taken to refer to the end of the specified field, just as if it was omitted. Thus, -k ...,2
is equivalent to -k ...,2.0
.
The option letters, if present, specify options to be used for the current sort key; if no letters are specified, the global sort options are used.
sort +1 -2 sort -k 2,2 Either example sorts lexicographically by the second field of each line.
sort +1 -2 +3 -5 sort -k 2,2 -k 4,5 sort +1 -2 -k 4,5 Sorts lexicographically by the second, fourth and fifth field of each line. (More verbosely, to compare two lines, we first compare their second field. If the two second fields are lexicographically equal, we compare the fourth field. If the fourth fields are equal, compare the fifth field. If the fifth fields are also equal, the lines are considered equal.)
sort -n +2 +0b -1 sort -n -k 3 -k 1b,1 Sorts numerically by the fields of each line starting from the third (ie, 3rd, 4th, 5th, ...). If two lines compare as equal, compare the first field lexicographically, ignoring any leading whitespace.
sort -n +2.1 -2.4 sort -n -k 3.2,3.4 Numerically compares the second through fourth characters of the third field.
sort.pl -t: +2n /etc/passwd Splits the lines of the file C</etc/passwd> into colon-separated fields, and sorts numerically on fields starting from the third field.
sort -: '(\d+)' +3n -4 +2 -3 +1n -2 Separates the input into fields consisting alternately of either all non-digits or all digits. (The regular expression instructs C<sort> to use fields separated by fields of digits. Therefore, the first field will be non-digits, but may be empty.) Sort numerically by the fourth field (second numeric field), lexicographically by the third field (second non-numeric field), and numerically by the second field (first numeric field).
These environment variables affect the execution of sort
:
Determine the locale for ordering rules.
Determine the locale for the interpretation of sequences of bytes of text data as characters (for example, single- versus multi-byte characters in arguments and input files) and the behaviour of character classification for the -b, -d, -f, -i and -n options.
See locale for more information about localization and Perl.
No bugs in sort
are currently known.
Albert Dvornik, <bert@mit.edu>
This program is copyright (c) Albert Dvornik 1999.
This program is free and open software. You may use, modify, distribute, and sell this program (and any modified variants) in any way you wish, provided you do not restrict others from doing the same.