rsc | cfa37a7 | 2004-04-10 18:53:55 +0000 | [diff] [blame] | 1 | .TH SORT 1 |
| 2 | .SH NAME |
| 3 | sort \- sort and/or merge files |
| 4 | .SH SYNOPSIS |
| 5 | .B sort |
| 6 | [ |
| 7 | .BI -cmuMbdf\&inrwt x |
| 8 | ] |
| 9 | [ |
| 10 | .BI + pos1 |
| 11 | [ |
| 12 | .BI - pos2 |
| 13 | ] ... |
| 14 | ] ... |
| 15 | [ |
| 16 | .B -k |
| 17 | .I pos1 |
| 18 | [ |
| 19 | .I ,pos2 |
| 20 | ] |
| 21 | ] ... |
| 22 | [ |
| 23 | .B -o |
| 24 | .I output |
| 25 | ] |
| 26 | [ |
| 27 | .B -T |
| 28 | .I dir |
| 29 | \&... |
| 30 | ] |
| 31 | [ |
| 32 | .I option |
| 33 | \&... |
| 34 | ] |
| 35 | [ |
| 36 | .I file |
| 37 | \&... |
| 38 | ] |
| 39 | .SH DESCRIPTION |
| 40 | .I Sort\^ |
| 41 | sorts |
| 42 | lines of all the |
| 43 | .I files |
| 44 | together and writes the result on |
| 45 | the standard output. |
| 46 | If no input files are named, the standard input is sorted. |
| 47 | .PP |
| 48 | The default sort key is an entire line. |
| 49 | Default ordering is |
| 50 | lexicographic by runes. |
| 51 | The ordering is affected globally by the following options, |
| 52 | one or more of which may appear. |
| 53 | .TP |
| 54 | .B -M |
| 55 | Compare as months. |
| 56 | The first three |
| 57 | non-white space characters |
| 58 | of the field |
| 59 | are folded |
| 60 | to upper case |
| 61 | and compared |
| 62 | so that |
| 63 | .L JAN |
| 64 | precedes |
| 65 | .LR FEB , |
| 66 | etc. |
| 67 | Invalid fields |
| 68 | compare low to |
| 69 | .LR JAN . |
| 70 | .TP |
| 71 | .B -b |
| 72 | Ignore leading white space (spaces and tabs) in field comparisons. |
| 73 | .TP |
| 74 | .B -d |
| 75 | `Phone directory' order: |
| 76 | only letters, |
| 77 | accented letters, |
| 78 | digits and white space |
| 79 | are significant in comparisons. |
| 80 | .TP |
| 81 | .B -f |
| 82 | Fold lower case |
| 83 | letters onto upper case. |
| 84 | Accented characters are folded to their |
| 85 | non-accented upper case form. |
| 86 | .TP |
| 87 | .B -i |
| 88 | Ignore characters outside the |
| 89 | .SM ASCII |
| 90 | range 040-0176 |
| 91 | in non-numeric comparisons. |
| 92 | .TP |
| 93 | .B -w |
| 94 | Like |
| 95 | .BR -i , |
| 96 | but ignore only tabs and spaces. |
| 97 | .TP |
| 98 | .B -n |
| 99 | An initial numeric string, |
| 100 | consisting of optional white space, |
| 101 | optional plus or minus sign, |
| 102 | and zero or more digits with optional decimal point, |
| 103 | is sorted by arithmetic value. |
| 104 | .TP |
| 105 | .B -g |
| 106 | Numbers, like |
| 107 | .B -n |
| 108 | but with optional |
| 109 | .BR e -style |
| 110 | exponents, are sorted by value. |
| 111 | .TP |
| 112 | .B -r |
| 113 | Reverse the sense of comparisons. |
| 114 | .TP |
| 115 | .BI -t x\^ |
| 116 | `Tab character' separating fields is |
| 117 | .IR x . |
| 118 | .PP |
| 119 | The notation |
| 120 | .BI + "pos1\| " - pos2\^ |
| 121 | restricts a sort key to a field beginning at |
| 122 | .I pos1\^ |
| 123 | and ending just before |
| 124 | .IR pos2 . |
| 125 | .I Pos1\^ |
| 126 | and |
| 127 | .I pos2\^ |
| 128 | each have the form |
| 129 | .IB m . n\f1, |
| 130 | optionally followed by one or more of the flags |
| 131 | .BR Mbdfginr , |
| 132 | where |
| 133 | .I m\^ |
| 134 | tells a number of fields to skip from the beginning of the line and |
| 135 | .I n\^ |
| 136 | tells a number of characters to skip further. |
| 137 | If any flags are present they override all the global |
| 138 | ordering options for this key. |
| 139 | A missing |
| 140 | .BI \&. n\^ |
| 141 | means |
| 142 | .BR \&.0 ; |
| 143 | a missing |
| 144 | .BI - pos2\^ |
| 145 | means the end of the line. |
| 146 | Under the |
| 147 | .BI -t x\^ |
| 148 | option, fields are strings separated by |
| 149 | .IR x ; |
| 150 | otherwise fields are |
| 151 | non-empty strings separated by white space. |
| 152 | White space before a field |
| 153 | is part of the field, except under option |
| 154 | .BR -b . |
| 155 | A |
| 156 | .B b |
| 157 | flag may be attached independently to |
| 158 | .IR pos1 |
| 159 | and |
| 160 | .IR pos2. |
| 161 | .PP |
| 162 | The notation |
| 163 | .B -k |
| 164 | .IR pos1 [, pos2 ] |
| 165 | is how POSIX |
| 166 | .I sort |
| 167 | defines fields: |
| 168 | .I pos1 |
| 169 | and |
| 170 | .I pos2 |
| 171 | have the same format but different meanings. |
| 172 | The value of |
| 173 | .I m\^ |
| 174 | is origin 1 instead of origin 0 |
| 175 | and a missing |
| 176 | .BI \&. n\^ |
| 177 | in |
| 178 | .I pos2 |
| 179 | is the end of the field. |
| 180 | .PP |
| 181 | When there are multiple sort keys, later keys |
| 182 | are compared only after all earlier keys |
| 183 | compare equal. |
| 184 | Lines that otherwise compare equal are ordered |
| 185 | with all bytes significant. |
| 186 | .PP |
| 187 | These option arguments are also understood: |
| 188 | .TP \w'\fL-z\fIrecsize\fLXX'u |
| 189 | .B -c |
| 190 | Check that the single input file is sorted according to the ordering rules; |
| 191 | give no output unless the file is out of sort. |
| 192 | .TP |
| 193 | .B -m |
| 194 | Merge; assume the input files are already sorted. |
| 195 | .TP |
| 196 | .B -u |
| 197 | Suppress all but one in each |
| 198 | set of equal lines. |
| 199 | Ignored bytes |
| 200 | and bytes outside keys |
| 201 | do not participate in |
| 202 | this comparison. |
| 203 | .TP |
| 204 | .B -o |
| 205 | The next argument is the name of an output file |
| 206 | to use instead of the standard output. |
| 207 | This file may be the same as one of the inputs. |
| 208 | .TP |
| 209 | .BI -T dir |
| 210 | Put temporary files in |
| 211 | .I dir |
| 212 | rather than in |
| 213 | .BR /tmp . |
| 214 | .ne 4 |
| 215 | .SH EXAMPLES |
| 216 | .TP |
| 217 | .L sort -u +0f +0 list |
| 218 | Print in alphabetical order all the unique spellings |
| 219 | in a list of words |
| 220 | where capitalized words differ from uncapitalized. |
| 221 | .TP |
| 222 | .L sort -t: +1 /adm/users |
| 223 | Print the users file |
| 224 | sorted by user name |
| 225 | (the second colon-separated field). |
| 226 | .TP |
| 227 | .L sort -umM dates |
| 228 | Print the first instance of each month in an already sorted file. |
| 229 | Options |
| 230 | .B -um |
| 231 | with just one input file make the choice of a |
| 232 | unique representative from a set of equal lines predictable. |
| 233 | .TP |
| 234 | .L |
| 235 | grep -n '^' input | sort -t: +1f +0n | sed 's/[0-9]*://' |
| 236 | A stable sort: input lines that compare equal will |
| 237 | come out in their original order. |
| 238 | .SH FILES |
| 239 | .BI /tmp/sort. <pid>.<ordinal> |
| 240 | .SH SOURCE |
rsc | c3674de | 2005-01-11 17:37:33 +0000 | [diff] [blame^] | 241 | .B \*9/src/cmd/sort.c |
rsc | cfa37a7 | 2004-04-10 18:53:55 +0000 | [diff] [blame] | 242 | .SH SEE ALSO |
| 243 | .IR uniq (1), |
| 244 | .IR look (1) |
| 245 | .SH DIAGNOSTICS |
| 246 | .I Sort |
| 247 | comments and exits with non-null status for various trouble |
| 248 | conditions and for disorder discovered under option |
| 249 | .BR -c . |
| 250 | .SH BUGS |
| 251 | An external null character can be confused |
| 252 | with an internally generated end-of-field character. |
| 253 | The result can make a sub-field not sort |
| 254 | less than a longer field. |
| 255 | .PP |
| 256 | Some of the options, e.g. |
| 257 | .B -i |
| 258 | and |
| 259 | .BR -M , |
| 260 | are hopelessly provincial. |