| <head> |
| <title>utf(7) - Plan 9 from User Space</title> |
| <meta content="text/html; charset=utf-8" http-equiv=Content-Type> |
| </head> |
| <body bgcolor=#ffffff> |
| <table border=0 cellpadding=0 cellspacing=0 width=100%> |
| <tr height=10><td> |
| <tr><td width=20><td> |
| <tr><td width=20><td><b>UTF(7)</b><td align=right><b>UTF(7)</b> |
| <tr><td width=20><td colspan=2> |
| <br> |
| <p><font size=+1><b>NAME </b></font><br> |
| |
| <table border=0 cellpadding=0 cellspacing=0><tr height=2><td><tr><td width=20><td> |
| |
| UTF, Unicode, ASCII, rune – character set and format<br> |
| |
| </table> |
| <p><font size=+1><b>DESCRIPTION </b></font><br> |
| |
| <table border=0 cellpadding=0 cellspacing=0><tr height=2><td><tr><td width=20><td> |
| |
| The Plan 9 character set and representation are based on the Unicode |
| Standard and on the ISO multibyte UTF-8 encoding (Universal Character |
| Set Transformation Format, 8 bits wide). The Unicode Standard |
| represents its characters in 16 bits; UTF-8 represents such values |
| in an 8-bit byte stream. Throughout this |
| manual, UTF-8 is shortened to UTF. |
| <table border=0 cellpadding=0 cellspacing=0><tr height=5><td></table> |
| |
| In Plan 9, a <i>rune</i> is a 16-bit quantity representing a Unicode |
| character. Internally, programs may store characters as runes. |
| However, any external manifestation of textual information, in |
| files or at the interface between programs, uses a machine-independent, |
| byte-stream encoding called UTF. |
| <table border=0 cellpadding=0 cellspacing=0><tr height=5><td></table> |
| |
| UTF is designed so the 7-bit ASCII set (values hexadecimal 00 |
| to 7F), appear only as themselves in the encoding. Runes with |
| values above 7F appear as sequences of two or more bytes with |
| values only from 80 to FF. |
| <table border=0 cellpadding=0 cellspacing=0><tr height=5><td></table> |
| |
| The UTF encoding of the Unicode Standard is backward compatible |
| with ASCII: programs presented only with ASCII work on Plan 9 |
| even if not written to deal with UTF, as do programs that deal |
| with uninterpreted byte streams. However, programs that perform |
| semantic processing on ASCII graphic characters must convert |
| from UTF to runes in order to work properly with non-ASCII input. |
| See <a href="../man3/rune.html"><i>rune</i>(3)</a>. |
| <table border=0 cellpadding=0 cellspacing=0><tr height=5><td></table> |
| |
| Letting numbers be binary, a rune x is converted to a multibyte |
| UTF sequence as follows: |
| <table border=0 cellpadding=0 cellspacing=0><tr height=5><td></table> |
| |
| 01. x in [00000000.0bbbbbbb] → 0bbbbbbb<br> |
| 10. x in [00000bbb.bbbbbbbb] → 110bbbbb, 10bbbbbb<br> |
| 11. x in [bbbbbbbb.bbbbbbbb] → 1110bbbb, 10bbbbbb, 10bbbbbb<br> |
| |
| <table border=0 cellpadding=0 cellspacing=0><tr height=5><td></table> |
| |
| Conversion 01 provides a one-byte sequence that spans the ASCII |
| character set in a compatible way. Conversions 10 and 11 represent |
| higher-valued characters as sequences of two or three bytes with |
| the high bit set. Plan 9 does not support the 4, 5, and 6 byte |
| sequences proposed by X-Open. When there are |
| multiple ways to encode a value, for example rune 0, the shortest |
| encoding is used. |
| <table border=0 cellpadding=0 cellspacing=0><tr height=5><td></table> |
| |
| In the inverse mapping, any sequence except those described above |
| is incorrect and is converted to rune hexadecimal 0080.<br> |
| |
| </table> |
| <p><font size=+1><b>SEE ALSO </b></font><br> |
| |
| <table border=0 cellpadding=0 cellspacing=0><tr height=2><td><tr><td width=20><td> |
| |
| <a href="../man1/ascii.html"><i>ascii</i>(1)</a>, <a href="../man1/tcs.html"><i>tcs</i>(1)</a>, <a href="../man3/rune.html"><i>rune</i>(3)</a>, <i>The Unicode Standard</i>.<br> |
| |
| </table> |
| |
| <td width=20> |
| <tr height=20><td> |
| </table> |
| <!-- TRAILER --> |
| <table border=0 cellpadding=0 cellspacing=0 width=100%> |
| <tr height=15><td width=10><td><td width=10> |
| <tr><td><td> |
| <center> |
| <a href="../../"><img src="../../dist/spaceglenda100.png" alt="Space Glenda" border=1></a> |
| </center> |
| </table> |
| <!-- TRAILER --> |
| </body></html> |