|  | .TH VENTI 7 | 
|  | .SH NAME | 
|  | venti \- archival storage server | 
|  | .SH DESCRIPTION | 
|  | Venti is a block storage server intended for archival data. | 
|  | In a Venti server, the SHA1 hash of a block's contents acts | 
|  | as the block identifier for read and write operations. | 
|  | This approach enforces a write-once policy, preventing | 
|  | accidental or malicious destruction of data.  In addition, | 
|  | duplicate copies of a block are coalesced, reducing the | 
|  | consumption of storage and simplifying the implementation | 
|  | of clients. | 
|  | .PP | 
|  | This manual page documents the basic concepts of | 
|  | block storage using Venti as well as the Venti network protocol. | 
|  | .PP | 
|  | .IR Venti (1) | 
|  | documents some simple clients. | 
|  | .IR Vac (1), | 
|  | .IR vacfs (4), | 
|  | and | 
|  | .IR vbackup (8) | 
|  | are more complex clients. | 
|  | .PP | 
|  | .IR Venti (3) | 
|  | describes a C library interface for accessing | 
|  | Venti servers and manipulating Venti data structures. | 
|  | .PP | 
|  | .IR Venti (8) | 
|  | describes the programs used to run a Venti server. | 
|  | .PP | 
|  | .SS "Scores | 
|  | The SHA1 hash that identifies a block is called its | 
|  | .IR score . | 
|  | The score of the zero-length block is called the | 
|  | .IR "zero score" . | 
|  | .PP | 
|  | Scores may have an optional | 
|  | .IB label : | 
|  | prefix, typically used to | 
|  | describe the format of the data. | 
|  | For example, | 
|  | .IR vac (1) | 
|  | uses a | 
|  | .B vac: | 
|  | prefix, while | 
|  | .IR vbackup (8) | 
|  | uses prefixes corresponding to the file system | 
|  | types: | 
|  | .BR ext2: , | 
|  | .BR ffs: , | 
|  | and so on. | 
|  | .SS "Files and Directories | 
|  | Venti accepts blocks up to 56 kilobytes in size. | 
|  | By convention, Venti clients use hash trees of blocks to | 
|  | represent arbitrary-size data | 
|  | .IR files . | 
|  | The data to be stored is split into fixed-size | 
|  | blocks and written to the server, producing a list | 
|  | of scores. | 
|  | The resulting list of scores is split into fixed-size pointer | 
|  | blocks (using only an integral number of scores per block) | 
|  | and written to the server, producing a smaller list | 
|  | of scores. | 
|  | The process continues, eventually ending with the | 
|  | score for the hash tree's top-most block. | 
|  | Each file stored this way is summarized by | 
|  | a | 
|  | .B VtEntry | 
|  | structure recording the top-most score, the depth | 
|  | of the tree, the data block size, and the pointer block size. | 
|  | One or more | 
|  | .B VtEntry | 
|  | structures can be concatenated | 
|  | and stored as a special file called a | 
|  | .IR directory . | 
|  | In this | 
|  | manner, arbitrary trees of files can be constructed | 
|  | and stored. | 
|  | .PP | 
|  | Scores passed between programs conventionally refer | 
|  | to | 
|  | .B VtRoot | 
|  | blocks, which contain descriptive information | 
|  | as well as the score of a directory block containing a small number | 
|  | of directory entries. | 
|  | .PP | 
|  | Conventionally, programs do not mix data and directory entries | 
|  | in the same file.  Instead, they keep two separate files, one with | 
|  | directory entries and one with metadata referencing those | 
|  | entries by position. | 
|  | Keeping this parallel representation is a minor annoyance | 
|  | but makes it possible for general programs like | 
|  | .I venti/copy | 
|  | (see | 
|  | .IR venti (1)) | 
|  | to traverse the block tree without knowing the specific details | 
|  | of any particular program's data. | 
|  | .SS "Block Types | 
|  | To allow programs to traverse these structures without | 
|  | needing to understand their higher-level meanings, | 
|  | Venti tags each block with a type.  The types are: | 
|  | .PP | 
|  | .nf | 
|  | .ft L | 
|  | VtDataType     000  \f1data\fL | 
|  | VtDataType+1   001  \fRscores of \fPVtDataType\fR blocks\fL | 
|  | VtDataType+2   002  \fRscores of \fPVtDataType+1\fR blocks\fL | 
|  | \fR\&...\fL | 
|  | VtDirType      010  VtEntry\fR structures\fL | 
|  | VtDirType+1    011  \fRscores of \fLVtDirType\fR blocks\fL | 
|  | VtDirType+2    012  \fRscores of \fLVtDirType+1\fR blocks\fL | 
|  | \fR\&...\fL | 
|  | VtRootType     020  VtRoot\fR structure\fL | 
|  | .fi | 
|  | .PP | 
|  | The octal numbers listed are the type numbers used | 
|  | by the commands below. | 
|  | (For historical reasons, the type numbers used on | 
|  | disk and on the wire are different from the above. | 
|  | They do not distinguish | 
|  | .BI VtDataType+ n | 
|  | blocks from | 
|  | .BI VtDirType+ n | 
|  | blocks.) | 
|  | .SS "Zero Truncation | 
|  | To avoid storing the same short data blocks padded with | 
|  | differing numbers of zeros, Venti clients working with fixed-size | 
|  | blocks conventionally | 
|  | `zero truncate' the blocks before writing them to the server. | 
|  | For example, if a 1024-byte data block contains the | 
|  | 11-byte string | 
|  | .RB ` hello " " world ' | 
|  | followed by 1013 zero bytes, | 
|  | a client would store only the 11-byte block. | 
|  | When the client later read the block from the server, | 
|  | it would append zero bytes to the end as necessary to | 
|  | reach the expected size. | 
|  | .PP | 
|  | When truncating pointer blocks | 
|  | .RB ( VtDataType+ \fIn | 
|  | and | 
|  | .BI VtDirType+ n | 
|  | blocks), | 
|  | trailing zero scores are removed | 
|  | instead of trailing zero bytes. | 
|  | .PP | 
|  | Because of the truncation convention, | 
|  | any file consisting entirely of zero bytes, | 
|  | no matter what its length, will be represented by the zero score: | 
|  | the data blocks contain all zeros and are thus truncated | 
|  | to the empty block, and the pointer blocks contain all zero scores | 
|  | and are thus also truncated to the empty block, | 
|  | and so on up the hash tree. | 
|  | .SS Network Protocol | 
|  | A Venti session begins when a | 
|  | .I client | 
|  | connects to the network address served by a Venti | 
|  | .IR server ; | 
|  | the conventional address is | 
|  | .BI tcp! server !venti | 
|  | (the | 
|  | .B venti | 
|  | port is 17034). | 
|  | Both client and server begin by sending a version | 
|  | string of the form | 
|  | .BI venti- versions - comment \en \fR. | 
|  | The | 
|  | .I versions | 
|  | field is a list of acceptable versions separated by | 
|  | colons. | 
|  | The protocol described here is version | 
|  | .BR 02 . | 
|  | The client is responsible for choosing a common | 
|  | version and sending it in the | 
|  | .B VtThello | 
|  | message, described below. | 
|  | .PP | 
|  | After the initial version exchange, the client transmits | 
|  | .I requests | 
|  | .RI ( T-messages ) | 
|  | to the server, which subsequently returns | 
|  | .I replies | 
|  | .RI ( R-messages ) | 
|  | to the client. | 
|  | The combined act of transmitting (receiving) a request | 
|  | of a particular type, and receiving (transmitting) its reply | 
|  | is called a | 
|  | .I transaction | 
|  | of that type. | 
|  | .PP | 
|  | Each message consists of a sequence of bytes. | 
|  | Two-byte fields hold unsigned integers represented | 
|  | in big-endian order (most significant byte first). | 
|  | Data items of variable lengths are represented by | 
|  | a one-byte field specifying a count, | 
|  | .IR n , | 
|  | followed by | 
|  | .I n | 
|  | bytes of data. | 
|  | Text strings are represented similarly, | 
|  | using a two-byte count with | 
|  | the text itself stored as a UTF-encoded sequence | 
|  | of Unicode characters (see | 
|  | .IR utf (7)). | 
|  | Text strings are not | 
|  | .SM NUL\c | 
|  | -terminated: | 
|  | .I n | 
|  | counts the bytes of UTF data, which include no final | 
|  | zero byte. | 
|  | The | 
|  | .SM NUL | 
|  | character is illegal in text strings in the Venti protocol. | 
|  | The maximum string length in Venti is 1024 bytes. | 
|  | .PP | 
|  | Each Venti message begins with a two-byte size field | 
|  | specifying the length in bytes of the message, | 
|  | not including the length field itself. | 
|  | The next byte is the message type, one of the constants | 
|  | in the enumeration in the include file | 
|  | .BR <venti.h> . | 
|  | The next byte is an identifying | 
|  | .IR tag , | 
|  | used to match responses to requests. | 
|  | The remaining bytes are parameters of different sizes. | 
|  | In the message descriptions, the number of bytes in a field | 
|  | is given in brackets after the field name. | 
|  | The notation | 
|  | .IR parameter [ n ] | 
|  | where | 
|  | .I n | 
|  | is not a constant represents a variable-length parameter: | 
|  | .IR n [1] | 
|  | followed by | 
|  | .I n | 
|  | bytes of data forming the | 
|  | .IR parameter . | 
|  | The notation | 
|  | .IR string [ s ] | 
|  | (using a literal | 
|  | .I s | 
|  | character) | 
|  | is shorthand for | 
|  | .IR s [2] | 
|  | followed by | 
|  | .I s | 
|  | bytes of UTF-8 text. | 
|  | The notation | 
|  | .IR parameter [] | 
|  | where | 
|  | .I parameter | 
|  | is the last field in the message represents a | 
|  | variable-length field that comprises all remaining | 
|  | bytes in the message. | 
|  | .PP | 
|  | All Venti RPC messages are prefixed with a field | 
|  | .IR size [2] | 
|  | giving the length of the message that follows | 
|  | (not including the | 
|  | .I size | 
|  | field itself). | 
|  | The message bodies are: | 
|  | .ta \w'\fLVtTgoodbye 'u | 
|  | .IP | 
|  | .ne 2v | 
|  | .B VtThello | 
|  | .IR tag [1] | 
|  | .IR version [ s ] | 
|  | .IR uid [ s ] | 
|  | .IR strength [1] | 
|  | .IR crypto [ n ] | 
|  | .IR codec [ n ] | 
|  | .br | 
|  | .B VtRhello | 
|  | .IR tag [1] | 
|  | .IR sid [ s ] | 
|  | .IR rcrypto [1] | 
|  | .IR rcodec [1] | 
|  | .IP | 
|  | .ne 2v | 
|  | .B VtTping | 
|  | .IR tag [1] | 
|  | .br | 
|  | .B VtRping | 
|  | .IR tag [1] | 
|  | .IP | 
|  | .ne 2v | 
|  | .B VtTread | 
|  | .IR tag [1] | 
|  | .IR score [20] | 
|  | .IR type [1] | 
|  | .IR pad [1] | 
|  | .IR count [2] | 
|  | .br | 
|  | .B VtRread | 
|  | .IR tag [1] | 
|  | .IR data [] | 
|  | .IP | 
|  | .ne 2v | 
|  | .B VtTwrite | 
|  | .IR tag [1] | 
|  | .IR type [1] | 
|  | .IR pad [3] | 
|  | .IR data [] | 
|  | .br | 
|  | .B VtRwrite | 
|  | .IR tag [1] | 
|  | .IR score [20] | 
|  | .IP | 
|  | .ne 2v | 
|  | .B VtTsync | 
|  | .IR tag [1] | 
|  | .br | 
|  | .B VtRsync | 
|  | .IR tag [1] | 
|  | .IP | 
|  | .ne 2v | 
|  | .B VtRerror | 
|  | .IR tag [1] | 
|  | .IR error [ s ] | 
|  | .IP | 
|  | .ne 2v | 
|  | .B VtTgoodbye | 
|  | .IR tag [1] | 
|  | .PP | 
|  | Each T-message has a one-byte | 
|  | .I tag | 
|  | field, chosen and used by the client to identify the message. | 
|  | The server will echo the request's | 
|  | .I tag | 
|  | field in the reply. | 
|  | Clients should arrange that no two outstanding | 
|  | messages have the same tag field so that responses | 
|  | can be distinguished. | 
|  | .PP | 
|  | The type of an R-message will either be one greater than | 
|  | the type of the corresponding T-message or | 
|  | .BR Rerror , | 
|  | indicating that the request failed. | 
|  | In the latter case, the | 
|  | .I error | 
|  | field contains a string describing the reason for failure. | 
|  | .PP | 
|  | Venti connections must begin with a | 
|  | .B hello | 
|  | transaction. | 
|  | The | 
|  | .B VtThello | 
|  | message contains the protocol | 
|  | .I version | 
|  | that the client has chosen to use. | 
|  | The fields | 
|  | .IR strength , | 
|  | .IR crypto , | 
|  | and | 
|  | .IR codec | 
|  | could be used to add authentication, encryption, | 
|  | and compression to the Venti session | 
|  | but are currently ignored. | 
|  | The | 
|  | .IR rcrypto , | 
|  | and | 
|  | .I rcodec | 
|  | fields in the | 
|  | .B VtRhello | 
|  | response are similarly ignored. | 
|  | The | 
|  | .IR uid | 
|  | and | 
|  | .IR sid | 
|  | fields are intended to be the identity | 
|  | of the client and server but, given the lack of | 
|  | authentication, should be treated only as advisory. | 
|  | The initial | 
|  | .B hello | 
|  | should be the only | 
|  | .B hello | 
|  | transaction during the session. | 
|  | .PP | 
|  | The | 
|  | .B ping | 
|  | message has no effect and | 
|  | is used mainly for debugging. | 
|  | Servers should respond immediately to pings. | 
|  | .PP | 
|  | The | 
|  | .B read | 
|  | message requests a block with the given | 
|  | .I score | 
|  | and | 
|  | .IR type . | 
|  | Use | 
|  | .I vttodisktype | 
|  | and | 
|  | .I vtfromdisktype | 
|  | (see | 
|  | .IR venti (3)) | 
|  | to convert a block type enumeration value | 
|  | .RB ( VtDataType , | 
|  | etc.) | 
|  | to the | 
|  | .I type | 
|  | used on disk and in the protocol. | 
|  | The | 
|  | .I count | 
|  | field specifies the maximum expected size | 
|  | of the block. | 
|  | The | 
|  | .I data | 
|  | in the reply is the block's contents. | 
|  | .PP | 
|  | The | 
|  | .B write | 
|  | message writes a new block of the given | 
|  | .I type | 
|  | with contents | 
|  | .I data | 
|  | to the server. | 
|  | The response includes the | 
|  | .I score | 
|  | to use to read the block, | 
|  | which should be the SHA1 hash of | 
|  | .IR data . | 
|  | .PP | 
|  | The Venti server may buffer written blocks in memory, | 
|  | waiting until after responding to the | 
|  | .B write | 
|  | message before writing them to | 
|  | permanent storage. | 
|  | The server will delay the response to a | 
|  | .B sync | 
|  | message until after all blocks in earlier | 
|  | .B write | 
|  | messages have been written to permanent storage. | 
|  | .PP | 
|  | The | 
|  | .B goodbye | 
|  | message ends a session.  There is no | 
|  | .BR VtRgoodbye : | 
|  | upon receiving the | 
|  | .BR VtTgoodbye | 
|  | message, the server terminates up the connection. | 
|  | .PP | 
|  | Version | 
|  | .B 04 | 
|  | of the Venti protocol is similar to version | 
|  | .B 02 | 
|  | (described above) | 
|  | but has two changes to accomodates larger payloads. | 
|  | First, it replaces the leading 2-byte packet size with | 
|  | a 4-byte size. | 
|  | Second, the | 
|  | .I count | 
|  | in the | 
|  | .B VtTread | 
|  | packet may be either 2 or 4 bytes; | 
|  | the total packet length distinguishes the two cases. | 
|  | .SH SEE ALSO | 
|  | .IR venti (1), | 
|  | .IR venti (3), | 
|  | .IR venti (8) | 
|  | .br | 
|  | Sean Quinlan and Sean Dorward, | 
|  | ``Venti: a new approach to archival storage'', | 
|  | .I "Usenix Conference on File and Storage Technologies" , | 
|  | 2002. |