|  | .TH VENTI 8 | 
|  | .SH NAME | 
|  | venti \- archival storage server | 
|  | .SH SYNOPSIS | 
|  | .in +0.25i | 
|  | .ti -0.25i | 
|  | .B venti/venti | 
|  | [ | 
|  | .B -Ldrs | 
|  | ] | 
|  | [ | 
|  | .B -a | 
|  | .I address | 
|  | ] | 
|  | [ | 
|  | .B -B | 
|  | .I blockcachesize | 
|  | ] | 
|  | [ | 
|  | .B -c | 
|  | .I config | 
|  | ] | 
|  | [ | 
|  | .B -C | 
|  | .I lumpcachesize | 
|  | ] | 
|  | [ | 
|  | .B -h | 
|  | .I httpaddress | 
|  | ] | 
|  | [ | 
|  | .B -I | 
|  | .I indexcachesize | 
|  | ] | 
|  | [ | 
|  | .B -W | 
|  | .I webroot | 
|  | ] | 
|  | .SH DESCRIPTION | 
|  | .I Venti | 
|  | is a SHA1-addressed archival storage server. | 
|  | See | 
|  | .IR venti (7) | 
|  | for a full introduction to the system. | 
|  | This page documents the structure and operation of the server. | 
|  | .PP | 
|  | A venti server requires multiple disks or disk partitions, | 
|  | each of which must be properly formatted before the server | 
|  | can be run. | 
|  | .SS Disk | 
|  | The venti server maintains three disk structures, typically | 
|  | stored on raw disk partitions: | 
|  | the append-only | 
|  | .IR "data log" , | 
|  | which holds, in sequential order, | 
|  | the contents of every block written to the server; | 
|  | the | 
|  | .IR index , | 
|  | which helps locate a block in the data log given its score; | 
|  | and optionally the | 
|  | .IR "bloom filter" , | 
|  | a concise summary of which scores are present in the index. | 
|  | The data log is the primary storage. | 
|  | To improve the robustness, it should be stored on | 
|  | a device that provides RAID functionality. | 
|  | The index and the bloom filter are optimizations | 
|  | employed to access the data log efficiently and can be rebuilt | 
|  | if lost or damaged. | 
|  | .PP | 
|  | The data log is logically split into sections called | 
|  | .IR arenas , | 
|  | typically sized for easy offline backup | 
|  | (e.g., 500MB). | 
|  | A data log may comprise many disks, each storing | 
|  | one or more arenas. | 
|  | Such disks are called | 
|  | .IR "arena partitions" . | 
|  | Arena partitions are filled in the order given in the configuration. | 
|  | .PP | 
|  | The index is logically split into block-sized pieces called | 
|  | .IR buckets , | 
|  | each of which is responsible for a particular range of scores. | 
|  | An index may be split across many disks, each storing many buckets. | 
|  | Such disks are called | 
|  | .IR "index sections" . | 
|  | .PP | 
|  | The index must be sized so that no bucket is full. | 
|  | When a bucket fills, the server must be shut down and | 
|  | the index made larger. | 
|  | Since scores appear random, each bucket will contain | 
|  | approximately the same number of entries. | 
|  | Index entries are 40 bytes long.  Assuming that a typical block | 
|  | being written to the server is 8192 bytes and compresses to 4096 | 
|  | bytes, the active index is expected to be about 1% of | 
|  | the active data log. | 
|  | Storing smaller blocks increases the relative index footprint; | 
|  | storing larger blocks decreases it. | 
|  | To allow variation in both block size and the random distribution | 
|  | of scores to buckets, the suggested index size is 5% of | 
|  | the active data log. | 
|  | .PP | 
|  | The (optional) bloom filter is a large bitmap that is stored on disk but | 
|  | also kept completely in memory while the venti server runs. | 
|  | It helps the venti server efficiently detect scores that are | 
|  | .I not | 
|  | already stored in the index. | 
|  | The bloom filter starts out zeroed. | 
|  | Each score recorded in the bloom filter is hashed to choose | 
|  | .I nhash | 
|  | bits to set in the bloom filter. | 
|  | A score is definitely not stored in the index of any of its | 
|  | .I nhash | 
|  | bits are not set. | 
|  | The bloom filter thus has two parameters: | 
|  | .I nhash | 
|  | (maximum 32) | 
|  | and the total bitmap size | 
|  | (maximum 512MB, 2\s-2\u32\d\s+2 bits). | 
|  | .PP | 
|  | The bloom filter should be sized so that | 
|  | .I nhash | 
|  | \(mu | 
|  | .I nblock | 
|  | \(<= | 
|  | 0.7 \(mu | 
|  | .IR b , | 
|  | where | 
|  | .I nblock | 
|  | is the expected number of blocks stored on the server | 
|  | and | 
|  | .I b | 
|  | is the bitmap size in bits. | 
|  | The false positive rate of the bloom filter when sized | 
|  | this way is approximately 2\s-2\u\-\fInblock\fR\d\s+2. | 
|  | .I Nhash | 
|  | less than 10 are not very useful; | 
|  | .I nhash | 
|  | greater than 24 are probably a waste of memory. | 
|  | .I Fmtbloom | 
|  | (see | 
|  | .IR venti-fmt (8)) | 
|  | can be given either | 
|  | .I nhash | 
|  | or | 
|  | .IR nblock ; | 
|  | if given | 
|  | .IR nblock , | 
|  | it will derive an appropriate | 
|  | .IR nhash . | 
|  | .SS Memory | 
|  | Venti can make effective use of large amounts of memory | 
|  | for various caches. | 
|  | .PP | 
|  | The | 
|  | .I "lump cache | 
|  | holds recently-accessed venti data blocks, which the server refers to as | 
|  | .IR lumps . | 
|  | The lump cache should be at least 1MB but can profitably be much larger. | 
|  | The lump cache can be thought of as the level-1 cache: | 
|  | read requests handled by the lump cache can | 
|  | be served instantly. | 
|  | .PP | 
|  | The | 
|  | .I "block cache | 
|  | holds recently-accessed | 
|  | .I disk | 
|  | blocks from the arena partitions. | 
|  | The block cache needs to be able to simultaneously hold two blocks | 
|  | from each arena plus four blocks for the currently-filling arena. | 
|  | The block cache can be thought of as the level-2 cache: | 
|  | read requests handled by the block cache are slower than those | 
|  | handled by the lump cache, since the lump data must be extracted | 
|  | from the raw disk blocks and possibly decompressed, but no | 
|  | disk accesses are necessary. | 
|  | .PP | 
|  | The | 
|  | .I "index cache | 
|  | holds recently-accessed or prefetched | 
|  | index entries. | 
|  | The index cache needs to be able to hold index entries | 
|  | for three or four arenas, at least, in order for prefetching | 
|  | to work properly.  Each index entry is 50 bytes. | 
|  | Assuming 500MB arenas of | 
|  | 128,000 blocks that are 4096 bytes each after compression, | 
|  | the minimum index cache size is about 6MB. | 
|  | The index cache can be thought of as the level-3 cache: | 
|  | read requests handled by the index cache must still go | 
|  | to disk to fetch the arena blocks, but the costly random | 
|  | access to the index is avoided. | 
|  | .PP | 
|  | The size of the index cache determines how long venti | 
|  | can sustain its `burst' write throughput, during which time | 
|  | the only disk accesses on the critical path | 
|  | are sequential writes to the arena partitions. | 
|  | For example, if you want to be able to sustain 10MB/s | 
|  | for an hour, you need enough index cache to hold entries | 
|  | for 36GB of blocks.  Assuming 8192-byte blocks, | 
|  | you need room for almost five million index entries. | 
|  | Since index entries are 50 bytes each, you need 250MB | 
|  | of index cache. | 
|  | If the background index update process can make a single | 
|  | pass through the index in an hour, which is possible, | 
|  | then you can sustain the 10MB/s indefinitely (at least until | 
|  | the arenas are all filled). | 
|  | .PP | 
|  | The | 
|  | .I "bloom filter | 
|  | requires memory equal to its size on disk, | 
|  | as discussed above. | 
|  | .PP | 
|  | A reasonable starting allocation is to | 
|  | divide memory equally (in thirds) between | 
|  | the bloom filter, the index cache, and the lump and block caches; | 
|  | the third of memory allocated to the lump and block caches | 
|  | should be split unevenly, with more (say, two thirds) | 
|  | going to the block cache. | 
|  | .SS Network | 
|  | The venti server announces two network services, one | 
|  | (conventionally TCP port | 
|  | .BR venti , | 
|  | 17034) serving | 
|  | the venti protocol as described in | 
|  | .IR venti (7), | 
|  | and one serving HTTP | 
|  | (conventionally TCP port | 
|  | .BR http , | 
|  | 80). | 
|  | .PP | 
|  | The venti web server provides the following | 
|  | URLs for accessing status information: | 
|  | .TF "\fL/storage" | 
|  | .PD | 
|  | .TP | 
|  | .B /index | 
|  | A summary of the usage of the arenas and index sections. | 
|  | .TP | 
|  | .B /xindex | 
|  | An XML version of | 
|  | .BR /index . | 
|  | .TP | 
|  | .B /storage | 
|  | Brief storage totals. | 
|  | .TP | 
|  | .BI /set | 
|  | Disable the values of all variables. | 
|  | Variables are: | 
|  | .BR compress , | 
|  | whether or not to compress blocks | 
|  | (for debugging); | 
|  | .BR logging , | 
|  | whether to write entries to the debugging logs; | 
|  | .BR stats , | 
|  | whether to collect run-time statistics; | 
|  | .BR icachesleeptime , | 
|  | the time in milliseconds between successive updates | 
|  | of megabytes of the index cache; | 
|  | .BR arenasumsleeptime , | 
|  | the time in milliseconds between reads while | 
|  | checksumming an arena in the background. | 
|  | The two sleep times should be (but are not) managed by venti; | 
|  | they exist to provide more experience with their effects. | 
|  | The other variables exist only for debugging and | 
|  | performance measurement. | 
|  | .TP | 
|  | .BI /set?name= variable | 
|  | Show the current setting of | 
|  | .IR variable . | 
|  | .TP | 
|  | .BI /set?name= variable &value= value | 
|  | Set | 
|  | .I variable | 
|  | to | 
|  | .IR value . | 
|  | .TP | 
|  | .BI /graph/ name / param / param / \fR... | 
|  | A PNG image graphing the named run-time statistic over time. | 
|  | The details of names and parameters are undocumented; | 
|  | see | 
|  | .B httpd.c | 
|  | in the venti sources. | 
|  | .TP | 
|  | .B /log | 
|  | A list of all debugging logs present in the server's memory. | 
|  | .TP | 
|  | .BI /log/ name | 
|  | The contents of the debugging log with the given | 
|  | .IR name . | 
|  | .TP | 
|  | .B /flushicache | 
|  | Force venti to begin flushing the index cache to disk. | 
|  | The request response will not be sent until the flush | 
|  | has completed. | 
|  | .TP | 
|  | .B /flushdcache | 
|  | Force venti to begin flushing the arena block cache to disk. | 
|  | The request response will not be sent until the flush | 
|  | has completed. | 
|  | .PD | 
|  | .PP | 
|  | Requests for other files are served by consulting a | 
|  | directory named in the configuration file | 
|  | (see | 
|  | .B webroot | 
|  | below). | 
|  | .SS Configuration File | 
|  | A venti configuration file | 
|  | enumerates the various index sections and | 
|  | arenas that constitute a venti system. | 
|  | The components are indicated by the name of the file, typically | 
|  | a disk partition, in which they reside.  The configuration | 
|  | file is the only location that file names are used.  Internally, | 
|  | venti uses the names assigned when the components were formatted | 
|  | with | 
|  | .I fmtarenas | 
|  | or | 
|  | .I fmtisect | 
|  | (see | 
|  | .IR venti-fmt (8)). | 
|  | In particular, only the configuration needs to be | 
|  | changed if a component is moved to a different file. | 
|  | .PP | 
|  | The configuration file consists of lines in the form described below. | 
|  | Lines starting with | 
|  | .B # | 
|  | are comments. | 
|  | .TF "\fLindex\fI name " | 
|  | .PD | 
|  | .TP | 
|  | .BI index " name | 
|  | Names the index for the system. | 
|  | .TP | 
|  | .BI arenas " file | 
|  | .I File | 
|  | is an arena partition, formatted using | 
|  | .IR fmtarenas . | 
|  | .TP | 
|  | .BI isect " file | 
|  | .I File | 
|  | is an index section, formatted using | 
|  | .IR fmtisect . | 
|  | .TP | 
|  | .BI bloom " file | 
|  | .I File | 
|  | is a bloom filter, formatted using | 
|  | .IR fmtbloom . | 
|  | .PD | 
|  | .PP | 
|  | After formatting a venti system using | 
|  | .IR fmtindex , | 
|  | the order of arenas and index sections should not be changed. | 
|  | Additional arenas can be appended to the configuration; | 
|  | run | 
|  | .I fmtindex | 
|  | with the | 
|  | .B -a | 
|  | flag to update the index. | 
|  | .PP | 
|  | The configuration file also holds configuration parameters | 
|  | for the venti server itself. | 
|  | These are: | 
|  | .TF "\fLhttpaddr\fI netaddr " | 
|  | .TP | 
|  | .BI mem " size | 
|  | lump cache size | 
|  | .TP | 
|  | .BI bcmem " size | 
|  | block cache size | 
|  | .TP | 
|  | .BI icmem " size | 
|  | index cache size | 
|  | .TP | 
|  | .BI addr " netaddr | 
|  | network address to announce venti service | 
|  | (default | 
|  | .BR tcp!*!venti ) | 
|  | .TP | 
|  | .BI httpaddr " netaddr | 
|  | network address to announce HTTP service | 
|  | (default | 
|  | .BR tcp!*!http ) | 
|  | .TP | 
|  | .B queuewrites | 
|  | queue writes in memory | 
|  | (default is not to queue) | 
|  | .TP | 
|  | .BI webroot " dir | 
|  | directory tree containing files for | 
|  | .IR venti 's | 
|  | internal HTTP server to consult for unrecognized URLs | 
|  | .PD | 
|  | .PP | 
|  | The units for the various cache sizes above can be specified by appending a | 
|  | .LR k , | 
|  | .LR m , | 
|  | or | 
|  | .LR g | 
|  | (case-insensitive) | 
|  | to indicate kilobytes, megabytes, or gigabytes respectively. | 
|  | .PP | 
|  | The | 
|  | .I file | 
|  | name in the configuration lines above can be of the form | 
|  | .IB file : lo - hi | 
|  | to specify a range of the file. | 
|  | .I Lo | 
|  | and | 
|  | .I hi | 
|  | are specified in bytes but can have the usual | 
|  | .BI k , | 
|  | .BI m , | 
|  | or | 
|  | .B g | 
|  | suffixes. | 
|  | Either | 
|  | .I lo | 
|  | or | 
|  | .I hi | 
|  | may be omitted. | 
|  | This notation eliminates the need to | 
|  | partition raw disks on non-Plan 9 systems. | 
|  | .SS Command Line | 
|  | Many of the options to Venti duplicate parameters that | 
|  | can be specified in the configuration file. | 
|  | The command line options override those found in a | 
|  | configuration file. | 
|  | Additional options are: | 
|  | .TF "\fL-c\fI config" | 
|  | .PD | 
|  | .TP | 
|  | .BI -c " config | 
|  | The server configuration file | 
|  | (default | 
|  | .BR venti.conf ) | 
|  | .TP | 
|  | .B -d | 
|  | Produce various debugging information on standard error. | 
|  | Implies | 
|  | .BR -s . | 
|  | .TP | 
|  | .B -L | 
|  | Enable logging.  By default all logging is disabled. | 
|  | Logging slows server operation considerably. | 
|  | .TP | 
|  | .B -r | 
|  | Allow only read access to the venti data. | 
|  | .TP | 
|  | .B -s | 
|  | Do not run in the background. | 
|  | Normally, | 
|  | the foreground process will exit once the Venti server | 
|  | is initialized and ready for connections. | 
|  | .PD | 
|  | .SH EXAMPLE | 
|  | A simple configuration: | 
|  | .IP | 
|  | .EX | 
|  | % cat venti.conf | 
|  | index main | 
|  | isect /tmp/disks/isect0 | 
|  | isect /tmp/disks/isect1 | 
|  | arenas /tmp/disks/arenas | 
|  | bloom /tmp/disks/bloom | 
|  | mem 10M | 
|  | bcmem 20M | 
|  | icmem 30M | 
|  | % | 
|  | .EE | 
|  | .PP | 
|  | Format the index sections, the arena partition, | 
|  | the bloom filter, and | 
|  | finally the main index: | 
|  | .IP | 
|  | .EX | 
|  | % venti/fmtisect isect0. /tmp/disks/isect0 | 
|  | % venti/fmtisect isect1. /tmp/disks/isect1 | 
|  | % venti/fmtarenas arenas0. /tmp/disks/arenas & | 
|  | % venti/fmtbloom /tmp/disks/bloom & | 
|  | % wait | 
|  | % venti/fmtindex venti.conf | 
|  | % | 
|  | .EE | 
|  | .PP | 
|  | Start the server and check the storage statistics: | 
|  | .IP | 
|  | .EX | 
|  | % venti/venti | 
|  | % hget http://$sysname/storage | 
|  | .EE | 
|  | .SH SOURCE | 
|  | .B \*9/src/cmd/venti/srv | 
|  | .SH "SEE ALSO" | 
|  | .IR venti (1), | 
|  | .IR venti (3), | 
|  | .IR venti (7), | 
|  | .IR venti-backup (8) | 
|  | .IR venti-fmt (8) | 
|  | .br | 
|  | Sean Quinlan and Sean Dorward, | 
|  | ``Venti: a new approach to archival storage'', | 
|  | .I "Usenix Conference on File and Storage Technologies" , | 
|  | 2002. | 
|  | .SH BUGS | 
|  | Setting up a venti server is too complicated. | 
|  | .PP | 
|  | Venti should not require the user to decide how to | 
|  | partition its memory usage. | 
|  | .PP | 
|  | Users of shells other than | 
|  | .IR rc (1) | 
|  | will not be able to use the program names shown. | 
|  | One solution is to define | 
|  | .B "V=$PLAN9/bin/venti" | 
|  | and then substitute | 
|  | .B $V/ | 
|  | for | 
|  | .B venti/ | 
|  | in the paths above. |