blob: 069e1d232e408c84441c500a1cedf6907dd05b9f [file] [log] [blame]
rsccfa37a72004-04-10 18:53:55 +00001.TH REGEXP 3
rscb2cfc4e2003-09-30 17:47:41 +00002.SH NAME
rsccfa37a72004-04-10 18:53:55 +00003regcomp, regcomplit, regcompnl, regexec, regsub, rregexec, rregsub, regerror \- regular expression
rscb2cfc4e2003-09-30 17:47:41 +00004.SH SYNOPSIS
rsccfa37a72004-04-10 18:53:55 +00005.B #include <u.h>
6.br
7.B #include <libc.h>
8.br
9.B #include <regexp.h>
rscb2cfc4e2003-09-30 17:47:41 +000010.PP
11.ta \w'\fLRegprog 'u
12.B
13Reprog *regcomp(char *exp)
14.PP
15.B
16Reprog *regcomplit(char *exp)
17.PP
18.B
19Reprog *regcompnl(char *exp)
20.PP
21.nf
22.B
23int regexec(Reprog *prog, char *string, Resub *match, int msize)
24.PP
25.nf
26.B
27void regsub(char *source, char *dest, int dlen, Resub *match, int msize)
28.PP
29.nf
30.B
31int rregexec(Reprog *prog, Rune *string, Resub *match, int msize)
32.PP
33.nf
34.B
35void rregsub(Rune *source, Rune *dest, int dlen, Resub *match, int msize)
36.PP
37.B
38void regerror(char *msg)
39.SH DESCRIPTION
40.I Regcomp
41compiles a
42regular expression and returns
43a pointer to the generated description.
44The space is allocated by
rscbf8a59f2004-04-11 03:42:27 +000045.IR malloc (3)
rscb2cfc4e2003-09-30 17:47:41 +000046and may be released by
47.IR free .
48Regular expressions are exactly as in
rsc058b0112005-01-03 06:40:20 +000049.IR regexp (7).
rscb2cfc4e2003-09-30 17:47:41 +000050.PP
51.I Regcomplit
52is like
53.I regcomp
54except that all characters are treated literally.
55.I Regcompnl
56is like
57.I regcomp
58except that the
59.B .
60metacharacter matches all characters, including newlines.
61.PP
62.I Regexec
63matches a null-terminated
64.I string
65against the compiled regular expression in
66.IR prog .
67If it matches,
68.I regexec
69returns
70.B 1
71and fills in the array
72.I match
73with character pointers to the substrings of
74.I string
75that correspond to the
76parenthesized subexpressions of
77.IR exp :
78.BI match[ i ].sp
79points to the beginning and
80.BI match[ i ].ep
81points just beyond
82the end of the
83.IR i th
84substring.
85(Subexpression
86.I i
87begins at the
88.IR i th
89left parenthesis, counting from 1.)
90Pointers in
91.B match[0]
92pick out the substring that corresponds to
93the whole regular expression.
94Unused elements of
95.I match
96are filled with zeros.
97Matches involving
98.LR * ,
99.LR + ,
100and
101.L ?
102are extended as far as possible.
103The number of array elements in
104.I match
105is given by
106.IR msize .
107The structure of elements of
108.I match
109is:
110.IP
111.EX
112typedef struct {
113 union {
114 char *sp;
115 Rune *rsp;
rscc8b63422005-01-13 04:49:19 +0000116 } s;
rscb2cfc4e2003-09-30 17:47:41 +0000117 union {
118 char *ep;
119 Rune *rep;
rscc8b63422005-01-13 04:49:19 +0000120 } e;
rscb2cfc4e2003-09-30 17:47:41 +0000121} Resub;
122.EE
123.LP
124If
rscc8b63422005-01-13 04:49:19 +0000125.B match[0].s.sp
rscb2cfc4e2003-09-30 17:47:41 +0000126is nonzero on entry,
127.I regexec
128starts matching at that point within
129.IR string .
130If
rscc8b63422005-01-13 04:49:19 +0000131.B match[0].e.ep
rscb2cfc4e2003-09-30 17:47:41 +0000132is nonzero on entry,
133the last character matched is the one
134preceding that point.
135.PP
136.I Regsub
137places in
138.I dest
139a substitution instance of
140.I source
141in the context of the last
142.I regexec
143performed using
144.IR match .
145Each instance of
146.BI \e n\f1,
147where
148.I n
149is a digit, is replaced by the
150string delimited by
rsccfa37a72004-04-10 18:53:55 +0000151.BI match[ n ].sp
rscb2cfc4e2003-09-30 17:47:41 +0000152and
rsccfa37a72004-04-10 18:53:55 +0000153.BI match[ n ].ep\f1.
rscb2cfc4e2003-09-30 17:47:41 +0000154Each instance of
155.L &
156is replaced by the string delimited by
rsccfa37a72004-04-10 18:53:55 +0000157.B match[0].sp
rscb2cfc4e2003-09-30 17:47:41 +0000158and
rsccfa37a72004-04-10 18:53:55 +0000159.BR match[0].ep .
rscb2cfc4e2003-09-30 17:47:41 +0000160The substitution will always be null terminated and
161trimmed to fit into dlen bytes.
162.PP
163.IR Regerror ,
164called whenever an error is detected in
165.IR regcomp ,
166writes the string
167.I msg
168on the standard error file and exits.
169.I Regerror
170can be replaced to perform
171special error processing.
172If the user supplied
173.I regerror
174returns rather than exits,
175.I regcomp
176will return 0.
177.PP
178.I Rregexec
179and
180.I rregsub
181are variants of
182.I regexec
183and
184.I regsub
185that use strings of
186.B Runes
187instead of strings of
188.BR chars .
189With these routines, the
190.I rsp
191and
192.I rep
193fields of the
194.I match
195array elements should be used.
rsccfa37a72004-04-10 18:53:55 +0000196.SH SOURCE
rscc3674de2005-01-11 17:37:33 +0000197.B \*9/src/libregexp
rscb2cfc4e2003-09-30 17:47:41 +0000198.SH "SEE ALSO"
rsccfa37a72004-04-10 18:53:55 +0000199.IR grep (1)
rscb2cfc4e2003-09-30 17:47:41 +0000200.SH DIAGNOSTICS
201.I Regcomp
202returns
203.B 0
204for an illegal expression
205or other failure.
206.I Regexec
207returns 0
208if
209.I string
210is not matched.
rscb2cfc4e2003-09-30 17:47:41 +0000211.SH BUGS
212There is no way to specify or match a NUL character; NULs terminate patterns and strings.