blob: cdee7dda202aa9ec3d3a57982d1419d6cfe0a1da [file] [log] [blame]
rscc8661ff2005-11-29 04:05:50 +00001<HTML>
2<HEAD>
3<!-- This HTML file has been created by texi2html 1.52
4 from gxxint.texi on 27 August 1999 -->
5
6<TITLE>G++ internals - Mangling</TITLE>
7</HEAD>
8<BODY>
9Go to the <A HREF="gxxint_1.html">first</A>, <A HREF="gxxint_14.html">previous</A>, <A HREF="gxxint_16.html">next</A>, <A HREF="gxxint_16.html">last</A> section, <A HREF="gxxint_toc.html">table of contents</A>.
10<P><HR><P>
11
12
13<H2><A NAME="SEC20" HREF="gxxint_toc.html#TOC20">Function name mangling for C++ and Java</A></H2>
14
15<P>
16Both C++ and Jave provide overloaded function and methods,
17which are methods with the same types but different parameter lists.
18Selecting the correct version is done at compile time.
19Though the overloaded functions have the same name in the source code,
20they need to be translated into different assembler-level names,
21since typical assemblers and linkers cannot handle overloading.
22This process of encoding the parameter types with the method name
23into a unique name is called <EM>name mangling</EM>. The inverse
24process is called <EM>demangling</EM>.
25
26</P>
27<P>
28It is convenient that C++ and Java use compatible mangling schemes,
29since the makes life easier for tools such as gdb, and it eases
30integration between C++ and Java.
31
32</P>
33<P>
34Note there is also a standard "Jave Native Interface" (JNI) which
35implements a different calling convention, and uses a different
36mangling scheme. The JNI is a rather abstract ABI so Java can call methods
37written in C or C++;
38we are concerned here about a lower-level interface primarily
39intended for methods written in Java, but that can also be used for C++
40(and less easily C).
41
42</P>
43
44
45<H3><A NAME="SEC21" HREF="gxxint_toc.html#TOC21">Method name mangling</A></H3>
46
47<P>
48C++ mangles a method by emitting the function name, followed by <CODE>__</CODE>,
49followed by encodings of any method qualifiers (such as <CODE>const</CODE>),
50followed by the mangling of the method's class,
51followed by the mangling of the parameters, in order.
52
53</P>
54<P>
55For example <CODE>Foo::bar(int, long) const</CODE> is mangled
56as <SAMP>`bar__C3Fooil'</SAMP>.
57
58</P>
59<P>
60For a constructor, the method name is left out.
61That is <CODE>Foo::Foo(int, long) const</CODE> is mangled
62as <SAMP>`__C3Fooil'</SAMP>.
63
64</P>
65<P>
66GNU Java does the same.
67
68</P>
69
70
71<H3><A NAME="SEC22" HREF="gxxint_toc.html#TOC22">Primitive types</A></H3>
72
73<P>
74The C++ types <CODE>int</CODE>, <CODE>long</CODE>, <CODE>short</CODE>, <CODE>char</CODE>,
75and <CODE>long long</CODE> are mangled as <SAMP>`i'</SAMP>, <SAMP>`l'</SAMP>,
76<SAMP>`s'</SAMP>, <SAMP>`c'</SAMP>, and <SAMP>`x'</SAMP>, respectively.
77The corresponding unsigned types have <SAMP>`U'</SAMP> prefixed
78to the mangling. The type <CODE>signed char</CODE> is mangled <SAMP>`Sc'</SAMP>.
79
80</P>
81<P>
82The C++ and Java floating-point types <CODE>float</CODE> and <CODE>double</CODE>
83are mangled as <SAMP>`f'</SAMP> and <SAMP>`d'</SAMP> respectively.
84
85</P>
86<P>
87The C++ <CODE>bool</CODE> type and the Java <CODE>boolean</CODE> type are
88mangled as <SAMP>`b'</SAMP>.
89
90</P>
91<P>
92The C++ <CODE>wchar_t</CODE> and the Java <CODE>char</CODE> types are
93mangled as <SAMP>`w'</SAMP>.
94
95</P>
96<P>
97The Java integral types <CODE>byte</CODE>, <CODE>short</CODE>, <CODE>int</CODE>
98and <CODE>long</CODE> are mangled as <SAMP>`c'</SAMP>, <SAMP>`s'</SAMP>, <SAMP>`i'</SAMP>,
99and <SAMP>`x'</SAMP>, respectively.
100
101</P>
102<P>
103C++ code that has included <CODE>javatypes.h</CODE> will mangle
104the typedefs <CODE>jbyte</CODE>, <CODE>jshort</CODE>, <CODE>jint</CODE>
105and <CODE>jlong</CODE> as respectively <SAMP>`c'</SAMP>, <SAMP>`s'</SAMP>, <SAMP>`i'</SAMP>,
106and <SAMP>`x'</SAMP>. (This has not been implemented yet.)
107
108</P>
109
110
111<H3><A NAME="SEC23" HREF="gxxint_toc.html#TOC23">Mangling of simple names</A></H3>
112
113<P>
114A simple class, package, template, or namespace name is
115encoded as the number of characters in the name, followed by
116the actual characters. Thus the class <CODE>Foo</CODE>
117is encoded as <SAMP>`3Foo'</SAMP>.
118
119</P>
120<P>
121If any of the characters in the name are not alphanumeric
122(i.e not one of the standard ASCII letters, digits, or '_'),
123or the initial character is a digit, then the name is
124mangled as a sequence of encoded Unicode letters.
125A Unicode encoding starts with a <SAMP>`U'</SAMP> to indicate
126that Unicode escapes are used, followed by the number of
127bytes used by the Unicode encoding, followed by the bytes
128representing the encoding. ASSCI letters and
129non-initial digits are encoded without change. However, all
130other characters (including underscore and initial digits) are
131translated into a sequence starting with an underscore,
132followed by the big-endian 4-hex-digit lower-case encoding of the character.
133
134</P>
135<P>
136If a method name contains Unicode-escaped characters, the
137entire mangled method name is followed by a <SAMP>`U'</SAMP>.
138
139</P>
140<P>
141For example, the method <CODE>X\u0319::M\u002B(int)</CODE> is encoded as
142<SAMP>`M_002b__U6X_0319iU'</SAMP>.
143
144</P>
145
146
147<H3><A NAME="SEC24" HREF="gxxint_toc.html#TOC24">Pointer and reference types</A></H3>
148
149<P>
150A C++ pointer type is mangled as <SAMP>`P'</SAMP> followed by the
151mangling of the type pointed to.
152
153</P>
154<P>
155A C++ reference type as mangled as <SAMP>`R'</SAMP> followed by the
156mangling of the type referenced.
157
158</P>
159<P>
160A Java object reference type is equivalent
161to a C++ pointer parameter, so we mangle such an parameter type
162as <SAMP>`P'</SAMP> followed by the mangling of the class name.
163
164</P>
165
166
167<H3><A NAME="SEC25" HREF="gxxint_toc.html#TOC25">Qualified names</A></H3>
168
169<P>
170Both C++ and Java allow a class to be lexically nested inside another
171class. C++ also supports namespaces (not yet implemented by G++).
172Java also supports packages.
173
174</P>
175<P>
176These are all mangled the same way: First the letter <SAMP>`Q'</SAMP>
177indicates that we are emitting a qualified name.
178That is followed by the number of parts in the qualified name.
179If that number is 9 or less, it is emitted with no delimiters.
180Otherwise, an underscore is written before and after the count.
181Then follows each part of the qualified name, as described above.
182
183</P>
184<P>
185For example <CODE>Foo::\u0319::Bar</CODE> is encoded as
186<SAMP>`Q33FooU5_03193Bar'</SAMP>.
187
188</P>
189
190
191<H3><A NAME="SEC26" HREF="gxxint_toc.html#TOC26">Templates</A></H3>
192
193<P>
194A class template instantiation is encoded as the letter <SAMP>`t'</SAMP>,
195followed by the encoding of the template name, followed
196the number of template parameters, followed by encoding of the template
197parameters. If a template parameter is a type, it is written
198as a <SAMP>`Z'</SAMP> followed by the encoding of the type.
199
200</P>
201<P>
202A function template specialization (either an instantiation or an
203explicit specialization) is encoded by an <SAMP>`H'</SAMP> followed by the
204encoding of the template parameters, as described above, followed by
205an <SAMP>`_'</SAMP>, the encoding of the argument types template function (not the
206specialization), another <SAMP>`_'</SAMP>, and the return type. (Like the
207argument types, the return type is the return type of the function
208template, not the specialization.) Template parameters in the argument
209and return types are encoded by an <SAMP>`X'</SAMP> for type parameters, or a
210<SAMP>`Y'</SAMP> for constant parameters, and an index indicating their position
211in the template parameter list declaration.
212
213</P>
214
215
216<H3><A NAME="SEC27" HREF="gxxint_toc.html#TOC27">Arrays</A></H3>
217
218<P>
219C++ array types are mangled by emitting <SAMP>`A'</SAMP>, followed by
220the length of the array, followed by an <SAMP>`_'</SAMP>, followed by
221the mangling of the element type. Of course, normally
222array parameter types decay into a pointer types, so you
223don't see this.
224
225</P>
226<P>
227Java arrays are objects. A Java type <CODE>T[]</CODE> is mangled
228as if it were the C++ type <CODE>JArray&#60;T&#62;</CODE>.
229For example <CODE>java.lang.String[]</CODE> is encoded as
230<SAMP>`Pt6JArray1ZPQ34java4lang6String'</SAMP>.
231
232</P>
233
234
235<H3><A NAME="SEC28" HREF="gxxint_toc.html#TOC28">Table of demangling code characters</A></H3>
236
237<P>
238The following special characters are used in mangling:
239
240</P>
241<DL COMPACT>
242
243<DT><SAMP>`A'</SAMP>
244<DD>
245Indicates a C++ array type.
246
247<DT><SAMP>`b'</SAMP>
248<DD>
249Encodes the C++ <CODE>bool</CODE> type,
250and the Java <CODE>boolean</CODE> type.
251
252<DT><SAMP>`c'</SAMP>
253<DD>
254Encodes the C++ <CODE>char</CODE> type, and the Java <CODE>byte</CODE> type.
255
256<DT><SAMP>`C'</SAMP>
257<DD>
258A modifier to indicate a <CODE>const</CODE> type.
259Also used to indicate a <CODE>const</CODE> member function
260(in which cases it precedes the encoding of the method's class).
261
262<DT><SAMP>`d'</SAMP>
263<DD>
264Encodes the C++ and Java <CODE>double</CODE> types.
265
266<DT><SAMP>`e'</SAMP>
267<DD>
268Indicates extra unknown arguments <CODE>...</CODE>.
269
270<DT><SAMP>`f'</SAMP>
271<DD>
272Encodes the C++ and Java <CODE>float</CODE> types.
273
274<DT><SAMP>`F'</SAMP>
275<DD>
276Used to indicate a function type.
277
278<DT><SAMP>`H'</SAMP>
279<DD>
280Used to indicate a template function.
281
282<DT><SAMP>`i'</SAMP>
283<DD>
284Encodes the C++ and Java <CODE>int</CODE> types.
285
286<DT><SAMP>`J'</SAMP>
287<DD>
288Indicates a complex type.
289
290<DT><SAMP>`l'</SAMP>
291<DD>
292Encodes the C++ <CODE>long</CODE> type.
293
294<DT><SAMP>`P'</SAMP>
295<DD>
296Indicates a pointer type. Followed by the type pointed to.
297
298<DT><SAMP>`Q'</SAMP>
299<DD>
300Used to mangle qualified names, which arise from nested classes.
301Should also be used for namespaces (?).
302In Java used to mangle package-qualified names, and inner classes.
303
304<DT><SAMP>`r'</SAMP>
305<DD>
306Encodes the GNU C++ <CODE>long double</CODE> type.
307
308<DT><SAMP>`R'</SAMP>
309<DD>
310Indicates a reference type. Followed by the referenced type.
311
312<DT><SAMP>`s'</SAMP>
313<DD>
314Encodes the C++ and java <CODE>short</CODE> types.
315
316<DT><SAMP>`S'</SAMP>
317<DD>
318A modifier that indicates that the following integer type is signed.
319Only used with <CODE>char</CODE>.
320
321Also used as a modifier to indicate a static member function.
322
323<DT><SAMP>`t'</SAMP>
324<DD>
325Indicates a template instantiation.
326
327<DT><SAMP>`T'</SAMP>
328<DD>
329A back reference to a previously seen type.
330
331<DT><SAMP>`U'</SAMP>
332<DD>
333A modifier that indicates that the following integer type is unsigned.
334Also used to indicate that the following class or namespace name
335is encoded using Unicode-mangling.
336
337<DT><SAMP>`v'</SAMP>
338<DD>
339Encodes the C++ and Java <CODE>void</CODE> types.
340
341<DT><SAMP>`V'</SAMP>
342<DD>
343A modified for a <CODE>const</CODE> type or method.
344
345<DT><SAMP>`w'</SAMP>
346<DD>
347Encodes the C++ <CODE>wchar_t</CODE> type, and the Java <CODE>char</CODE> types.
348
349<DT><SAMP>`x'</SAMP>
350<DD>
351Encodes the GNU C++ <CODE>long long</CODE> type, and the Java <CODE>long</CODE> type.
352
353<DT><SAMP>`X'</SAMP>
354<DD>
355Encodes a template type parameter, when part of a function type.
356
357<DT><SAMP>`Y'</SAMP>
358<DD>
359Encodes a template constant parameter, when part of a function type.
360
361<DT><SAMP>`Z'</SAMP>
362<DD>
363Used for template type parameters.
364
365</DL>
366
367<P>
368The letters <SAMP>`G'</SAMP>, <SAMP>`M'</SAMP>, <SAMP>`O'</SAMP>, and <SAMP>`p'</SAMP>
369also seem to be used for obscure purposes ...
370
371</P>
372<P><HR><P>
373Go to the <A HREF="gxxint_1.html">first</A>, <A HREF="gxxint_14.html">previous</A>, <A HREF="gxxint_16.html">next</A>, <A HREF="gxxint_16.html">last</A> section, <A HREF="gxxint_toc.html">table of contents</A>.
374</BODY>
375</HTML>