add title, fix some minor problems
[deliverable/binutils-gdb.git] / bfd / doc / bfdint.texi
1 \input texinfo
2 @setfilename bfdint.info
3
4 @settitle BFD Internals
5 @iftex
6 @title{BFD Internals}
7 @author{Ian Lance Taylor}
8 @author{Cygnus Solutions}
9 @end iftex
10
11 @node Top
12 @top BFD Internals
13 @raisesections
14 @cindex bfd internals
15
16 This document describes some BFD internal information which may be
17 helpful when working on BFD. It is very incomplete.
18
19 This document is not updated regularly, and may be out of date. It was
20 last modified on $Date$.
21
22 The initial version of this document was written by Ian Lance Taylor
23 @email{ian@@cygnus.com}.
24
25 @menu
26 * BFD glossary:: BFD glossary
27 * BFD guidelines:: BFD programming guidelines
28 * BFD generated files:: BFD generated files
29 * BFD multiple compilations:: Files compiled multiple times in BFD
30 * BFD relocation handling:: BFD relocation handling
31 * Index:: Index
32 @end menu
33
34 @node BFD glossary
35 @section BFD glossary
36 @cindex glossary for bfd
37 @cindex bfd glossary
38
39 This is a short glossary of some BFD terms.
40
41 @table @asis
42 @item a.out
43 The a.out object file format. The original Unix object file format.
44 Still used on SunOS, though not Solaris. Supports only three sections.
45
46 @item archive
47 A collection of object files produced and manipulated by the @samp{ar}
48 program.
49
50 @item BFD
51 The BFD library itself. Also, each object file, archive, or exectable
52 opened by the BFD library has the type @samp{bfd *}, and is sometimes
53 referred to as a bfd.
54
55 @item COFF
56 The Common Object File Format. Used on Unix SVR3. Used by some
57 embedded targets, although ELF is normally better.
58
59 @item DLL
60 A shared library on Windows.
61
62 @item dynamic linker
63 When a program linked against a shared library is run, the dynamic
64 linker will locate the appropriate shared library and arrange to somehow
65 include it in the running image.
66
67 @item dynamic object
68 Another name for an ELF shared library.
69
70 @item ECOFF
71 The Extended Common Object File Format. Used on Alpha Digital Unix
72 (formerly OSF/1), as well as Ultrix and Irix 4. A variant of COFF.
73
74 @item ELF
75 The Executable and Linking Format. The object file format used on most
76 modern Unix systems, including GNU/Linux, Solaris, Irix, and SVR4. Also
77 used on many embedded systems.
78
79 @item executable
80 A program, with instructions and symbols, and perhaps dynamic linking
81 information. Normally produced by a linker.
82
83 @item NLM
84 NetWare Loadable Module. Used to describe the format of an object which
85 be loaded into NetWare, which is some kind of PC based network server
86 program.
87
88 @item object file
89 A binary file including machine instructions, symbols, and relocation
90 information. Normally produced by an assembler.
91
92 @item object file format
93 The format of an object file. Typically object files and executables
94 for a particular system are in the same format, although executables
95 will not contain any relocation information.
96
97 @item PE
98 The Portable Executable format. This is the object file format used for
99 Windows (specifically, Win32) object files. It is based closely on
100 COFF, but has a few significant differences.
101
102 @item PEI
103 The Portable Executable Image format. This is the object file format
104 used for Windows (specifically, Win32) executables. It is very similar
105 to PE, but includes some additional header information.
106
107 @item relocations
108 Information used by the linker to adjust section contents. Also called
109 relocs.
110
111 @item section
112 Object files and executable are composed of sections. Sections have
113 optional data and optional relocation information.
114
115 @item shared library
116 A library of functions which may be used by many executables without
117 actually being linked into each executable. There are several different
118 implementations of shared libraries, each having slightly different
119 features.
120
121 @item symbol
122 Each object file and executable may have a list of symbols, often
123 referred to as the symbol table. A symbol is basically a name and an
124 address. There may also be some additional information like the type of
125 symbol, although the type of a symbol is normally something simple like
126 function or object, and should be confused with the more complex C
127 notion of type. Typically every global function and variable in a C
128 program will have an associated symbol.
129
130 @item Win32
131 The current Windows API, implemented by Windows 95 and later and Windows
132 NT 3.51 and later, but not by Windows 3.1.
133
134 @item XCOFF
135 The eXtended Common Object File Format. Used on AIX. A variant of
136 COFF, with a completely different symbol table implementation.
137 @end table
138
139 @node BFD guidelines
140 @section BFD programming guidelines
141 @cindex bfd programming guidelines
142 @cindex programming guidelines for bfd
143 @cindex guidelines, bfd programming
144
145 There is a lot of poorly written and confusing code in BFD. New BFD
146 code should be written to a higher standard. Merely because some BFD
147 code is written in a particular manner does not mean that you should
148 emulate it.
149
150 Here are some general BFD programming guidelines:
151
152 @itemize @bullet
153 @item
154 Follow the GNU coding standards.
155
156 @item
157 Avoid global variables. We ideally want BFD to be fully reentrant, so
158 that it can be used in multiple threads. All uses of global or static
159 variables interfere with that. Initialized constant variables are OK,
160 and they should be explicitly marked with const. Instead of global
161 variables, use data attached to a BFD or to a linker hash table.
162
163 @item
164 All externally visible functions should have names which start with
165 @samp{bfd_}. All such functions should be declared in some header file,
166 typically @file{bfd.h}. See, for example, the various declarations near
167 the end of @file{bfd-in.h}, which mostly declare functions required by
168 specific linker emulations.
169
170 @item
171 All functions which need to be visible from one file to another within
172 BFD, but should not be visible outside of BFD, should start with
173 @samp{_bfd_}. Although external names beginning with @samp{_} are
174 prohibited by the ANSI standard, in practice this usage will always
175 work, and it is required by the GNU coding standards.
176
177 @item
178 Always remember that people can compile using --enable-targets to build
179 several, or all, targets at once. It must be possible to link together
180 the files for all targets.
181
182 @item
183 BFD code should compile with few or no warnings using @samp{gcc -Wall}.
184 Some warnings are OK, like the absence of certain function declarations
185 which may or may not be declared in system header files. Warnings about
186 ambiguous expressions and the like should always be fixed.
187 @end itemize
188
189 @node BFD generated files
190 @section BFD generated files
191 @cindex generated files in bfd
192 @cindex bfd generated files
193
194 BFD contains several automatically generated files. This section
195 describes them. Some files are created at configure time, when you
196 configure BFD. Some files are created at make time, when you build
197 time. Some files are automatically rebuilt at make time, but only if
198 you configure with the @samp{--enable-maintainer-mode} option. Some
199 files live in the object directory---the directory from which you run
200 configure---and some live in the source directory. All files that live
201 in the source directory are checked into the CVS repository.
202
203 @table @file
204 @item bfd.h
205 @cindex @file{bfd.h}
206 @cindex @file{bfd-in3.h}
207 Lives in the object directory. Created at make time from
208 @file{bfd-in2.h} via @file{bfd-in3.h}. @file{bfd-in3.h} is created at
209 configure time from @file{bfd-in2.h}. There are automatic dependencies
210 to rebuild @file{bfd-in3.h} and hence @file{bfd.h} if @file{bfd-in2.h}
211 changes, so you can normally ignore @file{bfd-in3.h}, and just think
212 about @file{bfd-in2.h} and @file{bfd.h}.
213
214 @file{bfd.h} is built by replacing a few strings in @file{bfd-in2.h}.
215 To see them, search for @samp{@@} in @file{bfd-in2.h}. They mainly
216 control whether BFD is built for a 32 bit target or a 64 bit target.
217
218 @item bfd-in2.h
219 @cindex @file{bfd-in2.h}
220 Lives in the source directory. Created from @file{bfd-in.h} and several
221 other BFD source files. If you configure with the
222 @samp{--enable-maintainer-mode} option, @file{bfd-in2.h} is rebuilt
223 automatically when a source file changes.
224
225 @item elf32-target.h
226 @itemx elf64-target.h
227 @cindex @file{elf32-target.h}
228 @cindex @file{elf64-target.h}
229 Live in the object directory. Created from @file{elfxx-target.h}.
230 These files are versions of @file{elfxx-target.h} customized for either
231 a 32 bit ELF target or a 64 bit ELF target.
232
233 @item libbfd.h
234 @cindex @file{libbfd.h}
235 Lives in the source directory. Created from @file{libbfd-in.h} and
236 several other BFD source files. If you configure with the
237 @samp{--enable-maintainer-mode} option, @file{libbfd.h} is rebuilt
238 automatically when a source file changes.
239
240 @item libcoff.h
241 @cindex @file{libcoff.h}
242 Lives in the source directory. Created from @file{libcoff-in.h} and
243 @file{coffcode.h}. If you configure with the
244 @samp{--enable-maintainer-mode} option, @file{libcoff.h} is rebuilt
245 automatically when a source file changes.
246
247 @item targmatch.h
248 @cindex @file{targmatch.h}
249 Lives in the object directory. Created at make time from
250 @file{config.bfd}. This file is used to map configuration triplets into
251 BFD target vector variable names at run time.
252 @end table
253
254 @node BFD multiple compilations
255 @section Files compiled multiple times in BFD
256 Several files in BFD are compiled multiple times. By this I mean that
257 there are header files which contain function definitions. These header
258 filesare included by other files, and thus the functions are compiled
259 once per file which includes them.
260
261 Preprocessor macros are used to control the compilation, so that each
262 time the files are compiled the resulting functions are slightly
263 different. Naturally, if they weren't different, there would be no
264 reason to compile them multiple times.
265
266 This is a not a particularly good programming technique, and future BFD
267 work should avoid it.
268
269 @itemize @bullet
270 @item
271 Since this technique is rarely used, even experienced C programmers find
272 it confusing.
273
274 @item
275 It is difficult to debug programs which use BFD, since there is no way
276 to describe which version of a particular function you are looking at.
277
278 @item
279 Programs which use BFD wind up incorporating two or more slightly
280 different versions of the same function, which wastes space in the
281 executable.
282
283 @item
284 This technique is never required nor is it especially efficient. It is
285 always possible to use statically initialized structures holding
286 function pointers and magic constants instead.
287 @end itemize
288
289 The following is a list of the files which are compiled multiple times.
290
291 @table @file
292 @item aout-target.h
293 @cindex @file{aout-target.h}
294 Describes a few functions and the target vector for a.out targets. This
295 is used by individual a.out targets with different definitions of
296 @samp{N_TXTADDR} and similar a.out macros.
297
298 @item aoutf1.h
299 @cindex @file{aoutf1.h}
300 Implements standard SunOS a.out files. In principle it supports 64 bit
301 a.out targets based on the preprocessor macro @samp{ARCH_SIZE}, but
302 since all known a.out targets are 32 bits, this code may or may not
303 work. This file is only included by a few other files, and it is
304 difficult to justify its existence.
305
306 @item aoutx.h
307 @cindex @file{aoutx.h}
308 Implements basic a.out support routines. This file can be compiled for
309 either 32 or 64 bit support. Since all known a.out targets are 32 bits,
310 the 64 bit support may or may not work. I believe the original
311 intention was that this file would only be included by @samp{aout32.c}
312 and @samp{aout64.c}, and that other a.out targets would simply refer to
313 the functions it defined. Unfortunately, some other a.out targets
314 started including it directly, leading to a somewhat confused state of
315 affairs.
316
317 @item coffcode.h
318 @cindex @file{coffcode.h}
319 Implements basic COFF support routines. This file is included by every
320 COFF target. It implements code which handles COFF magic numbers as
321 well as various hook functions called by the generic COFF functions in
322 @file{coffgen.c}. This file is controlled by a number of different
323 macros, and more are added regularly.
324
325 @item coffswap.h
326 @cindex @file{coffswap.h}
327 Implements COFF swapping routines. This file is included by
328 @file{coffcode.h}, and thus by every COFF target. It implements the
329 routines which swap COFF structures between internal and external
330 format. The main control for this file is the external structure
331 definitions in the files in the @file{include/coff} directory. A COFF
332 target file will include one of those files before including
333 @file{coffcode.h} and thus @file{coffswap.h}. There are a few other
334 macros which affect @file{coffswap.h} as well, mostly describing whether
335 certain fields are present in the external structures.
336
337 @item ecoffswap.h
338 @cindex @file{ecoffswap.h}
339 Implements ECOFF swapping routines. This is like @file{coffswap.h}, but
340 for ECOFF. It is included by the ECOFF target files (of which there are
341 only two). The control is the preprocessor macro @samp{ECOFF_32} or
342 @samp{ECOFF_64}.
343
344 @item elfcode.h
345 @cindex @file{elfcode.h}
346 Implements ELF functions that use external structure definitions. This
347 file is included by two other files: @file{elf32.c} and @file{elf64.c}.
348 It is controlled by the @samp{ARCH_SIZE} macro which is defined to be
349 @samp{32} or @samp{64} before including it. The @samp{NAME} macro is
350 used internally to give the functions different names for the two target
351 sizes.
352
353 @item elfcore.h
354 @cindex @file{elfcore.h}
355 Like @file{elfcode.h}, but for functions that are specific to ELF core
356 files. This is included only by @file{elfcode.h}.
357
358 @item elflink.h
359 @cindex @file{elflink.h}
360 Like @file{elfcode.h}, but for functions used by the ELF linker. This
361 is included only by @file{elfcode.h}.
362
363 @item elfxx-target.h
364 @cindex @file{elfxx-target.h}
365 This file is the source for the generated files @file{elf32-target.h}
366 and @file{elf64-target.h}, one of which is included by every ELF target.
367 It defines the ELF target vector.
368
369 @item freebsd.h
370 @cindex @file{freebsd.h}
371 Presumably intended to be included by all FreeBSD targets, but in fact
372 there is only one such target, @samp{i386-freebsd}. This defines a
373 function used to set the right magic number for FreeBSD, as well as
374 various macros, and includes @file{aout-target.h}.
375
376 @item netbsd.h
377 @cindex @file{netbsd.h}
378 Like @file{freebsd.h}, except that there are several files which include
379 it.
380
381 @item nlm-target.h
382 @cindex @file{nlm-target.h}
383 Defines the target vector for a standard NLM target.
384
385 @item nlmcode.h
386 @cindex @file{nlmcode.h}
387 Like @file{elfcode.h}, but for NLM targets. This is only included by
388 @file{nlm32.c} and @file{nlm64.c}, both of which define the macro
389 @samp{ARCH_SIZE} to an appropriate value. There are no 64 bit NLM
390 targets anyhow, so this is sort of useless.
391
392 @item nlmswap.h
393 @cindex @file{nlmswap.h}
394 Like @file{coffswap.h}, but for NLM targets. This is included by each
395 NLM target, but I think it winds up compiling to the exact same code for
396 every target, and as such is fairly useless.
397
398 @item peicode.h
399 @cindex @file{peicode.h}
400 Provides swapping routines and other hooks for PE targets.
401 @file{coffcode.h} will include this rather than @file{coffswap.h} for a
402 PE target. This defines PE specific versions of the COFF swapping
403 routines, and also defines some macros which control @file{coffcode.h}
404 itself.
405 @end table
406
407 @node BFD relocation handling
408 @section BFD relocation handling
409 @cindex bfd relocation handling
410 @cindex relocations in bfd
411
412 The handling of relocations is one of the more confusing aspects of BFD.
413 Relocation handling has been implemented in various different ways, all
414 somewhat incompatible, none perfect.
415
416 @menu
417 * BFD relocation concepts:: BFD relocation concepts
418 * BFD relocation functions:: BFD relocation functions
419 * BFD relocation future:: BFD relocation future
420 @end menu
421
422 @node BFD relocation concepts
423 @subsection BFD relocation concepts
424
425 A relocation is an action which the linker must take when linking. It
426 describes a change to the contents of a section. The change is normally
427 based on the final value of one or more symbols. Relocations are
428 created by the assembler when it creates an object file.
429
430 Most relocations are simple. A typical simple relocation is to set 32
431 bits at a given offset in a section to the value of a symbol. This type
432 of relocation would be generated for code like @code{int *p = &i;} where
433 @samp{p} and @samp{i} are global variables. A relocation for the symbol
434 @samp{i} would be generated such that the linker would initialize the
435 area of memory which holds the value of @samp{p} to the value of the
436 symbol @samp{i}.
437
438 Slightly more complex relocations may include an addend, which is a
439 constant to add to the symbol value before using it. In some cases a
440 relocation will require adding the symbol value to the existing contents
441 of the section in the object file. In others the relocation will simply
442 replace the contents of the section with the symbol value. Some
443 relocations are PC relative, so that the value to be stored in the
444 section is the difference between the value of a symbol and the final
445 address of the section contents.
446
447 In general, relocations can be arbitrarily complex. For
448 example,relocations used in dynamic linking systems often require the
449 linker to allocate space in a different section and use the offset
450 within that section as the value to store. In the IEEE object file
451 format, relocations may involve arbitrary expressions.
452
453 When doing a relocateable link, the linker may or may not have to do
454 anything with a relocation, depending upon the definition of the
455 relocation. Simple relocations generally do not require any special
456 action.
457
458 @node BFD relocation functions
459 @subsection BFD relocation functions
460
461 In BFD, each section has an array of @samp{arelent} structures. Each
462 structure has a pointer to a symbol, an address within the section, an
463 addend, and a pointer to a @samp{reloc_howto_struct} structure. The
464 howto structure has a bunch of fields describing the reloc, including a
465 type field. The type field is specific to the object file format
466 backend; none of the generic code in BFD examines it.
467
468 Originally, the function @samp{bfd_perform_relocation} was supposed to
469 handle all relocations. In theory, many relocations would be simple
470 enough to be described by the fields in the howto structure. For those
471 that weren't, the howto structure included a @samp{special_function}
472 field to use as an escape.
473
474 While this seems plausible, a look at @samp{bfd_perform_relocation}
475 shows that it failed. The function has odd special cases. Some of the
476 fields in the howto structure, such as @samp{pcrel_offset}, were not
477 adequately documented.
478
479 The linker uses @samp{bfd_perform_relocation} to do all relocations when
480 the input and output file have different formats (e.g., when generating
481 S-records). The generic linker code, which is used by all targets which
482 do not define their own special purpose linker, uses
483 @samp{bfd_get_relocated_section_contents}, which for most targets turns
484 into a call to @samp{bfd_generic_get_relocated_section_contents}, which
485 calls @samp{bfd_perform_relocation}. So @samp{bfd_perform_relocation}
486 is still widely used, which makes it difficult to change, since it is
487 difficult to test all possible cases.
488
489 The assembler used @samp{bfd_perform_relocation} for a while. This
490 turned out to be the wrong thing to do, since
491 @samp{bfd_perform_relocation} was written to handle relocations on an
492 existing object file, while the assembler needed to create relocations
493 in a new object file. The assembler was changed to use the new function
494 @samp{bfd_install_relocation} instead, and @samp{bfd_install_relocation}
495 was created as a copy of @samp{bfd_perform_relocation}.
496
497 Unfortunately, the work did not progress any farther, so
498 @samp{bfd_install_relocation} remains a simple copy of
499 @samp{bfd_perform_relocation}, with all the odd special cases and
500 confusing code. This again is difficult to change, because again any
501 change can affect any assembler target, and so is difficult to test.
502
503 The new linker, when using the same object file format for all input
504 files and the output file, does not convert relocations into
505 @samp{arelent} structures, so it can not use
506 @samp{bfd_perform_relocation} at all. Instead, users of the new linker
507 are expected to write a @samp{relocate_section} function which will
508 handle relocations in a target specific fashion.
509
510 There are two helper functions for target specific relocation:
511 @samp{_bfd_final_link_relocate} and @samp{_bfd_relocate_contents}.
512 These functions use a howto structure, but they @emph{do not} use the
513 @samp{special_function} field. Since the functions are normally called
514 from target specific code, the @samp{special_function} field adds
515 little; any relocations which require special handling can be handled
516 without calling those functions.
517
518 So, if you want to add a new target, or add a new relocation to an
519 existing target, you need to do the following:
520 @itemize @bullet
521 @item
522 Make sure you clearly understand what the contents of the section should
523 look like after assembly, after a relocateable link, and after a final
524 link. Make sure you clearly understand the operations the linker must
525 perform during a relocateable link and during a final link.
526
527 @item
528 Write a howto structure for the relocation. The howto structure is
529 flexible enough to represent any relocation which should be handled by
530 setting a contiguous bitfield in the destination to the value of a
531 symbol, possibly with an addend, possibly adding the symbol value to the
532 value already present in the destination.
533
534 @item
535 Change the assembler to generate your relocation. The assembler will
536 call @samp{bfd_install_relocation}, so your howto structure has to be
537 able to handle that. You may need to set the @samp{special_function}
538 field to handle assembly correctly. Be careful to ensure that any code
539 you write to handle the assembler will also work correctly when doing a
540 relocateable link. For example, see @samp{bfd_elf_generic_reloc}.
541
542 @item
543 Test the assembler. Consider the cases of relocation against an
544 undefined symbol, a common symbol, a symbol defined in the object file
545 in the same section, and a symbol defined in the object file in a
546 different section. These cases may not all be applicable for your
547 reloc.
548
549 @item
550 If your target uses the new linker, which is recommended, add any
551 required handling to the target specific relocation function. In simple
552 cases this will just involve a call to @samp{_bfd_final_link_relocate}
553 or @samp{_bfd_relocate_contents}, depending upon the definition of the
554 relocation and whether the link is relocateable or not.
555
556 @item
557 Test the linker. Test the case of a final link. If the relocation can
558 overflow, use a linker script to force an overflow and make sure the
559 error is reported correctly. Test a relocateable link, whether the
560 symbol is defined or undefined in the relocateable output. For both the
561 final and relocateable link, test the case when the symbol is a common
562 symbol, when the symbol looked like a common symbol but became a defined
563 symbol, when the symbol is defined in a different object file, and when
564 the symbol is defined in the same object file.
565
566 @item
567 In order for linking to another object file format, such as S-records,
568 to work correctly, @samp{bfd_perform_relocation} has to do the right
569 thing for the relocation. You may need to set the
570 @samp{special_function} field to handle this correctly. Test this by
571 doing a link in which the output object file format is S-records.
572
573 @item
574 Using the linker to generate relocateable output in a different object
575 file format is impossible in the general case, so you generally don't
576 have to worry about that. Linking input files of different object file
577 formats together is quite unusual, but if you're really dedicated you
578 may want to consider testing this case, both when the output object file
579 format is the same as your format, and when it is different.
580 @end itemize
581
582 @node BFD relocation future
583 @subsection BFD relocation future
584
585 Clearly the current BFD relocation support is in bad shape. A
586 wholescale rewrite would be very difficult, because it would require
587 thorough testing of every BFD target. So some sort of incremental
588 change is required.
589
590 My vague thoughts on this would involve defining a new, clearly defined,
591 howto structure. Some mechanism would be used to determine which type
592 of howto structure was being used by a particular format.
593
594 The new howto structure would clearly define the relocation behaviour in
595 the case of an assembly, a relocateable link, and a final link. At
596 least one special function would be defined as an escape, and it might
597 make sense to define more.
598
599 One or more generic functions similar to @samp{bfd_perform_relocation}
600 would be written to handle the new howto structure.
601
602 This should make it possible to write a generic version of the relocate
603 section functions used by the new linker. The target specific code
604 would provide some mechanism (a function pointer or an initial
605 conversion) to convert target specific relocations into howto
606 structures.
607
608 Ideally it would be possible to use this generic relocate section
609 function for the generic linker as well. That is, it would replace the
610 @samp{bfd_generic_get_relocated_section_contents} function which is
611 currently normally used.
612
613 For the special case of ELF dynamic linking, more consideration needs to
614 be given to writing ELF specific but ELF target generic code to handle
615 special relocation types such as GOT and PLT.
616
617 @node Index
618 @unnumberedsec Index
619 @printindex cp
620
621 @contents
622 @bye
This page took 0.050287 seconds and 5 git commands to generate.