Commit | Line | Data |
---|---|---|
ae6cd60f KR |
1 | \input texinfo |
2 | @setfilename internals.info | |
582ffe70 KR |
3 | @node Assembler Internals |
4 | @chapter Assembler Internals | |
5 | @cindex internals | |
6 | ||
ae6cd60f KR |
7 | This documentation is not ready for prime time yet. Not even close. It's not |
8 | so much documentation as random blathering of mine intended to be notes to | |
9 | myself that may eventually be turned into real documentation. | |
10 | ||
11 | I take no responsibility for any negative effect it may have on your | |
12 | professional, personal, or spiritual life. Read it at your own risk. Caveat | |
13 | emptor. Delete before reading. Abandon all hope, ye who enter here. | |
14 | ||
15 | However, enhancements will be gratefully accepted. | |
16 | ||
582ffe70 KR |
17 | @menu |
18 | * Data types:: Data types | |
19 | @end menu | |
20 | ||
21 | @node foo | |
22 | @section foo | |
23 | ||
24 | BFD_ASSEMBLER | |
25 | BFD, MANY_SECTIONS, BFD_HEADERS | |
26 | ||
27 | ||
28 | @node Data types | |
29 | @section Data types | |
30 | @cindex internals, data types | |
31 | ||
ae6cd60f | 32 | @subsection Symbols |
582ffe70 KR |
33 | @cindex internals, symbols |
34 | @cindex symbols, internal | |
35 | ||
36 | ... `local' symbols ... flags ... | |
37 | ||
ae6cd60f KR |
38 | The definition for @code{struct symbol}, also known as @code{symbolS}, is |
39 | located in @file{struc-symbol.h}. Symbol structures can contain the following | |
40 | fields: | |
582ffe70 KR |
41 | |
42 | @table @code | |
43 | @item sy_value | |
ae6cd60f KR |
44 | This is an @code{expressionS} that describes the value of the symbol. It might |
45 | refer to another symbol; if so, its true value may not be known until | |
46 | @code{foo} is called. | |
582ffe70 | 47 | |
ae6cd60f KR |
48 | More generally, however, ... undefined? ... or an offset from the start of a |
49 | frag pointed to by the @code{sy_frag} field. | |
582ffe70 KR |
50 | |
51 | @item sy_resolved | |
ae6cd60f KR |
52 | This field is non-zero if the symbol's value has been completely resolved. It |
53 | is used during the final pass over the symbol table. | |
582ffe70 KR |
54 | |
55 | @item sy_resolving | |
56 | This field is used to detect loops while resolving the symbol's value. | |
57 | ||
58 | @item sy_used_in_reloc | |
ae6cd60f KR |
59 | This field is non-zero if the symbol is used by a relocation entry. If a local |
60 | symbol is used in a relocation entry, it must be possible to redirect those | |
61 | relocations to other symbols, or this symbol cannot be removed from the final | |
62 | symbol list. | |
582ffe70 KR |
63 | |
64 | @item sy_next | |
65 | @itemx sy_previous | |
ae6cd60f KR |
66 | These pointers to other @code{symbolS} structures describe a singly or doubly |
67 | linked list. (If @code{SYMBOLS_NEED_BACKPOINTERS} is not defined, the | |
68 | @code{sy_previous} field will be omitted.) These fields should be accessed | |
69 | with @code{symbol_next} and @code{symbol_previous}. | |
582ffe70 KR |
70 | |
71 | @item sy_frag | |
72 | This points to the @code{fragS} that this symbol is attached to. | |
73 | ||
74 | @item sy_used | |
ae6cd60f KR |
75 | Whether the symbol is used as an operand or in an expression. Note: Not all of |
76 | the backends keep this information accurate; backends which use this bit are | |
77 | responsible for setting it when a symbol is used in backend routines. | |
582ffe70 KR |
78 | |
79 | @item bsym | |
ae6cd60f KR |
80 | If @code{BFD_ASSEMBLER} is defined, this points to the @code{asymbol} that will |
81 | be used in writing the object file. | |
582ffe70 KR |
82 | |
83 | @item sy_name_offset | |
ae6cd60f KR |
84 | (Only used if @code{BFD_ASSEMBLER} is not defined.) This is the position of |
85 | the symbol's name in the symbol table of the object file. On some formats, | |
86 | this will start at position 4, with position 0 reserved for unnamed symbols. | |
87 | This field is not used until @code{write_object_file} is called. | |
582ffe70 KR |
88 | |
89 | @item sy_symbol | |
ae6cd60f KR |
90 | (Only used if @code{BFD_ASSEMBLER} is not defined.) This is the |
91 | format-specific symbol structure, as it would be written into the object file. | |
582ffe70 KR |
92 | |
93 | @item sy_number | |
ae6cd60f KR |
94 | (Only used if @code{BFD_ASSEMBLER} is not defined.) This is a 24-bit symbol |
95 | number, for use in constructing relocation table entries. | |
582ffe70 KR |
96 | |
97 | @item sy_obj | |
ae6cd60f KR |
98 | This format-specific data is of type @code{OBJ_SYMFIELD_TYPE}. If no macro by |
99 | that name is defined in @file{obj-format.h}, this field is not defined. | |
582ffe70 KR |
100 | |
101 | @item sy_tc | |
ae6cd60f KR |
102 | This processor-specific data is of type @code{TC_SYMFIELD_TYPE}. If no macro |
103 | by that name is defined in @file{targ-cpu.h}, this field is not defined. | |
582ffe70 KR |
104 | |
105 | @item TARGET_SYMBOL_FIELDS | |
ae6cd60f KR |
106 | If this macro is defined, it defines additional fields in the symbol structure. |
107 | This macro is obsolete, and should be replaced when possible by uses of | |
108 | @code{OBJ_SYMFIELD_TYPE} and @code{TC_SYMFIELD_TYPE}. | |
582ffe70 KR |
109 | |
110 | @end table | |
111 | ||
ae6cd60f | 112 | Access with S_SET_SEGMENT, S_SET_VALUE, S_GET_VALUE, S_GET_SEGMENT, etc., etc. |
582ffe70 | 113 | |
ae6cd60f | 114 | @subsection Expressions |
582ffe70 KR |
115 | @cindex internals, expressions |
116 | @cindex expressions, internal | |
117 | ||
118 | Expressions are stored as a combination of operator, symbols, blah. | |
119 | ||
ae6cd60f | 120 | @subsection Fixups |
582ffe70 KR |
121 | @cindex internals, fixups |
122 | @cindex fixups | |
123 | ||
ae6cd60f | 124 | @subsection Frags |
582ffe70 KR |
125 | @cindex internals, frags |
126 | @cindex frags | |
127 | ||
ae6cd60f KR |
128 | The frag is the basic unit for storing section contents. |
129 | ||
130 | @table @code | |
131 | ||
132 | @item fr_address | |
133 | The address of the frag. This is not set until the assembler rescans the list | |
134 | of all frags after the entire input file is parsed. The function | |
135 | @code{relax_segment} fills in this field. | |
136 | ||
137 | @item fr_next | |
138 | Pointer to the next frag in this (sub)section. | |
139 | ||
140 | @item fr_fix | |
141 | Fixed number of characters we know we're going to emit to the output file. May | |
142 | be zero. | |
143 | ||
144 | @item fr_var | |
145 | Variable number of characters we may output, after the initial @code{fr_fix} | |
146 | characters. May be zero. | |
147 | ||
148 | @item fr_symbol | |
149 | @itemx fr_offset | |
150 | Foo. | |
151 | ||
152 | @item fr_opcode | |
153 | Points to the lowest-addressed byte of the opcode, for use in relaxation. | |
154 | ||
155 | @item line | |
156 | Holds line-number info. | |
157 | ||
158 | @item fr_type | |
159 | Relaxation state. This field indicates the interpretation of @code{fr_offset}, | |
160 | @code{fr_symbol} and the variable-length tail of the frag, as well as the | |
161 | treatment it gets in various phases of processing. It does not affect the | |
162 | initial @code{fr_fix} characters; they are always supposed to be output | |
163 | verbatim (fixups aside). See below for specific values this field can have. | |
164 | ||
165 | @item fr_subtype | |
166 | Relaxation substate. If the macro @code{md_relax_frag} isn't defined, this is | |
167 | assumed to be an index into @code{md_relax_table} for the generic relaxation | |
168 | code to process. (@xref{Relaxation}.) If @code{md_relax_frag} is defined, | |
169 | this field is available for any use by the CPU-specific code. | |
170 | ||
171 | @item align_mask | |
172 | @itemx align_offset | |
173 | These fields are not used yet. They are intended to keep track of the | |
174 | alignment of the current frag within its section, even if the exact offset | |
175 | isn't known. In many cases, we should be able to avoid creating extra frags | |
176 | when @code{.align} directives are given; instead, the number of bytes needed | |
177 | may be computable when the @code{.align} directive is processed. Hmm. Is this | |
178 | the right place for these, or should they be in the @code{frchainS} structure? | |
179 | ||
180 | @item fr_pcrel_adjust | |
181 | @itemx fr_bsr | |
182 | These fields are only used in the NS32k configuration. But since @code{struct | |
183 | frag} is defined before the CPU-specific header files are included, they must | |
184 | unconditionally be defined. | |
185 | ||
186 | @item fr_literal | |
187 | Declared as a one-character array, this last field grows arbitrarily large to | |
188 | hold the actual contents of the frag. | |
189 | ||
190 | @end table | |
191 | ||
192 | These are the possible relaxation states, provided in the enumeration type | |
193 | @code{relax_stateT}, and the interpretations they represent for the other | |
194 | fields: | |
195 | ||
196 | @table @code | |
197 | ||
198 | @item rs_align | |
199 | The start of the following frag should be aligned on some boundary. In this | |
200 | frag, @code{fr_offset} is the logarithm (base 2) of the alignment in bytes. | |
201 | (For example, if alignment on an 8-byte boundary were desired, @code{fr_offset} | |
202 | would have a value of 3.) The variable characters indicate the fill pattern to | |
203 | be used. (More than one?) | |
204 | ||
205 | @item rs_broken_word | |
206 | This indicates that ``broken word'' processing should be done. @xref{Broken | |
207 | Words,,Broken Words}. If broken word processing is not necessary on the target | |
208 | machine, this enumerator value will not be defined. | |
209 | ||
210 | @item rs_fill | |
211 | The variable characters are to be repeated @code{fr_offset} times. If | |
212 | @code{fr_offset} is 0, this frag has a length of @code{fr_fix}. | |
213 | ||
214 | @item rs_machine_dependent | |
215 | Displacement relaxation is to be done on this frag. The target is indicated by | |
216 | @code{fr_symbol} and @code{fr_offset}, and @code{fr_subtype} indicates the | |
217 | particular machine-specific addressing mode desired. @xref{Relaxation}. | |
218 | ||
219 | @item rs_org | |
220 | The start of the following frag should be pushed back to some specific offset | |
221 | within the section. (Some assemblers use the value as an absolute address; the | |
222 | @sc{gnu} assembler does not handle final absolute addresses, it requires that | |
223 | the linker set them.) The offset is given by @code{fr_symbol} and | |
224 | @code{fr_offset}; one character from the variable-length tail is used as the | |
225 | fill character. | |
226 | ||
227 | @end table | |
228 | ||
229 | A chain of frags is built up for each subsection. The data structure | |
230 | describing a chain is called a @code{frchainS}, and contains the following | |
231 | fields: | |
232 | ||
233 | @table @code | |
234 | @item frch_root | |
235 | Points to the first frag in the chain. May be null if there are no frags in | |
236 | this chain. | |
237 | @item frch_last | |
238 | Points to the last frag in the chain, or null if there are none. | |
239 | @item frch_next | |
240 | Next in the list of @code{frchainS} structures. | |
241 | @item frch_seg | |
242 | Indicates the section this frag chain belongs to. | |
243 | @item frch_subseg | |
244 | Subsection (subsegment) number of this frag chain. | |
245 | @item fix_root, fix_tail | |
246 | (Defined only if @code{BFD_ASSEMBLER} is defined.) Point to first and last | |
247 | @code{fixS} structures associated with this subsection. | |
248 | @item frch_obstack | |
249 | Not currently used. Intended to be used for frag allocation for this | |
250 | subsection. This should reduce frag generation caused by switching sections. | |
251 | @end table | |
252 | ||
253 | A @code{frchainS} corresponds to a subsection; each section has a list of | |
254 | @code{frchainS} records associated with it. In most cases, only one subsection | |
255 | of each section is used, so the list will only be one element long, but any | |
256 | processing of frag chains should be prepared to deal with multiple chains per | |
257 | section. | |
258 | ||
259 | After the input files have been completely processed, and no more frags are to | |
260 | be generated, the frag chains are joined into one per section for further | |
261 | processing. After this point, it is safe to operate on one chain per section. | |
262 | ||
263 | @node Broken Words | |
264 | @subsection Broken Words | |
582ffe70 KR |
265 | @cindex internals, broken words |
266 | @cindex broken words | |
267 | @cindex promises, promises | |
268 | ||
ae6cd60f KR |
269 | The ``broken word'' idea derives from the fact that some compilers, including |
270 | @code{gcc}, will sometimes emit switch tables specifying 16-bit @code{.word} | |
271 | displacements to branch targets, and branch instructions that load entries from | |
272 | that table to compute the target address. If this is done on a 32-bit machine, | |
273 | there is a chance (at least with really large functions) that the displacement | |
274 | will not fit in 16 bits. Thus the ``broken word'' idea is well named, since | |
275 | there is an implied promise that the 16-bit field will in fact hold the | |
276 | specified displacement. | |
277 | ||
278 | If the ``broken word'' processing is enabled, and a situation like this is | |
279 | encountered, the assembler will insert a jump instruction into the instruction | |
280 | stream, close enough to be reached with the 16-bit displacement. This jump | |
281 | instruction will transfer to the real desired target address. Thus, as long as | |
282 | the @code{.word} value really is used as a displacement to compute an address | |
283 | to jump to, the net effect will be correct (minus a very small efficiency | |
284 | cost). If @code{.word} directives with label differences for values are used | |
285 | for other purposes, however, things may not work properly. I think there is a | |
286 | command-line option to turn on warnings when a broken word is discovered. | |
287 | ||
288 | This code is turned off by the @code{WORKING_DOT_WORD} macro. It isn't needed | |
289 | if @code{.word} emits a value large enough to contain an address (or, more | |
290 | correctly, any possible difference between two addresses). | |
291 | ||
582ffe70 KR |
292 | @node What Happens? |
293 | @section What Happens? | |
294 | ||
ae6cd60f KR |
295 | Blah blah blah, initialization, argument parsing, file reading, whitespace |
296 | munging, opcode parsing and lookup, operand parsing. Now it's time to write | |
297 | the output file. | |
582ffe70 KR |
298 | |
299 | In @code{BFD_ASSEMBLER} mode, processing of relocations and symbols and | |
ae6cd60f | 300 | creation of the output file is initiated by calling @code{write_object_file}. |
582ffe70 KR |
301 | |
302 | @node Target Dependent Definitions | |
303 | @section Target Dependent Definitions | |
304 | ||
ae6cd60f KR |
305 | @subsection Format-specific definitions |
306 | ||
307 | @defmac obj_sec_sym_ok_for_reloc (section) | |
308 | (@code{BFD_ASSEMBLER} only.) | |
309 | Is it okay to use this section's section-symbol in a relocation entry? If not, | |
310 | a new internal-linkage symbol is generated and emitted if such a relocation | |
311 | entry is needed. (Default: Always use a new symbol.) | |
312 | ||
313 | @end defmac | |
582ffe70 | 314 | |
ae6cd60f | 315 | @defmac obj_adjust_symtab |
582ffe70 | 316 | (@code{BFD_ASSEMBLER} only.) |
ae6cd60f KR |
317 | If this macro is defined, it is invoked just before setting the symbol table of |
318 | the output BFD. Any finalizing changes needed in the symbol table should be | |
319 | done here. For example, in the COFF support, if there is no @code{.file} | |
320 | symbol defined already, one is generated at this point. If no such adjustments | |
321 | are needed, this macro need not be defined. | |
322 | ||
323 | @end defmac | |
582ffe70 KR |
324 | |
325 | @defmac EMIT_SECTION_SYMBOLS | |
326 | (@code{BFD_ASSEMBLER} only.) | |
327 | Should section symbols be included in the symbol list if they're used in | |
ae6cd60f KR |
328 | relocations? Some formats can generate section-relative relocations, and thus |
329 | don't need symbols emitted for them. (Default: 1.) | |
330 | @end defmac | |
331 | ||
332 | @defmac obj_frob_file | |
333 | Any final cleanup needed before writing out the BFD may be done here. For | |
334 | example, ECOFF formats (and MIPS ELF format) may do some work on the MIPS-style | |
335 | symbol table with its integrated debug information. The symbol table should | |
336 | not be modified at this time. | |
337 | @end defmac | |
338 | ||
339 | @subsection CPU-specific definitions | |
340 | ||
341 | @node Relaxation | |
342 | @subsubsection Relaxation | |
343 | @cindex Relaxation | |
344 | ||
345 | If @code{md_relax_frag} isn't defined, the assembler will perform some | |
346 | relaxation on @code{rs_machine_dependent} frags based on the frag subtype and | |
347 | the displacement to some specified target address. The basic idea is that many | |
348 | machines have different addressing modes for instructions that can specify | |
349 | different ranges of values, with successive modes able to access wider ranges, | |
350 | including the entirety of the previous range. Smaller ranges are assumed to be | |
351 | more desirable (perhaps the instruction requires one word instead of two or | |
352 | three); if this is not the case, don't describe the smaller-range, inferior | |
353 | mode. | |
354 | ||
355 | The @code{fr_subtype} and the field of a frag is an index into a CPU-specific | |
356 | relaxation table. That table entry indicates the range of values that can be | |
357 | stored, the number of bytes that will have to be added to the frag to | |
358 | accomodate the addressing mode, and the index of the next entry to examine if | |
359 | the value to be stored is outside the range accessible by the current | |
360 | addressing mode. The @code{fr_symbol} field of the frag indicates what symbol | |
361 | is to be accessed; the @code{fr_offset} field is added in. | |
362 | ||
363 | If the @code{fr_pcrel_adjust} field is set, which currently should only happen | |
364 | for the NS32k family, the @code{TC_PCREL_ADJUST} macro is called on the frag to | |
365 | compute an adjustment to be made to the displacement. | |
366 | ||
367 | The value fitted by the relaxation code is always assumed to be a displacement | |
368 | from the current frag. (More specifically, from @code{fr_fix} bytes into the | |
369 | frag.) This seems kinda silly. What about fitting small absolute values? I | |
370 | suppose @code{md_assemble} is supposed to take care of that, but if the operand | |
371 | is a difference between symbols, it might not be able to, if the difference was | |
372 | not computable yet. | |
373 | ||
374 | The end of the relaxation sequence is indicated by a ``next'' value of 0. This | |
375 | is kinda silly too, since it means that the first entry in the table can't be | |
376 | used. I think -1 would make a more logical sentinel value. | |
377 | ||
378 | The table @code{md_relax_table} from @file{targ-cpu.c} describes the relaxation | |
379 | modes available. Currently this must always be provided, even on machines for | |
380 | which this type of relaxation isn't possible or practical. Probably fewer than | |
381 | half the machines gas supports used it; it ought to be made conditional on some | |
382 | CPU-specific macro. Currently, also that table must be declared ``const;'' on | |
383 | some machines, though, it might make sense to keep it writeable, so it can be | |
384 | modified depending on which CPU of a family is specified. For example, in the | |
385 | m68k family, the 68020 has some addressing modes that are not available on the | |
386 | 68000. | |
387 | ||
388 | The relaxation table type contains these fields: | |
389 | ||
390 | @table @code | |
391 | @item long rlx_forward | |
392 | Forward reach, must be non-negative. | |
393 | @item long rlx_backward | |
394 | Backward reach, must be zero or negative. | |
395 | @item rlx_length | |
396 | Length in bytes of this addressing mode. | |
397 | @item rlx_more | |
398 | Index of the next-longer relax state, or zero if there is no ``next'' | |
399 | relax state. | |
400 | @end table | |
401 | ||
402 | The relaxation is done in @code{relax_segment} in @file{write.c}. The | |
403 | difference in the length fields between the original mode and the one finally | |
404 | chosen by the relaxing code is taken as the size by which the current frag will | |
405 | be increased in size. For example, if the initial relaxing mode has a length | |
406 | of 2 bytes, and because of the size of the displacement, it gets upgraded to a | |
407 | mode with a size of 6 bytes, it is assumed that the frag will grow by 4 bytes. | |
408 | (The initial two bytes should have been part of the fixed portion of the frag, | |
409 | since it is already known that they will be output.) This growth must be | |
410 | effected by @code{md_convert_frag}; it should increase the @code{fr_fix} field | |
411 | by the appropriate size, and fill in the appropriate bytes of the frag. | |
412 | (Enough space for the maximum growth should have been allocated in the call to | |
413 | frag_var as the second argument.) | |
414 | ||
415 | If relocation records are needed, they should be emitted by | |
416 | @code{md_estimate_size_before_relax}. | |
417 | ||
418 | These are the machine-specific definitions associated with the relaxation | |
419 | mechanism: | |
420 | ||
421 | @deftypefun int md_estimate_size_before_relax (fragS *@var{frag}, segT @var{sec}) | |
422 | This function should examine the target symbol of the supplied frag and correct | |
423 | the @code{fr_subtype} of the frag if needed. When this function is called, if | |
424 | the symbol has not yet been defined, it will not become defined later; however, | |
425 | its value may still change if the section it is in gets relaxed. | |
426 | ||
427 | Usually, if the symbol is in the same section as the frag (given by the | |
428 | @var{sec} argument), the narrowest likely relaxation mode is stored in | |
429 | @code{fr_subtype}, and that's that. | |
430 | ||
431 | If the symbol is undefined, or in a different section (and therefore moveable | |
432 | to an arbitrarily large distance), the largest available relaxation mode is | |
433 | specified, @code{fix_new} is called to produce the relocation record, | |
434 | @code{fr_fix} is increased to include the relocated field (remember, this | |
435 | storage was allocated when @code{frag_var} was called), and @code{frag_wane} is | |
436 | called to convert the frag to an @code{rs_fill} frag with no variant part. | |
437 | Sometimes changing addressing modes may also require rewriting the instruction. | |
438 | It can be accessed via @code{fr_opcode} or @code{fr_fix}. | |
439 | ||
440 | Sometimes @code{fr_var} is increased instead, and @code{frag_wane} is not | |
441 | called. I'm not sure, but I think this is to keep @code{fr_fix} referring to | |
442 | an earlier byte, and @code{fr_subtype} set to @code{rs_machine_dependent} so | |
443 | that @code{md_convert_frag} will get called. | |
444 | @end deftypefun | |
445 | ||
446 | @deftypevar relax_typeS md_relax_table [] | |
447 | This is the table. | |
448 | @end deftypevar | |
449 | ||
450 | @defmac md_relax_frag (@var{frag}) | |
451 | ||
452 | This macro, if defined, overrides all of the processing described above. It's | |
453 | only defined for the MIPS target CPU, and there it doesn't do anything; it's | |
454 | used solely to disable the relaxing code and free up the @code{fr_subtype} | |
455 | field for use by the CPU-specific code. | |
456 | ||
457 | @end defmac | |
458 | ||
459 | @defmac tc_frob_file | |
460 | Like @code{obj_frob_file}, this macro handles miscellaneous last-minute | |
461 | cleanup. Currently only used on PowerPC/POWER support, for setting up a | |
462 | @code{.debug} section. This macro should not cause the symbol table to be | |
463 | modified. | |
464 | ||
465 | @end defmac | |
582ffe70 KR |
466 | |
467 | @node Source File Summary | |
468 | @section Source File Summary | |
469 | ||
ae6cd60f KR |
470 | @subsection File Format Descriptions |
471 | ||
472 | @subheading a.out | |
473 | ||
474 | The @code{a.out} format is described by @file{obj-aout.*}. | |
475 | ||
476 | @subheading b.out | |
477 | ||
478 | The @code{b.out} format, described by @file{obj-bout.*}, is similar to | |
479 | @code{a.out} format, except for a few additional fields in the file header | |
480 | describing section alignment and address. | |
481 | ||
482 | @subheading COFF | |
582ffe70 KR |
483 | |
484 | Originally, @file{obj-coff} was a purely non-BFD version, and | |
ae6cd60f KR |
485 | @file{obj-coffbfd} was created to use BFD for low-level byte-swapping. When |
486 | the @code{BFD_ASSEMBLER} conversion started, the first COFF target to be | |
487 | converted was using @file{obj-coff}, and the two files had diverged somewhat, | |
488 | and I didn't feel like first converting the support of that target over to use | |
489 | the low-level BFD interface. | |
490 | ||
491 | So @file{obj-coff} got converted, and to simplify certain things, | |
492 | @file{obj-coffbfd} got ``merged'' in with a brute-force approach. | |
493 | Specifically, preprocessor conditionals testing for @code{BFD_ASSEMBLER} | |
494 | effectively split the @file{obj-coff} files into the two separate versions. It | |
495 | isn't pretty. They will be merged more thoroughly, and eventually only the | |
496 | higher-level interface will be used. | |
497 | ||
498 | @subheading ECOFF | |
499 | ||
500 | All ECOFF configurations use BFD for writing object files. | |
501 | ||
502 | @subheading ELF | |
503 | ||
504 | ELF is a fairly reasonable format, without many of the deficiencies the other | |
505 | object file formats have. (It's got some of its own, but not as bad as the | |
506 | others.) All ELF configurations use BFD for writing object files. | |
507 | ||
508 | @subheading EVAX | |
509 | ||
510 | This is the format used on VMS. Yes, someone has actually written BFD support | |
511 | for it. The code hasn't been integrated yet though. | |
512 | ||
513 | @subheading HP300? | |
514 | ||
515 | @subheading IEEE? | |
516 | ||
517 | @subheading SOM | |
518 | ||
519 | @subheading XCOFF | |
520 | ||
521 | The XCOFF configuration is based on the COFF cofiguration (using the | |
522 | higher-level BFD interface). In fact, it uses the same files in the assembler. | |
523 | ||
524 | @subheading VMS | |
525 | ||
526 | This is the old Vax VMS support. It doesn't use BFD. | |
527 | ||
528 | @subsection Processor Descriptions | |
582ffe70 | 529 | |
ae6cd60f KR |
530 | Foo: a29k, alpha, h8300, h8500, hppa, i386, i860, i960, m68k, m88k, mips, |
531 | ns32k, ppc, sh, sparc, tahoe, vax, z8k. | |
532 | ||
533 | @node M68k | |
534 | @subsubsection M68k | |
535 | ||
536 | The operand syntax handling is atrocious. There is no clear specification of | |
537 | the operand syntax. I'm looking into using a Bison grammar to replace much of | |
538 | it. | |
539 | ||
540 | Operands on the 68k series processors can have two displacement values | |
541 | specified, plus a base register and a (possibly scaled) index register of which | |
542 | only some bits might be used. Thus a single 68k operand requires up to two | |
543 | expressions, two register numbers, and size and scale factors. The | |
544 | @code{struct m68k_op} type also includes a field indicating the mode of the | |
545 | operand, and an @code{error} field indicating a problem encountered while | |
546 | parsing the operand. | |
547 | ||
548 | An instruction on the 68k may have up to 6 operands, although most of them have | |
549 | to be simple register operands. Up to 11 (16-bit) words may be required to | |
550 | express the instruction. | |
551 | ||
552 | A @code{struct m68k_exp} expression contains an @code{expressionS}, pointers to | |
553 | the first and last characters of the input that produced the expression, an | |
554 | indication of the section to which the expression belongs, and a size field. | |
555 | I'm not sure what the size field describes. | |
556 | ||
557 | @subsubheading M68k addressing modes | |
558 | ||
559 | Many instructions used the low six bits of the first instruction word to | |
560 | describe the location of the operand, or how to compute the location. The six | |
561 | bits are typically split into three for a ``mode'' and three for a ``register'' | |
562 | value. The interpretation of these values is as follows: | |
563 | ||
564 | @example | |
565 | Mode Register Operand addressing mode | |
566 | 0 Dn data register | |
567 | 1 An address register | |
568 | 2 An indirect | |
569 | 3 An indirect, post-increment | |
570 | 4 An indirect, pre-decrement | |
571 | 5 An indirect with displacement | |
572 | 6 An indirect with optional displacement and index; | |
573 | may involve multiple indirections and two | |
574 | displacements | |
575 | 7 0 16-bit address follows | |
576 | 7 1 32-bit address follows | |
577 | 7 2 PC indirect with displacement | |
578 | 7 3 PC indirect with optional displacements and index | |
579 | 7 4 immediate 16- or 32-bit | |
580 | 7 5,6,7 Reserved | |
581 | @end example | |
582 | ||
583 | On the 68000 and 68010, support for modes 6 and 7.3 are incomplete; the | |
584 | displacement must fit in 8 bits, and no scaling or index suppression is | |
585 | permitted. | |
586 | ||
587 | @subsubheading M68k relaxation modes | |
588 | ||
589 | The relaxation modes used on the 68k are: | |
590 | ||
591 | @table @code | |
592 | @item ABRANCH | |
593 | Case @samp{g} except when @code{BCC68000} is applicable. | |
594 | @item FBRANCH | |
595 | Coprocessor branches. | |
596 | @item PCREL | |
597 | Mode 7.2 -- program counter indirect with 16-bit displacement. This is | |
598 | available on all processors. Widens to 32-bit absolute. Used only if the | |
599 | original code used @code{ABSL} mode, and the CPU is not a 68000 or 68010. | |
600 | (Why? Those processors support mode 7.2.) | |
601 | @item BCC68000 | |
602 | A conditional branch instruction, on the 68000 or 68010. These instructions | |
603 | support only 16-bit displacements on these processors. If a larger | |
604 | displacement is needed, the condition is negated and turned into a short branch | |
605 | around a jump instruction to the specified target. This jump will have an | |
606 | long absolute addressing mode. | |
607 | @item DBCC | |
608 | Like @code{BCC68000}, but for @code{dbCC} (decrement and branch on condition) | |
609 | instructions. | |
610 | @item PCLEA | |
611 | Not currently used?? Short form is mode 7.2 (program counter indirect, 16-bit | |
612 | displacement); long form is 7.3/0x0170 (program counter indirect, suppressed | |
613 | index register, 32-bit displacement). Used in progressive-930331 for mode | |
614 | @code{AOFF} with a PC-relative addressing mode and a displacement that won't | |
615 | fit in 16 bits, or which is variable and is not specified to have a size other | |
616 | than long. | |
617 | @item PCINDEX | |
618 | Newly added. PC indirect with index. An 8-bit displacement is supported on | |
619 | the 68000 and 68010, wider displacements on later processors. | |
620 | ||
621 | Well, actually, I haven't added it yet. I need to soon, though. It fixes a | |
622 | bug reported by a customer. | |
623 | @end table | |
624 | ||
625 | @subsection ``Emulation'' Descriptions | |
626 | ||
627 | These are the @file{te-*.h} files. | |
ed307a20 KR |
628 | |
629 | @node Foo | |
630 | @section Foo | |
631 | ||
632 | @subsection Warning and Error Messages | |
633 | ||
634 | @deftypefun int had_warnings (void) | |
635 | @deftypefunx int had_errors (void) | |
636 | ||
ae6cd60f KR |
637 | Returns non-zero if any warnings or errors, respectively, have been printed |
638 | during this invocation. | |
ed307a20 KR |
639 | |
640 | @end deftypefun | |
641 | ||
ae6cd60f | 642 | @deftypefun void as_perror (const char *@var{gripe}, const char *@var{filename}) |
ed307a20 KR |
643 | |
644 | Displays a BFD or system error, then clears the error status. | |
645 | ||
646 | @end deftypefun | |
647 | ||
648 | @deftypefun void as_tsktsk (const char *@var{format}, ...) | |
649 | @deftypefunx void as_warn (const char *@var{format}, ...) | |
650 | @deftypefunx void as_bad (const char *@var{format}, ...) | |
651 | @deftypefunx void as_fatal (const char *@var{format}, ...) | |
652 | ||
ae6cd60f KR |
653 | These functions display messages about something amiss with the input file, or |
654 | internal problems in the assembler itself. The current file name and line | |
655 | number are printed, followed by the supplied message, formatted using | |
656 | @code{vfprintf}, and a final newline. | |
657 | ||
658 | An error indicated by @code{as_bad} will result in a non-zero exit status when | |
659 | the assembler has finished. Calling @code{as_fatal} will result in immediate | |
660 | termination of the assembler process. | |
ed307a20 KR |
661 | |
662 | @end deftypefun | |
663 | ||
664 | @deftypefun void as_warn_where (char *@var{file}, unsigned int @var{line}, const char *@var{format}, ...) | |
665 | @deftypefunx void as_bad_where (char *@var{file}, unsigned int @var{line}, const char *@var{format}, ...) | |
666 | ||
ae6cd60f KR |
667 | These variants permit specification of the file name and line number, and are |
668 | used when problems are detected when reprocessing information saved away when | |
669 | processing some earlier part of the file. For example, fixups are processed | |
670 | after all input has been read, but messages about fixups should refer to the | |
671 | original filename and line number that they are applicable to. | |
ed307a20 KR |
672 | |
673 | @end deftypefun | |
674 | ||
675 | @deftypefun void fprint_value (FILE *@var{file}, valueT @var{val}) | |
676 | @deftypefunx void sprint_value (char *@var{buf}, valueT @var{val}) | |
677 | ||
ae6cd60f KR |
678 | These functions are helpful for converting a @code{valueT} value into printable |
679 | format, in case it's wider than modes that @code{*printf} can handle. If the | |
680 | type is narrow enough, a decimal number will be produced; otherwise, it will be | |
681 | in hexadecimal (FIXME: currently without `0x' prefix). The value itself is not | |
682 | examined to make this determination. | |
ed307a20 KR |
683 | |
684 | @end deftypefun | |
ae6cd60f KR |
685 | |
686 | @node Writing a new target | |
687 | @section Writing a new target | |
688 | ||
689 | @node Test suite | |
690 | @section Test suite | |
691 | @cindex test suite | |
692 | ||
693 | The test suite is kind of lame for most processors. Often it only checks to | |
694 | see if a couple of files can be assembled without the assembler reporting any | |
695 | errors. For more complete testing, write a test which either examines the | |
696 | assembler listing, or runs @code{objdump} and examines its output. For the | |
697 | latter, the TCL procedure @code{run_dump_test} may come in handy. It takes the | |
698 | base name of a file, and looks for @file{@var{file}.d}. This file should | |
699 | contain as its initial lines a set of variable settings in @samp{#} comments, | |
700 | in the form: | |
701 | ||
702 | @example | |
703 | #@var{varname}: @var{value} | |
704 | @end example | |
705 | ||
706 | The @var{varname} may be @code{objdump}, @code{nm}, or @code{as}, in which case | |
707 | it specifies the options to be passed to the specified programs. Exactly one | |
708 | of @code{objdump} or @code{nm} must be specified, as that also specifies which | |
709 | program to run after the assembler has finished. If @var{varname} is | |
710 | @code{source}, it specifies the name of the source file; otherwise, | |
711 | @file{@var{file}.s} is used. If @var{varname} is @code{name}, it specifies the | |
712 | name of the test to be used in the @code{pass} or @code{fail} messages. | |
713 | ||
714 | The non-commented parts of the file are interpreted as regular expressions, one | |
715 | per line. Blank lines in the @code{objdump} or @code{nm} output are skipped, | |
716 | as are blank lines in the @code{.d} file; the other lines are tested to see if | |
717 | the regular expression matches the program output. If it does not, the test | |
718 | fails. | |
719 | ||
720 | Note that this means the tests must be modified if the @code{objdump} output | |
721 | style is changed. | |
722 | ||
723 | @bye | |
724 | @c Local Variables: | |
725 | @c fill-column: 79 | |
726 | @c End: |