Add AMD bdver3 support.

[deliverable/binutils-gdb.git] / gas / doc / c-i386.texi
diff --git a/gas/doc/c-i386.texi b/gas/doc/c-i386.texi

index 43bee797317896b219bbd2cf33cf36baf64be4b2..4ee8d7a23124795c07f4eae4862bf4b29e489407 100644 (file)
--- a/gas/doc/c-i386.texi
+++ b/gas/doc/c-i386.texi
@@ -1,6 +1,10 @@
-@c Copyright (C) 1991, 92, 93, 94, 95, 97, 1998 Free Software Foundation, Inc.
+@c Copyright 1991, 1992, 1993, 1994, 1995, 1997, 1998, 1999, 2000,
+@c 2001, 2003, 2004, 2005, 2006, 2007, 2008, 2009, 2011
+@c Free Software Foundation, Inc.
  @c This is part of the GAS manual.
  @c For copying conditions, see the file as.texinfo.
+@c man end
+
  @ifset GENERIC
  @page
  @node i386-Dependent
@@ -12,7 +16,7 @@
  @end ifclear
  
  @cindex i386 support
-@cindex i80306 support
+@cindex i80386 support
  @cindex x86-64 support
  
  The i386 version @code{@value{AS}} supports both the original Intel 386
@@ -21,14 +25,18 @@ extending the Intel architecture to 64-bits.
  
  @menu
  * i386-Options::                Options
-* i386-Syntax::                 AT&T Syntax versus Intel Syntax
+* i386-Directives::             X86 specific directives
+* i386-Syntax::                 Syntactical considerations
  * i386-Mnemonics::              Instruction Naming
  * i386-Regs::                   Register Naming
  * i386-Prefixes::               Instruction Prefixes
  * i386-Memory::                 Memory References
-* i386-jumps::                  Handling of Jump Instructions
+* i386-Jumps::                  Handling of Jump Instructions
  * i386-Float::                  Floating Point
  * i386-SIMD::                   Intel's MMX and AMD's 3DNow! SIMD Operations
+* i386-LWP::                    AMD's Lightweight Profiling Instructions
+* i386-BMI::                    Bit Manipulation Instruction
+* i386-TBM::                    AMD's Trailing Bit Manipulation Instructions
  * i386-16bit::                  Writing 16-bit Code
  * i386-Arch::                   Specifying an x86 CPU architecture
  * i386-Bugs::                   AT&T Syntax bugs
@@ -46,24 +54,237 @@ extending the Intel architecture to 64-bits.
  The i386 version of @code{@value{AS}} has a few machine
  dependent options:
  
-@table @code
+@c man begin OPTIONS
+@table @gcctabopt
  @cindex @samp{--32} option, i386
  @cindex @samp{--32} option, x86-64
+@cindex @samp{--x32} option, i386
+@cindex @samp{--x32} option, x86-64
  @cindex @samp{--64} option, i386
  @cindex @samp{--64} option, x86-64
-@item --32 | --64
-Select the word size, either 32 bits or 64 bits. Selecting 32-bit
-implies Intel i386 architecture, while 64-bit implies AMD x86-64
-architecture.
+@item --32 | --x32 | --64
+Select the word size, either 32 bits or 64 bits.  @samp{--32}
+implies Intel i386 architecture, while @samp{--x32} and @samp{--64}
+imply AMD x86-64 architecture with 32-bit or 64-bit word-size
+respectively.
  
  These options are only available with the ELF object file format, and
  require that the necessary BFD support has been included (on a 32-bit
  platform you have to add --enable-64-bit-bfd to configure enable 64-bit
  usage and use x86-64 as target platform).
+
+@item -n
+By default, x86 GAS replaces multiple nop instructions used for
+alignment within code sections with multi-byte nop instructions such
+as leal 0(%esi,1),%esi.  This switch disables the optimization.
+
+@cindex @samp{--divide} option, i386
+@item --divide
+On SVR4-derived platforms, the character @samp{/} is treated as a comment
+character, which means that it cannot be used in expressions.  The
+@samp{--divide} option turns @samp{/} into a normal character.  This does
+not disable @samp{/} at the beginning of a line starting a comment, or
+affect using @samp{#} for starting a comment.
+
+@cindex @samp{-march=} option, i386
+@cindex @samp{-march=} option, x86-64
+@item -march=@var{CPU}[+@var{EXTENSION}@dots{}]
+This option specifies the target processor.  The assembler will
+issue an error message if an attempt is made to assemble an instruction
+which will not execute on the target processor.  The following
+processor names are recognized: 
+@code{i8086},
+@code{i186},
+@code{i286},
+@code{i386},
+@code{i486},
+@code{i586},
+@code{i686},
+@code{pentium},
+@code{pentiumpro},
+@code{pentiumii},
+@code{pentiumiii},
+@code{pentium4},
+@code{prescott},
+@code{nocona},
+@code{core},
+@code{core2},
+@code{corei7},
+@code{l1om},
+@code{k1om},
+@code{k6},
+@code{k6_2},
+@code{athlon},
+@code{opteron},
+@code{k8},
+@code{amdfam10},
+@code{bdver1},
+@code{bdver2},
+@code{bdver3},
+@code{btver1},
+@code{btver2},
+@code{generic32} and
+@code{generic64}.
+
+In addition to the basic instruction set, the assembler can be told to 
+accept various extension mnemonics.  For example,
+@code{-march=i686+sse4+vmx} extends @var{i686} with @var{sse4} and
+@var{vmx}.  The following extensions are currently supported:
+@code{8087},
+@code{287},
+@code{387},
+@code{no87},
+@code{mmx},
+@code{nommx},
+@code{sse},
+@code{sse2},
+@code{sse3},
+@code{ssse3},
+@code{sse4.1},
+@code{sse4.2},
+@code{sse4},
+@code{nosse},
+@code{avx},
+@code{avx2},
+@code{adx},
+@code{rdseed},
+@code{prfchw},
+@code{noavx},
+@code{vmx},
+@code{vmfunc},
+@code{smx},
+@code{xsave},
+@code{xsaveopt},
+@code{aes},
+@code{pclmul},
+@code{fsgsbase},
+@code{rdrnd},
+@code{f16c},
+@code{bmi2},
+@code{fma},
+@code{movbe},
+@code{ept},
+@code{lzcnt},
+@code{hle},
+@code{rtm},
+@code{invpcid},
+@code{clflush},
+@code{lwp},
+@code{fma4},
+@code{xop},
+@code{cx16},
+@code{syscall},
+@code{rdtscp},
+@code{3dnow},
+@code{3dnowa},
+@code{sse4a},
+@code{sse5},
+@code{svme},
+@code{abm} and
+@code{padlock}.
+Note that rather than extending a basic instruction set, the extension
+mnemonics starting with @code{no} revoke the respective functionality.
+
+When the @code{.arch} directive is used with @option{-march}, the
+@code{.arch} directive will take precedent.
+
+@cindex @samp{-mtune=} option, i386
+@cindex @samp{-mtune=} option, x86-64
+@item -mtune=@var{CPU}
+This option specifies a processor to optimize for. When used in
+conjunction with the @option{-march} option, only instructions
+of the processor specified by the @option{-march} option will be
+generated.
+
+Valid @var{CPU} values are identical to the processor list of
+@option{-march=@var{CPU}}.
+
+@cindex @samp{-msse2avx} option, i386
+@cindex @samp{-msse2avx} option, x86-64
+@item -msse2avx
+This option specifies that the assembler should encode SSE instructions
+with VEX prefix.
+
+@cindex @samp{-msse-check=} option, i386
+@cindex @samp{-msse-check=} option, x86-64
+@item -msse-check=@var{none}
+@itemx -msse-check=@var{warning}
+@itemx -msse-check=@var{error}
+These options control if the assembler should check SSE intructions.
+@option{-msse-check=@var{none}} will make the assembler not to check SSE
+instructions,  which is the default.  @option{-msse-check=@var{warning}}
+will make the assembler issue a warning for any SSE intruction.
+@option{-msse-check=@var{error}} will make the assembler issue an error
+for any SSE intruction.
+
+@cindex @samp{-mavxscalar=} option, i386
+@cindex @samp{-mavxscalar=} option, x86-64
+@item -mavxscalar=@var{128}
+@itemx -mavxscalar=@var{256}
+These options control how the assembler should encode scalar AVX
+instructions.  @option{-mavxscalar=@var{128}} will encode scalar
+AVX instructions with 128bit vector length, which is the default.
+@option{-mavxscalar=@var{256}} will encode scalar AVX instructions
+with 256bit vector length.
+
+@cindex @samp{-mmnemonic=} option, i386
+@cindex @samp{-mmnemonic=} option, x86-64
+@item -mmnemonic=@var{att}
+@itemx -mmnemonic=@var{intel}
+This option specifies instruction mnemonic for matching instructions. 
+The @code{.att_mnemonic} and @code{.intel_mnemonic} directives will
+take precedent.
+
+@cindex @samp{-msyntax=} option, i386
+@cindex @samp{-msyntax=} option, x86-64
+@item -msyntax=@var{att}
+@itemx -msyntax=@var{intel}
+This option specifies instruction syntax when processing instructions. 
+The @code{.att_syntax} and @code{.intel_syntax} directives will
+take precedent.
+
+@cindex @samp{-mnaked-reg} option, i386
+@cindex @samp{-mnaked-reg} option, x86-64
+@item -mnaked-reg
+This opetion specifies that registers don't require a @samp{%} prefix.
+The @code{.att_syntax} and @code{.intel_syntax} directives will take precedent.
+
+@end table
+@c man end
+
+@node i386-Directives
+@section x86 specific Directives
+
+@cindex machine directives, x86
+@cindex x86 machine directives
+@table @code
+
+@cindex @code{lcomm} directive, COFF
+@item .lcomm @var{symbol} , @var{length}[, @var{alignment}]
+Reserve @var{length} (an absolute expression) bytes for a local common
+denoted by @var{symbol}.  The section and value of @var{symbol} are
+those of the new local common.  The addresses are allocated in the bss
+section, so that at run-time the bytes start off zeroed.  Since
+@var{symbol} is not declared global, it is normally not visible to
+@code{@value{LD}}.  The optional third parameter, @var{alignment},
+specifies the desired alignment of the symbol in the bss section.
+
+This directive is only available for COFF based x86 targets.
+
+@c FIXME: Document other x86 specific directives ?  Eg: .code16gcc,
+@c .largecomm
+
  @end table
  
  @node i386-Syntax
-@section AT&T Syntax versus Intel Syntax
+@section i386 Syntactical Considerations
+@menu
+* i386-Variations::           AT&T Syntax versus Intel Syntax
+* i386-Chars::                Special Characters
+@end menu
+
+@node i386-Variations
+@subsection AT&T Syntax versus Intel Syntax
  
  @cindex i386 intel_syntax pseudo op
  @cindex intel_syntax pseudo op, i386
@@ -119,9 +340,9 @@ operands are prefixed by @samp{*}; they are undelimited in Intel syntax.
  AT&T and Intel syntax use the opposite order for source and destination
  operands.  Intel @samp{add eax, 4} is @samp{addl $4, %eax}.  The
  @samp{source, dest} convention is maintained for compatibility with
-previous Unix assemblers.  Note that instructions with more than one
-source operand, such as the @samp{enter} instruction, do @emph{not} have
-reversed order.  @ref{i386-Bugs}.
+previous Unix assemblers.  Note that @samp{bound}, @samp{invlpga}, and
+instructions with 2 immediate operands, such as the @samp{enter}
+instruction, do @emph{not} have reversed order.  @ref{i386-Bugs}.
  
  @cindex mnemonic suffixes, i386
  @cindex sizes operands, i386
@@ -139,6 +360,9 @@ this by prefixing memory operands (@emph{not} the instruction mnemonics) with
  Intel @samp{mov al, byte ptr @var{foo}} is @samp{movb @var{foo}, %al} in AT&T
  syntax.
  
+In 64-bit code, @samp{movabs} can be used to encode the @samp{mov}
+instruction with the 64-bit displacement or immediate operand.
+
  @cindex return instructions, i386
  @cindex i386 jump, call, return
  @cindex return instructions, x86-64
@@ -161,6 +385,29 @@ The AT&T assembler does not provide support for multiple section
  programs.  Unix style systems expect all programs to be single sections.
  @end itemize
  
+@node i386-Chars
+@subsection Special Characters
+
+@cindex line comment character, i386
+@cindex i386 line comment character
+The presence of a @samp{#} appearing anywhere on a line indicates the
+start of a comment that extends to the end of that line.
+
+If a @samp{#} appears as the first character of a line then the whole
+line is treated as a comment, but in this case the line can also be a
+logical line number directive (@pxref{Comments}) or a preprocessor
+control command (@pxref{Preprocessing}).
+
+If the @option{--divide} command line option has not been specified
+then the @samp{/} character appearing anywhere on a line also
+introduces a line comment.
+
+@cindex line separator, i386
+@cindex statement separator, i386
+@cindex i386 line separator
+The @samp{;} character can be used to separate statements on the same
+line.
+
  @node i386-Mnemonics
  @section Instruction Naming
  
@@ -197,6 +444,14 @@ thus, are @samp{bl} (from byte to long), @samp{bw} (from byte to word),
  @samp{wq} (from word to quadruple word), and @samp{lq} (from long to
  quadruple word).
  
+@cindex encoding options, i386
+@cindex encoding options, x86-64
+
+Different encoding options can be specified via optional mnemonic
+suffix.  @samp{.s} suffix swaps 2 register operands in encoding when
+moving from one register to another.  @samp{.d8} or @samp{.d32} suffix
+prefers 8bit or 32bit displacement in encoding.
+
  @cindex conversion instructions, i386
  @cindex i386 conversion instructions
  @cindex conversion instructions, x86-64
@@ -221,7 +476,7 @@ The Intel-syntax conversion instructions
  (x86-64 only),
  
  @item
-@samp{cdo} --- sign-extend quad in @samp{%rax} to octuple in
+@samp{cqo} --- sign-extend quad in @samp{%rax} to octuple in
  @samp{%rdx:%rax} (x86-64 only),
  @end itemize
  
@@ -238,6 +493,21 @@ Far call/jump instructions are @samp{lcall} and @samp{ljmp} in
  AT&T syntax, but are @samp{call far} and @samp{jump far} in Intel
  convention.
  
+@section AT&T Mnemonic versus Intel Mnemonic
+
+@cindex i386 mnemonic compatibility
+@cindex mnemonic compatibility, i386
+
+@code{@value{AS}} supports assembly using Intel mnemonic.
+@code{.intel_mnemonic} selects Intel mnemonic with Intel syntax, and
+@code{.att_mnemonic} switches back to the usual AT&T mnemonic with AT&T
+syntax for compatibility with the output of @code{@value{GCC}}.
+Several x87 instructions, @samp{fadd}, @samp{fdiv}, @samp{fdivp},
+@samp{fdivr}, @samp{fdivrp}, @samp{fmul}, @samp{fsub}, @samp{fsubp},
+@samp{fsubr} and @samp{fsubrp},  are implemented in AT&T System V/386
+assembler with different mnemonics from those in Intel IA32 specification.
+@code{@value{GCC}} generates those instructions with AT&T mnemonic.
+
  @node i386-Regs
  @section Register Naming
  
@@ -488,7 +758,7 @@ the default absolute addressing.
  Other addressing modes remain unchanged in x86-64 architecture, except
  registers used are 64-bit instead of 32-bit.
  
-@node i386-jumps
+@node i386-Jumps
  @section Handling of Jump Instructions
  
  @cindex jump optimization, i386
@@ -498,11 +768,11 @@ registers used are 64-bit instead of 32-bit.
  Jump instructions are always optimized to use the smallest possible
  displacements.  This is accomplished by using byte (8-bit) displacement
  jumps whenever the target is sufficiently close.  If a byte displacement
-is insufficient a long (32-bit) displacement is used.  We do not support
+is insufficient a long displacement is used.  We do not support
  word (16-bit) displacement jumps in 32-bit mode (i.e. prefixing the jump
  instruction with the @samp{data16} instruction prefix), since the 80386
  insists upon masking @samp{%eip} to 16 bits after the word displacement
-is added.
+is added. (See also @pxref{i386-Arch})
  
  Note that the @samp{jcxz}, @samp{jecxz}, @samp{loop}, @samp{loopz},
  @samp{loope}, @samp{loopnz} and @samp{loopne} instructions only come in byte
@@ -588,7 +858,7 @@ then stores the result in the 4 byte location @samp{mem})
  @code{@value{AS}} supports Intel's MMX instruction set (SIMD
  instructions for integer data), available on Intel's Pentium MMX
  processors and Pentium II processors, AMD's K6 and K6-2 processors,
-Cyrix' M2 processor, and probably others.  It also supports AMD's 3DNow!
+Cyrix' M2 processor, and probably others.  It also supports AMD's 3DNow!@:
  instruction set (SIMD instructions for 32-bit floating point data)
  available on AMD's K6-2 processor and possibly others in the future.
  
@@ -604,6 +874,55 @@ as the floating point stack.
  See Intel and AMD documentation, keeping in mind that the operand order in
  instructions is reversed from the Intel syntax.
  
+@node i386-LWP
+@section AMD's Lightweight Profiling Instructions
+
+@cindex LWP, i386
+@cindex LWP, x86-64
+
+@code{@value{AS}} supports AMD's Lightweight Profiling (LWP)
+instruction set, available on AMD's Family 15h (Orochi) processors.
+
+LWP enables applications to collect and manage performance data, and
+react to performance events.  The collection of performance data
+requires no context switches.  LWP runs in the context of a thread and
+so several counters can be used independently across multiple threads.
+LWP can be used in both 64-bit and legacy 32-bit modes.
+
+For detailed information on the LWP instruction set, see the
+@cite{AMD Lightweight Profiling Specification} available at
+@uref{http://developer.amd.com/cpu/LWP,Lightweight Profiling Specification}.
+
+@node i386-BMI
+@section Bit Manipulation Instructions
+
+@cindex BMI, i386
+@cindex BMI, x86-64
+
+@code{@value{AS}} supports the Bit Manipulation (BMI) instruction set.
+
+BMI instructions provide several instructions implementing individual
+bit manipulation operations such as isolation, masking, setting, or
+resetting.  
+
+@c Need to add a specification citation here when available.
+
+@node i386-TBM
+@section AMD's Trailing Bit Manipulation Instructions
+
+@cindex TBM, i386
+@cindex TBM, x86-64
+
+@code{@value{AS}} supports AMD's Trailing Bit Manipulation (TBM)
+instruction set, available on AMD's BDVER2 processors (Trinity and
+Viperfish).
+
+TBM instructions provide instructions implementing individual bit
+manipulation operations such as isolating, masking, setting, resetting,
+complementing, and operations on trailing zeros and ones.
+
+@c Need to add a specification citation here when available.
+
  @node i386-16bit
  @section Writing 16-bit Code
  
@@ -620,8 +939,9 @@ or 64-bit x86-64 code depending on the default configuration,
  it also supports writing code to run in real mode or in 16-bit protected
  mode code segments.  To do this, put a @samp{.code16} or
  @samp{.code16gcc} directive before the assembly language instructions to
-be run in 16-bit mode.  You can switch @code{@value{AS}} back to writing
-normal 32-bit code with the @samp{.code32} directive.
+be run in 16-bit mode.  You can switch @code{@value{AS}} to writing
+32-bit code with the @samp{.code32} directive or 64-bit code with the
+@samp{.code64} directive.
  
  @samp{.code16gcc} provides experimental support for generating 16-bit
  code from gcc, and differs from @samp{.code16} in that @samp{call},
@@ -651,7 +971,7 @@ value @samp{4} onto the stack, decrementing @samp{%esp} by 2.
  @end smallexample
  
  The same code in a 16-bit code section would generate the machine
-opcode bytes @samp{6a 04} (ie. without the operand size prefix), which
+opcode bytes @samp{6a 04} (i.e., without the operand size prefix), which
  is correct since the processor default operand size is assumed to be 16
  bits in a 16-bit code section.
  
@@ -685,24 +1005,62 @@ register is @samp{%st(i)}.
  @cindex x86-64 arch directive
  
  @code{@value{AS}} may be told to assemble for a particular CPU
-architecture with the @code{.arch @var{cpu_type}} directive.  This
+(sub-)architecture with the @code{.arch @var{cpu_type}} directive.  This
  directive enables a warning when gas detects an instruction that is not
  supported on the CPU specified.  The choices for @var{cpu_type} are:
  
  @multitable @columnfractions .20 .20 .20 .20
  @item @samp{i8086} @tab @samp{i186} @tab @samp{i286} @tab @samp{i386}
  @item @samp{i486} @tab @samp{i586} @tab @samp{i686} @tab @samp{pentium}
-@item @samp{pentiumpro} @tab @samp {pentium4} @tab @samp {k6} @tab @samp {athlon}
-@item @samp{sledgehammer}
+@item @samp{pentiumpro} @tab @samp{pentiumii} @tab @samp{pentiumiii} @tab @samp{pentium4}
+@item @samp{prescott} @tab @samp{nocona} @tab @samp{core} @tab @samp{core2}
+@item @samp{corei7} @tab @samp{l1om} @tab @samp{k1om}
+@item @samp{k6} @tab @samp{k6_2} @tab @samp{athlon} @tab @samp{k8}
+@item @samp{amdfam10} @tab @samp{bdver1} @tab @samp{bdver2} @tab @samp{bdver3}
+@item @samp{btver1} @tab @samp{btver2}
+@item @samp{generic32} @tab @samp{generic64}
+@item @samp{.mmx} @tab @samp{.sse} @tab @samp{.sse2} @tab @samp{.sse3}
+@item @samp{.ssse3} @tab @samp{.sse4.1} @tab @samp{.sse4.2} @tab @samp{.sse4}
+@item @samp{.avx} @tab @samp{.vmx} @tab @samp{.smx} @tab @samp{.ept}
+@item @samp{.clflush} @tab @samp{.movbe} @tab @samp{.xsave} @tab @samp{.xsaveopt}
+@item @samp{.aes} @tab @samp{.pclmul} @tab @samp{.fma} @tab @samp{.fsgsbase}
+@item @samp{.rdrnd} @tab @samp{.f16c} @tab @samp{.avx2} @tab @samp{.bmi2}
+@item @samp{.lzcnt} @tab @samp{.invpcid} @tab @samp{.vmfunc} @tab @samp{.hle}
+@item @samp{.rtm} @tab @samp{.adx} @tab @samp{.rdseed} @tab @samp{.prfchw}
+@item @samp{.3dnow} @tab @samp{.3dnowa} @tab @samp{.sse4a} @tab @samp{.sse5}
+@item @samp{.syscall} @tab @samp{.rdtscp} @tab @samp{.svme} @tab @samp{.abm}
+@item @samp{.lwp} @tab @samp{.fma4} @tab @samp{.xop} @tab @samp{.cx16}
+@item @samp{.padlock}
  @end multitable
  
-Apart from the warning, there is only one other effect on
-@code{@value{AS}} operation;  If you specify a CPU other than
+Apart from the warning, there are only two other effects on
+@code{@value{AS}} operation;  Firstly, if you specify a CPU other than
  @samp{i486}, then shift by one instructions such as @samp{sarl $1, %eax}
  will automatically use a two byte opcode sequence.  The larger three
  byte opcode sequence is used on the 486 (and when no architecture is
  specified) because it executes faster on the 486.  Note that you can
  explicitly request the two byte opcode by writing @samp{sarl %eax}.
+Secondly, if you specify @samp{i8086}, @samp{i186}, or @samp{i286},
+@emph{and} @samp{.code16} or @samp{.code16gcc} then byte offset
+conditional jumps will be promoted when necessary to a two instruction
+sequence consisting of a conditional jump of the opposite sense around
+an unconditional jump to the target.
+
+Following the CPU architecture (but not a sub-architecture, which are those
+starting with a dot), you may specify @samp{jumps} or @samp{nojumps} to
+control automatic promotion of conditional jumps. @samp{jumps} is the
+default, and enables jump promotion;  All external jumps will be of the long
+variety, and file-local jumps will be promoted as necessary.
+(@pxref{i386-Jumps})  @samp{nojumps} leaves external conditional jumps as
+byte offset jumps, and warns about file-local conditional jumps that
+@code{@value{AS}} promotes.
+Unconditional jumps are treated as for @samp{jumps}.
+
+For example
+
+@smallexample
+ .arch i8086,nojumps
+@end smallexample
  
  @node i386-Notes
  @section Notes