2010-01-06 Quentin Neill <quentin.neill@amd.com>

[deliverable/binutils-gdb.git] / gas / doc / c-i386.texi
diff --git a/gas/doc/c-i386.texi b/gas/doc/c-i386.texi

index 84139db23fe3ee08eb3ea1770e1741954bf6202b..4a9f6615e4d416977db2d7ef51a3c960d956b602 100644 (file)
--- a/gas/doc/c-i386.texi
+++ b/gas/doc/c-i386.texi
@@ -1,4 +1,6 @@
-@c Copyright (C) 1991, 92, 93, 94, 95, 97, 1998 Free Software Foundation, Inc.
+@c Copyright 1991, 1992, 1993, 1994, 1995, 1997, 1998, 1999, 2000,
+@c 2001, 2003, 2004, 2005, 2006, 2007, 2008, 2009
+@c Free Software Foundation, Inc.
  @c This is part of the GAS manual.
  @c For copying conditions, see the file as.texinfo.
  @ifset GENERIC
@@ -12,17 +14,25 @@
  @end ifclear
  
  @cindex i386 support
-@cindex i80306 support
+@cindex i80386 support
+@cindex x86-64 support
+
+The i386 version @code{@value{AS}} supports both the original Intel 386
+architecture in both 16 and 32-bit mode as well as AMD x86-64 architecture
+extending the Intel architecture to 64-bits.
+
  @menu
  * i386-Options::                Options
+* i386-Directives::             X86 specific directives
  * i386-Syntax::                 AT&T Syntax versus Intel Syntax
  * i386-Mnemonics::              Instruction Naming
  * i386-Regs::                   Register Naming
  * i386-Prefixes::               Instruction Prefixes
  * i386-Memory::                 Memory References
-* i386-jumps::                  Handling of Jump Instructions
+* i386-Jumps::                  Handling of Jump Instructions
  * i386-Float::                  Floating Point
  * i386-SIMD::                   Intel's MMX and AMD's 3DNow! SIMD Operations
+* i386-LWP::                    AMD's Lightweight Profiling Instructions
  * i386-16bit::                  Writing 16-bit Code
  * i386-Arch::                   Specifying an x86 CPU architecture
  * i386-Bugs::                   AT&T Syntax bugs
@@ -32,10 +42,200 @@
  @node i386-Options
  @section Options
  
-@cindex options for i386 (none)
-@cindex i386 options (none)
-The 80386 has no machine dependent options.
+@cindex options for i386
+@cindex options for x86-64
+@cindex i386 options
+@cindex x86-64 options 
+
+The i386 version of @code{@value{AS}} has a few machine
+dependent options:
+
+@table @code
+@cindex @samp{--32} option, i386
+@cindex @samp{--32} option, x86-64
+@cindex @samp{--64} option, i386
+@cindex @samp{--64} option, x86-64
+@item --32 | --64
+Select the word size, either 32 bits or 64 bits. Selecting 32-bit
+implies Intel i386 architecture, while 64-bit implies AMD x86-64
+architecture.
+
+These options are only available with the ELF object file format, and
+require that the necessary BFD support has been included (on a 32-bit
+platform you have to add --enable-64-bit-bfd to configure enable 64-bit
+usage and use x86-64 as target platform).
+
+@item -n
+By default, x86 GAS replaces multiple nop instructions used for
+alignment within code sections with multi-byte nop instructions such
+as leal 0(%esi,1),%esi.  This switch disables the optimization.
+
+@cindex @samp{--divide} option, i386
+@item --divide
+On SVR4-derived platforms, the character @samp{/} is treated as a comment
+character, which means that it cannot be used in expressions.  The
+@samp{--divide} option turns @samp{/} into a normal character.  This does
+not disable @samp{/} at the beginning of a line starting a comment, or
+affect using @samp{#} for starting a comment.
+
+@cindex @samp{-march=} option, i386
+@cindex @samp{-march=} option, x86-64
+@item -march=@var{CPU}[+@var{EXTENSION}@dots{}]
+This option specifies the target processor.  The assembler will
+issue an error message if an attempt is made to assemble an instruction
+which will not execute on the target processor.  The following
+processor names are recognized: 
+@code{i8086},
+@code{i186},
+@code{i286},
+@code{i386},
+@code{i486},
+@code{i586},
+@code{i686},
+@code{pentium},
+@code{pentiumpro},
+@code{pentiumii},
+@code{pentiumiii},
+@code{pentium4},
+@code{prescott},
+@code{nocona},
+@code{core},
+@code{core2},
+@code{corei7},
+@code{l1om},
+@code{k6},
+@code{k6_2},
+@code{athlon},
+@code{opteron},
+@code{k8},
+@code{amdfam10},
+@code{amdfam15},
+@code{generic32} and
+@code{generic64}.
+
+In addition to the basic instruction set, the assembler can be told to 
+accept various extension mnemonics.  For example,
+@code{-march=i686+sse4+vmx} extends @var{i686} with @var{sse4} and
+@var{vmx}.  The following extensions are currently supported:
+@code{8087},
+@code{287},
+@code{387},
+@code{no87},
+@code{mmx},
+@code{nommx},
+@code{sse},
+@code{sse2},
+@code{sse3},
+@code{ssse3},
+@code{sse4.1},
+@code{sse4.2},
+@code{sse4},
+@code{nosse},
+@code{avx},
+@code{noavx},
+@code{vmx},
+@code{smx},
+@code{xsave},
+@code{aes},
+@code{pclmul},
+@code{fma},
+@code{movbe},
+@code{ept},
+@code{clflush},
+@code{lwp},
+@code{fma4},
+@code{xop},
+@code{syscall},
+@code{rdtscp},
+@code{3dnow},
+@code{3dnowa},
+@code{sse4a},
+@code{sse5},
+@code{svme},
+@code{abm} and
+@code{padlock}.
+Note that rather than extending a basic instruction set, the extension
+mnemonics starting with @code{no} revoke the respective functionality.
+
+When the @code{.arch} directive is used with @option{-march}, the
+@code{.arch} directive will take precedent.
+
+@cindex @samp{-mtune=} option, i386
+@cindex @samp{-mtune=} option, x86-64
+@item -mtune=@var{CPU}
+This option specifies a processor to optimize for. When used in
+conjunction with the @option{-march} option, only instructions
+of the processor specified by the @option{-march} option will be
+generated.
+
+Valid @var{CPU} values are identical to the processor list of
+@option{-march=@var{CPU}}.
+
+@cindex @samp{-msse2avx} option, i386
+@cindex @samp{-msse2avx} option, x86-64
+@item -msse2avx
+This option specifies that the assembler should encode SSE instructions
+with VEX prefix.
+
+@cindex @samp{-msse-check=} option, i386
+@cindex @samp{-msse-check=} option, x86-64
+@item -msse-check=@var{none}
+@item -msse-check=@var{warning}
+@item -msse-check=@var{error}
+These options control if the assembler should check SSE intructions.
+@option{-msse-check=@var{none}} will make the assembler not to check SSE
+instructions,  which is the default.  @option{-msse-check=@var{warning}}
+will make the assembler issue a warning for any SSE intruction.
+@option{-msse-check=@var{error}} will make the assembler issue an error
+for any SSE intruction.
+
+@cindex @samp{-mmnemonic=} option, i386
+@cindex @samp{-mmnemonic=} option, x86-64
+@item -mmnemonic=@var{att}
+@item -mmnemonic=@var{intel}
+This option specifies instruction mnemonic for matching instructions. 
+The @code{.att_mnemonic} and @code{.intel_mnemonic} directives will
+take precedent.
+
+@cindex @samp{-msyntax=} option, i386
+@cindex @samp{-msyntax=} option, x86-64
+@item -msyntax=@var{att}
+@item -msyntax=@var{intel}
+This option specifies instruction syntax when processing instructions. 
+The @code{.att_syntax} and @code{.intel_syntax} directives will
+take precedent.
+
+@cindex @samp{-mnaked-reg} option, i386
+@cindex @samp{-mnaked-reg} option, x86-64
+@item -mnaked-reg
+This opetion specifies that registers don't require a @samp{%} prefix.
+The @code{.att_syntax} and @code{.intel_syntax} directives will take precedent.
+
+@end table
+
+@node i386-Directives
+@section x86 specific Directives
+
+@cindex machine directives, x86
+@cindex x86 machine directives
+@table @code
+
+@cindex @code{lcomm} directive, COFF
+@item .lcomm @var{symbol} , @var{length}[, @var{alignment}]
+Reserve @var{length} (an absolute expression) bytes for a local common
+denoted by @var{symbol}.  The section and value of @var{symbol} are
+those of the new local common.  The addresses are allocated in the bss
+section, so that at run-time the bytes start off zeroed.  Since
+@var{symbol} is not declared global, it is normally not visible to
+@code{@value{LD}}.  The optional third parameter, @var{alignment},
+specifies the desired alignment of the symbol in the bss section.
+
+This directive is only available for COFF based x86 targets.
  
+@c FIXME: Document other x86 specific directives ?  Eg: .code16gcc,
+@c .largecomm
+
+@end table
  
  @node i386-Syntax
  @section AT&T Syntax versus Intel Syntax
@@ -46,6 +246,12 @@ The 80386 has no machine dependent options.
  @cindex att_syntax pseudo op, i386
  @cindex i386 syntax compatibility
  @cindex syntax compatibility, i386
+@cindex x86-64 intel_syntax pseudo op
+@cindex intel_syntax pseudo op, x86-64
+@cindex x86-64 att_syntax pseudo op
+@cindex att_syntax pseudo op, x86-64
+@cindex x86-64 syntax compatibility
+@cindex syntax compatibility, x86-64
  
  @code{@value{AS}} now supports assembly using Intel assembler syntax.
  @code{.intel_syntax} selects Intel mode, and @code{.att_syntax} switches
@@ -64,6 +270,14 @@ between the two syntaxes are:
  @cindex jump/call operands, i386
  @cindex i386 jump/call operands
  @cindex operand delimiters, i386
+
+@cindex immediate operands, x86-64
+@cindex x86-64 immediate operands
+@cindex register operands, x86-64
+@cindex x86-64 register operands
+@cindex jump/call operands, x86-64
+@cindex x86-64 jump/call operands
+@cindex operand delimiters, x86-64
  @itemize @bullet
  @item
  AT&T immediate operands are preceded by @samp{$}; Intel immediate
@@ -74,28 +288,39 @@ operands are prefixed by @samp{*}; they are undelimited in Intel syntax.
  
  @cindex i386 source, destination operands
  @cindex source, destination operands; i386
+@cindex x86-64 source, destination operands
+@cindex source, destination operands; x86-64
  @item
  AT&T and Intel syntax use the opposite order for source and destination
  operands.  Intel @samp{add eax, 4} is @samp{addl $4, %eax}.  The
  @samp{source, dest} convention is maintained for compatibility with
-previous Unix assemblers.  Note that instructions with more than one
-source operand, such as the @samp{enter} instruction, do @emph{not} have
-reversed order.  @ref{i386-Bugs}.
+previous Unix assemblers.  Note that @samp{bound}, @samp{invlpga}, and
+instructions with 2 immediate operands, such as the @samp{enter}
+instruction, do @emph{not} have reversed order.  @ref{i386-Bugs}.
  
  @cindex mnemonic suffixes, i386
  @cindex sizes operands, i386
  @cindex i386 size suffixes
+@cindex mnemonic suffixes, x86-64
+@cindex sizes operands, x86-64
+@cindex x86-64 size suffixes
  @item
  In AT&T syntax the size of memory operands is determined from the last
  character of the instruction mnemonic.  Mnemonic suffixes of @samp{b},
-@samp{w}, and @samp{l} specify byte (8-bit), word (16-bit), and long
-(32-bit) memory references.  Intel syntax accomplishes this by prefixing
-memory operands (@emph{not} the instruction mnemonics) with @samp{byte
-ptr}, @samp{word ptr}, and @samp{dword ptr}.  Thus, Intel @samp{mov al,
-byte ptr @var{foo}} is @samp{movb @var{foo}, %al} in AT&T syntax.
+@samp{w}, @samp{l} and @samp{q} specify byte (8-bit), word (16-bit), long
+(32-bit) and quadruple word (64-bit) memory references.  Intel syntax accomplishes
+this by prefixing memory operands (@emph{not} the instruction mnemonics) with
+@samp{byte ptr}, @samp{word ptr}, @samp{dword ptr} and @samp{qword ptr}.  Thus,
+Intel @samp{mov al, byte ptr @var{foo}} is @samp{movb @var{foo}, %al} in AT&T
+syntax.
+
+In 64-bit code, @samp{movabs} can be used to encode the @samp{mov}
+instruction with the 64-bit displacement or immediate operand.
  
  @cindex return instructions, i386
  @cindex i386 jump, call, return
+@cindex return instructions, x86-64
+@cindex x86-64 jump, call, return
  @item
  Immediate form long jumps and calls are
  @samp{lcall/ljmp $@var{section}, $@var{offset}} in AT&T syntax; the
@@ -107,6 +332,8 @@ is @samp{lret $@var{stack-adjust}} in AT&T syntax; Intel syntax is
  
  @cindex sections, i386
  @cindex i386 sections
+@cindex sections, x86-64
+@cindex x86-64 sections
  @item
  The AT&T assembler does not provide support for multiple section
  programs.  Unix style systems expect all programs to be single sections.
@@ -117,17 +344,20 @@ programs.  Unix style systems expect all programs to be single sections.
  
  @cindex i386 instruction naming
  @cindex instruction naming, i386
+@cindex x86-64 instruction naming
+@cindex instruction naming, x86-64
+
  Instruction mnemonics are suffixed with one character modifiers which
-specify the size of operands.  The letters @samp{b}, @samp{w}, and
-@samp{l} specify byte, word, and long operands.  If no suffix is
-specified by an instruction then @code{@value{AS}} tries to fill in the
-missing suffix based on the destination register operand (the last one
-by convention).  Thus, @samp{mov %ax, %bx} is equivalent to @samp{movw
-%ax, %bx}; also, @samp{mov $1, %bx} is equivalent to @samp{movw $1,
-%bx}.  Note that this is incompatible with the AT&T Unix assembler which
-assumes that a missing mnemonic suffix implies long operand size.  (This
-incompatibility does not affect compiler output since compilers always
-explicitly specify the mnemonic suffix.)
+specify the size of operands.  The letters @samp{b}, @samp{w}, @samp{l}
+and @samp{q} specify byte, word, long and quadruple word operands.  If
+no suffix is specified by an instruction then @code{@value{AS}} tries to
+fill in the missing suffix based on the destination register operand
+(the last one by convention).  Thus, @samp{mov %ax, %bx} is equivalent
+to @samp{movw %ax, %bx}; also, @samp{mov $1, %bx} is equivalent to
+@samp{movw $1, bx}.  Note that this is incompatible with the AT&T Unix
+assembler which assumes that a missing mnemonic suffix implies long
+operand size.  (This incompatibility does not affect compiler output
+since compilers always explicitly specify the mnemonic suffix.)
  
  Almost all instructions have the same names in AT&T and Intel format.
  There are a few exceptions.  The sign extend and zero extend
@@ -141,10 +371,21 @@ are tacked on to this base name, the @emph{from} suffix before the
  @emph{to} suffix.  Thus, @samp{movsbl %al, %edx} is AT&T syntax for
  ``move sign extend @emph{from} %al @emph{to} %edx.''  Possible suffixes,
  thus, are @samp{bl} (from byte to long), @samp{bw} (from byte to word),
-and @samp{wl} (from word to long).
+@samp{wl} (from word to long), @samp{bq} (from byte to quadruple word),
+@samp{wq} (from word to quadruple word), and @samp{lq} (from long to
+quadruple word).
+
+@cindex encoding options, i386
+@cindex encoding options, x86-64
+
+Different encoding options can be specified via optional mnemonic
+suffix.  @samp{.s} suffix swaps 2 register operands in encoding when
+moving from one register to another.
  
  @cindex conversion instructions, i386
  @cindex i386 conversion instructions
+@cindex conversion instructions, x86-64
+@cindex x86-64 conversion instructions
  The Intel-syntax conversion instructions
  
  @itemize @bullet
@@ -159,23 +400,51 @@ The Intel-syntax conversion instructions
  
  @item
  @samp{cdq} --- sign-extend dword in @samp{%eax} to quad in @samp{%edx:%eax},
+
+@item
+@samp{cdqe} --- sign-extend dword in @samp{%eax} to quad in @samp{%rax}
+(x86-64 only),
+
+@item
+@samp{cqo} --- sign-extend quad in @samp{%rax} to octuple in
+@samp{%rdx:%rax} (x86-64 only),
  @end itemize
  
  @noindent
-are called @samp{cbtw}, @samp{cwtl}, @samp{cwtd}, and @samp{cltd} in
-AT&T naming.  @code{@value{AS}} accepts either naming for these instructions.
+are called @samp{cbtw}, @samp{cwtl}, @samp{cwtd}, @samp{cltd}, @samp{cltq}, and
+@samp{cqto} in AT&T naming.  @code{@value{AS}} accepts either naming for these
+instructions.
  
  @cindex jump instructions, i386
  @cindex call instructions, i386
+@cindex jump instructions, x86-64
+@cindex call instructions, x86-64
  Far call/jump instructions are @samp{lcall} and @samp{ljmp} in
  AT&T syntax, but are @samp{call far} and @samp{jump far} in Intel
  convention.
  
+@section AT&T Mnemonic versus Intel Mnemonic
+
+@cindex i386 mnemonic compatibility
+@cindex mnemonic compatibility, i386
+
+@code{@value{AS}} supports assembly using Intel mnemonic.
+@code{.intel_mnemonic} selects Intel mnemonic with Intel syntax, and
+@code{.att_mnemonic} switches back to the usual AT&T mnemonic with AT&T
+syntax for compatibility with the output of @code{@value{GCC}}.
+Several x87 instructions, @samp{fadd}, @samp{fdiv}, @samp{fdivp},
+@samp{fdivr}, @samp{fdivrp}, @samp{fmul}, @samp{fsub}, @samp{fsubp},
+@samp{fsubr} and @samp{fsubrp},  are implemented in AT&T System V/386
+assembler with different mnemonics from those in Intel IA32 specification.
+@code{@value{GCC}} generates those instructions with AT&T mnemonic.
+
  @node i386-Regs
  @section Register Naming
  
  @cindex i386 registers
  @cindex registers, i386
+@cindex x86-64 registers
+@cindex registers, x86-64
  Register operands are always prefixed with @samp{%}.  The 80386 registers
  consist of
  
@@ -215,6 +484,44 @@ the 2 test registers @samp{%tr6} and @samp{%tr7}.
  the 8 floating point register stack @samp{%st} or equivalently
  @samp{%st(0)}, @samp{%st(1)}, @samp{%st(2)}, @samp{%st(3)},
  @samp{%st(4)}, @samp{%st(5)}, @samp{%st(6)}, and @samp{%st(7)}.
+These registers are overloaded by 8 MMX registers @samp{%mm0},
+@samp{%mm1}, @samp{%mm2}, @samp{%mm3}, @samp{%mm4}, @samp{%mm5},
+@samp{%mm6} and @samp{%mm7}.
+
+@item
+the 8 SSE registers registers @samp{%xmm0}, @samp{%xmm1}, @samp{%xmm2},
+@samp{%xmm3}, @samp{%xmm4}, @samp{%xmm5}, @samp{%xmm6} and @samp{%xmm7}.
+@end itemize
+
+The AMD x86-64 architecture extends the register set by:
+
+@itemize @bullet
+@item
+enhancing the 8 32-bit registers to 64-bit: @samp{%rax} (the
+accumulator), @samp{%rbx}, @samp{%rcx}, @samp{%rdx}, @samp{%rdi},
+@samp{%rsi}, @samp{%rbp} (the frame pointer), @samp{%rsp} (the stack
+pointer)
+
+@item
+the 8 extended registers @samp{%r8}--@samp{%r15}.
+
+@item
+the 8 32-bit low ends of the extended registers: @samp{%r8d}--@samp{%r15d}
+
+@item
+the 8 16-bit low ends of the extended registers: @samp{%r8w}--@samp{%r15w}
+
+@item
+the 8 8-bit low ends of the extended registers: @samp{%r8b}--@samp{%r15b}
+
+@item
+the 4 8-bit registers: @samp{%sil}, @samp{%dil}, @samp{%bpl}, @samp{%spl}.
+
+@item
+the 8 debug registers: @samp{%db8}--@samp{%db15}.
+
+@item
+the 8 SSE registers: @samp{%xmm8}--@samp{%xmm15}.
  @end itemize
  
  @node i386-Prefixes
@@ -281,6 +588,20 @@ complete the current instruction.  This should never be needed for the
  The @samp{rep}, @samp{repe}, and @samp{repne} prefixes are added
  to string instructions to make them repeat @samp{%ecx} times (@samp{%cx}
  times if the current address size is 16-bits).
+@cindex REX prefixes, i386
+@item
+The @samp{rex} family of prefixes is used by x86-64 to encode
+extensions to i386 instruction set.  The @samp{rex} prefix has four
+bits --- an operand size overwrite (@code{64}) used to change operand size
+from 32-bit to 64-bit and X, Y and Z extensions bits used to extend the
+register set.
+
+You may write the @samp{rex} prefixes directly. The @samp{rex64xyz}
+instruction emits @samp{rex} prefix with all the bits set.  By omitting
+the @code{64}, @code{x}, @code{y} or @code{z} you may write other
+prefixes as well.  Normally, there is no need to write the prefixes
+explicitly, since gas will automatically generate them based on the
+instruction operands.
  @end itemize
  
  @node i386-Memory
@@ -288,6 +609,8 @@ times if the current address size is 16-bits).
  
  @cindex i386 memory references
  @cindex memory references, i386
+@cindex x86-64 memory references
+@cindex memory references, x86-64
  An Intel syntax indirect memory reference of the form
  
  @smallexample
@@ -344,22 +667,42 @@ prefixed with @samp{*}.  If no @samp{*} is specified, @code{@value{AS}}
  always chooses PC relative addressing for jump/call labels.
  
  Any instruction that has a memory operand, but no register operand,
-@emph{must} specify its size (byte, word, or long) with an instruction
-mnemonic suffix (@samp{b}, @samp{w}, or @samp{l}, respectively).
+@emph{must} specify its size (byte, word, long, or quadruple) with an
+instruction mnemonic suffix (@samp{b}, @samp{w}, @samp{l} or @samp{q},
+respectively).
+
+The x86-64 architecture adds an RIP (instruction pointer relative)
+addressing.  This addressing mode is specified by using @samp{rip} as a
+base register.  Only constant offsets are valid. For example:
  
-@node i386-jumps
+@table @asis
+@item AT&T: @samp{1234(%rip)}, Intel: @samp{[rip + 1234]}
+Points to the address 1234 bytes past the end of the current
+instruction.
+
+@item AT&T: @samp{symbol(%rip)}, Intel: @samp{[rip + symbol]}
+Points to the @code{symbol} in RIP relative way, this is shorter than
+the default absolute addressing.
+@end table
+
+Other addressing modes remain unchanged in x86-64 architecture, except
+registers used are 64-bit instead of 32-bit.
+
+@node i386-Jumps
  @section Handling of Jump Instructions
  
  @cindex jump optimization, i386
  @cindex i386 jump optimization
+@cindex jump optimization, x86-64
+@cindex x86-64 jump optimization
  Jump instructions are always optimized to use the smallest possible
  displacements.  This is accomplished by using byte (8-bit) displacement
  jumps whenever the target is sufficiently close.  If a byte displacement
-is insufficient a long (32-bit) displacement is used.  We do not support
+is insufficient a long displacement is used.  We do not support
  word (16-bit) displacement jumps in 32-bit mode (i.e. prefixing the jump
  instruction with the @samp{data16} instruction prefix), since the 80386
  insists upon masking @samp{%eip} to 16 bits after the word displacement
-is added.
+is added. (See also @pxref{i386-Arch})
  
  Note that the @samp{jcxz}, @samp{jecxz}, @samp{loop}, @samp{loopz},
  @samp{loope}, @samp{loopnz} and @samp{loopne} instructions only come in byte
@@ -380,6 +723,8 @@ cx_nonzero:
  
  @cindex i386 floating point
  @cindex floating point, i386
+@cindex x86-64 floating point
+@cindex floating point, x86-64
  All 80387 floating point types except packed BCD are supported.
  (BCD support may be added without much difficulty).  These data
  types are 16-, 32-, and 64- bit integers, and single (32-bit),
@@ -392,6 +737,10 @@ data type.  Constructors build these data types into memory.
  @cindex @code{single} directive, i386
  @cindex @code{double} directive, i386
  @cindex @code{tfloat} directive, i386
+@cindex @code{float} directive, x86-64
+@cindex @code{single} directive, x86-64
+@cindex @code{double} directive, x86-64
+@cindex @code{tfloat} directive, x86-64
  @itemize @bullet
  @item
  Floating point constructors are @samp{.float} or @samp{.single},
@@ -405,6 +754,10 @@ top) and @samp{fstpt} (store 80-bit real and pop stack) instructions.
  @cindex @code{long} directive, i386
  @cindex @code{int} directive, i386
  @cindex @code{quad} directive, i386
+@cindex @code{word} directive, x86-64
+@cindex @code{long} directive, x86-64
+@cindex @code{int} directive, x86-64
+@cindex @code{quad} directive, x86-64
  @item
  Integer constructors are @samp{.word}, @samp{.long} or @samp{.int}, and
  @samp{.quad} for the 16-, 32-, and 64-bit integer formats.  The
@@ -428,11 +781,14 @@ then stores the result in the 4 byte location @samp{mem})
  @cindex MMX, i386
  @cindex 3DNow!, i386
  @cindex SIMD, i386
+@cindex MMX, x86-64
+@cindex 3DNow!, x86-64
+@cindex SIMD, x86-64
  
  @code{@value{AS}} supports Intel's MMX instruction set (SIMD
  instructions for integer data), available on Intel's Pentium MMX
  processors and Pentium II processors, AMD's K6 and K6-2 processors,
-Cyrix' M2 processor, and probably others.  It also supports AMD's 3DNow!
+Cyrix' M2 processor, and probably others.  It also supports AMD's 3DNow!@:
  instruction set (SIMD instructions for 32-bit floating point data)
  available on AMD's K6-2 processor and possibly others in the future.
  
@@ -448,6 +804,25 @@ as the floating point stack.
  See Intel and AMD documentation, keeping in mind that the operand order in
  instructions is reversed from the Intel syntax.
  
+@node i386-LWP
+@section AMD's Lightweight Profiling Instructions
+
+@cindex LWP, i386
+@cindex LWP, x86-64
+
+@code{@value{AS}} supports AMD's Lightweight Profiling (LWP)
+instruction set, available on AMD's Family 15h (Orochi) processors.
+
+LWP enables applications to collect and manage performance data, and
+react to performance events.  The collection of performance data
+requires no context switches.  LWP runs in the context of a thread and
+so several counters can be used independently across multiple threads.
+LWP can be used in both 64-bit and legacy 32-bit modes.
+
+For detailed information on the LWP instruction set, see the
+@cite{AMD Lightweight Profiling Specification} available at
+@uref{http://developer.amd.com/cpu/LWP,Lightweight Profiling Specification}.
+
  @node i386-16bit
  @section Writing 16-bit Code
  
@@ -457,12 +832,16 @@ instructions is reversed from the Intel syntax.
  @cindex @code{code16gcc} directive, i386
  @cindex @code{code16} directive, i386
  @cindex @code{code32} directive, i386
-While @code{@value{AS}} normally writes only ``pure'' 32-bit i386 code,
+@cindex @code{code64} directive, i386
+@cindex @code{code64} directive, x86-64
+While @code{@value{AS}} normally writes only ``pure'' 32-bit i386 code
+or 64-bit x86-64 code depending on the default configuration,
  it also supports writing code to run in real mode or in 16-bit protected
  mode code segments.  To do this, put a @samp{.code16} or
  @samp{.code16gcc} directive before the assembly language instructions to
-be run in 16-bit mode.  You can switch @code{@value{AS}} back to writing
-normal 32-bit code with the @samp{.code32} directive.
+be run in 16-bit mode.  You can switch @code{@value{AS}} to writing
+32-bit code with the @samp{.code32} directive or 64-bit code with the
+@samp{.code64} directive.
  
  @samp{.code16gcc} provides experimental support for generating 16-bit
  code from gcc, and differs from @samp{.code16} in that @samp{call},
@@ -492,7 +871,7 @@ value @samp{4} onto the stack, decrementing @samp{%esp} by 2.
  @end smallexample
  
  The same code in a 16-bit code section would generate the machine
-opcode bytes @samp{6a 04} (ie. without the operand size prefix), which
+opcode bytes @samp{6a 04} (i.e., without the operand size prefix), which
  is correct since the processor default operand size is assumed to be 16
  bits in a 16-bit code section.
  
@@ -522,25 +901,62 @@ register is @samp{%st(i)}.
  
  @cindex arch directive, i386
  @cindex i386 arch directive
+@cindex arch directive, x86-64
+@cindex x86-64 arch directive
  
  @code{@value{AS}} may be told to assemble for a particular CPU
-architecture with the @code{.arch @var{cpu_type}} directive.  This
+(sub-)architecture with the @code{.arch @var{cpu_type}} directive.  This
  directive enables a warning when gas detects an instruction that is not
  supported on the CPU specified.  The choices for @var{cpu_type} are:
  
  @multitable @columnfractions .20 .20 .20 .20
  @item @samp{i8086} @tab @samp{i186} @tab @samp{i286} @tab @samp{i386}
  @item @samp{i486} @tab @samp{i586} @tab @samp{i686} @tab @samp{pentium}
-@item @samp{pentiumpro} @tab @samp{k6} @tab @samp{athlon}
+@item @samp{pentiumpro} @tab @samp{pentiumii} @tab @samp{pentiumiii} @tab @samp{pentium4}
+@item @samp{prescott} @tab @samp{nocona} @tab @samp{core} @tab @samp{core2}
+@item @samp{corei7} @tab @samp{l1om}
+@item @samp{k6} @tab @samp{k6_2} @tab @samp{athlon} @tab @samp{k8}
+@item @samp{amdfam10} @tab @samp{amdfam15}
+@item @samp{generic32} @tab @samp{generic64}
+@item @samp{.mmx} @tab @samp{.sse} @tab @samp{.sse2} @tab @samp{.sse3}
+@item @samp{.ssse3} @tab @samp{.sse4.1} @tab @samp{.sse4.2} @tab @samp{.sse4}
+@item @samp{.avx} @tab @samp{.vmx} @tab @samp{.smx} @tab @samp{.xsave}
+@item @samp{.aes} @tab @samp{.pclmul} @tab @samp{.fma} @tab @samp{.movbe}
+@item @samp{.ept} @tab @samp{.clflush}
+@item @samp{.3dnow} @tab @samp{.3dnowa} @tab @samp{.sse4a} @tab @samp{.sse5}
+@item @samp{.syscall} @tab @samp{.rdtscp} @tab @samp{.svme} @tab @samp{.abm}
+@item @samp{.lwp} @tab @samp{.fma4} @tab @samp{.xop}
+@item @samp{.padlock}
  @end multitable
  
-Apart from the warning, there is only one other effect on
-@code{@value{AS}} operation;  If you specify a CPU other than
+Apart from the warning, there are only two other effects on
+@code{@value{AS}} operation;  Firstly, if you specify a CPU other than
  @samp{i486}, then shift by one instructions such as @samp{sarl $1, %eax}
  will automatically use a two byte opcode sequence.  The larger three
  byte opcode sequence is used on the 486 (and when no architecture is
  specified) because it executes faster on the 486.  Note that you can
  explicitly request the two byte opcode by writing @samp{sarl %eax}.
+Secondly, if you specify @samp{i8086}, @samp{i186}, or @samp{i286},
+@emph{and} @samp{.code16} or @samp{.code16gcc} then byte offset
+conditional jumps will be promoted when necessary to a two instruction
+sequence consisting of a conditional jump of the opposite sense around
+an unconditional jump to the target.
+
+Following the CPU architecture (but not a sub-architecture, which are those
+starting with a dot), you may specify @samp{jumps} or @samp{nojumps} to
+control automatic promotion of conditional jumps. @samp{jumps} is the
+default, and enables jump promotion;  All external jumps will be of the long
+variety, and file-local jumps will be promoted as necessary.
+(@pxref{i386-Jumps})  @samp{nojumps} leaves external conditional jumps as
+byte offset jumps, and warns about file-local conditional jumps that
+@code{@value{AS}} promotes.
+Unconditional jumps are treated as for @samp{jumps}.
+
+For example
+
+@smallexample
+ .arch i8086,nojumps
+@end smallexample
  
  @node i386-Notes
  @section Notes
@@ -548,8 +964,10 @@ explicitly request the two byte opcode by writing @samp{sarl %eax}.
  @cindex i386 @code{mul}, @code{imul} instructions
  @cindex @code{mul} instruction, i386
  @cindex @code{imul} instruction, i386
+@cindex @code{mul} instruction, x86-64
+@cindex @code{imul} instruction, x86-64
  There is some trickery concerning the @samp{mul} and @samp{imul}
-instructions that deserves mention.  The 16-, 32-, and 64-bit expanding
+instructions that deserves mention.  The 16-, 32-, 64- and 128-bit expanding
  multiplies (base opcode @samp{0xf6}; extension 4 for @samp{mul} and 5
  for @samp{imul}) can be output only in the one operand form.  Thus,
  @samp{imul %ebx, %eax} does @emph{not} select the expanding multiply;