Gnu Assembler Examples
本文翻译自gnu汇编例子
GNU Assembler,它的汇编器也叫gas,是GNU操作系统的默认汇编器。 它适用于许多不同的架构,并支持多种汇编语言语法。 本文下面的示例,仅适用于使用x86-64平台的Linux操作系统。
本文目录:
- 开始
- 使用C库
- 64位C代码调用约定
- C和汇编混合编程
- 命令行参数
- 浮点指令
- 数据段
- 递归
- SIMD并行性
- Saturation算术
- 本地变量和堆栈帧
入门
这是经典的Hello World程序,使用Linux系统调用write和exit,用于64位系统(汇编指令与32位不同):
# ----------------------------------------------------------------------------------------
# Writes "Hello, World" to the console using only system calls. Runs on 64-bit Linux only.
# To assemble and run:
#
# gcc -c hello.s && ld hello.o && ./a.out
#
# or
#
# gcc -nostdlib hello.s && ./a.out
# ----------------------------------------------------------------------------------------
.global _start
.text
_start:
# write(1, message, 13)
mov $1, %rax # system call 1 is write
mov $1, %rdi # file handle 1 is stdout
mov $message, %rsi # address of string to output
mov $13, %rdx # number of bytes
syscall # invoke operating system to do the write
# exit(0)
mov $60, %rax # system call 60 is exit
xor %rdi, %rdi # we want return code 0
syscall # invoke operating system to exit
message:
.ascii "Hello, world\n"
用gcc编译
$ gcc -c hello.s && ld hello.o && ./a.out
Hello, World
如果您使用的是OSX或Windows操作系统,系统调用号和使用的寄存器可能会有所不同(gcc、as当然也支持这两个平台)。
使用C库
一般来说,都会需要使用C库。 下面是调用C库(puts函数)的Hello World:
# ----------------------------------------------------------------------------------------
# Writes "Hola, mundo" to the console using a C library. Runs on Linux or any other system
# that does not use underscores for symbols in its C library. To assemble and run:
#
# filename: hola.s
# gcc hola.s && ./a.out
# ----------------------------------------------------------------------------------------
.global main
.text
main: # This is called by C library's startup code
mov $message, %rdi # First integer (or pointer) parameter in %rdi
call puts # puts(message)
ret # Return to C library code
message:
.asciz "Hola, mundo"
运行
$ gcc hola.s && ./a.out
Hola, mundo
64位C代码的调用约定(Calling Conventions)
64位调用约定有一些更详细的说明,并在AMD64 ABI参考文献中进行了全面的解释。你也可以在维基百科获取他们的信息。最重要的几点是(再次,对于64位Linux,而不是Windows):
- 从左到右,传递寄存器的参数。分配寄存器的顺序是:
- 对于整数和指针,rdi,rsi,rdx,rcx,r8,r9。
- 对于浮点(float,double),xmm0,xmm1,xmm2,xmm3,xmm4,xmm5,xmm6,xmm7
- 额外的参数被push到堆栈,从右到左,并在调用后被删除。
- 参数压入堆栈后,调用call指令,所以当被调用函数开始运行时,返回地址为(%rsp),第一个内存参数的地址为8(%rsp)等。
- 在call之前,堆栈指针RSP必须对齐到16字节边界。ok,但进行call是将8字节的返回地址压入了堆栈了,所以当被调用函数运行时,%rsp并不对齐。你必须通过push某个东西或从%rsp中减去8,来增加额外的空间。
- 需要被调用函数保留的唯一寄存器(calle-save寄存器)是:rbp,rbx,r12,r13,r14,r15。其他寄存器都可以被随意改变。
- 被调用函数也应该保存XMCSR和x87控制字的控制位,但是x87指令在64位代码中很少见,所以可以不用担心。
- 整数以 rax 或 rdx:rax 返回,浮点值在 xmm0 或 xmm1:xmm0 中返回。
该程序输出前几个斐波纳契数字,演示了如何保存和恢复寄存器:
# -----------------------------------------------------------------------------
# A 64-bit Linux application that writes the first 90 Fibonacci numbers. It
# needs to be linked with a C library.
#
# Assemble and Link:
# gcc fib.s
# -----------------------------------------------------------------------------
.global main
.text
main:
push %rbx # we have to save this since we use it
mov $90, %ecx # ecx will countdown to 0
xor %rax, %rax # rax will hold the current number
xor %rbx, %rbx # rbx will hold the next number
inc %rbx # rbx is originally 1
print:
# We need to call printf, but we are using eax, ebx, and ecx. printf
# may destroy eax and ecx so we will save these before the call and
# restore them afterwards.
push %rax # caller-save register
push %rcx # caller-save register
mov $format, %rdi # set 1st parameter (format)
mov %rax, %rsi # set 2nd parameter (current_number)
xor %rax, %rax # because printf is varargs
# Stack is already aligned because we pushed three 8 byte registers
call printf # printf(format, current_number)
pop %rcx # restore caller-save register
pop %rax # restore caller-save register
mov %rax, %rdx # save the current number
mov %rbx, %rax # next number is now current
add %rdx, %rbx # get the new next number
dec %ecx # count down
jnz print # if not done counting, do some more
pop %rbx # restore rbx before returning
ret
format:
.asciz "%20ld\n"
运行:
$ gcc fib.s && ./a.out
0
1
1
2
3
...
420196140727489673
679891637638612258
1100087778366101931
1779979416004714189
和C语言的混合编程
这个64位例子,是一个非常简单的函数,它读入3个64位整数并返回最大值。 它演示了如何提取整数参数:它们将被push到堆栈,以便在进入该函数时,它们将分别在rdi,rsi和rdx中。 返回值是一个整数,所以把它放在了rax中。
# -----------------------------------------------------------------------------
# A 64-bit function that returns the maximum value of its three 64-bit integer
# arguments. The function has signature:
#
# int64_t maxofthree(int64_t x, int64_t y, int64_t z)
#
# Note that the parameters have already been passed in rdi, rsi, and rdx. We
# just have to return the value in rax.
# -----------------------------------------------------------------------------
.globl maxofthree
.text
maxofthree:
mov %rdi, %rax # result (rax) initially holds x
cmp %rsi, %rax # is x less than y?
cmovl %rsi, %rax # if so, set result to y
cmp %rdx, %rax # is max(x,y) less than z?
cmovl %rdx, %rax # if so, set result to z
ret
调用这段汇编代码的C程序:
/*
* callmaxofthree.c
*
* A small program that illustrates how to call the maxofthree function we wrote in
* assembly language.
*/
#include <stdio.h>
#include <inttypes.h>
int64_t maxofthree(int64_t, int64_t, int64_t);
int main() {
printf("%ld\n", maxofthree(1, -4, -7));
printf("%ld\n", maxofthree(2, -6, 1));
printf("%ld\n", maxofthree(2, 3, 1));
printf("%ld\n", maxofthree(-2, 4, 3));
printf("%ld\n", maxofthree(2, -6, 5));
printf("%ld\n", maxofthree(2, 4, 6));
return 0;
}
汇编、链接、运行这两段代码:
$ gcc -std=c99 callmaxofthree.c maxofthree.s && ./a.out
1
2
3
4
5
6
命令行参数
大家知道在C中,main只是一个简单的函数,它有两个自己的参数:
int main(int argc,char ** argv)
下面这个例子,简单地打印一个程序的命令行参数,每行一个:
# -----------------------------------------------------------------------------
# A 64-bit program that displays its commandline arguments, one per line.
#
# On entry, %rdi will contain argc and %rsi will contain argv.
# -----------------------------------------------------------------------------
.global main
.text
main:
push %rdi # save registers that puts uses
push %rsi
sub $8, %rsp # must align stack before call
mov (%rsi), %rdi # the argument string to display
call puts # print it
add $8, %rsp # restore %rsp to pre-aligned value
pop %rsi # restore registers puts used
pop %rdi
add $8, %rsi # point to next argument
dec %rdi # count down
jnz main # if not done counting keep going
ret
format:
.asciz "%s\n"
运行结果:
$ gcc echo.s && ./a.out 25782 dog huh $$
./a.out
25782
dog
huh
9971
$ gcc echo.s && ./a.out 25782 dog huh '$$'
./a.out
25782
dog
huh
$$
请注意,就C Library而言,命令行参数始终是字符串。 如果要将它们视为整数,需要调用atoi。 这是一个计算x的y次方的小程序。 该示例的另一个功能是它显示了如何将值限制为32位。
# -----------------------------------------------------------------------------
# A 64-bit command line application to compute x^y.
#
# Syntax: power x y
# x and y are integers
# -----------------------------------------------------------------------------
.global main
.text
main:
push %r12 # save callee-save registers
push %r13
push %r14
# By pushing 3 registers our stack is already aligned for calls
cmp $3, %rdi # must have exactly two arguments
jne error1
mov %rsi, %r12 # argv
# We will use ecx to count down form the exponent to zero, esi to hold the
# value of the base, and eax to hold the running product.
mov 16(%r12), %rdi # argv[2]
call atoi # y in eax
cmp $0, %eax # disallow negative exponents
jl error2
mov %eax, %r13d # y in r13d
mov 8(%r12), %rdi # argv
call atoi # x in eax
mov %eax, %r14d # x in r14d
mov $1, %eax # start with answer = 1
check:
test %r13d, %r13d # we're counting y downto 0
jz gotit # done
imul %r14d, %eax # multiply in another x
dec %r13d
jmp check
gotit: # print report on success
mov $answer, %rdi
movslq %eax, %rsi
xor %rax, %rax
call printf
jmp done
error1: # print error message
mov $badArgumentCount, %edi
call puts
jmp done
error2: # print error message
mov $negativeExponent, %edi
call puts
done: # restore saved registers
pop %r14
pop %r13
pop %r12
ret
answer:
.asciz "%d\n"
badArgumentCount:
.asciz "Requires exactly two arguments\n"
negativeExponent:
.asciz "The exponent may not be negative\n"
运行结果:
$ ./power 2 19
524288
$ ./power 3 -8
The exponent may not be negative
$ ./power 1 500
1
练习:重写这个例子,使用64位整数。 需要用strtol替换掉atoi。
浮点指令
浮点的参数放在xmm寄存器中。 这是一个简单的函数,对double数组中的值进行求和:
# -----------------------------------------------------------------------------
# A 64-bit function that returns the sum of the elements in a floating-point
# array. The function has prototype:
#
# double sum(double[] array, unsigned length)
# -----------------------------------------------------------------------------
.global sum
.text
sum:
xorpd %xmm0, %xmm0 # initialize the sum to 0
cmp $0, %rsi # special case for length = 0
je done
next:
addsd (%rdi), %xmm0 # add in the current array element
add $8, %rdi # move to next array element
dec %rsi # count down
jnz next # if not done counting, continue
done:
ret
调用他的c代码:
/*
* callsum.c
*
* Illustrates how to call the sum function we wrote in assembly language.
*/
#include <stdio.h>
double sum(double[], unsigned);
int main() {
double test[] = {
40.5, 26.7, 21.9, 1.5, -40.5, -23.4
};
printf("%20.7f\n", sum(test, 6));
printf("%20.7f\n", sum(test, 2));
printf("%20.7f\n", sum(test, 0));
printf("%20.7f\n", sum(test, 3));
return 0;
}
运行:
$ gcc callsum.c sum.s && ./a.out
26.7000000
67.2000000
0.0000000
89.1000000
数据段
在大多数操作系统上,代码段是只读的,数据段仅用于初始化数据,并且有一个特殊的.bss段用于未初始化的数据。
下面是一段代码,它计算命令行参数的平均值,预期为整数,并将结果显示为浮点数。注意特殊的是:代码里有 .text .data两个段标识,分别代表代码段和数据段的开始
# -----------------------------------------------------------------------------
# 64-bit program that treats all its command line arguments as integers and
# displays their average as a floating point number. This program uses a data
# section to store intermediate results, not that it has to, but only to
# illustrate how data sections are used.
# -----------------------------------------------------------------------------
.globl main
.text
main:
dec %rdi # argc-1, since we don't count program name
jz nothingToAverage
mov %rdi, count # save number of real arguments
accumulate:
push %rdi # save register across call to atoi
push %rsi
mov (%rsi,%rdi,8), %rdi # argv[rdi]
call atoi # now rax has the int value of arg
pop %rsi # restore registers after atoi call
pop %rdi
add %rax, sum # accumulate sum as we go
dec %rdi # count down
jnz accumulate # more arguments?
average:
cvtsi2sd sum, %xmm0
cvtsi2sd count, %xmm1
divsd %xmm1, %xmm0 # xmm0 is sum/count
mov $format, %rdi # 1st arg to printf
mov $1, %rax # printf is varargs, there is 1 non-int argument
sub $8, %rsp # align stack pointer
call printf # printf(format, sum/count)
add $8, %rsp # restore stack pointer
ret
nothingToAverage:
mov $error, %rdi
xor %rax, %rax
call printf
ret
.data
count: .quad 0
sum: .quad 0
format: .asciz "%g\n"
error: .asciz "There are no command line arguments to average\n"
递归
汇编做递归感觉会比较怪,但也许会令你惊讶的是,实现递归函数没有什么特殊的要求。 你需要做的只是,按照通常的方法小心的保存寄存器。 下面是一个例子。
C语音版本:
uint64_t factorial(unsigned n) {
return (n <= 1) ? 1 : n * factorial(n-1);
}
汇编版本:
# ----------------------------------------------------------------------------
# A 64-bit recursive implementation of the function
#
# uint64_t factorial(unsigned n)
#
# implemented recursively
# ----------------------------------------------------------------------------
.globl factorial
.text
factorial:
cmp $1, %rdi # n <= 1?
jnbe L1 # if not, go do a recursive call
mov $1, %rax # otherwise return 1
ret
L1:
push %rdi # save n on stack (also aligns %rsp!)
dec %rdi # n-1
call factorial # factorial(n-1), result goes in %rax
pop %rdi # restore n
imul %rdi, %rax # n * factorial(n-1), stored in %rax
ret
用C语音调用这个递归函数:
/*
* An application that illustrates calling the factorial function defined elsewhere.
*/
#include <stdio.h>
#include <inttypes.h>
uint64_t factorial(unsigned n);
int main() {
for (unsigned i = 0; i < 20; i++) {
printf("factorial(%2u) = %lu\n", i, factorial(i));
}
}
SIMD并行性
XMM寄存器可以对浮点值进行运算,每条指令可以进行一次或多次的算术运算。
指令形式如下:
operation xmmregister_or_memorylocation,xmmregister
对于浮点加法运算,指令如下:
- addpd: 做2个双精度加法
- addps: 做一个双精度加法,使用寄存器的低64位
- addsd: 做4个单精度加法
- addss: 做1个单精度加法,使用寄存器的低32位
练习: 实现一个计算浮点数组的函数,一次计算4个。
Saturation算术
Saturation 算术是一种特殊的算术,其中所有的操作(如加法和乘法)被限制在最小值和最大值之间的固定范围内。
XMM寄存器也支持这种操作,可以在浮点处理器上进行整数的算术运算。 指令的格式如下:
operation xmmregister_or_memorylocation,xmmregister
对于整数加法,指令如下:
- paddb: 16字节加法
- paddw: 8个word 的加法
- paddd: 4个dword的加法
- paddq: 2个qword的加法
- paddsb:16字节加法,带符号(80..7F,括号内表示Saturation的范围,下同)
- paddsw:8个word的加法,无符号(8000..7FFF)
- paddusb:16字节加法,无符号(00..FF)
- paddusw:8个word的加法,无符号 (00..FFFF)
本地变量和堆栈帧
首先,请阅读Eli Bendersky的文章,里面的概述比本文的简要说明更完整。
当调用函数时,调用者首先将参数放入正确的寄存器中,然后发出call指令。 超出寄存器数目的额外的参数,将在call指令之前被push到堆栈。
call指令将返回地址放在堆栈顶部。
比如在下面这段代码:
long example(long x, long y) {
long a, b, c;
b = 7;
return x * b + y;
}
在进入example函数时,x将在%edi中,y将在%esi中,返回地址将位于堆栈的顶部。 我们在哪里可以放置局部变量? 最简单的选择是当然还是堆栈,当然如果有足够的registers,也可以使用。
如果你在遵循标准ABI的计算机上运行,则可以将%rsp保留不变,并通过%rsp访问“额外的参数”和本地变量,例如:
+----------+
rsp-24 | a |
+----------+
rsp-16 | b |
+----------+
rsp-8 | c |
+----------+
rsp | retaddr |
+----------+
rsp+8 | caller's |
| stack |
| frame |
| ... |
+----------+
这样函数就会简化为:
.text
.globl example
example:
movl $7, -16(%rsp)
mov %rdi, %rax
imul 8(%rsp), %rax
add %rsi, %rax
ret
如果我们的函数是要call另一函数,那么则必须调整%rsp来正确返回。
在Windows上,却不能使用此方法,因为如果发生interrupt,堆栈指针上方的所有内容都将被抹去。 这在大多数其他操作系统上都不会发生,因为堆栈指针有一个128字节的“red zone”,发生interrupt后是安全的。 在这种情况下,你可以在进入被调用函数后,立即在堆栈上开出一段内存出来:
example:
sub $24, %rsp
此时的堆栈看起来是这样:
+----------+
rsp | a |
+----------+
rsp+8 | b |
+----------+
rsp+16 | c |
+----------+
rsp+24 | retaddr |
+----------+
rsp+32 | caller's |
| stack |
| frame |
| ... |
+----------+
下面是最后的代码。要注意,必须在返回之前恢复堆栈指针(add $24, %rsp)!
.text
.globl example
example:
sub $24, %rsp
movl $7, 8(%rsp)
mov %rdi, %rax
imul 8(%rsp), %rax
add %rsi, %rax
add $24, %rsp
ret