The source code is written in plain text. The processor (a 386 or better processor in this case) does not know how to execute code written in ASCII. The assembler (as) is used to translate the text code (somewhat human friendly) to binary code (understood by the processor).
In the simplest case, this one step should be sufficient. However, for more complex programs, the source is typically broken into multiple files for better maintenance. As a result, a single source file may not contain all the necessary components for a program.
The following command
as --gstab -o test.o test.s
outputs an object file, test.o. An object file contains instructions in binary code, but it also contains other types of information. For an object file that corresponds to source code with unresolved symbolic references, the object file has a non-empty section that lists all of these unresolved reference. Each entry includes the symbolic name as well as offsets in the code segment that refers to the symbolic name.
Each object file also
contains other book keeping information. As mentioned before,
the --gstab option means that the generated object file
includes extra debugging information. This information is represented
by a table that maps addresses in the object file
to line numbers in the source file.
The linker (ld) is responsible to take a look at all the supplied object files and library files, and try to resolve all the symbolic references. A library file is kind of like a concatination of many object files. Each library file groups related object files together for easier handling. If the linker is successful in resolving all the unresolved symbolic references, it concatinates the binary code from all the object files as well as the needed portions of library files. Then, the linker ``fixes up'' the unresolved references.
The linker outputs an executable file that can be invoked by the operating system. Note that the linker can only resolve symbolic references among the object files and libraries. However, it does not know exactly where in memory the program will be loaded when it is run. Some instructions are sensitive to the actual location when the program is run. The executable file contains a table that lists offsets of all locations that need to be fixed up when the program is loaded.
When the operating system invokes a problem, the image of binary code in the executable file is loaded into memory of the computer. A special component of the OS, called the loader, is responsible for this. The loader is also responsible for fixing up relocation sensitive references.
All of this may sound abstract and difficult to understand. Once we start to talk about addressing modes and instructions, some of this confusion will go away.
Copyright © 2009-04-16 by Tak Auyeung