An assembler is the equivalent of a compiler for an assembly language. The assembler (symbolic assembly program) converts programs written in a symbolic assembly language into binary words in machine code. Because such languages are machine-oriented, assemblers, too, are machine oriented.
Assemblers process three types of information:
There are two types of assembler: the one-pass (or load-and-go) assembler, and the two-pass assembler.
In the one-pass assembler, the program is assembled in memory and is then executed. Any library routines to which reference is made are automatically incorporated.
The two-pass assembler devotes the first pass to building the local symbol table of labels used within the program or routine and the global symbol table of external names (i.e. the references to other programs and subprograms, either those assembled with the program, or those in the program and routine library). The assembler carries out the translation using these tables, in the second pass. External references are resolved at this stage by searching the index of the program library, locating, retrieving and appending these programs to the routine in memory.Because it is easier to write a program by subdividing into subprograms or subroutines, most assemblers have been written to assemble subroutines or subprograms and then combine them into a single entity.
The output of the assembler is in binary-symbolic or semi-compiled form and it is converted into absolute machine code by a loader. The assembler also produces the following information:
For example, the IBM macro DUMP causes the CPU to print out, line by line, the contents of every location in memory and then perform all the operations necessary to terminate the program currently in the computer's memory. This is a complicated operation: one assembly language instruction has been converted into many machine code instructions.
The most frequent use of macro instructions is in Input/Output. Before a record can be read into the memory of the computer, there are many checks that need to be performed to ensure that the correct disk is being used, or that all error conditions have been covered, etc. Perhaps records have been blocked to save time on I/O and the user needs to be certain that the correct record is extracted from the block. The programming involved in this sort of operation is long and complex. However, if these instructions are written as a macro, they need only be written once, then everyone can use them.
Other features include: subroutines, local and global variables, comments and relocatability. The output from an assembler may consist of:
Assembly languages may be cross-assembled; the assembler in this case converts the assembly language into machine code for a machine different to that for which the assembly language has been written.
A self assembly language is one in which the assembler is itself written in the assembly language and which converts itself into machine code (see the compiler-compiler in any book on high level language compilation).
Programs in high level languages are known as source programs. They cannot be executed directly and have to be converted into object programs in machine code. The conversion is carried out by a program called a compiler. This uses the statements of the source program as input data and outputs an object program that is its machiile code equivalent. In most cases, the compiler generates up to 25 object program instructions from one source program instruction.
The compiler is language- and machine-dependent. Among other things, it performs lexical, and syntactical analysis of source code, retrieves subroutines from the system source statement library as appropriate, allocates storage and creates machine code.
Some compilers make two passes through a source program. Others produce very effcient object programs by eliminating redundant coding. These are known as optimising compilers. The compiler also produces listings that contain a copy of the source program, compiler diagnostics (error messages) and a memory map (relocation information about variable and program storage addresses). Some compilers list the assembly language program equivalent to the machine code produced.
Interactive programming languages are used from a terminal device such as a VDU. One such language is BASIC: In this type of operation, and especially in micro-computer systems with small amounts of available memory, the source programs are converted by an interpreter. This executes a source program on a step-by-step, line-by-line, or unit-by-unit basis, i.e. it executes the smallest meaningful unit in the programming language. The output from an interpreter is an answer, i.e. the result of an action in the program.
An interpreter keeps a user program in an abbreviated form. It is most useful when programs are large relative to the available storage or in timesharing environments where programs should be kept small. An advantage of the interpreting method is the more economic error detection. There are drawbacks to interpreting: e.g. certain phases of processing and analysis are repeated, thus making the process slow.
However, a compiler generates code that takes a lot of storage and, while compiled code might run 10 or 20 times faster, the interpreted code would be economical on storage. Most interpreters do not work on the source language as supplied by the user. Instead, the source code is checked for syntax errors and pre-translated into a concise and easily manipulated form.
One way of combining the merits of an interpreter and a compiler is to write both and fit the two together, so parts of a user program are compiled and part interpreted. This is called a mixed-code approach. In most user programs, 5% of the program would be compiled and the rest interpreted. The resultant program would run faster than a wholly interpreted version, but would be a little bigger.
Disadvantages of this approach are the need to write both a compiler and an interpreter, and the impossibility of maintaining a compatible interface between the two throughout the life of the software.
When a program has been compiled, the object program is in binary. In all but a few cases, however, it is not in a state that can be run. There are cross references that must be resolved and external routines that must be included. The program must be made relocatable, i.e. independent of the address in memory where the beginning of the program is stored. A program known as the linkage editor has as its input the binary output from the compiler. It keeps a record of the beginning of the program and adjusts all addresses to be relative to this base. It checks that the program size does not exceed the available memory space. It keeps track of cross-references. The relocatable program and subroutine library are automatically searched in order to include routines, functions and procedures used by the program.
The output from the linkage editor is in relocatable binary. To get this into absolute binary in the memory of the computer, another program, the relocating loader, is used. In some computer systems, the output of the compiler may be linked and loaded in one pass by a linking loader. Frequently used programs will reside in the memory image library. (They are stored there from the linkage editor.)