Source code format
A typical line in assembly language programme might be as follows:
LOOP: MOV.B r0, #80 ;initialise counter
This line will be assembled into a single instruction (in this case 11 0000 1000 0000 in binary, or 3080); the assembly language and the machine code correspond to each other.
It has four parts; label, mnemonic, operand, comment; not all are present in every line.
The first part (LOOP in this example) is a label ; this is a word, invented by the programmer, which identifies this point in the program. It will be set equal to the value of the address where this instruction is stored. So, for example, if later in the programme there is a statement JMP LOOP, the assembler programme will replace the label LOOP with the actual value of LOOP, which is the address at which this instruction is stored. (For the assembler to recognise this as a label, the label must begin at the first character in the line., in some assemblers a colon ":" follows the label) So if the address at which the instruction is stored is 4F, LOOP takes on the value 4F. If LOOP is used later in the programme, the assembler will give it the value 4F.
The second part is the mnemonic .This corresponds to a particular kind of instruction (opcode sent by the Dispatch Unit). The intention is that the word chosen (by the manufacturers) for the mnemonic is easy to remember, and indicates what the instruction does. In this case, the instruction moves a literal value (one byte) into the register 0, hence MOV.B r0,#80
The third part of the line is an operand (there may be two); in this case the operand is the value 80 (in Hex) and the register r0.
The last part of the line is a comment. This does not affect the actual instruction at all; it is not part of the instruction, and is not assembled; instead it helps the programmer to remember what this part of the program does. The comment is preceded by a semi-colon.
When you have written a programme in assembly language, it actually consists of lots of ASCII characters; this would be stored in a file and called "source code". This then forms the input to a program called an "assembler" (MPASM for the PIC) which can translate the "source code" into machine code. or object code. When the assembler has done its work, this line of source code will have been translated into a binary pattern, associated with a particular address in the programme memory.
Addressing modes
What different ways are there of specifying where data is coming
from (reading), or where it is going to (writing)? We have seen
two of them already:
Immediate data is actually specified in the instruction;
this data is a fixed part of the program.
Direct addressing is where the address of the data (that is, the address of the register) is specified. Thus
MOV.B r0 #0F ;immediate data: value is F
MOV.B r0 0F ;direct addressing: address is F
In the second example, 0F is not data, but the address into which the data will go.
Most CPUs support another way of specifying data, called indirect; in this case, you specify not the address of the register which contains the data, but the address of the register which contains the address of the data. So, the instruction
MOV.B r0, (r1)
copies the value of r1 as an address, and uses this address as the source address in memory of the data to be placed in r0.
Why should one do this? Well, for example, suppose that a set of ASCII characters was stored in successive registers starting from 0C; then the following code might be useful:
MOV r0, #0C ;load base address of string into r0
LOAD: MOV r1,(r0) ;load contents into r1
CALL PRINT ; call a print routine to print the character in
r1
INC r0 ;point to next character
JMP LOAD ;load next character
Thus we could use this set of instructions to print out a list of characters. (In practice we need a way of stopping the loop!)
More complicated processors have a much more complete set of "addressing modes", and usually allow an offset to be added to the index value (r0 above) before it is used as an address. However, the basic set of immediate, direct and indirect methods is very powerful.
Many instructions operate on the eight bits of data stored in one of the registers. We have looked already at the MOV instructions which does this. For example, MOV r0,r1 takes the contents of file register r1, and copies it to r0.
Consider first those instructions, like MOVF, whose input is the data stored in one file register, and leave until later those instructions which take in the contents of W as well.
In each case, the answer can be written to W (d = 0) or back to f (d = 1); if the answer is written in W, then the contents of f remain unchanged.
AND.B r0, #0 ;gives an answer of 0
XOR.B r0,#FF ;inverts each bit
DEC.B r0 ;decreases number by 1
INC.B r0 ;increases number by 1
In each case, the operation takes place on just one byte; and that sometimes the status register bits (e.g.Z) are affected.
A second group of these instructions accepts a word as input:
ADD.W r0,r1 ; r0 + r1
SUB.W r0,r1 ; r0 - r1
Logical operations: each of these work on each bit of the inputs
AND.W r0,r1 ;each bit is the result of ANDing two bits
OR.W r0,r1 ;each bit is the result of ORing two bits
XOR.W r0,r1 ;each bit is the result of XORing two bits
Assembly language directives
There are some more basic ideas which need to be covered before you could write a complete programme in assembly language, although you will probably not need this until you actually come to write a real program. One important idea is that of "assembler directives". These are statements which are part of the source code which the assembler uses, but which do not correspond to actual instructions. I shall cover the directives ORG, EQU, END, INCLUDE, PROCESSOR, and RADIX. A complete list is given in the MPASM user's guide.
PROCESSOR
The statement PROCESSOR 16C71 is clearly not part of the programme; the 16C71 does not need to be told what kind of processor it is! The point is that the MPASM assembler programme does need to know: it can deal with a number of different processors in the PIC range, so you must tell it which one you are using. This is a very clear example of an assembler directive, which does not get translated into actual code, but affects the way in which code is translated.
RADIX
For example, RADIX HEX. This tells the assembler that unless you specify otherwise, numbers will be in hexadecimal. Other possibilities are DEC and OCT, for decimal and octal. Note that within a programme you can still use another radix: for example, you could say B'10' which would be interpreted as the binary number 10, which is 210; if you just wrote 10, it would interpret it as the hexadecimal number 10, which is 1610. I shall always assume in the examples I give you that numbers are in hex format, and specify explicitly when they are in binary or decimal. It is worth putting a 0 in front sometimes, so that it is quite clear that 0FF is a number, not a label "FF".
ORG
The directive ORG does not get translated into any code; what
it does is to set the address at which the next instruction
will be stored.
Example:
ORG 0
GOTO 10
The instruction GOTO 10
is translated into the machine code 10 1000 0001 0000
But it will be placed at the address 0. ORG 0 does not get translated into any code; but it affects where the code for GOTO 10 is placed.
EQU
Example:
START EQU 10
The directive EQU stands for "equals"; it allows you to give a numerical value to a label. In this example, the assembler would always replace START with 1016. Hence you could write :
START EQU 10
ORG START
GOTO 10
The assembler would translate START as 10, and load the code at $10, as before. The use of labels often makes a programme easier to understand and to modify.
For example, it is easier to remember the names of the special function registers than their addresses.
Example
PCL EQU 2 ;programme counter lower byte
STATUS EQU 3 ;status register
PORTA EQU 5 ;Port A
PORTB EQU 6 ;Port B
It is also easier to remember names than addresses of general purpose file registers. Suppose you want to use one of them as a counter; then you could define
COUNT EQU 0C ;counter
and then subsequently, an instruction like
DECF COUNT ;decrease counter
is easier to understand.
INCLUDE
This is a useful directive, specially when you have a number of standard definitions that you always want to begin your programmes with; there could be a PROCESSOR statement, a RADIX statement, and a series of EQU statements, defining all the standard special function registers. If you wrote them once in an assembly language file, called for example PIC16C71.H, then the statement
INCLUDE PIC16C71.H
would add those standard definitions, without having to write
them out again.
END
This is not the end of the programme that the PIC runs!
Such programmes are usually endless loops; once the PIC is switched
on, it runs in the loop for ever. Instead, END is an instruction
or directive to the assembler, to say "This is the end of
my programme; you can finish now!"
See also:
Authot: Gorry Fairhurst (Email: G.Fairhurst@eng.abdn.ac.uk)