Acorn Electron World - Acorn Electron User Guide (English)

Chapter 29. Assembly Language

The computer's 'brain' has its own language, and that language is not BASIC. Every time you run a BASIC program, each line has to be translated before this brain (the computer's central processor unit) can understand it all. This translation is accomplished by a device called an interpreter, which resides in the computer's memory. The action of this device need not concern you, but it is itself a program written in machine-code, and machine-code is the computer's own language.

There are 33 different instructions in machine code, which is about half the number of BASIC instructions available on the Electron. Each of these instructions acts upon one or more of the registers inside the 6502 microprocessor (6502 is the type-number of this processor - it has no significance). A register is just like a byte of memory. The 6502 contains six registers, five of them being 1 byte long, and the last being 2 bytes long. These registers are not a part of the computer's memory map (from location &0000 to &FFFF(; they live an entirely separate existence in the heart of the microprocessor. But the machine-code instructions which control these registers are stored in the computer's memory, in the position on the memory map labelled 'operating system'. These instructions don't look much like intelligible commands, for they are simply binary numbers - 1010100100001010 for example. It is very difficult to program using such low-level instructions; even in hexadecimal they hardly look any better: A9 0A. This is the reason for using Assembly Language.

Assembly Language uses a three-letter mnemonic to represent each machine-code instruction. Each mnemonic is a contraction of the action-in-words of that instruction.

Take the instruction given above. One of the registers in the microprocessor is called the accumulator (all the registers will be described in detail in a moment).

A9 means 'load the accumulator'.

The mnemonic for this is LDA, thereby giving you a rough guide to its function, LoaD Accumulator.

The other part of the instruction, 0A, is 10 in decimal. So A9 A0 means 'put 10 in the accumulator', and this is written in Assembly Language as:

LDA#10

(The hash (#) tells the computer that it is the 10 which is to be put into the accumulator, and not the contents of memory location 10. This will be explained in a moment.)

So, each of the 55 machine-code instructions is assigned a three-letter assembly mnemonic, which enables you the programmer to understand the function of each without having to look it up on a chart.

The Electron has another program in its memory, called an assembler, and this converts the Assembly Language directly into machine-code. During this assembly process, the computer can help you by giving error messages and a listing of the machine-code in hex. (If you were programming the 6502 direct in machine-code there would be no error messages at all - and just try finding a mistake among a few hundred machine code instructions!)

The assembler loads the machine-code into memory, and it can then by run, either as a CALL or USR from BASIC, or by using *RUN.

Registers In The 6502

The 6502 microprocessor has six registers as follows:

Accumulator

The accumulator is the main working register of the processor. Most of the 55 Assembly Language instructions operate on the accumulator, which gained its name from the way that results of arithmetic operations are 'accumulated'. It is an 8-bit register, meaning that it can store and operate upon eight binary digits (one byte). Each bit is designated a number, from 0 for the least significant (rightmost) to 7 for the most significant (leftmost).

Common operations involving the accumulator are:

Loading it from memory (the locations &0000 to &FFFF).
Storing its contents in memory.
Addition or subtraction.
Logical functions (AND, OR or EOR).
Shifting its contents left or right.

Index Registers X and Y

The two index registers are each 8-bits long, and are used for the following:

To be added to the address used by an instruction. This is called indexing.
As general purpose registers for various counting or short term memory duties.
In addition to the above, both the accumulator and the two index registers are used by the Electron to pass parameters to operating system subroutine calls. This will be explained later.

Program Counter

The program counter is the only 16-bit register, and it holds the memory address of the next instruction to be executed.

Operations involving the program counter are:

Jump and branch instructions which alter the contents of the PC and thereby divert the flow of the program. (Much like GOTO in BASIC.)

Stack Pointer

The stack pointer is an 8-bit register, with a ninth bit on the most significant end which is always set to 1. It is an address pointer which gives the location in memory of a special kind of data-structure used by computers called the stack. It can point to addresses between &0100 and &01FF. The stack is explained later, but in essence it is a section of memory which has not only a position, but also an order. Thus, data which is pushed on to the stack in one order, can only be pulled off it in the reverse order. This sort of memory is called last in first out (LIFO). It is used for storing data in which the order is important, e.g. execution addresses of nested subroutines.

Flags Register

The flags register is different from all the others in that it operates as seven single-bit registers: N, V, B, D, I, Z, C. Each bit signals a condition in the processor, and certain instructions act upon these conditions (whether that condition is present, true; or is not present, false).

Each bit acts as follows:

Bit N is set to 1 when the last operation produced a negative result. A negative result is signified by the most significant bit of a register being 1 (the sign bit). In the case of the accumulator, a result inside it of, for example, 10010100 would set bit N of the flags register to 1. If the last operation did not produce a negative result then bit N is reset to 0.

Bit V is set to 1 when the last operation overflowed into the sign bit. As stated above, the sign bit is bit 7 in the case of the accumulator, so bit V is set to 1 when there is a carry from bit 6 to bit 7. This is important to know when using twos complement arithmetic, for it means that an error has occurred which must be corrected.

Bit B is set to 1 when the BRK command is used (break). (This command has much the same effect on a machine-code program that ESCAPE has on a BASIC program.)

Bit D, when set to 1, causes the processor to operate in BCD mode (binary coded decimal). When reset to 0, the processor works as normal in binary. BCD is beyond the scope of this book, and need not concern you.

Bit I is the interrupt mask. When it is set to 1, no interrupts are accepted. Interrupts are also beyond the scope of this book.

Bit Z is set to 1 when the last operation produced a zero result.

Bit C is the carry register. It is set to 1 by a carry from the most significant bit of one of the registers, usually the accumulator.

These flags are used by the branch instructions, which direct the flow of the program according to the conditions. For example, BEQ means 'branch if equal to zero'. The program will branch if the Z bit is set to 1. If not it will not branch.

Addressing Modes

Take a single instruction - you have seen LDA before. Its function is always to 'load the accumulator', but it may load it in different ways and from different places according to which addressing mode is used.

LDA#10

means 'load the accumulator with 10'. You know that already. However,

LDA 10

means 'load the accumulator with the contents of memory location 10.

This is an example of two different addressing modes. The first is immediate addressing. The instruction uses the data immediately, without looking for it in memory. The second is zero-page addressing. The instruction uses the contents of the address specified. It is called zero-page because the computer's memory is divided up into 256 pages each of 256 bytes. Any address which has its two most significant hex digits are zero is said to be in the zero-page of memory. The zero-page extends from locations &0000 to &00FF.

LDA may also be used with a full 16-bit address:

LDA &30A7

will 'load the accumulator with the contents of memory location &30A7'. This addressing mode is called absolute. It can access any location in the computer's memory. Notice that the assembler treats numbers as decimal, unless they are preceded by &.

Immediate, zero-page and absolute are not the only addressing modes, although they are the most simple to understand.

LDA &1D77,X

is an indexed addressing mode.

The address used by the instruction is &1D77 plus the contents of index register X. So the accumulator is not loaded from &1D77 but from &1D77+X. Note that the contents of index register X are added to the address, and not to its contents.

The index register used can equally well be Y:

LDA &2500,Y

(Note: When using machine-code there are several subdivisions of the above indexed addressing mode, but using the assembler takes care of all those for you. However, the assembled machine-code (in hex) will not always be the same for the same indexed instruction.)

Another still more complicated addressing mode is indirect addressing:

LDA (&1B,X)

The address given after the assembler mnemonic, in this case &1B, must be a location in the zero-page of memory (or an error will result). This location is then added to the contents of the X register, to give another location in the zero-page. The contents of this new location, and the contents of the location above it, together supply the full 16-bit address of the location from which the accumulator is loaded. So, if &1B+X contains &AA, and &1B+X+1 contains &BB, then the accumulator will be loaded with the contents of memory location &AABB.

The above operation is called pre-indexed indirect addressing; the indexing is the addition of the X register, and the direction is the use of the two consecutive locations at the intermediate address as an address pointer to the actual location used. It is called pre-indexed indirect because the indexing is done before the indirection. All pre-indexed indirections must use index register X.

Post-indexed indirect addressing is written in Assembly Language as follows:

LDA (&27),Y

In this addressing mode, the indirection occurs first. The address given after the assembler mnemonic, in this case &27, must again be a location in the zero-page of memory. The contents of this location and the contents of the location above it together give a 16-bit address. To this 16-bit address is added the contents of index register Y, and this final address is the location from which the accumulator is loaded. All post-indexed instructions must use index register Y.

The above examples show the complete range of addressing modes which can be used with the instruction LDA. However, there are three more important addressing modes which are used with certain other instructions.

All of the branch instructions use a relative addressing mode. BEQ was mentioned in the description of the flags register; it means 'branch if equal to zero'. A branch is an instruction which has an offset:

	ZZZ data
	BEQ Label
	AAA data
	BBB data
.Label	CCC data
	DDD data

In this fragment of program, the triple-letters can be assembler instructions. When a program is running, the program counter is incremented one step at a time to point at the next location which is to be executed. In this example, when BEQ Label is being executed the program counter will point to the line containing the instruction marked AAA. If the result of checking the Z flag is that the previous operation did not produce a negative result then execution will continue at the line containing AAA. If the previous operation did give a zero result then the program counter is incremented until it points at the line marked Label. This program illustrates the use of labels in assembler. They can take any name you choose (subject to the same limitations as a BASIC variable name), and are signified by the fact that they must always start with a full stop.

Branch instructions may branch to labels either forwards or backwards, but not too far. The actual distances are 128 bytes backwards or 127 bytes forwards; but remember that these are measured from the next instruction following the branch, and that each instruction may be either 1, 2 or 3 bytes long. The assembler will soon tell you if you have an address or label out of range.

The next addressing mode is accumulator addressing, which is used by only four instructions in the 6502 set. These are ASL, LSR, ROL and ROR and their action is explained in the reference section. In essence, they shift the bits of a memory location of the accumulator to the left or right.

ASL &760

means shift the contents of memory location &760 one bit to the left. In order to apply this instruction to the accumulator, the accumulator's own addressing mode is used:

ASL A

means shift the contents of the accumulator one bit to the left. Look up the four instructions in the reference section for more information.

The final addressing mode which you need to know about is the simplest. Certain instructions, such as BRK (break) do not need any data or memory reference at all. These are called implied instructions and they carry out a simple task, usually on one of the registers; for example CLC meaning 'clear the carry flag'.

Addressing Mode Examples

Immediate	LDA #68	LDA #number
Zero-page	LDA &9B	LDA address
Absolute	LDA &8E17	LDA address
Indexed	LDA &A06C,Y	LDA Table,X
Pre-indexed indirect	LDA (&72,X)	LDA (pointer,X)
Post-indexed indirect	LDA (&00),Y	LDA (zero),Y
Relative	BEQ Repeat	BNE Loop
Implied	CLC	BRK
Accumulator	LSR A	ROL A

The examples on the right show the assembler mnemonics used not with specific addresses, but with BASIC variables. You will find out that this is a good way of writing assembler subroutines which are to be called from within BASIC programs by CALL or USR.

One final point about addressing modes. The JMP (jump) instruction is the only one which allows straight indirect addressing(non-indexed). JMP is very similar to BASIC's GOTO. It can take a full 16-bit address and place this value in the program counter - hence the program jumps to a new execution address. It is usually used with a label, just like branch, but without the restriction on distance. In absolute mode it would look like this:

JMP Label

If you wish to use it in indirect mode then simply enclose the address in brackets:

JMP(&21A7)

It will then use the contents of the two consecutive locations at &21A7 as an address pointer to the location to which it will jump.

	JMP &21A7 - - -
&21A7	&32 &76 - -
&3276	continue execution.

Entering Assembly Mnemonics

This section tells you how to write Assembly Language subroutines, and how to call them from BASIC. You may find it worthwhile, now that you know about the 6502 processor's make-up, to read all of the assembly mnemonic definitions. You will then be able to understand much more clearly the capability of the processor, and what the short programs in this section are doing.

Sections of Assembly Language are entered as part of a BASIC program, separated from the BASIC part by the square brackets [ and ]. The general structure of a program containing an assembler routine is:

10 REM BASIC Program
100 [
110 \ Start of assembler mnemonics
200 ]
210 REM BASIC program continues

Notice that remarks in the Assembly Language section are signalled by a backslash \. The assembler then knows to ignore them.

Before the routine can be assembled, the computer must be told where it is to be put in computer memory. So the first line of the BASIC part must allocate some memory for this purpose, so there are two ways in which you can do this.

On entering the assembler routine, you assign to the resident integer variable P%, the value you choose to be the address of the first instruction of the assembled machine code. P% is the 'pseudo program counter', used by the assembler, to calculate addressed for branch and jump instructions and as the pointer for the assembled codes. (When O% is not being used).

The two methods for doing this are:

(i) By direct assignment: P%=&2000 for example. The problem with direct assignment is that you have to ensure that the memory location chosen is available for use.

The second method gets round this problem.

(ii) By using BASIC DIM instruction. This takes the form DIM P% 100. Note the use of spaces, and no commas or brackets, to distinguish it from an array dimension, DIM P% 100 allocates 101 bytes of memory for the machine-code, which will be stored along with all the BASIC variables above LOMEM. The number used with the DIM instruction must be large enough so that sufficient space is reserved to hold all the code, but not so large as to overlap other items in the memory.

An even better way in which to use DIM is: DIM Q% 100 followed by P%=W%. DIM is a convenient way of reserving space for machine code routines. No check is made to prevent the assembled code from overrunning the space reserved for it.

Assembly

To get the computer to assemble the routine into machine-code, you simply RUN the program. To complete the assembly, the program has to be RUN twice. The reason for this will become clear in a moment. The assembler pseudo-operator OPT controls the listing and error output generated on assembly. This operator must be placed in the assembler routine, usually at the start, and is followed by a number from 0 to 3 which causes the following outputs:

OPT 0	No errors printed, no listing given.
OPT 1	No errors, but a listing is given.
OPT 2	Errors are printed, but no listing.
OPT 3	Both errors and a listing are given.

The listing given is of the machine-code, in hexadecimal. The errors are printed as messages on the screen.

Here's an Assembly Language routine:

10 DIM Q% 100
20 P%=Q%
30 [OPT 3
40 LDA &70
50 CMP #0
60 BEQ Zero
70 STA &72
80 .Zero RTS
90 ]

When you RUN this program, the computer will print a listing, and then the message:

No such variable at line 60

Routines which have forward references to labels (Zero is referred to on line 60 when the assembler has not yet come across it) will always generate an error. The answer to this is to inhibit errors the first time through by using OPT 0, and then to RUN a second time to generate the complete code. This is called two-pass assembly.

The way to do this is to enclose the routine in a FOR...NEXT loop as follows:

10 DIM Q% 100
20 FOR I=0 TO 3 STEP 3
30 P%=Q%
40 [OPTI
50 LDA &70
60 CMP #0
70 STA &72
80 STA &72
90 .Zero RTS
100 ]
110 NEXT

On the first time through the loop, I=0 and so there will be no listing and no error reported. This run allows the computer to identify the forward referenced label. The second time through the loop, I=3 and hence a list of compiled code is produced, along with any programming errors. Note that the assignment statement P%=Q% is enclosed within the loop so that it is reset before each pass.

On running the program, you will see a listing of the assembled machine-code alongside the Assembly Language mnemonics:

>RUN
0E75	OPT I
0E75 A5 70	LDA &70
0E77 C9 00	CMP #0
0E79 F0 02	BEQ Zero
0E7B 85 72	STA &72
0E7D 60	.Zero RTS

This means that the mnemonics have been successfully assembled, and the corresponding machine-code has been loaded into addresses &0E75 to &0E7D. &A5 is stored in location &0E75, &70 in location &0E76, &C9 in location &0E77, and so on to 60 which is stored in location &0E7D. This is nine bytes of machine-code in all.

This routine has not yet been executed. To do that, a CALL from BASIC is required:

CALL Q% RETURN

Nothing is printed on the screen when you do this, and that's because the program is trivial; it merely loads a byte from memory location &70 into the accumulator, and if it isn't zero it is stored in memory location &72. There are some points to note about the structure of the Assembly Language routine:

When a label is assigned to a line, as at line 90, it must be preceded by a full stop. When the label is called by an instruction, as at line 70, there must be no full stop.
Most Assembly Language routines end with RTS (return from subroutine) which transfers control back to the BASIC interpreter.
The above routine uses two locations in the zero page of memory. Only locations &70 to &8F in the zero page may be used by your own programs; all the remainder is taken up by the Operating System's variables, and BASIC's workspace.

Execution by USR

USR is similar to a BASIC FN (function); it gives a single value.

The format is:

R% = USR(Z)

where Z may be a label pointing to the first assembler mnemonic, or the address of the first instruction in machine-code. A label is easier to use since it requires no knowledge of where the machine-code is placed in memory. When R%=USR(Z) is executed, the least significant byte of each of the BASIC integer variables A%, X% and Y% is placed into the accumulator, X register, and Y register respectively. The least significant bit of C% is placed in the carry flag (bit C of the flags register). A%, X%, Y% and C% can therefore be used to initialise the 6502 registers before entry into the assembler routine. Control then passes to the subroutine pointed to by Z. On returning to BASIC (after RTS), the four bytes comprising R% will each contain the contents of one of the 6502 registers, as follows:

R% = RYXA

So R% contains the flags, Y register, X register, and accumulator in that order.

Any or each of these registers may be extracted from R% by setting up a mask using AND. To get the accumulator, the least significant byte is required:

Acc = R% AND &FF

Similarly for X, Y:

X	= (R% AND &FF00) DIV &100
Y	= (R% AND &FF0000) DIV &10000

To get the flags:

10 DIM BLOCK 3
20 !BLOCK = USR(Z)

Then:

(Acc=BLOCK?0, X=BLOCK?1, Y=BLOCK?2), the flags = BLOCK?3.

Here is a program which uses USR. The Assembly Language routine adds the numbers held in X% and A%, and gives the result in the accumulator:

10 DIM Q% 100
20 FOR I=0 TO 3 STEP 3
30 P%=Q%
40 [OPT I
50 .Start STA &80
60 TXA
70 CLC
80 CLD
90 ADC &80
100 RTS
110 ]
120 NEXT
130 INPUT"First number "A%
140 INPUT"Second number "X%
150 Register%=USR(Start)
160 Sum%=Registers% AND &FF
170 PRINT"Sum of two numbers is ";Sum%

When RUN, you will see the following:

>RUN
0F0A 8580	OPT I
0F0A 8580	.Start STA &80
0F0C 8A	TXA
0F0D 18	CLC
0F0E D8	CLD
0F0F 6580	ADC &80
0F11 60	RTS
First number	11
Second number	12
Sum of two number is	23

The numbers 11 and 12 are entered by the user, and are stored in the integer variables A% and X%. The USR call tells the computer to start executing the assembly routine from the label Start. Before this happens, the least significant byte of A% is placed in the accumulator, and the least significant byte of X% into the X register. The machine-code corresponding to the assembler mnemonics is now executed in sequence:

STA &80 stores the contents of the accumulator in memory location &80.

TXA transfer the contents of the X register to the accumulator.

CLC clears the carry flag prior to addition. If this is not done then a spurious carry may be added to give an incorrect result.

CLD clears the D flag so that the 6502 is working in binary mode.

ADC &80 adds the contents of the accumulator to the contents of memory location &80, plus the contents of the carry flag; and places the result in the accumulator.

RTS returns control to BASIC.

Back in the BASIC section, Registers% now contains the four 6502 registers' contents. The result is in the accumulator, so the least significant byte of Registers% is placed into Sum%, which is then printed to give the answer. Note that this routine performs only a single-byte addition, so any result given in Sum% will be MOD 256.

Execution by CALL

CALL is similar to a BASIC PROC (procedure).

Here is another addition routine:

10 DIM Q% 100
20 FOR I=0 TO 3 STEP 3
30 P%=Q%
40 [OPT I
50 .Start CLC
60 CLD
70 LDA &80
80 ADC &81
90 STA &82
100 RTS
110 ]
120 NEXT
130 INPUT"First number"number1%
140 INPUT"Second number"number2%
150 ?&80=number1%
160 ?&81=number2%
170 CALL Start
180 Sum%=?&82
190 PRINT"Sum of two numbers is ";Sum%

This program illustrates the use of the indirection operator ?. Indirection operators are very useful when calling assembly routines.

Here is a list to refresh your memory:

?&80 = J%	Will put the least significant byte of J% in location &80.
!&80 = &12345678	Will put &78 in location &80, &56 in location &81, &34 in location &82, and &12 in location &83.
$V% = "FAULT"	Will put the string "FAULT" plus a carriage return (\|M) in locations starting at V%. V% must not be in zero page.
S% = ?&80	Will read the contents of location &80 (1 byte) into S%.
R% = !&87	Will read 4 bytes from locations &87 to &8A into R%; &8B being the most significant, &87 the least significant.
R$ = $&2000	Will read a string starting at &2000 into R$.

The addition program shown above has exactly the same effect as the previous example. In this instance though, the two numbers are stored into memory in the BASIC part of the program, and are added and the result stored in the Assembly Language part.

CALL may also be used with parameters, similar to PROC. This takes the form:

CALL Start,integer%,decimal,string%,?byte

The parameters are separated by commas. Start is a label, but could equally well be a specific address, &2000 for example. The above CALL shows that any kind of variable may be passed as a parameter: integer, real, string, and single-byte. When a CALL is made, the parameters are assigned to a parameter block, which starts at memory location &600. The format of this parameter block is:

Address	Contents
&600	Number of parameters
&601	1st parameter address (low)
&602	1st parameter address (high)
&603	1st parameter type
&604	2nd parameter address (low)
&605	2nd parameter address (high)
&606	2nd parameter type

There may be any number of parameters, and this number is given in the first byte of the parameter block. Following this, each parameter's address and type is given.

The type is designated by a number:

0	A single byte (e.g. ?location)
4	A 4-byte variable (e.g. Z% or !address)
5	A 5-byte variable (e.g. number)
128	A defined string (e.g. "YES PLEASE") which must end with &D (RETURN)
129	A string variable (e.g. name$)

The way that the parameter block is laid out, it would seem that the best way to access the individual parameters is to use indirect addressing. Unfortunately, the 6502 only allows the zero-page to be used for indirect address pointers, so here is a routine which transfers the addresses from the parameter block into free locations in the zero-page:

	LDA &600	\Check the number of parameters.
	BEQ End	\If zero then finish.
	STA &70	\If not then store this number.
	LDX #0	\Clear the X register.
	LDY #0	\and the Y register.
.Loop	LDA &601, Y	\Take high address of parameter
	STA &71,X	\and store it in zero-page.
	INX	\Increment X register.
	INY	\and Y register.
	LDA &601,Y	\Take two address of parameter
	STA &71,X	\and store it in zero-page.
	INX	\Increment X register
	INY	\and Y register
	INY	\twice.
	DEC &70	\Decrement number of parameters.
	BNE Loop	\If still not zero then repeat.
.End	RTS	\Return to BASIC.

This routine stores the address of each parameter in zero-page memory starting at location &71. 15 parameter addresses may be stored in this way before the total user zero-page memory is filled. This routine is very useful if the number of parameters passed to a particular Assembly Language subroutine is not always the same, for it will only relocate the addresses of those parameters which exist.

Here this routine is incorporated into another addition program:

10 DIM Q% 100
20 FOR I = 0 TO 3 STEP 3
30 P%=Q%
40 [OPT I
50 .Start CLC
60 CLD
70 LDA &600
80 BEQ End
90 STA &70
100 LDX #0
110 LDY #0
120 .Loop1 LDA &601,Y
130 STA &71,X
140 INX
150 INY
160 LDA &601,Y
170 STA &71,X
180 INX
190 INY
200 INY
210 DEC &70
220 BNE Loop1
230 .End LDX #0
240 STX &2000
250 LDY &600
260 BEQ Finish
270 .Loop2 LDA (&71,X)
280 ADC &2000
290 STA &2000
300 INX
310 INX
320 DEY
330 BNE Loop2
340 .Finish RTS
350 ]
360 NEXT
370 INPUT"First number "one%
380 INPUT"Second number "two%
390 INPUT"Third number "three%
400 CALL Start,one%
410 Sum%=?&2000
420 PRINT Sum%
430 CALL Start,one%,two%,three%
440 Sum%=?&2000
450 PRINT Sum%
460 CALL Start,one%,two%,three%
470 Sum%=?&2000
480 PRINT Sum%

The parameter block transfer routine ends at line 240, where the addition routine begins. Notice that the whole routine is CALLed with varying numbers of parameters, just to prove that it works. The result of adding the parameters is given in location &2000. However, as with the previous programs, the result is MOD 256.

Quadruple Precision Addition

Integer variables are stored in four consecutive bytes of memory. Groups of four bytes can be accessed using !, and can be added together. This is achieved a byte at a time, starting with the least significant, and storing each successive result:

10 DIM Q% 100
20 FOR I=0 TO 3 STEP 3
30 P%=Q%
40 [OPT I
50 .Start CLC          \Clear carry for ADC instruction
60 CLD
70 LDX #0            \Clear X register
80 LDY #4            \Set Y register to 4 as a counter
90 .Loop LDA &70,X     \Put byte from one% in accumulator
100 ADC &74,X           \Add byte from two%
110 STA &78,X           \Store the result
120 INX                \Increment X register
130 DEY                \Decrement Y register
140 BNE Loop            \If not zero then repeat
150 RTS
160 ]
170 NEXT
180 INPUT"First number "one%
190 INPUT"Second number "two%
200 !&70=one%
210 !&74=two%
220 CALL Start
230 sum%=!&78
240 PRINT"Sum of two numbers is ";sum%

This program will work with positive or negative integers.

Multiplication

The 6502 does not have a multiply instruction. Multiplication is achieved by adding and shifting, just like ordinary decimal long-multiplication. As a simple example, take the multiplication of two 4-bit numbers. Such a multiplication can give an 8-bit result:

Test the rightmost bit of the multiplier. If it is zero then add 0000 to the most significant end of the result. If it is 1 then add the number to be multiplied to the most significant end of the result.
Shift the result one bit position to the right. Repeat (i) for the next bit of the multiplier.

Applying the above to 1101*1001, the rightmost bit of the multiplier (1001) is 1. Therefore 1101 is added to the most significant end of the result:

1101

Shift the result right one bit position:

01101

The next bit of the multiplier is zero, so 0000 is added to the result, and it is again shifted right.

001101

The next bit is again zero:

0001101

The final bit is 1, so 1101 is added to the result, and the final shift is performed:

01110101

Notice that for 4-bit multiplication, four shifts are required, 8-bit multiplication will require eight shifts, 16-bit multiplication 16 shifts, and so on.

To put the above routine into practice on the 6502, the shift and rotate instructions are used. Here is a program to multiply two 8-bit numbers:

10 DIM Q% 100
20 FOR I=0 TO 3 STEP 3
30 P%=Q%
40 [ OPT I
50 .Start CLD
60 LDA #0
70 STA &72           \Clear 16-bits
80 STA &73           \for the result.
90 LDY #8            \Set Y to 8 as a counter.
100 .Loop LSR &71     \Shift multiplier right one bit.
110 BBC Noadd         \Test this bit. Branch is zero.
120 CLC               \Clear carry prior to addition.
130 LDA &70           \Load accumulator with number to be multiplied.
140 ADC &73           \Add most significant byte of result.
150 STA &73           \Shift result right, with carry from addition.
160 .Noadd ROR &73    \Decrement counter.
170 ROR &72           \Repeat if not zero.
180 DEY
190 BNE Loop
200 RTS
210 ]
220 NEXT
230 INPUT"First number "one%
240 INPUT"Second number "two%
250 ?&70=one%
260 ?&71=two%
270 CALL Start
280 Product%=?&72+256*?&73
290 PRINT"Product of two numbers is ";Product%

This routine is not the most efficient way of multiplying two bytes together, but it illustrates the method clearly:

Lines 60, 70 and 80 clear the two bytes in memory which will be used for the result of the multiplication. These locations are &72 (result low byte) and &73 (result high byte).

Lines 250 and 260 store the numbers to be multiplied in locations &70 and &71. It doesn't matter which of these is chosen to be the multiplier; the example uses the number in &71.

Line 90 sets the Y register to 8 as a counter. Because this is an 8-bit multiplication, eight shifts are required.

Line 100 shifts the multiplier right one bit position. The rightmost bit falls into the carry where it can be tested.

Line 110 carries out the test. If the C bit is zero then the program branches to NoAdd; if it is 1 then the addition of the number in &70 to the result high byte (&73) takes place.

Lines 120 to 150 accomplish this addition, by clearing the carry bit, loading the accumulator from &70, adding the result high byte, and then storing back in the result high byte.

Line 160, labelled NoAdd, rotates the result high byte right one byte position. The carry from the addition in line 140 is entered from the left, and the rightmost bit falls into the carry.

Line 170 rotates the result low byte right one bit position. The leftmost bit from the high byte, now in the carry, enters the low byte from the left.

Line 190 decrements the counter, and repeats the above process until the counter is zero.

The program will give the result of multiplying two positive integers, each between 0 and 255. You can see how many instructions it takes just to do this, and can imagine the complexity of a BASIC statement when it is interpreted into machine-code.

A shorter routine to multiply two bytes uses the accumulator as the result high byte, and the multiplier as the result low-byte. As each bit of the multiplier is shifted into the carry to be tested, the leftmost bit of the multiplier location becomes vacant, so allowing the result to be shifted in.

.Start	CLD
	LDA #0	\Clear result high byte
	LDY #8	\Set shift counter.
.Loop	ROR &71	\Shift multiplier right one bit.
	BCC Noadd	\Test this bit. Branch if zero.
	CLC	\Clear carry prior to addition.
	ADC &70	\Ask number to be multiplied.
.Noadd	ROR A	\Shift result right, with carry from addition
	DEY	\Decrement counter.
	BNE Loop	\Repeat if not zero.
	ROR &71	\Final shift of result
	STA &72	\Store result high byte.
	RTS

Before using this routine, the two bytes to be multiplied are placed in locations &70 and &71. The result appears in &71 (low byte) and &72 (high byte).

To multiply two 4-byte numbers together, the additions and shifts must act on each byte in turn, and the total number of shifts must be 32.

.Start	CDS
	LDX #8	\Clear
	LDA #0	\eight
.Clear	STA &77,X	\bytes
	DEX	\for
	BNE Clear	\result.
	LDY #32	\Set shift counter.
.Loop	LSR &77	\Shift four bytes
	ROR &76	\of multiplier
	ROR &75	\right
	ROR &74	\one bit.
	BCC Noadd	\Test this bit. Branch if zero.
	CLC	\Clear carry prior to addition.
	LDA &70	\Add
	ADC &7C	\4-byte
	STA &7C	\multiplier
	LDA &71	\to
	ADC &7D	\4-byte
	STA &7D	\result
	LDA &72	\and
	ADC &7E	\store.
	STA &7E	\"
	LDA &73	\"
	ADC &7F	\"
	STA &7F	\"
.Noadd	LDY #8	\Shift
.Shift	ROR &77,X	\eight bytes
	DEX	\of result
	BNE Shift	\right
	DEY	\one bit.
	BNE Loop	\Repeat if not zero
	RTS

Before using this routine, the two numbers to be multiplied must be placed in !&70 and !&74. The result appears in the four bytes from &78 (least significant) to &7B (most significant), and is accessed as !&78. This routine will work with both positive and negative integers.

Division

Division is accomplished as the reverse of multiplication. 8-bit multiplication gave a 16-bit result, so, for division, a 16-bit numerator and 8-bit denominator will give an 8-bit result. The numerator is stored in two bytes of memory. It is shifted left one bit position and the numerator high byte is then loaded into the accumulator. If the shift produced a carry then a 1 is shifted left into the result, the denominator is subtracted from the accumulator, and the accumulator contents are then stored in the numerator high byte. If the shift did not produce a carry then the denominator is subtracted from the accumulator in any case. If this subtraction produces a carry then a 1 is shifted left into the result and the accumulator contents are stored in the numerator high byte. If no carry, then 0 is shifted left into the result.

This whole process is repeated eight times. The division program is as follows:

10	DIM Q% 100
20	FOR I=0 TO 3 STEP 3
30	P%=Q%
40	[OPT I
50	.Start CLD
60	LDY #8	\Set shift counter.
70	.Loop ASL &72	\Shift numerator
80	ROL &73	\left one bit.
90	LDA &73	\Load accumulator high byte.
100	BCC Label	\Test carry produced by shift.
110	SBC &71	\Subtract denominator and
120	STA &73	\store in numerator high byte.
130	SEC	\Set carry prior to shifting into result
140	JMP Shift	\Go to Shift.
150	.Label SEC	\Set carry prior to subtraction.
160	SBC &71	\Subtract denominator
170	BCC Shift	\and test carry.
180	STA &73	\Store in numerator high byte.
190	.Shift ROL &70	\Shift either 0 or 1 into result.
200	DEY	\Decrement counter.
210	BNE Loop	\Repeat if not zero.
220	RTS
230	]
240	NEXT
250	INPUT "Numerator"numerator%
260	INPUT "Denominator"denominator%
270	P%=&71
280	[OPT 3
290	EQUB denominator%	\Store denominator at location &71.
300	EQUW numerator%	\Store numerator at locations &72 and &73.
310	RTS
320	]
330	CALL Start
340	PRINT"Quotient is ";?&70
350	PRINT"Remainder is ";?&73

In this routine, the denominator is stored at location &71, and the numerator in two bytes &72 and &73. The result appears in &70, and any remainder is left in &73. Remember, this is a 16-bit by 8-bit division, so the denominator may not be greater than 255 and the numerator not greater than 65025 (2552) to give a valid result (the result must be 255 or less).

The short routine from lines 280 to 320 is used to store the data in memory, and contains some instructions which you have not yet seen or used. EQUB and EQUW are in the same class of instruction as OPT, in that they are used in the Assembly Language part of the program but are not assembly instructions. They are used simply to store data at the location(s) at which they appear when assembled into machine-code. You will see this clearly when you RUN the above program. After you have typed in the numerator and denominator you will see a listing of the machine-code from &0071 to &0074.

There are in fact four EQU instructions:

EQUB stores a byte of data.
EQUW stores a word of data (2 bytes).
EQUD stores a double-word of data (4 bytes).
EQUS stores the ASCII representation of a string.

EQUS is illustrated in the next section on error handling in Assembly Language.

Notice in the program example above how putting P% equal to &71 enables the denominator to be stored in &71 using EQUB, and the numerator to be stored in &72 and &73 using EQUW. EQUD may be used to store the contents of a full BASIC integer variable. (You may use EQUB instead of ?, and EQUD instead of !.)

Error Trapping in Assembler

The assembler will tell you of any mistakes which you make in typing in programs (syntax errors), and some errors associated with BASIC variables during assembly, but there is no such thing as a run-time error in machine-code: you just have to fathom it out line by line. However, it is possible for you to trap errors generated while a machine-code program is running by using the BRK instruction. As an example, take the division program described in the previous section. Everyone knows that it is not possible to divide by zero, but the program does not know this. If you try to do so it unwittingly gives the answer 255.

It is simple to test the denominator before the division is started, and then to branch to an error routine. The whole program is not repeated here, but the following lines may be added:

53 LDA &71
56 BEQ Error

222 .Error BRK
224 EQUB 18
226 EQUS "Division by zero"
228 BRK

If you now run the program with a zero denominator, it will stop and print the message:

Division by zero at line 330

You can also type:

PRINT ERR RETURN

upon which it will give the correct error number, 18.

Any error message must take the following form:

BRK
EQUB errornumber (ERR)
EQUS "message"
BRK

Operating System Calls from Assembler

All the Operating System calls available from BASIC, and many more, are available from a machine-code program. These routines are always accessed using a JSR to some address in the Operating System, and usually involve the passing of one or more parameters via the accumulator (for 1), X and Y (for 2 or 3), or a parameter block in memory (for more than 3).

Here is a table showing all the Operating System calls available.

Routine		Vector		Summary of function
Name	Address	Name	Address
		UPTV	222	User print routine
		EVNTV	220	Event interrupt
		FSCV	21E	File system control entry
OSFIND	FFCE	FINDV	21C	Open or close a file
OSBPUT	FFD4	BPUTV	218	Save a single byte to file from A
OSBGET	FFD7	BGERV	216	Load a single byte to A from file
OSARGS	FFDA	ARGSV	214	Load or save data about a file
OSFILE	FFDD	FILEV	212	Load or save a complete file
OSRDCH	FFE0	RDCHV	210	Read character (from keyboard) to A
OSASCI	FFE3	-	-	Write a character (to screen) from A plus LF if (A)=&0D
OSNEWL	FFE7	-	-	Write LF, CR (&0A, &0D) to screen
OSWRCH	FFEE	WRCHV	20E	Write character (to screen) from A
OSWORD	FFF1	WORDV	20C	Perform miscellaneous OS operation using control block to pass parameters
OSBYTE	FFF4	BYTEV	20A	Perform miscellaneous OS operation using registers to pass parameters
OSCLI	FFF7	CLIV	208	Interpret the command line given

When you use one of these routines, you must use a JSR to the corresponding address shown in the second column. For example, OSWRCH is called from assembler by typing:

JSR &FFEE

The routine stored at &FFEE uses the OSWRCH vector address, shown in the fourth column, as an indirect pointer to the actual location of the OSWRCH routine.

The reason for this is twofold:

The actual address of the OSWRCH routine may be altered by the manufacturer without affecting the Operating System subroutine call in any way. JSR &FFEE will always give an OSWRCH call even though the address held in locations &20E and &20F may not be the same on every machine.
The user can alter the address held in the zero-page vector location and trap any call of that particular Operating System routine, indirecting such a call to the user's own routine anywhere in memory.