Compression Techniques

By Richard Dimond

Originally published in EUG #18

These techniques can save both memory and disk space and are useful for getting as much as possible on a disk, especially with the limitations of DFS. As you need to manipulate them yourself, they cannot be called directly from the Utilities Menu but are all stored in a separate disk directory, C.

Screen Compression

This was explained in Electron User (Aug 1989) and also in Acorn User (December 1992). The Electron User program is for Mode 2 only and in BASIC. As it directly reads from and writes to the disk it is rather slow. The Acorn User program is a m/code routine which writes the compressed data to an address below the screen area. This will work in all Modes but in Modes 0 to 2 there is little memory for the compressed file especially if PAGE is at &1D00 with the Plus 3. The routine can only then be used for screens which will compress enough to fit in this area. For this reason, I have modified and added coding so that the file is written to and read from the disk.

These programs rely on the fact that a lot of screen data includes strings of the same character and these can be loaded using a loop in the program. The file uses a flag which indicates a branch to the loop and is followed by the number of characters and the character code and string of up to 255 characters can be reduced to three bytes. Thus for some screens you can get quite a saving of memory but the more detailed a screen is, you get less compression.

There are two sets of files, those with suffix 1 load the compressed give into memory and is the preferred method as the screen is expanded more quickly and without the interruptions of reading the disk.

The other set, with suffix 2, reads and writes the disk and needs to be used for those screens which do not compress sufficiently. Both the m/code files first look for an usused number to use for the flag and if none is found the screen cannot be compressed. It is very rare that all 256 numbers are used!

The UNCOMP programs expand the compressed data and the COMPT programs compress a screen file.

The CPCODE programs the M/code files for the compression and expansion routines. The COMP files are the assembler programs for these files.

Set 1

COMPT1, CPCODE1, UNCOMP1 are for those screens which will compress sufficiently for the file to be loaded below the screen area.

The compression program COMPT1 first asks for the Mode number and the file to load. It will then compress the file; some files may overflow into the screen area, and then the screen reverts to Mode 6 before giving the length of the file and the percentage of compression. It then asks for a filename to save it under.

UNCOMP1 is similarly used. HIMEM is set to PAGE + &180 to give the maximum amount of memory for LOADing the compressed file even if PAGE is at &1D00. Should the file be too long, the routine will usually still work but the lower part of the expanded screen will be spoilt as the expanded screen has overwritten the data in the file. This can be overcome by using UNCOMP2.

Set 2

COMPT2, CPCODE2, UNCOMP2 write and read the disk and must be used when the compressed screen cannot be loaded below the screen area.

COMPT2 is used in the same way but asks for the filename to save before compressing the screen. The file is then SAVEd to disk without giving the information on length and percentage.

Text Compression

This was explained in Electron User (Aug 1987) in relation to saving memory in the adventure program series "Demon Data". The programs given could only be used for text up to 255 characters. I have made some changes to these programs so that a whole View file can be compressed. Also I have now rewritten the decode program in m/code so that it prints out much quicker.

The program ENCODE compresses the text file and is a BASIC program and so is rather slow. DECODE expands a compressed file and uses the m/code file DecM/C. DecMsrc is the assembler file for this code.

The explanation in the EU article is rather detailed but to give a brief idea of how it works: It uses the values of the high nibble of a byte greater than 7 for the seven most common letters and space which are held in the string " etaonri". These characters are printed using INSTR and so by a nibble rather than a byte and, when the nibbles are paired up as bytes, a saving of up to 20-25% in memory can be made.

Richard Dimond, EUG #18