Program Compactor
Edward H. Carlson
Okemos MI
There are two evils that sneak up on you as your programs attain moderate length. The programs begin to take up too much space in memory, and they become increasingly obtuse. These evils combine in positive feedback. Increased internal documentation by REMark statements is a partial antidote to program complexity but this, of course, compounds the problem of fitting the whole glob into memory.
The answer is to have two copies of each program: a "working copy" occupying minimum space, and a fully documented "archives copy" that may, in fact, be too long to be run in your machine. (It must be short enough to fit in your memory as source code, sans variable tables, of course.)
But consider, you say, how much finger tapping, eyeball twitching, and obsessive concentration it takes to go through a program and remove all the REMarks, especially those buried inside lines of active code, and most especially those "invisible" REM statements like this one:
120 GOTO 232 : NO "REM" NEEDED HERE
(In such a statement, the BASIC interpreter jumps to another line before passing the colon and so does not detect the syntax error caused by the omitted REM.)
Looky here, I say, repetitive decisions are just what the logic machine was invented to perform! One needs to write a "program compacting" program, and that is just what I have done, showing my results in Listing 1. The program was written for use with my Ohio Scientific C2-4P, but should work with little change in other Microsoft BASIC machines, such as the PET. All that needs changing is the starting address of the source code, $0300 for OSI BASIC, and the numerical values of the tokens, which differ in the OSI and PET versions of Microsoft BASIC.
This compactor is a moderately complex program in itself. It is put at high line numbers so as to be out of the way of any program you are writing. When you have a version of your own program that needs compacting, first save it to tape, then read in the Compactor (from a tape that does not have the Test Program in front). Do a "RUN 62000". The compacted program will then be POKEd into memory, ready to SAVE to tape or to RUN. The Compactor will still be in memory, but now invisible to you and inaccessible to BASIC.
Listing 1 starts off with a very short Test Program that has most of the features that would give trouble in a poorly contrived Compactor. Then follows the compactor itself. After initializing addresses, etc., a loop over I is started. Each time through, one line is compacted. Line 62036 contains the exit from the compacting process. This occurs when the line number to be processed is above 9999. You may wish to change this, but all my programs use only line numbers below 9000. Next, leading colons and spaces are removed. I haven't used such things in my own code, but it is legal and so I include that case in the Compactor Program.
Following these preliminaries, the program enters a loop over K, at line 62050, which walks through a single line. Spaces are removed, and the line is terminated if a REM, STOP, RETURN, or GOTO is encountered. The compacted line is stored in an array called L(I). This is an artifact left over from the program construction period. Before allowing my infant program to actually POKE into tender source code memory, I had it make a string and print it. Upon reaching voting age, the string became the L array.
During all this, it is necessary to keep a sharp eye out for quotation marks, as you do not want to alter any of the text inside them. Line 62080 detects opening quotes and jumps to a routine to march along looking for the closing quotes so that control can be returned to the main loop. If a colon or a null is found before the closing quotation marks, the statement or line has terminated and analysis of the next is begun.
Every Microsoft BASIC line ends in a null. Detection of a null character sends control to the top of the "I" loop. The next command sends control to the subroutine at 62600 where the compacted line is POKEd into memory. Some tricky address changing is needed here. The first 2 bytes of a BASIC line are a pointer to the start of the next BASIC line. This chain of pointers must remain intact during interpretation of any part of the Compactor that would do a line number search. Such a search would start at the first line of the program to be compacted, even though the code being interpreted is all above line number 62000. So lines 62601 and 2 pick up the starting address of the code that is next to be compacted in POKEs it into the first two bytes of the newly compacted line.
Program Compactor
1 A = 1 : REM *** TEST PROGRAM *** 2 REM "RUN 62000" TO COMPACT THE TEST PROGRAM 3 : ::C = 3 : D = 4 : REM AAAAA 4 END : DON'T SEE THIS AFTER COMPACTION 5 RETURN : NOR THIS 6 GOTO 11111 : NOR THIS 7 A$ = "SEE THIS" : REM NOT THIS 999 STOP 62000 REM 62001 REM *** COMPACTOR *** 62002 REM 62003 REM Edward H. Carlson 62004 REM 3872 Raleigh Drive 62005 REM Okemos MI 48864 62006 REM (517) 349-1219 62007 REM 62010 PRINT : PRINT : PRINT "COMPACTING" : PRINT : PRINT 62015 DIM L(80):A = 3 * 256 : AP = A + 1 : AD = A - 3 62020 FOR I = 1 TO 9999 : A = A + 4 : REM NEW LINE 62025 IF L<>0 THEN GOSUB 62600 62035 L = PEEK(A - 1) + PEEK(A) * 256 : AN = 0 62036 IF L>9999 THEN POKE AP, 0 : POKE AP + 1, 0 : END 62039 REM REMOVE LEADING COLONS AND SPACES 62040 A = A + 1 : B = PEEK(A) : IF (B = 32) OR (B = 58) THEN 62040 62050 A = A - 1 : FOR K = l TO 255 : A = A + 1 : B = PEEK(A) 62060 IF B = 0 THEN NEXT I 62065 IF B = 142 THEN GOTO 62100 62068 IF (B = 128) OR (B = 143) OR (B = 141) THEN L(AN) = B : AN = AN + 1 : GOTO 62100 62070 IF B = 58 THEN GOTO 62400 62072 REM STORE CHAR. FOR COMPACT LINE 62073 IF B<>32 THEN L(AN) = B : AN = AN + 1 62075 IF B = 136 THEN GOTO 62200 62080 IF B = 34 THEN GOTO 62300 62090 NEXT K : STOP 62100 FOR K = 1 TO 255 : A = A + 1 : B = PEEK(A) : REM LOOKING FOR LINE END 62110 IF B = 0 THEN NEXT I 62120 NEXT K 62200 FOR K = 1 TO 255 : A = A + 1 : B = PEEK(A) : REM FOUND "GOTO" 62210 IF B = 0 THEN NEXT I 62215 IF B = 32 THEN A = A + 1 : B = PEEK(A) : GOTO 62210 62220 IF B = 58 THEN GOTO 62100 62225 L(AN) = B : AN = AN + 1 : NEXT K 62300 FOR K = 1 TO 255 : A = A + 1 : B = PEEK(A) : REM FOUND " CHAR. 62320 IF B = 34 THEN L(AN) = B : AN = AN + 1 : GOTO 62090 62325 IF B = 0 THEN NEXT I 62327 IF B = 58 THEN 62400 62330 L(AN) = B : AN = AN + 1 : NEXT K 62400 A = A + 1 : B = PEEK(A) : IF (B = 32)OR(B = 58) THEN 62400:REM FOUND : 62410 IF B = 0 THEN NEXT I 62420 IF B = 142 THEN GOTO 6210062430 L(AN) = 58 : L(AN + 1) = B : AN = AN + 2 : GOTO 62120 62600 PRINT L; : REM POKE MEMORY WITH COMPACTED LINE 62601 AH = INT((A - 3)/256) : AL = (A - 3) - 256 * AH 62602 POKE AP, AL : POKE AP + 1, AH : PRINT TAB(8) AL;AH; 62603 REM AN IS LENGTH OF COMPACTED LINE 62604 IF AN = 0 THEN PRINT : RETURN 62605 AH = INT(AP/256) : AL = AP - 256 * AH 62607 PRINT TAB(16) AL;AH; 62608 POKE AD,AL : POKE AD + 1,AH : AD = AP : AP = AP + 2 62610 AH = INT(L/256) : AL = L - 256 * AH 62611 POKE AP,AL : AP = AP + 1 : POKE AP,AH : AP = AP + 1 62616 FOR I = 0 TO AN - 1 : POKE AP,L(I) : PRINT CHR$(L(I)); : AP = AP + 1 : NEXT I 62620 POKE AP,0 : AP = AP + 1 : PRINT : RETURN 62700 REM 62705 REM VARIABLES 62710 REM 62715 REM L LINE NUMBER 62720 REM A ADDRESS BEING SCANNED 62725 REM AP ADDRESS BEING POKED 62730 REM AD ADDRESS OF PREVIOUS LINE 62732 REM AN INDEX IN L(I) OF CHAR. TO BE POKED 62735 REM B CHARACTER BEING SCANNED 62739 REM 62740 REM TOKENS AND CHARACTERS 62741 REM 62745 REM 32 SPACE 62750 REM 34 " CHARACTER 62752 REM 58 : CHARACTER 62755 REM 128 END TOKEN 62760 REM 136 GOTO 62765 REM 141 RETURN 62770 REM 142 REM 62775 REM 143 STOP OK
Now if the compacted line is zero length, one returns to the "I" loop and starts compacting the next line. If not, one has to update the pointer in the previous compacted line and this is done in lines 62605 and 7. This restores continuity to the chain of pointers. Then the line number is POKEd in, and then the line of compacted text and finally the null at the end.
While developing this program, I used a somewhat longer "Test Program" than I am giving here. The final test was made on two of my old war horses. The first is a Knight's Tour and the other I informally call "Godzilla Eats Tanks". Both programs are rather long, involved, and use PEEKed and POKEd graphics extensively. "Godzilla" uses PEEKed keyboard input with ANDed and ORed data. ("Godzilla" is invisible and lays down an invisible trail that is sniffed out by a "stench seeking" guided missile.) Both programs ran successfully after compacting to about 75% of their original length. However, I did need to repair the Knight's Tour because I had some GOTO's that went to free standing REM's. These REM's were removed by the Compactor and had to be put back in by hand. I have since learned my lesson. Never GOTO or GOSUB to a REM.
This program is very useful but not yet the ultimate in compaction technique. At least one of the large software houses sells a compactor that is combined with a "branch locator" so that statements which are not the target lines of a GOSUB or GOTO are candidates to be put on the same line, separated by colons. It would certainly be faster and easier to buy such an efficient compactor in preference to tapping this one in through the keyboard, but if your interest is in playing around with logical tasks, you may prefer to modify this program to have multistatement-per-line capability.