Garbage Collection On Commodore Computers
Part 1
Jim Butterfield, Associate Editor
There's a sneaky event lying in wait for you within most Commodore machines. It's called garbage collection, and it will show up, seemingly unpredictably, in any of several ways. Your program may seem to run slowly or erratically in "spurts." The program may have frequent pauses, each of which lasts several seconds. Worst of all, the program may pause for much longer periods of time—a minute, ten minutes, or even longer—and will seem to have "crashed." The user might be tempted to turn the machine off, thinking that it has failed.
The garbage collection phenomenon isn't limited to Commodore machines, of course. Much of what is said here may be applied to other computers. The specific remedies that will be given for VIC, 64, PET, and CBM can be adapted to suit the different logic of other machines. Conversely, not all Commodore machines have garbage collection problems; for example, machines identifying themselves as 4.0 won't have these delays.
An Example
Try this on your computer:
100 DIM A$(800) 110 FOR J = 1 TO 800 120 A$ (J) = CHR$ (65) 130 NEXT J 140 PRINT "X" 150 PRINT FRE(0) 160 PRINT "Y"
It will take a few moments to perform the loop in lines 110 to 130. You would expect this. But unless you know about garbage collection, you won't expect much of a delay in the last three lines; after all, they are just PRINT statements.
Try it. If there's a delay between printing X and Y, that's a garbage collection pause.
To illustrate the odd nature of garbage collection, try this: Change line 120 to read A$(J) = "A"—this is the same thing, of course, since CHR$(65) is the letter A. But this time the delay disappears when you run the program.
Why It Happens
When a program assigns a value to a string variable, it may do so in either of two ways. If the string exists completely within the program, it will be used "where it lies"; there's no need to make a copy. For example, a program statement such as 500 X$ = "HELLO" will use the string HELLO right out of the program where it lies. Similarly, the statements: 800 DATA COFFEE and 900 READ R$ will cause the string COFFEE to be used from within the DATA statement; it won't be moved to any other place in memory. There doesn't seem to be a name for this kind of string: I'll use the term static string to refer to a string used directly from its place within a program.
On the other hand, some strings can't be used this way. If I create a string with an INPUT statement or by using a string manipulation command such as STR$() or CHR$(), the computer must find a place to put this newly formed string. This kind of string must be packed away into a string storage area. I'll use the term dynamic string to refer to strings of this type.
Now, let's suppose that a running program creates a dynamic string with the statement INPUT A$. The user types in the string—say, EBENEEZER—which will be packed into the string storage area. Later, the program loops and asks for more input with INPUT A$, and the user now types in MARY. MARY, too, gets packed into the string storage area; but even though Ebeneezer is no longer needed (he's been replaced by Mary), the old string is not erased. Instead it lies dead in memory—as garbage.
Let's talk for a moment about the string storage area. It's located near the top of available BASIC memory: above the program, above the variables, and above the arrays. Dynamic strings are placed at the top of this area. As more and more strings are created, they work their way downward. Often, many discarded strings will be left behind—Ebeneezer and his friends—yet no attempt is made to reclaim the wasted space.
This type of thing continues until the dynamic strings bump into the top of BASIC, variables, and arrays. At that time, the waste space must be cleaned up; hence, "garbage collection."
Bad Timing
Garbage collection can take up a lot of time; more about this in a moment. Worse, it's hard to predict when it will strike. It's difficult to code in a JUST A MOMENT message when you don't know when that moment will arrive.
You can force a garbage collection by using the FRE(0) function. In order to measure free memory space, the BASIC interpreter must repack the strings. But doing this may not buy you much. You'll find that doing a garbage collection saves you no time on the next one. If the illustrative program above is still in your computer, restore the original line 120 and RUN. When the program is complete—pause and all—type GOTO 140. You'll find that the second collection takes just as long as before, even though we know there's no garbage to be collected.
You may estimate garbage collection timing by using this crude rule of thumb:
G.C. Time = (Number of dimensioned strings) times (Number of dynamic strings) divided by ten; Answer is in milliseconds.
Caution: This is a very crude formula. The actual time varies from machine to machine and is also dependent on average string length. If we work out this formula in terms of the example, we'll get 800 times 800 divided by 10, giving 64,000 milliseconds or slightly over a minute. Don't worry if your machine gave you a noticeably different time. It's the principle that counts here; and anything over a few seconds is too long. We must learn how to reduce this time drastically.
Causes Of Garbage Collection
All we need to do is learn not to leave waste strings lying around; no waste space, no need for garbage collection. That's easy for me to say, but it will take another article to go into the details of how to do it.
The following rules hint at the details that I'll give in the second part of this article:
- Don't move strings around. It's tempting to move strings when your program is doing a sorting job. Don't do it; instead of moving strings, move an "index" array.
- If you transfer strings into and out of computer memory in "blocks," set the unused strings to "null"; for example, A$(X) = " ". When your strings are at a minimum—just before reading in the next block—force a quick collection with FRE(0).
- Identify the garbage-making areas of your program. The most common is a GET or GET# loop which builds longer strings through concatenation. By fiddling with pointers immediately before and after such operations, you can perform a "local" garbage cleanup with great savings of time.
- Some arrays may be changed to numeric instead of string—for example, "April 6, 1984" may be stored as numeric 19840406. Reducing strings reduces garbage collection time.
- If all else fails: When garbage collection seems imminent, write all strings to disk and clear them from memory; force a quick collection; read all the strings back in.
Details on all this next time.
Copyright © 1983 Jim Butterfield