Data compression: packing it in. (Introdos) (Column)
by Tony Robert
It's axiomatic: Data expands to fill all available storage space on your disk.
When you run out of room, you can either delete files, purchase additional storage, or find some way of making more data fit into less space. For more and more computer users, the last option, data compression, is the best way to go. Let's see how compression works and look at the ways it can be achieved.
Compression software uses a variety of algorithms to compact files. These programs usually start by looking for repeated characters in a file. For example, many people routinely press the space bar five times every time they indent a paragraph. The compression software identifies these repeated strings, and instead of storing five spaces in the disk file, it stores a code that means five spaces.
The compressed file, therefore, is a series of special codes that describe the original file. When file decompression is requested, the codes are expanded, and the file is returned to its original form and size.
Graphics, word-processing, database, and spreadsheet files usually compress well because of the high incidence of repetitive data that occurs in them. Program files, however, normally do not compress as much.
For years, telecommunicators have been big fans of file compression. Smaller, Compressed files transfer much faster than uncompressed files, and that means lower connect-time charges, which in turn means lower long-distance bills. But even if you're not a telecommunicator, you may want to begin compressing some of your files to free up some disk space and to simplify file management.
Single files or groups of files can be compressed with utilities such as PKZIP and LHArc. PKZIP has become a widely recognized standard. LHArc, another well-known compression program, is freely distributed. Be aware that the compression algorithms used by these programs differ, however, so a file compressed with PKZIP can't be decompressed with LHArc.
In addition to saving space, compression utilities can take several files and combine them into one file called an archive. For example, you can gather up all the files you used to prepare last year's tax returns - spreadsheet files, word-processing files, tax-preparation software files, and so on-and bundle them into an archive called TAXES92. Copy this archive file to a floppy disk and store it with your income tax materials. When you're ready to work on your 1993 tax return, you'll have all of your 1992 documents to use as a handy reference right at your fingertips.
Or, for another example, look at your correspondence subdirectory. Does it include dozens or hundreds of memos that you keep on hand because you may want to refer to them sometime? Why not take all of your letters from 1993 and compress them into one archive called LTRS93? In addition to freeing up hard disk space, archiving your letters reduces the clutter in your correspondence subdirectory. If you ever need one of the letters in the archive, you can give a command to decompress only the one you need.
In the past few months, another type of compression - whole disk compression - has, received considerable attention, thanks to the inclusion of DoubleSpace as an integral part of DOS 6. Under this system, everything that's stored on disk is compressed as it's being saved and decompressed as it's being read. And it all happens without any intervention from the user.
Disk compression may slow system performance a tad, but the payoff is that you can store nearly twice as much data on any given disk. On a fast computer, the slowdown is barely perceptible. DoubleSpace and similar utilities, such as Stacker from Stac Electronics, can provide a low-cost Way to expand your system without your having to open the box and install new hardware.
However, the inclusion of DoubleSpace with DOS 6 has fueled a continuing debate about the safety of disk compression. While the majority of users have installed Double-Space successfully, a few have reported problems and have experienced data loss. Most of these problems appear to be installation issues, and Microsoft's answer - a DOS 6.2 maintenance release - may be available by the time you read this.
It's clear, though, from Stacker's track record and from the experience of those who've achieved successful installation of DoubleSpace, that whole disk compression is a viable alternative to installing a new hard drive. Still, the standard computing caveat - always keep backup copies of your data - bears repeating.
If you use whole disk compression, note that you won't double the benefit by trying to combine the effects of Stacker or DoubleSpace with PKZIP or LHArc. Once a file is compressed, the whole disk compression program won't be able to do much more.