Sorting Sorts: A Programming Notebook
Belinda Hulon
Rick Hulon, Manager, The Corner Computer Store
An important aspect of many business applications involving microcomputers is the selection and efficient use of an appropriate sorting routine. While sorting routines are readily available in the literature and/or easily programmable, some thought should be given to the type of sort used. This two-part article will deal with six of the better known sorting algorithms: Selection Sort, Bubble Sort, Shell Sort, Quick Sort, Merge Sort, and Heap Sort. These sorts range from very simple to quite complex, from extremely slow to exceedingly fast. This first article will concern itself with the simpler, slow to intermediate sorts.
In this article we evaluate Selection Sort, Bubble Sort and Shell Sort. Selection Sort is a very simple, straight-forward routine (see Figure 1). Its methodology is to iteratively pass through the list of items to be sorted. On the initial pass, the first item is compared with each successive item, exchanging it with any element that is "less than" the first item. The "new" first element is then compared to each item after the point of exchange. This process continues until one entire pass is completed. The procedure is then repeated for the second item in the list, then the third, etc., until the last item is reached. Thus, Selection Sort essentially "selects out" the smallest item on the first pass, the next smallest on the second, and so on. This sort, then, always goes through a set number of passes regardless of the state of the list. An already sorted list would still go through the entire routine as though it were not sorted.
Bubble Sort accomplishes its task not by comparing one item to all the others as in Selection Sort, but rather by comparing adjacent elements in the list, switching whenever necessary. In addition, it sets a flag to indicate when no exchanges have been made in a given pass, thus signalling the end of the sort. Bubble Sort therefore takes only as many passes as it needs. An already sorted list would use one pass to determine that no exchanges were made. A listing of Bubble Sort can be found in Figure 2.
The third sort to be considered, Shell Sort, is some what more complicated. Shell Sort is essentially an extension of Bubble Sort. Initially a "gap" size is determined to be the largest integer less than or equal to half of the list size (e.g., if the list contained 11 items, the initial gap size would be 5). This gap size supplies the essential difference between Shell Sort and Bubble Sort, for instead of only comparing adjacent items, Shell Sort compares items separated by the gap size, exchanging them when necessary. Once it determines that no exchanges were made on the last pass with a particular gap size, the size of the gap is cut in half and the process continues. As one can easily see, this results in a Bubble Sort when the gap size becomes 1, but since the list is already partially sorted, it does not require as much time for larger lists as a regular Bubble Sort would. Shell Sort, like Selection Sort, does not provide for an "easy out." However, it does not go through the routine a set number of times. A pre-sorted list will require only enough passes to obtain a gap size of 1, since there will never be any exchanges. (See Figure 3).
Sorting Sorts: A Programming Notebook 10 FOR I = 1 TO N - 1 20 FOR J = I + 1 TO N 30 IF V(I) < = V(J) THEN 70 40 S = V(I) 50 V(I) = V(J) 60 V(J) = S 70 NEXT J 80 NEXT I 90 END Figure 1 10 F = 1 20 IF F = 0 THEN 120 30 F = 0 40 FOR I = 1 TO N - 1 50 IF V(I) <= V(I + 1) THEN 100 60 S = V(I) 70 V(I) = V(I + 1) 80 V(I + 1) = S 90 F = 1 100 NEXT I 110 GOTO 20 120 END Figure 2 10 GP = N 20 IF GP < = 1 THEN 160 30 GP = INT(GP/2) 40 MI = N - GP 50 F = 0 60 FOR I = 1 TO MI 70 GI = GP + I 80 IF V(I)<= V(GI) THEN 130 90 S = V(I) 100 V(I) = V(GI) 110 V(GI) = S 120 F = 1 130 NEXT I 140 IF F = 1 THEN 50 150 GOTO 20 160 END Figure 3 WHERE: V = Array containing data to be sorted S = Holding variable used for exchanging items N = Number of items to be sorted F = Flag indicating occurrence of an exchange GP = Gap size MI = Number of times the loop must be iterated on one pass; depends on gap size GI = Index for comparison; depends on gap size |
Several factors are involved in determining the efficiency of a sorting algorithm. The time involved, or the speed of the sort, is usually one of our major concerns, especially with micros. Two important factors contributing to the speed and efficiency of a sort are the number of comparisons made and the number of actual exchanges involved in sorting a list. In this case we counted any comparison made (including the checking of flags), not just data comparisons. Although we are not directly concerned with CPU time in terms of actual cost, it seems obvious that the fewer comparisons and exchanges made in the same (or less) amount of time, the more efficient the sort will be. These three factors then, time, number of comparisons and number of exchanges comprise our criterion for comparison. The actual method used was to generate 30 different lists of random numbers, having each algorithm sort each list. The 30 values for each of the criterion for the different lists sizes were averaged to produce the values given in Table 1. This procedure was followed for lists of size 10, 25, 50 and 100. All data was obtained from a Commodore CBM with 32K of internal RAM.
TABLE 1 | |||
BUBBLE SORT | |||
List Size | Number of Comparisons | Number of Exchanges | Time |
10 | 75 | 21 | 1.8s |
25 | 508 | 153 | 12.4s |
50 | 2098 | 592 | 50.7s |
100 | 8811 | 2450 | 3.6m |
SELECTION SORT | |||
45 | 21 | 1.1s | |
300 | 144 | 7.3s | |
1225 | 505 | 28.0s | |
4950 | 1815 | 1.8m | |
SHELL SORT | |||
72 | 11 | 1.5s | |
339 | 63 | 7.4s | |
967 | 155 | 20.9s | |
2669 | 399 | 57.2s | |
WHERE: | |||
s = seconds | |||
m = minutes |
It should be noted that upon beginning this article there were some basic expectations. Having already run a similar project on a large computer, we expected similar results from the CBM. The initial project showed, true to the numerous textbooks available, that while Selection Sort and Bubble Sort were good for small lists (even superior to more sophisticated sorts), Shell Sort would be better for larger lists. Also, Selection Sort should be faster than Bubble Sort, due to the nature of the algorithms (we omit the mathematical determination of this situation). In this experiment we did duplicate our first results fairly well, as can be seen from Table 1. However, the amount of time involved seems flabbergasting. Of course, we could not have expected micro to compare in speed to a mainframe but the differences were disturbing. For example, the time involved in the sorting of a list of 500 items by one of these sorts ran into hours, somewhat troublesome for business applications. Having mulled over this for awhile we came to a tentative conclusion which seems to explain this occurrence. Our original sorting routines were written in PL/1, a batch language for use on an IBM 360/370. In this situation the source code were through a compiler which translated it into machine code for execution. On our CBM, however, the routines were written in interpreted BASIC. One important difference between an interpreter and a compiler that with a compiler the source is "compiled" only one. The machine code is produced and the higher level language is no longer a concern. With an interpreter, on the other hand, each line of code is interpreted EVERY time it is encountered. This, then should account for much of the excessive time observed. As a test, we wrote Selection Sort (chosen for its simplicity) in machine code. This eliminated the interpretation stage. This gives us a better idea of just how much time is actually involved in sorting the lists. The elimination of an interpreter changed the time involved drastically. While the BASIC routine required 1.1 seconds to sort list of only 10 items, the machine code version took so little time as to record a duration of 0 seconds! Since the built-in timer of the CBM records time in "jiffies" or 1/60 of a second, it actually took less than 1/60 of a second to sort the list. The results are even more impressive for a list of 100 items. While BASIC required 109.2 seconds (just under 2 minutes) the machine code version required only .2 seconds. In other words, the BASIC algorithm took 546 times longer than did the machine code routine. Much of this extra time, then, seems to be a result of the BASIC interpreter. Thus, what might seem to be a very efficient sort could actually prove to be worse than a less efficient sort, depending on the amount of code involved and the number of items to be sorted. In the design of business software, as much attention should be paid to the language (and therefore type of compiler or interpreter) as to the type of sort involved. If you are willing to work with machine code then more efficient sorts should be considered. We will include machine code listings in the next article along with the evaluations of Quick Sort, Merge Sort, and Heap Sort.
TABLE 2 | ||
List Size | Time for BASIC routine | Time for mach. code routine. |
10 | 1.1 sec | 0.00 sec |
25 | 7.3 sec | 0.02 sec |
50 | 28.0 sec | 0.05 sec |
100 | 1.8 min | 0.17 sec |