ABSTRACT
This experiment is designed to test and compare AVL trees and Hash Tables.
According to theory Hash Tables should perform better then AVL trees but there are other factors, such as that AVL trees resolve string comparisons in a few letters while a Hash function must use the whole string, that must be considered. A test to count the number of element comparisons involved in the insertion of various strings into an AVL tree was performed. A test of the insertion time of words in alphabetical order was performed. A variety of tests timing the insertion of random strings of various lengths were performed. A test timing the lookup of words that were inserted in alphabetical order was performed. A test timing the lookup of random strings of various lengths was performed. The results are that even though AVL trees can sometimes resolve string comparisons without having to compare all the letters in a particular string, Hash Tables still perform better in the situations that were tested in this experiment.
DATA COLLECTION
All data recorded in this paper is the result of taking the middle values of three identical trails. The time values for the insert tests were obtained using the UNIX/linux built in time function ran at the prompt. The time values for the find tests were obtained by using the time.h library routines in the following manner.
time_t start, finish;
start = clock();
//some code
finish = clock();
total = (finish - start)/ (double)clocks_per_sec;
This method, when used to test an entire program, gave results to within two tenths of a second of the UNIX/linux built-in time function.
All tests on both the AVL tree and the Hash Table were run with keys of type Xstring. (Xstring class copyright 1998-1999 by David H. Ackley ) used with permission.
The element counts for AVL inserts were obtained in the following manner. I added a function "element_count" into the Xstring class that had a static variable "count" which was initialized to zero. The function incremented count each time it was called, and returned count. I added calls to element_count inside the Xstring string compare function, which is called in the overloaded Xstring string comparison operators. I added a call to element_count at each character comparison in the string compare function. In main I used element_count to return the value of the static variable count for my data collection, taking care to decrement it by one to account for this last call. This added code was removed from the Xstring class for all other tests. This method gave exact results when compared with pen and paper tracings of string insertions.
All tests were performed on a 100MHz machine running Linux 6.0 with a Pentium processor and 32 MB RAM.
METHODS
Randomly generated strings were obtained by using the random function in stdlib.h to generate random numbers, using the mod function to insure they were in the right range, casting them to characters and += them to an Xstring.
Simulated worst case strings have the same letter at the beginning but random letters at the end. These were obtained by using a small for loop to += the same character into an Xstring the desired number of times, then adding randomly generated characters at the end.
In the tests of the find functions for both the AVL and the Hash table a check was performed to insure that all finds were successful. The find function of the Hash table returns a bool, so this bool was checked. The find function of the AVL returns a pointer so this was checked to not be null.
The Hash table used separate chaining and the PJW hashing function.
The hash table resized when the number of nodes was greater then three times the table size. The Hash table started with an initial table size of 3203.
TESTS/RESULTS
The first test was to find out how many element comparisons were being done by the AVL tree during inserts, and how this related to the running time. The input used in trail (a) was a file of 23779 words in alphabetical order. This file is named words and was obtained from a linux dictionary program ( /usr/share/dict/words). Trial (a) resulted in 1535132 element comparisons and a time of 2.64 seconds. This is 64.5 average element comparisons per word. Trial (a) leans toward worst case, as the words being inserted have many similar letters at the front. The following trials were with randomly generated strings, starting with strings of length five and increasing to length fifteen. Again, 23779 words were inserted to keep the length of the words the only changing variable. The element comparisons did not keep pace with the added letters but averaged around 973,000, actually coming down at the same time letters were added. See charts (1) and (2). This is because the depth of an AVL tree is only slightly more then log n (in practice). The time increased steadily as the strings got longer. The final trials in this test were to simulate instances of worst case element comparison. Strings in which all the letters were the same except the last three letters, which were randomly generated, were made and inserted into the AVL tree. The last three were left random to give a pool of 17,576 (26*26*26) words before repeats. This generated a huge number of element comparisons and also slowed down the program. This is because this test forces the AVL
tree to check almost every letter in a string at each string comparison. For example with a string of length five where every letter has to be checked at each string comparison (the last letter is the only different one) and four string comparisons are done, generating twenty element comparisons. Even worse, if the code checks for less-than first and then greater-than and at each comparison the string is greater-than the string its being compared to then the above example will generate forty element comparisons. Although, an AVL tree can be slowed down by forcing it to do a large number of element comparisons. In average or random cases element comparisons are not where the AVL tree is spending most of its time.
The second group of tests started by inserting randomly generated strings into the AVL tree and the Hash table. The number of insertions was 23,779. The string lengths started at five and went up to fifteen. See chart (5). The Hash table averaged .24 of a second faster then the AVL. The Hash performed .31 seconds better then the AVL tree with string lengths of five, but only .12 seconds better with string lengths of fifteen. This is to be expected, as algorithm analysis states that Hash tables will perform better then AVL trees. Worst case string simulation gave similar results but with the Hash table times being considerably better than the AVL tree and the AVL tree performance degrading quickly as the string length increased. See chart (6). With the number of words inserted increased to 100,000 the Hash table performed 1.92 seconds faster with random strings and 18.31 seconds faster with simulated worst case strings. AVL time for the insertion of all the entries in the "word" file was 2.37 seconds, the Hash table time was 1.85 seconds.
Lastly, the find function of both the AVL tree and the Hash table were run through tests. Find times for all the entries in the "word" file varied by .95 seconds between the AVL and the Hash table the Hash table performing .95 seconds faster than the AVL tree. See chart (9). A find on randomly generated strings of varying length was done on the AVL and the Hash table, with the Hash table exhibiting a better average performance of one second. See chart (10). In tests of finds of worst case strings the Hash table performed 1.41 seconds faster then the AVL tree, with string lengths of five. The performance of the AVL degraded rapidly as string lengths were increased. See chart (11). Finally a find test was ran with string length of fifteen and the number of words searched for equal to 100,000. For worst case strings the Hash was 19.31 seconds faster. For randomly generated strings the Hash table was 5.87 seconds faster. See chart (12).
CONCLUSION
This experiment showed that algorithm theory is correct in that Hash tables do perform better than AVL trees. The hashing of the string function does not offset the element comparisons done in the AVL tree, although the AVL tree can be slowed down by forcing it to perform a large number of element comparisons. In average or random cases the element comparisons are not the factor that slows down the AVL. See charts (1) and (2). The results are that even though AVL trees can sometimes resolve string comparisons without having to compare all the letters in a particular string, Hash Tables still perform better in the situations that were tested in this experiment.