CS 340-Fall 2007: Programming Assignment 0

Due September 7, 11:59 PM.

(This document was last modified on September 2 at 8:53 PM.)


This assignment is intended as a warm-up assignment, to get everyone thinking about programming again. You are to implement a lookup program using either a trie, Patricia tree, or some other similar radix-based searching technique, The program should read in a text file consisting of commands, one line per command. Each command consists an operation, followed by a space, followed a word with no spaces. There are three possible operations: a, which means to add the word to the lookup table; d, which means to delete the word from the lookup table; and l which means to lookup the word. If a word is already in the dictionary, your program should ignore attempts to add it again. If a word is not in the dictionary, your program should ignore attempts to delete it.

If the command is l, you program should print out one line (unless suppressed as indicated below), consisting of the line number of the lookup command in the commands file, a colon, and the string “found.” or “not found.” (note the period). You must follow this format exactly, because the correctness of your program will be checked by a script.

You may use C, C++, or Java.

An example solution that meets the functional requirements is provided here. Instructions for compiling are at the top of the source. This solution does not use a trie, however. Your submission must use a trie, Patricia tree, or some other similar radix-based searching technique, and should be at least as fast as the example solution, at least for some test cases. It is possible that in other test cases, your code will be slower.

Sample input may be generated with this program. Instructions for compiling and running are at the top of the source.

Submission and Evaluation

Your code must run on bingsuns. You should e-mail a compressed (gzip) tar file containing your source and a Makefile to cs340-internal@cs.binghamton.edu. After running make, there should be an executable (or script, if you are using Java) named lookup in the top-level directory. I should be able to run it by typing:

./lookup [y|n] input_file

The first argument is either y or n. If the first argument is y, then your program should print normally as stated above. If the first argument is n, then your program should suppress the printout. For the timing tests, printing will be suppressed.

The second argument is the name of input file.

A very simple makefile that will work is given below.

.PHONY: lookup
lookup:
      g++ -O -o lookup lookup.cpp file1.cpp file2.cpp ...

Note the compile line must be indented with a real tab, and not just spaces.

Your grade will be based on correctness (60%) and level of performance (40%). Correctness will include whether or not your output is correct, and also whether or not you are able to scale to 100,000 words in your dictionary. I will time your program from a “warm” file cache. That is, I will run it multiple times on the same input file, but throw out the first run. This is to eliminate the time it takes to do the disk I/O to read the input file from disk. This assignment will be worth approximately 6% of your grade (subject to some adjustment).

Assignments that are late will receive a 10% penalty per day up to five days.

You must accept the specified input format exactly, produce the specified output format exactly, and follow the submission instructions exactly. Submissions that do not conform to these requirements will not recieve credit.

The submitter of the fastest program will win a $30 gift certificate to Circuit City. The runner-up will win a $20 gift certificate.

To test, I plan to use the output of gen_input, which consists of real words. You can look at the source of that to see how it is generated. I plan to make it approximately 30% add operations, and 5% delete operations. I will run two tests, the first will use a large vocabulary, as present in the output of gen_input. For the second test, the input will draw from the same set of strings as gen_input, but I will ensure that the total number of characters of all words in the lookup table at any one moment is less than 10K, which may allow your data structures to fit in L2 cache. In both cases, I plan to make the input file big enough so that your data structures will have reached “steady-state”. I will reserve the right to change some of these conditions, however, if needed to get meaningful test results.

The tests will be run on bingsuns. Use prtdiag to get details about the hardware configuration of bingsuns.