CMIS 240 Week 13 I hope the exam wasn't too painful. Let's learn about binary search trees (BST). This is a data structure that is good for a changing set of data that needs to be searched. On average, inserting, deleting, and searching are O(log N) operations. So it's as efficient a search as binary search, without the O(N) insert and delete times of a sorted array. The drawback is that it's more complicated than linear structures like arrays or linked lists. A BST is a linked structure. Each node has two pointer members, one pointing to the left "child", the other pointing to the "right" child. They're called left and right because that's their position when drawn. The two children nodes are distinct. A node can have one or the other or both or no child nodes. Trees are a mix of familial and botanical terminology. A node with no children is a "leaf" node. The node at the top (trees are drawn upside down) is called the "root" node. A node has exactly one "parent". The root has no parent. In fig 8.1 the node with A is the root. The pointer from the box tree is the external pointer to the data structure; it's not the parent of the root. You can talk about siblings, descendants, ancestors etc. The links can be called branches. There is only one path from the root to a node (and only one path between any pair of nodes [a path can't use a link more than once]). Also, in a tree of N nodes, there are exactly N-1 links. A BST is a special kind of binary tree. A binary tree is a special kind of tree. A node in a (regular) tree can have any number of children, and there is no distinction among them. Trees are used for heirarchical classification: library call numbers, biological classification, organizational charts, filesystem are everyday examples of trees. A binary tree's nodes have 0, 1, or 2 children called the left and right children. Binary tree is about shape. It's also recursive because each and every node can be viewed as the root of a (sub)tree. "Height" is the distance of the furthest node from the root, in number of links. The trees in Fig 8.2 are of heights 3, 4, and 9, respectively. We'll see that height is the critical factor in the efficiency of a BST. The binary trees in Figs 8.1 and 8.2 are not BSTs. A BST is a binary tree with an additional property that the value (more precisely, the key value) of a node is such that all values in the left subtree are smaller than it, and all values in the right subtree are larger than it. Fig 8.3 is a BST. The BST property applies at EVERY node. Look now at Fig 8.6, 8.7, 8.10, 8.16, and the Exercises for other examples of BSTs. I've posted a simplified version of the book's BST TreeType class. I chopped off the function for copying a BST (along with the copy ctor and overloaded = operator which use it); you can skip that section of the book. Same with the functions to do the various traversals of the BST using an auxiliary queue. And the iterator functions. They aren't needed for a first pass thru BSTs. In Chapter8/ TreeType.h TreeType.cpp treedr.cpp driver program thingee.h thingee struct sample_run redirect input to tree of ints program We're left with the core functionality of inserting, deleting, and searching. We can also traverse the tree to either count the number of nodes or to display the information it contains in sorted order. Insert, delete, and search are implemented with recursive functions. They can also be done iteratively (simpler I think to understand); you'll notice that the recursive functions are all tail recursive, thus easily convertible into iterative functions. The public member functions call recursive functions with the pointer to the root node as argument to do the actual searching, inserting, deleting, counting, printing, respectively. The public functions can't be recursive because they would have to be called with the root member, which would violate encapsulation if root wasn't private. The count_nodes function is not tail recursive, so it's better as a recursive function. It's somewhat similar to the path counting program you wrote. The print_tree function does an inorder traversal of the BST, which will visit the nodes in sorted order. It's also not tail recursive, an iterative version requires a stack. Understand why inorder traversal of a BST visits the nodes in order: all the nodes in the left subtree, which are all smaller than the current node, are visited before this node, which is then visited, and then all the nodes in the right subtree, which are all larger than this node, are visited. The dtor uses the Destroy function which visits the nodes in postorder, meaning the node is visited after its children (and their descendants) have been visited (and deleted). Postorder is the only traversal suitable for destroying the tree. A node's memory can't be returned to the heap before its pointer members have been accessed. Practice the three traversal methods with some BSTs. Searching a BST is simple: go down the only possible path that the search value could be on comparing the nodes with the search value. You'll either come to a node with the sought-for value or to a Null pointer, falling off the tree in an unsuccessful search. It's similar to binary search, at each step compare the value to see if you've got a match, otherwise go left or right as appropriate. It's not called a Binary Search tree for nothing! Inserting and deleting both start with a search. Insert is going down the only correct path that the new value can belong on. It's sort of searching but not checking for equality, just doing the < to see if it should go left or right. Where it falls off the tree, the new node is attached as a leaf at that point. Insert doesn't "insert" a new node between other nodes; it might be better named Attach. The Insert code allows more than one node with the same value to be in the tree. Note that the shape of a BST of a given set of values depends on the order in which the values are inserted. Fig 8.10 illustrates that the same set of values yields different BSTs depending on the order the values are inserted (assuming no deletions). Practice making BSTs with different insertion orders. The average height of all the BSTs of N nodes is O(log N), i.e. on average, a BST is short and bushy, not tall and stringy. Searching, inserting, and deleting all might have to go down to the furthest node from the root, doing O(1) work at each node along the way, for a total of O(height) work. So when height is O(log N), the work is O(log N). A tall and stringy tree will have height of O(N), in which case the functions have a worst case of O(N). So BSTs have the potential to be as inefficient as sorted arrays for the basic functions of searching, inserting, and deleting. but this worst case is unlikely to occur, and there are techniques to ensure that it never happens. Delete function searches for the node that is to be deleted. I modified the book's version so that it doesn't matter if the node isn't in the tree, it does nothing in that case. Delete is the most complex of the functions. Basically, the to-be-deleted node is one of three cases, depending on the number of children it has (0, 1,or 2). Case 0 children, i.e. it's a leaf, is the easiest: just set its parent's pointer to Null. Case 1 child, set it's parent's pointer to point to its single child. Case 2 children, the most complicated, replace the node with its "predecessor", the largest (rightmost) node in the left subtree, and delete the predecessor node. Many good Exercises: 1-20, 37-40 are good practice. 21-36 are code writing. -------------------------------------------------------------------------- The treedr.cpp uses conditional compilation. From the command line you can invoke the GNU compiler with a -D option that defines a symbol: g++ -Dthingeetree treedr.cpp then in the code, the symbol thingeetree is defined, so the #ifdef thingeetree is true and the code that follows is compiled, yielding a tree of thingee structs. If the compiler is not invoked with the option, then the #ifdef is false and the #else part is compiled, yielding a tree of ints. You can have "else if" and an optional else statements; the syntax is #elif whateverSymbol In an IDE, there should be somewhere you can specify options to the compiler: In rhide/DJGPP, under Options--Compiler Options you can enter -Dthingeetree as the Parameter value. In Dev-C++, under Options--Compiler Options, check the "Add the following commands when calling compiler" and enter -Dthingeetree In Borland 5: From the main screen, select PROJECT from the top bar. then at the bottom select OPTIONS. then click the DIRECTORIES/CONDITIONALS tab. Near the bottom you'll see a block marked "Conditional defines:", it probably has "_DEBUG" already inside it. Enter the name that you want defined in that block ie "thingeetree". In VC++: (maybe there's an easier way, anyone, please?) First, open your workspace for your program, let's say Treedr. Next select PROJECT from the menu bar and choose SETTINGS. The 'project settings' window should be displayed. 'Treedr' should be listed on the left side of the window. If it is collaped (a '+' to the left of it) - expand it (by clicking the plus) - treedr.cpp should be seen in the list. - select it. To the right, you should see tabs "General" and "C/C++" -select the C/C++ tab. From the 'Category' drop down list, select 'preprocessor' - the last item in the list. Preprocessor definitions should now be displayed just below the catagory selection. Add ",thingeetree" (without quotes) after the last definition. Before: WIN32,_DEBUG,_CONSOLE,_MBCS After: WIN32,_DEBUG,_CONSOLE,_MBCS,thingeetree Click ok, and recompile treedr.cpp (Repeat 100 times: "a GUI IDE is so much easier to use") -------------------------------------------------------------------------- Homework Due: 16 Dec Modify the treedr.cpp so that it conditionally creates a tree of phone structs which is defined as follows: struct phone { string name; int extension; }; You'll need a phone.h file similar to the thingee.h file for defining the phone struct and overloading the I/O and relational operators to work with it. The overloaded relational operators can use C strings (i.e. cstring) or C++ strings (i.e. string). No changes are needed to the TreeType files. No change is needed to the do loop of main. Submit the modified treedr.cpp and your phone.h files.