CMIS 240
Week 13
I hope the exam wasn't too painful.
Let's learn about binary search trees (BST). This is a data structure
that is good for a changing set of data that needs to be searched. On
average, inserting, deleting, and searching are O(log N) operations.
So it's as efficient a search as binary search, without the O(N)
insert and delete times of a sorted array. The drawback is that it's
more complicated than linear structures like arrays or linked lists.
A BST is a linked structure. Each node has two pointer members, one
pointing to the left "child", the other pointing to the "right" child.
They're called left and right because that's their position when
drawn. The two children nodes are distinct. A node can have one or
the other or both or no child nodes. Trees are a mix of familial and
botanical terminology. A node with no children is a "leaf" node. The
node at the top (trees are drawn upside down) is called the "root"
node. A node has exactly one "parent". The root has no parent. In
fig 8.1 the node with A is the root. The pointer from the box tree is
the external pointer to the data structure; it's not the parent of the
root. You can talk about siblings, descendants, ancestors etc. The
links can be called branches. There is only one path from the root to
a node (and only one path between any pair of nodes [a path can't use
a link more than once]). Also, in a tree of N nodes, there are
exactly N-1 links.
A BST is a special kind of binary tree. A binary tree is a special
kind of tree. A node in a (regular) tree can have any number of
children, and there is no distinction among them. Trees are used for
heirarchical classification: library call numbers, biological
classification, organizational charts, filesystem are everyday
examples of trees. A binary tree's nodes have 0, 1, or 2 children
called the left and right children. Binary tree is about shape. It's
also recursive because each and every node can be viewed as the root
of a (sub)tree. "Height" is the distance of the furthest node from
the root, in number of links. The trees in Fig 8.2 are of heights 3,
4, and 9, respectively. We'll see that height is the critical factor
in the efficiency of a BST. The binary trees in Figs 8.1 and 8.2 are
not BSTs. A BST is a binary tree with an additional property that the
value (more precisely, the key value) of a node is such that all
values in the left subtree are smaller than it, and all values in the
right subtree are larger than it. Fig 8.3 is a BST. The BST property
applies at EVERY node. Look now at Fig 8.6, 8.7, 8.10, 8.16, and the
Exercises for other examples of BSTs.
I've posted a simplified version of the book's BST TreeType class. I
chopped off the function for copying a BST (along with the copy ctor
and overloaded = operator which use it); you can skip that section of
the book. Same with the functions to do the various traversals of the
BST using an auxiliary queue. And the iterator functions. They
aren't needed for a first pass thru BSTs.
In Chapter8/
TreeType.h
TreeType.cpp
treedr.cpp driver program
thingee.h thingee struct
sample_run redirect input to tree of ints program
We're left with the core functionality of inserting, deleting, and
searching. We can also traverse the tree to either count the number
of nodes or to display the information it contains in sorted order.
Insert, delete, and search are implemented with recursive functions.
They can also be done iteratively (simpler I think to understand);
you'll notice that the recursive functions are all tail recursive,
thus easily convertible into iterative functions. The public member
functions call recursive functions with the pointer to the root node
as argument to do the actual searching, inserting, deleting, counting,
printing, respectively. The public functions can't be recursive
because they would have to be called with the root member, which would
violate encapsulation if root wasn't private.
The count_nodes function is not tail recursive, so it's better as a
recursive function. It's somewhat similar to the path counting
program you wrote.
The print_tree function does an inorder traversal of the BST, which
will visit the nodes in sorted order. It's also not tail recursive,
an iterative version requires a stack. Understand why inorder
traversal of a BST visits the nodes in order: all the nodes in the
left subtree, which are all smaller than the current node, are visited
before this node, which is then visited, and then all the nodes in the
right subtree, which are all larger than this node, are visited.
The dtor uses the Destroy function which visits the nodes in
postorder, meaning the node is visited after its children (and their
descendants) have been visited (and deleted). Postorder is the only
traversal suitable for destroying the tree. A node's memory can't be
returned to the heap before its pointer members have been accessed.
Practice the three traversal methods with some BSTs.
Searching a BST is simple: go down the only possible path that the
search value could be on comparing the nodes with the search value.
You'll either come to a node with the sought-for value or to a Null
pointer, falling off the tree in an unsuccessful search. It's similar
to binary search, at each step compare the value to see if you've got
a match, otherwise go left or right as appropriate. It's not called a
Binary Search tree for nothing!
Inserting and deleting both start with a search. Insert is going down
the only correct path that the new value can belong on. It's sort of
searching but not checking for equality, just doing the < to see if it
should go left or right. Where it falls off the tree, the new node is
attached as a leaf at that point. Insert doesn't "insert" a new node
between other nodes; it might be better named Attach. The Insert code
allows more than one node with the same value to be in the tree. Note
that the shape of a BST of a given set of values depends on the order
in which the values are inserted. Fig 8.10 illustrates that the same
set of values yields different BSTs depending on the order the values
are inserted (assuming no deletions). Practice making BSTs with
different insertion orders.
The average height of all the BSTs of N nodes is O(log N), i.e. on
average, a BST is short and bushy, not tall and stringy. Searching,
inserting, and deleting all might have to go down to the furthest node
from the root, doing O(1) work at each node along the way, for a total
of O(height) work. So when height is O(log N), the work is O(log N).
A tall and stringy tree will have height of O(N), in which case the
functions have a worst case of O(N). So BSTs have the potential to be
as inefficient as sorted arrays for the basic functions of searching,
inserting, and deleting. but this worst case is unlikely to occur,
and there are techniques to ensure that it never happens.
Delete function searches for the node that is to be deleted. I
modified the book's version so that it doesn't matter if the node
isn't in the tree, it does nothing in that case. Delete is the most
complex of the functions. Basically, the to-be-deleted node is one of
three cases, depending on the number of children it has (0, 1,or 2).
Case 0 children, i.e. it's a leaf, is the easiest: just set its
parent's pointer to Null. Case 1 child, set it's parent's pointer to
point to its single child. Case 2 children, the most complicated,
replace the node with its "predecessor", the largest (rightmost) node
in the left subtree, and delete the predecessor node.
Many good Exercises: 1-20, 37-40 are good practice. 21-36 are code
writing.
--------------------------------------------------------------------------
The treedr.cpp uses conditional compilation. From the command line
you can invoke the GNU compiler with a -D option that defines a symbol:
g++ -Dthingeetree treedr.cpp
then in the code, the symbol thingeetree is defined, so the
#ifdef thingeetree
is true and the code that follows is compiled, yielding a tree of
thingee structs. If the compiler is not invoked with the option, then
the #ifdef is false and the #else part is compiled, yielding a tree of
ints.
You can have "else if" and an optional else statements; the syntax is
#elif whateverSymbol
In an IDE, there should be somewhere you can specify options to
the compiler:
In rhide/DJGPP, under Options--Compiler Options you can enter
-Dthingeetree as the Parameter value.
In Dev-C++, under Options--Compiler Options, check the "Add the
following commands when calling compiler" and enter
-Dthingeetree
In Borland 5: From the main screen,
select PROJECT from the top bar.
then at the bottom select OPTIONS.
then click the DIRECTORIES/CONDITIONALS tab.
Near the bottom you'll see a block marked "Conditional defines:", it
probably has "_DEBUG" already inside it. Enter the name that you want
defined in that block ie "thingeetree".
In VC++: (maybe there's an easier way, anyone, please?)
First, open your workspace for your program, let's say Treedr.
Next select PROJECT from the menu bar and choose SETTINGS.
The 'project settings' window should be displayed.
'Treedr' should be listed on the left side of the window. If it is
collaped (a '+' to the left of it) - expand it (by clicking the plus)
- treedr.cpp should be seen in the list. - select it.
To the right, you should see tabs "General" and "C/C++" -select the C/C++ tab.
From the 'Category' drop down list, select 'preprocessor' - the last
item in the list.
Preprocessor definitions should now be displayed just below the
catagory selection.
Add ",thingeetree" (without quotes) after the last definition.
Before: WIN32,_DEBUG,_CONSOLE,_MBCS
After: WIN32,_DEBUG,_CONSOLE,_MBCS,thingeetree
Click ok, and recompile treedr.cpp
(Repeat 100 times: "a GUI IDE is so much easier to use")
--------------------------------------------------------------------------
Homework
Due: 16 Dec
Modify the treedr.cpp so that it conditionally creates a tree of
phone structs which is defined as follows:
struct phone {
string name;
int extension;
};
You'll need a phone.h file similar to the thingee.h file for defining
the phone struct and overloading the I/O and relational operators to
work with it. The overloaded relational operators can use C strings
(i.e. cstring) or C++ strings (i.e. string). No changes are needed
to the TreeType files. No change is needed to the do loop of main.
Submit the modified treedr.cpp and your phone.h files.