CMIS 241 DE
Study chapter 3, about unsorted and sorted lists classes and binary
search. You can go lightly over the implementation details of the
book's code.
A list is an abstract data type that is a sequence of items. Sequence
implies that each list item has an item before it and after it in the
list (except for the first and last elements). The number of items in
the list is the list's length or size. A partially filled array
(i.e. filled from the beginning) is a kind of list, but a list is not
an array. A list can be implemented with an array (it could also be
implemented with a linked data structure [chapter 5]).
Java has some built-in list classes. They are part of the Collections
Framework. ArrayList is one of them. An ArrayList is a list of any
kind of and any mix of objects (primitive type values have to be
"wrapped" in its corresponding wrapper class, e.g. an int has to be
turned in to a Integer). (Usually the items in the list are the same
type, but they don't have to be.)
A collection is a group of objects...C++ uses the word container...
Incredibly easy to use. And efficient and very reliable. No one will
create their own list class.
The ArrayList grows and shrinks automatically as needed.
//How to use an ArrayList:
//1. declare and instantiate an ArrayList object:
// ArrayList myArrayList = new ArrayList();
//2. use the methods on it:
// myArrayList.add("a string object");
// myArrayList.add(anyObject);
// if (myArrayList.contains(someObject))...
The Collections class has some static methods to perform various
algorithms on collections such as ArrayLists.
Collections.sort (myArrayList)
Collections.reverse(myArrayList)
Collections.shuffle(myArrayList) randomly permute (mix up) the items
Collections.min (myArrayList)
Collections.max (myArrayList)
A static method (also known as a class method) is a method that is
not part of any particular instantiated object. It typically provides
a utility for general use. It is typically invoked via the class name
instead of an object reference. Math.sqrt(),
JOptionPane.showInputDialog(), System.out.println() are static
methods
Run the ArrayListTest.java to understand ArrayList.
A list that's not in any order can not be efficiently searched.
Basically you have to look at all the items until either you find what
you're looking for or you get to the end of the list. This is called
sequential search. With small lists the time it takes is irrelevant,
but many applications have huge lists that must be searched a lot.
Say you've got one million items, and they need to be searched one
thousand times every second. The searching is going to take up a lot
of CPU time, whatever the mega/giga hertz. However, if the list is in
sorted order, and it's stored in an array (i.e. the list is
implemented with an array), you can take advantage of the power of
array indexing to search very efficiently. Because arrays are laid
out sequentially in memory, element access via an index is just a
simple memory address calculation. Any element of an array can be
accessed in the same time as any other. You're not limited to
sequential access, you can access the elements in any order or
pattern. The binary search repeatedly ("recursively" as we'll see in
chapter 7) accesses the middle element of (sub)arrays that are half
the size of the (sub)array just previously looked at. This allows
searching in essentially no time at all. Pages 174-180 show the algorithm.
The contains() method of the ArrayList class does sequential search:
myArrayList.contains(someObject) returns boolean
The indexOf() method of the ArrayList class also does sequential search:
myArrayList.indexOf(someObject) returns index of where found...
The Collections class has a static method for doing binary search on a
sorted ArrayList (if it's not sorted the answer is undefined, meaning
junk):
Collections.binarySearch(myArrayList,someObject)
returns index of where found, or negative number if not found
Run the SearchArrayListTest.java program to get a feel for the
efficiency difference between sequential and binary search. Binary
search is vastly less time-consuming, how much so is discussed
below. The drawback of binary search is that the list must be in
sorted order...
Analyzing the efficiency of algorithms is an important topic in
computer science. The discussion in the book is a good introduction
to it. Once an algorithm is correct, we can worry about its
efficiency. Timings are too dependent on CPU, operating system,
programming language, compiler and are not mathematical enough so they
are not used. Basically we are just counting the number of times some
fundamental operation of the algorithm executes. This operation will
be the unit of work. For searches and sorts, the operation is a
comparison of two values. For numeric algorithms, typically a
multiplication and/or division is the "countable" operation. We want
an algebraic function (one of those f(x) things) that relates the
input size (or problem size, or amount of data) to the number of
operations executed (x is the input size). The big-O notation allows
us to disregard multiplicative factors and lesser terms in the actual
analyzed function and just talk about the dominant term. Most
practical algorithms are in one of the "bins" (functions) of figure
3.13 and table 3.3. Note that the last row with N=256 is still a
small value, most real-world programs deal with much larger amounts of
data, hence the critical need to find efficient algorithms (i.e. ones
with smaller rate of growth function), if indeed there can be any
(some problems require at least a minimum amount of work in order to
be solved).
The general discussion is then made concrete by applying it to the
member functions of the sorted and unsorted list classes. Note in
particular the difference between linear (sequential) search and
binary search. Sequential search takes time proportional to the
number of items in the list, it is an O(N) "order n" algorithm.
Binary search takes time proportional to the logarithm of the number
of items in the list, it is an O(lg N) "order log n" algorithm. lg
means log to the base 2 instead of the more common log to the base
10.
N O(N) O(lg N)
-------- ---- ------
100 100 7
1000 1000 10
1000000 1000000 20
billion billion 30
trillion trillion 40
Moral: an O(lg N) algorithm takes almost no time to run.
By using a cleverer algorithm (and a sorted list on which to operate),
searching can be done vastly more quickly.