Sequential and binary search.
A list that's not in any order can not be efficiently searched.
Basically you have to look at all the items until either you find what
you're looking for or you get to the end of the list. This is called
sequential search. With small lists the time it takes is irrelevant,
but many applications have huge lists that must be searched a lot.
Say you've got one million items, and they need to be searched one
thousand times every second. The searching is going to take up a lot
of CPU time, whatever the mega/giga hertz. However, if the list is in
sorted order, and it's stored in an array (i.e. the list is
implemented with an array, like ArrayList is), you can take advantage
of the power of array indexing to search very efficiently. Because
arrays are laid out sequentially in memory, element access via an
index is just a simple memory address calculation. Any element of an
array can be accessed in the same time as any other. You're not
limited to sequential access, you can access the elements in any order
or pattern. The binary search repeatedly (or recursively) accesses
the middle element of (sub)arrays that are half the size of the
(sub)array just previously looked at. This allows searching in
essentially no time at all.
The contains() method of the ArrayList class does sequential search:
myArrayList.contains(someObject) returns boolean.
The indexOf() method of the ArrayList class also does sequential search:
myArrayList.indexOf(someObject) returns index of where found...
The Collections class has a static method for doing binary search on a
sorted ArrayList (if it's not sorted the answer is undefined, meaning junk):
Collections.binarySearch(myArrayList,someObject)
returns index of where found, or negative number if not found
Run the SearchArrayListTest.java program to get a feel for the
efficiency difference between sequential and binary search. Binary
search is vastly less time-consuming, how much so is discussed
below. The drawback of binary search is that the list must be in
sorted order...
Analyzing the efficiency of algorithms is an important topic in
computer science. The discussion in the book in sections 24.1-24.4 is a good
introduction to it. Once an algorithm is correct, we can worry about
its efficiency. Timings are too dependent on CPU, operating system,
programming language, compiler and are not mathematical enough so they
are not used. Basically we are just counting the number of times some
fundamental operation of the algorithm executes. This operation will
be the unit of work. For searches and sorts, the operation is a
comparison of two values. For numeric algorithms, typically a
multiplication and/or division is the "countable" operation. We want
an algebraic function (one of those f(x) things) that relates the
input size (or problem size, or amount of data) to the number of
operations executed (x is the input size). The big-O notation allows
us to disregard multiplicative factors and lesser terms in the actual
analyzed function and just talk about the dominant term. Most
practical algorithms are in one of the "bins" (functions) of table 24.1
Most real-world programs deal with much larger amounts of
data, hence the critical need to find efficient algorithms (i.e. ones
with smaller rate of growth function), if indeed there can be any
(some problems require at least a minimum amount of work in order to be solved).
Sequential search takes time proportional to the number of
items in the list, it is an O(N) "order n" algorithm. Binary search
takes time proportional to the logarithm of the number of items in the
list, it is an O(lg N) "order log n" algorithm. lg means log to the
base 2 instead of the more common log to the base 10.
N O(N) O(lg N)
-------- ---- ------
100 100 7
1000 1000 10
1000000 1000000 20
billion billion 30
trillion trillion 40
Example: with one million items, sequential search uses "on the order
of" one million operations whereas binary search only requres about 20 operations.
Moral: an O(lg N) algorithm takes almost no time to run.
By using a cleverer algorithm (and a sorted list on which to operate),
searching can be done vastly more quickly.