CMIS 320
Name:________________
Use the 1,000,000 record 33MB "j1000000.txt" data file.
The j1000000.txt file is available at http://192.168.x.yyy on the "UMUC WiFi" wireless.
Its fields: Type Name
integer<2G, int "bignum"
3 pseudo-Japanese di-syllables, char(6) "bigword"
integer<100, int(2) "smallnum"
A-F, char(1) "grade" or "letter"
double<=1.0, real(5,4) "realnum"
2 pseudo-Japanese di-syllables char(4) "smallword"
The data file has no nulls, but we can dispense with NOT NULL.
We'll skip a PRIMARY KEY too.
Paste your table's description:
Paste the Time to Load the data file: There should be no errors or warnings.
Answer these queries, notice the time taken for each. Use one query each.
1. Do this query and note the time it takes to process. How many rows are there?
SELECT Count(*) FROM japan;
2. Become familiar with the basic statistics of the data:
Do a query for each numeric column:
what is the sum, average, min, max, and number of different values?
Do a query for each string column:
what is the min, max, and number of different values?
3. Display the records whose 6-letter word starts with zu, small
integer is between 50 and 60, grade is F, real number is less than .3
and 4-letter word starts with tu (i.e. all these ANDed):
Moral of this example: one million is a lot. If the display is 4
records per inch, then one million records is about 4 miles long.
If 60 records are printed per page, then one million records is about 33
reams of paper (500 pages per ream), which laid end to end would be almost
3 miles long (about Foster to Kadena on Route 58),
stacked would be five or six feet high.
A better way to think about one million is to look at it.
Using a clock or stopwatch, record how much real time it takes to display
the one million rows:__________ (This will take a long time, so continue in another CMD.)
The result status is the time the server uses to execute the query.
Dividing 1,000,000 by this time in seconds yields a rows per second speed:______
4. What is the average number of rows that a bigword value is in
(i.e. the average number of times each bigword occurs;
this does not use the AVG function;
it is the number of rows divided by the number of different bigwords):
What is the average number of times a smallword value is used in the table
5. Display the record with the largest bignum value: use a value from a previous query
6. The first field (bignum) is an integer between 0 and 2G.
There are 1 million rows of these integers in the table.
One million is only 1/2000 of 2G (i.e. 0.05%), a very small percentage.
But are the one million values unique?
How many different bignums are there: (from a previous query)
a. How many of the bignum values occur more than once:
Hint: count(*)-count(distinct bignum)
b. Anwer this by pasting the status result of a Group By bignum Having count(*)>1
The time is interesting.
7. How many of the records has its 4-letter word as the prefix of its 6-letter word:
8. What is the grade distribution (i.e. how many of each grade):
All this data is randomly generated by a program.
What is the percentage of the total number of rows that each grade is
(use the fact that there are 1000000 rows directly in the query's expressions).
Use the Round function to get 8 digits of precision in the percentage distribution.
(the grade data is "uniformly distributed", each grade is equally likely).
9. The real number column's values are not uniformly distributed between 0 and 1.
If they were the average would be .5 instead of the value you got above.
a. How many different values are in this column:
b. What is the frequency distribution of these real number values?
c. Using a WHERE clause, restrict this (b.'s) counting to the real number values greater than
or equal to .2:
d. Using a HAVING clause, restrict this (b.'s) counting to the real number values greater than
or equal to .2:
Timewise which is more efficient, using the Where or using the Having?
17. How many combinations of smallnum and grade are there? Resultset is one row.
18. Of those combinations, which occurs the most (and how many times is that)?
19. Of those combinations, which occurs the least (and how many times is that)?
select * from japan where bignum=574181552;
Note the time used:
Add an Index:
alter table japan Add INDEX (bignum);
Time takes to do this:
select * from japan where bignum=574181552;
Note the time used:
What does Describe table indicate in Key column that bignum is:
Do 6b again and show the new time it takes:
Delete all the rows of the table (with the Index). Paste the status result.
Load the data file. With the Index in place, how does the load time compare
to loading the data without the index?
Drop the Index.
alter table japan Drop INDEX bignum;
Time?
Delete all the rows of the table (without the Index). Paste the status result.