CMIS 320
Name:________________________

wordsMysql.txt contains a tab-delimited DOS-file line formatted
list of vocabulary words. Each line has a word, its definition and
its year of appearance in a certain list.
No word is more than 17 characters, no definition longer than 150.
No field is suitable as a priary key, and don't bother with making one.
Create an appropriate table and load the data into it.
There is no missing word but a few missing definitions (actualy, are blank, not null).


1.The number of rows, the number of distinct words, number of distinct definitions,
 number of distinct word and definition combinations,
 and the average number of times a word appears.

2.The average and standard deviation (stddev function) of word length, 
 also the min and max lengths

3.The (distinct) 3-letter words, in alphabetical order.

4.The distinct words of 15 or more characters in descending length order.

5.The frequency distribution of word lengths.

6.The frequency distribution of first letter of word.

7.The number of words, the number of distinct words, the number of duplicate words,
for each year.

7.a.The average  and standard deviation of word length, also the min and max lengths
for each year.

b.Repeat the query adding WITH ROLLUP which adds a summary line. 
Paste the NULL line:

8.The duplicate words of 2006.

9.The words that occur six or more times.

10.The median word (sort in alphabetical order and use the fact that there are 4871
words to display the one in the middle)

11.Seeing that there are a few fewer words than definitions,
it must be the case that some definitions are used for more than one word.
Run these queries and paste the resultsets.

 Select definition 
  From (Select word,definition From word Group By word,definition) As wd 
  Group By definition Having count(*)>1;

 Select definition,Group_Concat(word)
  From (Select word,definition From word Group By word,definition) As wd 
  Group By definition Having count(*)>1;

12.How many rows in the Cartesian product of the table with itself:___________

13.Join the table to itself where the word is a proper prefix of another word
i.e. matches if word1=leftmost length-of-word1 chars of word2.
and exclude the word matching itself, i.e. word1!=word2
Aliases are required for both tables. The word field must be qualified by tablename.
Use distinct to eliminate the duplicates.
Sort by word1. display the prefix word and its matching "expansion".



14.Create a table with one field, a varchar(16).
Load the commonestwords.txt file into the table.
This file contains the commonest words of English.

15.Which common words are duplicated.

16.Which words are in both tables.

17.How many distinct examples are there of a common word being a 
prefix of the other common words?

18.How many of the common words are prefixes of the vocabulary words?

19.Average length of the common words.

20.The definitions of the vocabulary words consist of words separated by single space,
so use the technique of #10 of the Bible table homework to find the
total number of words in the definitions, 
and the average number of words of the definitions.


******************** SKIP to end *************************************************
SKIP: The number of definitions that are used more than once:
Select count(*) 
 From (Select count(*) From word Group By definition Having count(*)>1) As dc;

SKIP: the sum of the number of times the multiply-used definitions are used.
Select sum(dupDefs) 
 From (Select count(*) As dupDefs From word Group By definition Having count(*)>1) As dc;

SKIP: of the rows of words with no definition, are there any other rows with the 
same word but with a definition
 Select * From word Where word=Any(Select word From word Where definition="");
slow...  but aren't any :(