CMIS 320 Name:_______________________ The counties2010.txt file contains data for the 3143 "counties" in the USA. This data is from the 2010 census. Its fields are comma-delimited and the lines are CR LF (i.e. DOS format) terminated. The first line of the file contains field names. It can be skipped in the LOAD DATA command with a IGNORE 1 LINES clause (yes, lineS) after the "TERMINATED" clauses. Fips code, name, state, population, income per capita, median household income, and area in square miles are the fields. The first field in the file, "fips", is the official code for the county; it is a four or five digit string; it is unique for each county and so can be the Primary Key for the table. The longest name is 22 characters. State is always the two character postal abbreviation. Population is a positive Integer (hint: Unsigned). The two income fields are also positive integers. Area is a real number; Float can be the datatype. There is no missing or unknown data and we don't want to allow any Nulls in the table. Washington DC is included. Puerto Rico, Guam, Virgin Islands, Johnson Atoll, Okinawa, and Afghanistan are not included. Louisana calls them parishes but they're here. Alaska calls them boroughes but they're here. Virginia has many towns/cities that they annoyingly classify as counties; their name is appended with " city". Paste your table's Description here: Paste the result of the LOAD DATA command: Answer these queries. Paste the SQL and the resultset. 1 -number of rows. There should be 3143. Always verify you have the data you're suposed to have. 2 -make one query that will find any Nulls in any (i.e. all) fields. In other words, verify that there are no Nulls in the table. This is a typical kind of data validation query. Verify that your loaded data is corect! 3 -query for duplicate primary keys. Hint: Group By the primary key field HAVING count(*)>1 Another typical data validation query. 4 -number of states. 5 -number of different county names. 6 -min, max, sum, avg, and std of population. 7 -min, max, sum, avg, and std of area. 7b -the three-letter names, in alphabetical order, not duplicated. 8 -name, state and density (population per square mile) of the 10 most densely populated counties, in decreasing order. Round the density to nearest integer and call it 'density'. 9 -population densities of the five least densely populated states. To determine a state's density, divide the sum of its population by the sum of its area. 10 -the state with the most counties, and that number of counties. 11 -income and percapita income of each state. A county's income is its population times its perCapitaIncome. A state's income is the sum of its counties' incomes. A state's percapita income is its income divided by the sum of its population. 12 -number of counties whose names contain 'king'. 13 -the counties whose name starts with 'King'. 14 -the counties whose name starts with 'King ' (notice the space) or is 'King'. 15 -the ten most common county names, in order of occurence 16 -the counties that have 13 or more letters in their name and no other characters like spaces or dashes. In other words, the long-named single-word counties. 17 -the data of your home county. 18 -summary of the country's number of counties, population, area, density, income, income per capita. Use nice aliases for calculated columns. Round to integer. 19 -summary by each state's number of counties, population, area, density, income, income per capita. Use nice aliases for calculated columns. Round to integer. SKIP: Correlation between pop and incomePerCapita: select (count(*)*sum(pop*perCap) - sum(pop)*sum(perCap)) / (sqrt(count(*)*sum(pow(pop,2))-pow(sum(pop),2)) * sqrt(count(*)*sum(pow(perCap,2))-pow(sum(perCap),2))) as r from counties2010; +--------------------+ | r | +--------------------+ | 0.2738087844906263 | +--------------------+ a bit correlated Correlation between incomePerCapita and householdIncome: select (count(*)*sum(house*perCap) - sum(house)*sum(perCap)) / (sqrt(count(*)*sum(pow(house,2))-pow(sum(house),2)) * sqrt(count(*)*sum(pow(perCap,2))-pow(sum(perCap),2))) as r from counties2010; +--------------------+ | r | +--------------------+ | 0.8730494476810672 | +--------------------+ very high Correlation between incomePerCapita and area: select (count(*)*sum(area*perCap) - sum(area)*sum(perCap)) / (sqrt(count(*)*sum(pow(area,2))-pow(sum(area),2)) * sqrt(count(*)*sum(pow(perCap,2))-pow(sum(perCap),2))) as r from counties2010; +----------------------+ | r | +----------------------+ | 0.016259140723668453 | +----------------------+ none