## Thursday, June 11, 2020

### Data Set Analysis Statistics Project - 1100 Words

Data Set Analysis (Statistics Project Sample) Content: PART 1Question 1The unit of analysis in this dataset is sampled US counties.Question 2CountyÃƒ ¢Ã¢â€š ¬Ã‹Å"CountyÃƒ ¢Ã¢â€š ¬ is an independent nominal variable. It gives identity and classification of the study population.YearYear is a quantitative independent variable. It is an interval variable which is continuous.Dem. PctThis is a quantitative ratio variable.PresÃƒ ¢Ã¢â€š ¬Ã‹Å"PresÃƒ ¢Ã¢â€š ¬ is a categorical dichotomous distribution. It is an independent sample in the dataset.TurnoutIt is a quantitative ratio variable.ArrestsArrests is a quantitative and independent ratio variable.UrbanThis is an independent and categorical nominal variable.Question 3While reported arrests may be a valid measure of crime, it is not reliable. This is due to the fact that the number of arrests can predict the trend in crime. However, arrests between different counties depend on the efficiency of the security personnel in the particular county. This means that it is possible for coun ty A to have more reported arrests than county B whereas county B has more criminal activities than county A. Therefore the number of reported arrested may not show a clear comparison between two or more counties hence unreliable. The number of reported arrests is valid because the reported arrests are part of the crimes and therefore, an increase/decrease in crime in a given county will lead to a proportionate increase/decrease in the number of reported arrests.Question 4The unit of analysis for this dataset is the sampled US registered voters.Question 5ParticipantÃƒ ¢Ã¢â€š ¬Ã‹Å"ParticipantÃƒ ¢Ã¢â€š ¬ is an independent and categorical nominal variable.PartyÃƒ ¢Ã¢â€š ¬Ã‹Å"PartyÃƒ ¢Ã¢â€š ¬ is also an independent and categorical nominal variableCrimeÃƒ ¢Ã¢â€š ¬Ã‹Å"CrimeÃƒ ¢Ã¢â€š ¬ is a categorical ordinal variable.ObamaThis is a categorical ordinal variable.ClintonÃƒ ¢Ã¢â€š ¬Ã‹Å"ClintonÃƒ ¢Ã¢â€š ¬ is a categorical ordinal variable.IncomeThis is quantitative ratio variable.GayÃƒ ¢Ã¢â€š ¬Ã‹Å"GayÃƒ ¢Ã¢â€š ¬ is a categorical dichotomous variable.FrackÃƒ ¢Ã¢â€š ¬Ã‹Å"FrackÃƒ ¢Ã¢â€š ¬ is a categorical dichotomous variable.Question 6The pollster needs to obtain a fair representation of the whole population. This can be done by obtaining the sample of 600 individuals from across the population. This ensures that all heterogeneous characteristics of the population are represented in the sample. To avoid bias which may affect reliability, the pollster needs to randomly select samples from the homogeneous subgroups of the population.The data should be collected at relatively the same time period to reduce chances of significant changes in variable values and attributes.The sample of 600 participants must be a fair representation of the population. The sample size must be sufficient to generalize its characteristics to the characteristics of the whole population.Question 7Graph 1Mean=0Median=0Mode=0Question 8Graph 2MeanÃƒ ¢30MedianÃƒ ¢25Mode=20Question 9Graph 3MeanÃƒ ¢ 3Medi anÃƒ ¢ 4Mode= 5Question 10Graph 4MeanÃƒ ¢ 0MedianÃƒ ¢ 0Mode= -1 and 1Question 11Rank from strongest negative correlation to strongest positive correlation 1 Graph 8 2 Graph 6 3 Graph 5 4 Graph 7Question 12Correlation coefficient (PearsonÃƒ ¢Ã¢â€š ¬s R) for graph 6 is zero. This is because the graph is a composition of two curves. One curve has a positive correlation coefficient while the other has a negative correlation coefficient. The slope of both curves is relatively similar and therefore both curves have relatively equal absolute values of correlation coefficients which sum up to zero.Question 13The average score of Penn students (+1) has a substantive significant difference from the average score of Drexel students (-0.5). From the information given it is not possible to determine whether both scores have significant statistical difference. To determine the statistical significance difference, more information needs to be provided about the variance or standard deviation of th e students from the mean score as well as the sizes of sample and/or population obtained from Penn and Drexel students. The distribution of the scores determines whether the means of the mean scores have statistical significant differences or not.Question 14The measures chosen by Professor X are neither reliable nor valid. There lacks any relationship between Ãƒ ¢Ã¢â€š ¬Ã‹Å"repressÃƒ ¢Ã¢â€š ¬ and Ãƒ ¢Ã¢â€š ¬Ã‹Å"protestsÃƒ ¢Ã¢â€š ¬ as illustrated by the graph. This is invalid because the independent variable is not fully independent in that the dependent variable has a causal effect on the independent variable. This gives rise to multi-correlation of the dependent and independent variables. This gives rise to unreliable and invalid results due to the large errors associated with the multi-correlation of the variables.On another note, the professor has drawn wrong inference from the bivariate relationship graph which in turn leads to erroneous analysis of the data. The graph does not show graph does not show a strong positive relationship between the variables. It shows zero relationship between the variables and therefore the professor was not supposed to regress variables that did not possess any linear relationship.In order to test the hypotheses postulated by the other professors and the student, Professor X needed to perform independent analyses for each hypothesis in order to arrive at reliable conclusions of hypothesis under test. This would have be...