A. P. Statistics
6/20/00
1. Exploring Date: Observing patterns and departures from patterns
Exploratory analysis of data makes use of graphical and numerical techniques to study patterns and departures from patterns. Emphasis should be placed on interpreting information from graphical and numerical displays and summaries.
A. Interpreting graphical displays of distributions of univariate data (dotplot, stemplot, histogram)
- 1. Center and spread
- 2. Cluster and gaps
- 3. Outliers and other unusual features
B. Summarizing distributions of univariate data
- 1. Measuring center: median, mean
- 2. Measuring spread: range, interquartile range, standard deviation
- 3. Measuring position: quartiles, percentiles, standardized scores (z-scores)
- 4. Using boxplots
- 5. The effect of changing units on summary measures
C. Comparing distributions of univariate data, dotplots, back-to-back stemplots, parallel boxplots)
- 1. Comparing center and spread: within group, between group variation
- 2. Comparing clusters and gaps
- 3. Comparing outiers and other unusual features
- 4. Comparing shapes
D. Exploring bivariate data
- 1. Analyzing pattern in scatterplots
- 2. Correlation and linearity
- 3. Least squares regression line
- 4. Residual plots, outliers, and influential points
- 5. Transformations to achieve linearity; logarithmic and power transformation
E. Exploring categorical data: frequency tables
- 1. Marginal and joint frequencies and association
- 2. Conditional relative frequencies and association
2. Planning a Study
Data must be collected according to a well-developed plan if valid information on a conjecture is to be obtained. This plan includes clarifying the question and deciding upon a method of data collection and analysis.
A. Overview of methods of data collection
- 1. Census
- 2. Sample survey
- 3. Experiment
- 4. Observational surveys
B. Planning and conducting surveys
- 1. Simple random sampling
- 2. Characteristics of a well-designed and conducted survey
- 3. Sampling error: the variation inherent in a survey
- 4. Sources of bias in surveys
- 5. Stratifying to reduce variation
C. Planning and conducting experiments
- 1. Experiments versus observational studies versus surveys
- 2. Confounding, control groups, placebo effects, blinding
- 3. Treatments, experimental units, and randomization
- 4. Completely randomized design for two treatments
- 5. Randomized paired comparison design
- 6. Replication, blocking, and generalizability of results
3. Anticipating Patterns: Producing models using probability and simulation
Probability is the tool used for anticipating what the distribution of data should look like under a given model.
A. Probability as relative frequency
- 1. "Law of large numbers" concept
- 2. Addition rule, multiplication rule, conditional probabilities and independence
- 3. Discrete random variables and their probability distributions
- 4. Simulation of probability distributions, including binomial and geometric
- 5. Mean (expected value) and standard deviation of a random variable, including binomial
B. Combining independent random variables
- 1. Notion of independence versus dependence
- 2. Mean and standard deviation for sums and differences of independent random variables
C. The normal distribution
- 1. Properties of the normal distribution
- 2. Using tables of the normal distribution
- 3. The normal distribution as a model for measurements
D. Simulating sampling distribution
- 1. Sampling distribution of a sample proportion
- 2. Sampling distribution of a sample mean
- 3. Central Limit Theorem
- 4. Sampling distribution of a difference between two independent sample proportions
- 5. Sampling distribution of a difference between two independent sample means
4. Statistical Inference: Confirming models
Statistical inference guides the selection of appropriate models.
A. Confidence intervals
- 1. The meaning of a confidence interval
- 2. Large sample confidence interval for a proportion
- 3. Large sample confidence interval for a mean
- 4. Large sample confidence interval for a difference between two proportions
- 5. Large sample confidence interval for a difference between two means (unpaired and paired)
- 6. Chi-square test for goodness of fit, homogeneity of proportions and independence
B. Special case of normally distributed data
- 1. t-distribution
- 2. Single sample t procedures
- 3. Two sample (independent and matched pairs t procedures
- 4. Inference for slope of least squares line
