Statistics Flashcards
Edexcel GCSE Statistics (1ST0)
Ready to master these flashcards?
Sign in to study with spaced repetition and track your progress.
Sign In to Track ProgressTerms in this set (125)
Sampling
The process of selecting a subset of individuals or items from a population to represent the whole population.
Population
The entire group of individuals or items that data could be collected from.
Sample
A subset of the population chosen for data collection.
Random sampling
A sampling method where every member of the population has an equal chance of being selected.
Systematic sampling
A sampling method where members are selected at regular intervals from an ordered list.
Stratified sampling
A sampling method where the population is divided into groups (strata) and a proportional sample is taken from each group.
Quota sampling
A sampling method where researchers select a sample that reflects certain characteristics of the population.
Bias in sampling
Occurs when the sample does not accurately represent the population, leading to misleading results.
Sample size
The number of individuals or items included in the sample.
Qualitative data
Data that is descriptive and non-numerical, such as colours or opinions.
Quantitative data
Data that is numerical and can be measured, such as height or temperature.
Discrete data
Quantitative data that can only take specific values, such as the number of people.
Continuous data
Quantitative data that can take any value within a range, such as weight or time.
Primary data
Data collected directly by the researcher for a specific purpose.
Secondary data
Data that has been collected by someone else and is used by the researcher.
Bar chart
A graph that uses bars to show frequencies or values for different categories.
Pie chart
A circular chart divided into sectors, showing proportions of a whole.
Histogram
A graph showing frequencies of continuous data, with bars representing intervals.
Scatter diagram
A graph showing the relationship between two variables using plotted points.
Line graph
A graph that uses lines to show trends or changes over time.
Stem-and-leaf diagram
A way of organising numerical data to show its distribution.
Frequency polygon
A graph that connects the midpoints of histogram bars with straight lines.
Cumulative frequency graph
A graph showing the running total of frequencies, used to find medians and quartiles.
Mean
Mean = Sum of values ÷ Number of values.
Median position
Median position = (n + 1) ÷ 2, where n is the number of values.
Range
Range = Largest value - Smallest value.
Interquartile range (IQR)
IQR = Upper quartile - Lower quartile.
Frequency density
Frequency density = Frequency ÷ Class width (used in histograms).
Estimated mean
Estimated mean = (Σfx) ÷ Σf, where f is frequency and x is midpoint.
Standard deviation
Standard deviation = √(Σ(x - mean)² ÷ n), where x is each value and n is the number of values.
Outlier
A data value that is significantly different from the rest of the data set.
Identifying outliers (IQR method)
Outliers are values below Q1 - 1.5 × IQR or above Q3 + 1.5 × IQR.
Impact of outliers
Outliers can affect measures of central tendency and spread, such as the mean and range.
Scatter diagram
A graph showing the relationship between two variables using plotted points.
Positive correlation
A relationship where as one variable increases, the other also increases.
Negative correlation
A relationship where as one variable increases, the other decreases.
No correlation
No apparent relationship between the two variables.
Line of best fit
A straight line drawn on a scatter diagram to show the general trend of the data.
Interpolation
Estimating values within the range of the data using the line of best fit.
Extrapolation
Estimating values outside the range of the data using the line of best fit.
Spearman's Rank Correlation Coefficient (SRCC)
A measure of the strength and direction of a relationship between two ranked variables.
Pearson's Product-Moment Correlation Coefficient (PMCC)
A measure of the strength and direction of a linear relationship between two variables.
SRCC range
Values range from -1 to +1, where -1 indicates perfect negative correlation and +1 indicates perfect positive correlation.
PMCC range
Values range from -1 to +1, where -1 indicates perfect negative linear correlation and +1 indicates perfect positive linear correlation.
Difference between SRCC and PMCC
SRCC is used for ranked data, while PMCC is used for numerical data with a linear relationship.
Qualitative data
Data that is descriptive and non-numerical, such as colours or opinions.
Quantitative data
Data that is numerical and can be measured, such as height or temperature.
Discrete data
Quantitative data that can only take specific values, such as the number of people.
Continuous data
Quantitative data that can take any value within a range, such as weight or time.
Primary data
Data collected directly by the researcher for a specific purpose.
Secondary data
Data that has been collected by someone else and is used by the researcher.
Categorical data
Data that can be grouped into categories, such as eye colour or car brands.
Ordinal data
Data that can be ordered or ranked, such as exam grades or survey responses.
Random sampling - Positive
Unbiased as every member of the population has an equal chance of being selected.
Random sampling - Negative
Can be time-consuming and difficult to achieve if the population is large.
Systematic sampling - Positive
Quick and easy to implement, especially with an ordered list.
Systematic sampling - Negative
Can introduce bias if the list has a hidden pattern.
Stratified sampling - Positive
Ensures all groups in the population are represented proportionally.
Stratified sampling - Negative
Time-consuming and requires detailed information about the population.
Quota sampling - Positive
Quick and easy to carry out, and ensures representation of specific groups.
Quota sampling - Negative
Can be biased as it relies on the researcher’s judgement to select participants.
Weighted mean
A mean where different values are given different weights based on their importance or frequency.
Weighted mean formula
Weighted mean = (Σwx) ÷ Σw, where w is the weight and x is the value.
Purpose of weighted mean
Used when some values contribute more to the mean than others, such as in grouped data.
Pie chart
A circular chart divided into sectors, showing proportions of a whole.
Comparative pie chart
Two or more pie charts used to compare data sets, with areas proportional to the total frequencies.
Angle calculation for pie chart
Angle = (Frequency ÷ Total frequency) × 360°.
Purpose of comparative pie charts
Used to compare proportions and totals between different data sets.
Advantage of pie charts
Visually shows proportions and is easy to interpret.
Disadvantage of pie charts
Does not show exact values and can be hard to compare small differences.
Probability
The likelihood of an event occurring, expressed as a number between 0 and 1.
Probability formula
Probability = Number of favourable outcomes ÷ Total number of possible outcomes.
Mutually exclusive events
Events that cannot happen at the same time, e.g., rolling a 3 or a 4 on a die.
Complementary events
Events where the sum of their probabilities equals 1, e.g., heads and tails in a coin toss.
Independent events
Events where the outcome of one does not affect the outcome of the other.
Dependent events
Events where the outcome of one affects the outcome of the other.
Addition rule for probability
P(A or B) = P(A) + P(B) for mutually exclusive events.
Multiplication rule for probability
P(A and B) = P(A) × P(B) for independent events.
Measures of central tendency
Mean, median, and mode are used to compare the average values of data sets.
Measures of spread
Range, interquartile range (IQR), and standard deviation are used to compare the variability of data sets.
Box plots
Used to compare medians, ranges, and IQRs visually between data sets.
Cumulative frequency graphs
Used to compare distributions and find medians and quartiles of data sets.
Histograms
Used to compare the frequency distribution of continuous data between data sets.
Scatter diagrams
Used to compare relationships between two variables in different data sets.
Key considerations when comparing
Consider sample size, measures of central tendency, and spread to make fair comparisons.
Index number
A measure that shows how a value has changed compared to a base value, often expressed as a percentage.
Index number formula
Index number = (Value ÷ Base value) × 100.
Base year
The year or time period used as a reference point for index numbers.
Purpose of index numbers
Used to compare changes in data over time, such as prices or production levels.
Advantages of index numbers
Simplifies comparisons over time and highlights trends.
Disadvantages of index numbers
Does not show absolute values and can be affected by changes in the base year.
Probability distribution
A table or formula showing all possible outcomes of an event and their probabilities.
Discrete probability distribution
A probability distribution where outcomes are distinct and countable, e.g., rolling a die.
Continuous probability distribution
A probability distribution where outcomes can take any value within a range, e.g., heights.
Uniform distribution
A distribution where all outcomes have equal probabilities.
Binomial distribution
A discrete probability distribution for events with two outcomes, e.g., success or failure.
Key property of probability distributions
The sum of all probabilities in a distribution equals 1.
Time series
A sequence of data points measured at successive time intervals.
Trend
The general direction in which data points move over time, e.g., increasing or decreasing.
Seasonal variation
Regular patterns in data that repeat over specific time periods, e.g., monthly sales.
Moving average
A method to smooth out fluctuations in time series data to identify trends.
Purpose of time series analysis
Used to identify trends, seasonal variations, and make predictions.
Time series graph
A line graph used to display data points over time, showing trends and patterns.
Independent events
Events where the outcome of one does not affect the outcome of the other.
Multiplication rule for independent events
P(A and B) = P(A) × P(B).
Example of independent events
Flipping a coin and rolling a die are independent because one does not affect the other.
Key property of independent events
The probability of one event occurring is the same regardless of whether the other event occurs.
Conditional probability
The probability of an event occurring given that another event has already occurred.
Conditional probability formula
P(A | B) = P(A and B) ÷ P(B), where P(A | B) is the probability of A given B.
Key property of conditional probability
The probability of one event depends on the occurrence of another event.
Example of conditional probability
The probability of drawing a red card given that the card drawn is a heart.
Normal distribution
A continuous probability distribution that is symmetric and bell-shaped.
Binomial distribution
A discrete probability distribution for events with two outcomes, e.g., success or failure.
Key property of normal distribution
Most values cluster around the mean, with fewer values at the extremes.
Key property of binomial distribution
Defined by the number of trials (n) and the probability of success (p).
Difference: normal vs binomial
Normal is continuous and symmetric; binomial is discrete and based on trials.
Example of normal distribution
Heights of people or exam scores in a large population.
Example of binomial distribution
Flipping a coin multiple times and counting the number of heads.
Scatter diagram
A graph showing the relationship between two variables using plotted points.
Positive correlation
A relationship where as one variable increases, the other also increases.
Negative correlation
A relationship where as one variable increases, the other decreases.
No correlation
No apparent relationship between the two variables.
Line of best fit
A straight line drawn on a scatter diagram to show the general trend of the data.
Interpolation
Estimating values within the range of the data using the line of best fit.
Extrapolation
Estimating values outside the range of the data using the line of best fit.
Want to Learn More?
Get personalised lessons, quizzes, and instant feedback from your AI tutor.
Start Learning