Statistics Flashcards

Edexcel GCSE Statistics (1ST0)

Sampling

The process of selecting a subset of individuals or items from a population to represent the whole population.

1 / 125

Created by Millionaire

125 Cards

Ready to master these flashcards?

Terms in this set (125)

Sampling

The process of selecting a subset of individuals or items from a population to represent the whole population.

Population

The entire group of individuals or items that data could be collected from.

Sample

A subset of the population chosen for data collection.

Random sampling

A sampling method where every member of the population has an equal chance of being selected.

Systematic sampling

A sampling method where members are selected at regular intervals from an ordered list.

Stratified sampling

A sampling method where the population is divided into groups (strata) and a proportional sample is taken from each group.

Quota sampling

A sampling method where researchers select a sample that reflects certain characteristics of the population.

Bias in sampling

Occurs when the sample does not accurately represent the population, leading to misleading results.

Sample size

The number of individuals or items included in the sample.

Qualitative data

Data that is descriptive and non-numerical, such as colours or opinions.

Quantitative data

Data that is numerical and can be measured, such as height or temperature.

Discrete data

Quantitative data that can only take specific values, such as the number of people.

Continuous data

Quantitative data that can take any value within a range, such as weight or time.

Primary data

Data collected directly by the researcher for a specific purpose.

Secondary data

Data that has been collected by someone else and is used by the researcher.

Bar chart

A graph that uses bars to show frequencies or values for different categories.

Pie chart

A circular chart divided into sectors, showing proportions of a whole.

Histogram

A graph showing frequencies of continuous data, with bars representing intervals.

Scatter diagram

A graph showing the relationship between two variables using plotted points.

Line graph

A graph that uses lines to show trends or changes over time.

Stem-and-leaf diagram

A way of organising numerical data to show its distribution.

Frequency polygon

A graph that connects the midpoints of histogram bars with straight lines.

Cumulative frequency graph

A graph showing the running total of frequencies, used to find medians and quartiles.

Mean

Mean = Sum of values ÷ Number of values.

Median position

Median position = (n + 1) ÷ 2, where n is the number of values.

Range

Range = Largest value - Smallest value.

Interquartile range (IQR)

IQR = Upper quartile - Lower quartile.

Frequency density

Frequency density = Frequency ÷ Class width (used in histograms).

Estimated mean

Estimated mean = (Σfx) ÷ Σf, where f is frequency and x is midpoint.

Standard deviation

Standard deviation = √(Σ(x - mean)² ÷ n), where x is each value and n is the number of values.

Outlier

A data value that is significantly different from the rest of the data set.

Identifying outliers (IQR method)

Outliers are values below Q1 - 1.5 × IQR or above Q3 + 1.5 × IQR.

Impact of outliers

Outliers can affect measures of central tendency and spread, such as the mean and range.

Scatter diagram

A graph showing the relationship between two variables using plotted points.

Positive correlation

A relationship where as one variable increases, the other also increases.

Negative correlation

A relationship where as one variable increases, the other decreases.

No correlation

No apparent relationship between the two variables.

Line of best fit

A straight line drawn on a scatter diagram to show the general trend of the data.

Interpolation

Estimating values within the range of the data using the line of best fit.

Extrapolation

Estimating values outside the range of the data using the line of best fit.

Spearman's Rank Correlation Coefficient (SRCC)

A measure of the strength and direction of a relationship between two ranked variables.

Pearson's Product-Moment Correlation Coefficient (PMCC)

A measure of the strength and direction of a linear relationship between two variables.

SRCC range

Values range from -1 to +1, where -1 indicates perfect negative correlation and +1 indicates perfect positive correlation.

PMCC range

Values range from -1 to +1, where -1 indicates perfect negative linear correlation and +1 indicates perfect positive linear correlation.

Difference between SRCC and PMCC

SRCC is used for ranked data, while PMCC is used for numerical data with a linear relationship.

Qualitative data

Data that is descriptive and non-numerical, such as colours or opinions.

Quantitative data

Data that is numerical and can be measured, such as height or temperature.

Discrete data

Quantitative data that can only take specific values, such as the number of people.

Continuous data

Quantitative data that can take any value within a range, such as weight or time.

Primary data

Data collected directly by the researcher for a specific purpose.

Secondary data

Data that has been collected by someone else and is used by the researcher.

Categorical data

Data that can be grouped into categories, such as eye colour or car brands.

Ordinal data

Data that can be ordered or ranked, such as exam grades or survey responses.

Random sampling - Positive

Unbiased as every member of the population has an equal chance of being selected.

Random sampling - Negative

Can be time-consuming and difficult to achieve if the population is large.

Systematic sampling - Positive

Quick and easy to implement, especially with an ordered list.

Systematic sampling - Negative

Can introduce bias if the list has a hidden pattern.

Stratified sampling - Positive

Ensures all groups in the population are represented proportionally.

Stratified sampling - Negative

Time-consuming and requires detailed information about the population.

Quota sampling - Positive

Quick and easy to carry out, and ensures representation of specific groups.

Quota sampling - Negative

Can be biased as it relies on the researcher’s judgement to select participants.

Weighted mean

A mean where different values are given different weights based on their importance or frequency.

Weighted mean formula

Weighted mean = (Σwx) ÷ Σw, where w is the weight and x is the value.

Purpose of weighted mean

Used when some values contribute more to the mean than others, such as in grouped data.

Pie chart

A circular chart divided into sectors, showing proportions of a whole.

Comparative pie chart

Two or more pie charts used to compare data sets, with areas proportional to the total frequencies.

Angle calculation for pie chart

Angle = (Frequency ÷ Total frequency) × 360°.

Purpose of comparative pie charts

Used to compare proportions and totals between different data sets.

Advantage of pie charts

Visually shows proportions and is easy to interpret.

Disadvantage of pie charts

Does not show exact values and can be hard to compare small differences.

Probability

The likelihood of an event occurring, expressed as a number between 0 and 1.

Probability formula

Probability = Number of favourable outcomes ÷ Total number of possible outcomes.

Mutually exclusive events

Events that cannot happen at the same time, e.g., rolling a 3 or a 4 on a die.

Complementary events

Events where the sum of their probabilities equals 1, e.g., heads and tails in a coin toss.

Independent events

Events where the outcome of one does not affect the outcome of the other.

Dependent events

Events where the outcome of one affects the outcome of the other.

Addition rule for probability

P(A or B) = P(A) + P(B) for mutually exclusive events.

Multiplication rule for probability

P(A and B) = P(A) × P(B) for independent events.

Measures of central tendency

Mean, median, and mode are used to compare the average values of data sets.

Measures of spread

Range, interquartile range (IQR), and standard deviation are used to compare the variability of data sets.

Box plots

Used to compare medians, ranges, and IQRs visually between data sets.

Cumulative frequency graphs

Used to compare distributions and find medians and quartiles of data sets.

Histograms

Used to compare the frequency distribution of continuous data between data sets.

Scatter diagrams

Used to compare relationships between two variables in different data sets.

Key considerations when comparing

Consider sample size, measures of central tendency, and spread to make fair comparisons.

Index number

A measure that shows how a value has changed compared to a base value, often expressed as a percentage.

Index number formula

Index number = (Value ÷ Base value) × 100.

Base year

The year or time period used as a reference point for index numbers.

Purpose of index numbers

Used to compare changes in data over time, such as prices or production levels.

Advantages of index numbers

Simplifies comparisons over time and highlights trends.

Disadvantages of index numbers

Does not show absolute values and can be affected by changes in the base year.

Probability distribution

A table or formula showing all possible outcomes of an event and their probabilities.

Discrete probability distribution

A probability distribution where outcomes are distinct and countable, e.g., rolling a die.

Continuous probability distribution

A probability distribution where outcomes can take any value within a range, e.g., heights.

Uniform distribution

A distribution where all outcomes have equal probabilities.

Binomial distribution

A discrete probability distribution for events with two outcomes, e.g., success or failure.

Key property of probability distributions

The sum of all probabilities in a distribution equals 1.

Time series

A sequence of data points measured at successive time intervals.

Trend

The general direction in which data points move over time, e.g., increasing or decreasing.

100

Seasonal variation

Regular patterns in data that repeat over specific time periods, e.g., monthly sales.

101

Moving average

A method to smooth out fluctuations in time series data to identify trends.

102

Purpose of time series analysis

Used to identify trends, seasonal variations, and make predictions.

103

Time series graph

A line graph used to display data points over time, showing trends and patterns.

104

Independent events

Events where the outcome of one does not affect the outcome of the other.

105

Multiplication rule for independent events

P(A and B) = P(A) × P(B).

106

Example of independent events

Flipping a coin and rolling a die are independent because one does not affect the other.

107

Key property of independent events

The probability of one event occurring is the same regardless of whether the other event occurs.

108

Conditional probability

The probability of an event occurring given that another event has already occurred.

109

Conditional probability formula

P(A | B) = P(A and B) ÷ P(B), where P(A | B) is the probability of A given B.

110

Key property of conditional probability

The probability of one event depends on the occurrence of another event.

111

Example of conditional probability

The probability of drawing a red card given that the card drawn is a heart.

112

Normal distribution

A continuous probability distribution that is symmetric and bell-shaped.

113

Binomial distribution

A discrete probability distribution for events with two outcomes, e.g., success or failure.

114

Key property of normal distribution

Most values cluster around the mean, with fewer values at the extremes.

115

Key property of binomial distribution

Defined by the number of trials (n) and the probability of success (p).

116

Difference: normal vs binomial

Normal is continuous and symmetric; binomial is discrete and based on trials.

117

Example of normal distribution

Heights of people or exam scores in a large population.

118

Example of binomial distribution

Flipping a coin multiple times and counting the number of heads.

119

Scatter diagram

A graph showing the relationship between two variables using plotted points.

120

Positive correlation

A relationship where as one variable increases, the other also increases.

121

Negative correlation

A relationship where as one variable increases, the other decreases.

122

No correlation

No apparent relationship between the two variables.

123

Line of best fit

A straight line drawn on a scatter diagram to show the general trend of the data.

124

Interpolation

Estimating values within the range of the data using the line of best fit.

125

Extrapolation

Estimating values outside the range of the data using the line of best fit.

Want to Learn More?

Get personalised lessons, quizzes, and instant feedback from your AI tutor.

Start Learning