“You should look at this section if you have only a limited idea of sampling methods and their various purposes.”
If the purpose of an assessment programme is to provide every student with an individual grade or score, then all students have to participate in the assessment. This is called a census and is used in a range of types of assessments such as at the classroom and school level, for formal board examinations and for university entrance examinations. In this case the score of each student only relates to that individual student.
In large scale assessment, however, it may not be necessary for every student to participate. If the goal is to measure the performance of an education system, for example a district or state or country, then assessing every student would be very expensive and time-consuming. In this case, a sample of students is selected to participate in the assessment.
This means that a group of students are selected to participate in an assessment programme to represent all students with similar characteristics. For example, a percentage of students in a state might be selected to participate in an assessment. Their performance can then be used to estimate the performance of all students in that state.
Sampling saves a lot of time and money, but it is not easy to do well. When mistakes are made in sampling, the performance of those students selected to participate in an assessment might give an inaccurate picture of all students in that education system. For example, if a sample only includes students in the best schools in a state, then the performance of those students can only represent students in similar schools, not all students in the state.
To do sampling well, three important elements need to be considered.
First, it is important to think about the purpose of the assessment programme. For example, if the mathematics skills of grade 8 students are to be measured, is the goal to compare the mathematics performance of grade 8 students between districts or between schools? Or is it important to compare the mathematics performance of grade8 boys and girls, or of grade 8 students in rural areas and in towns?
Second, how much information is available about the whole population of students in that education system. For example, are there accurate and up-to-date records of how many grade 8 boys and girls are attending each school? It is usually OK to use records from the previous academic year as a reliable estimate, but if records are a few years old then this will impact on the sampling strategy that can be used.
Third, what margin of error is acceptable. People often ask ‘how big’a sample should be to provide a good estimate of student performance. The answer to this is ‘it depends’. It depends on the purpose of the assessment programme and the quality of information available about target students. And it depends on the margin of error that is considered to be acceptable. All sampling includes error (the only way to avoid error completely is to undertake a census!).
A margin of error might be +/-3% at a 95% confidence interval. This means that if the assessment was done 100 times, the result would be within 3% of the score in 95of the 100 surveys. There is no ‘right’ answer about what margin of error or confidence interval should be used–this has to be decided by the stakeholders in charge of the survey, and usually includes a consideration of cost vs accuracy.
To find out more about how to draw a sample, go to #Intermediate.
“You should look at this section if you already know what sampling is but would like to know more details.”
It is common in large scale assessment to assess a sample of students and to use their performance to estimate the performance of all students with the same characteristics. This is done to save time and money.For example, in PISA 2018, six-thousand, three hundred15-year olds students are selected to represent all 15-year old students in each that participates in the online version of the test.
There are a lot of technical terms used in sampling. Here are an explanation of what some of them mean and why they are important:
Target population–this is a description of the characteristics of the students that the assessment focuses on and often includes a number of components. In the example of PISA, this is all students who–at the beginning of testing–are aged between 15 years 3 months and 16 years 2 months AND attend educational institutions AND that are in grade 7 or above AND that are in an education system participating in PISA. The selected sample only represents students in the target population, not other students.
Exclusions–these are students in the target population that cannot be included in the assessment for practical reasons, for example if they are severely disabled or attend such remote schools that the cost of including them would be too high. Sometimes individual students are excluded, but sometimes whole groups, for example all students at a school for disabled students.
Sampling methodology–this is the method used to choose the sample.It must relate to the purpose of the assessment programme as there are many different ways to do this. Many people think that simple random sampling(SRS) is the most common approach but in fact sampling in assessment programmes often includes stratification. This is because SRS might mean that, for example, hardly any students in a state or hardly any boys are selected. Depending on the purpose, this may be regarded as unacceptable.
Sampling size–The size of the sample will depend on a number of different elements such as the purpose of the assessment programme, the level of reporting (e.g. at school level or district level) and the margin of error that is considered acceptable.This might be something like+/-3% at a 95% confidence interval. Decisions about the margin of error need to take into account cost as well as accuracy, as–for example–increasing to a 99% confidence interval will result in the need for a much bigger sample (and higher cost).
Response rates–The higher response rates, the better a sample is at estimating the population. For large surveys such as PISA, minimum response rates of around 85% are common. This means that if a lower response rate is achieved, the data may be excluded from comparison. Non-response bias is caused when response rates are higher among some students than others, for example a bigger proportion of girls than boys participate. There should be strategies in place to avoid this as otherwise there liability and precision of the assessment results will be lessened.
Weighting-Sampling weights are used to correct for imperfections associated with the sample that might lead to bias.The purposes of weighting include to compensate for disproportional probabilities of selection of subgroups of students; to adjust for non-response; and to adjust the weighted sample distribution for key variables of interest (e.g.age, gender). Weights are used in the estimation of population characteristics of interest and also in the estimation of the sampling errors of the survey estimates generated.
To find out more about scientific sampling strategies, go to #Advanced.
“You should look at this section if you are already familiar with how to undertake sampling and would like more technical details.”
High quality, representative samples improve the reliability and precision of assessment results. Good sampling design involves a number of steps, and some of the critical ones are identified here. Many of these involve judgement calls, and the purpose of the assessment programme should be the key factor in making these decisions.
Defining the target population–This should be very precise, for example ‘Grade 5 students who are taught in the official language of state X’. The population definition should take account of the information available. If, for example, there is no up to date information on what language students are taught in, then this would not be a useful definition.
Coverage and Exclusions–It may not be possible to have complete coverage of a target population, for example if there is a conflict or for practical reasons. If students, schools or districts are excluded then this needs to be defined, and included in reporting. For example, schools with less than ten students might be excluded if the cost of including them outweighs the benefits of doing so. Generally, at least 95% of the target population should be included.
Sampling methodology–Probability sampling methods such as simple-and complex-random-sampling are those in which each element of the target population has a known, non-zero probability of selection and where there is no systematic bias. Cluster sampling, stratification and sample weighting are components of complex random sampling and are common in assessment programmes where there may be a desire to over-or under-represent certain sub-populations. Stratification can be explicit or implicit. Common explicit strata are state or region and common implicit strata are gender or school type.
Multi-stage sampling–This is commonly used in large-scale assessment programmes. For example, in a two-stage sampling design schools are selected first, and then students or classes selected within schools. This is both cost effective and also allows for multi-level analyses of data but a large sample size is required than in a simple random sample.
Sample size–There are many factors to consider in calculating the sample size but ideally a 95% confidence interval for all relevant estimates is obtained. In a simple random sample, this degree of statistical precision translates to a minimum of 400 students per assessment item for student-level estimates. Complex sampling and analysis designs require compensation for design effects such as clustering to achieve the same precision.
Sampling frame–This should provide as complete coverage of the target population as possible, but its construction depends on the availability of information. It should include a unique school identifier, entries for all explicit and implicit variables and a suitable measure of size such as student enrolment in target classes. It is important to identify substitute schools and/or students if the sampled ones cannot-or refuse-to participate.
Minimising bias-Response rates should be as high as possible to avoid non-response bias resulting from differences between students who participated versus those who did not which can undermine the generalisability of results and lead to misleading inferences.
Sampling weights–These are used to account for stratification or disproportional probabilities of selection of subgroups; to adjust for non-response and non-coverage of the population; to adjust the weighted sample distribution for key variables of interest and to make the sample conform to a known population distribution. Weights are used to estimate population characteristics of interest and in the estimation of the sampling errors of the estimates generated.