Norm-referenced vs criterion-referenced tests
All the tests in the Sherwood Test Series have been developed as psychometric instruments and appropriate data has been collected and analysed to establish good levels of reliability and validity. Reports are based on the number of questions answered correctly combined with the time taken to reach the correct answer. Together these provide a Power score. In addition a pure speed measure, based on the time taken to answer, is also used. Combining Power (thinking power) and Speed (thinking efficiency) provides valuable information about ability not offered by standard paper and pencil tests. All learners receive the same set of items (questions) with performance being compared to a relevant population or norm comparison group. Simply, it is more meaningful to compare performance with others of similar ability, e.g. to compare a learner against a "Key Skill Level 2" norm group, if the learner is about to embark on a level 2 course. This makes our tests reliable and fair.
For example, The Sherwood Ability Screen has ten norm comparison groups. The comparison groups are based on the test results of over 50,000 learners. Groups range from Basic/Functional Skills through to Key Skill level 4 and beyond. All norm groups are based on thousands of individual item responses collected from colleges and training organisations nationwide. In the case of the SAS that’s some 1,950,000 individual responses. A learners performance can easily be compared to any of these groups. Performance is shown visually as a point on a sliding scale to make comparison between learners easier. We have introduced the option of letting our reports engine take the strain by writing a text-based interpretative report for you. Many organisations currently use our tests as base for directing support and for claiming the appropriate funding.
Think about when you have received test results, e.g. school exams. How did you feel being told you scored 17 out of 30, or 65%? Usually one of the first questions asked is, “How did everyone else do?”, “Is that a good result in comparison with others in my group?”
A test score with no reference to other people in a group is somewhat meaningless. For example, a result of 25 out of 40 could be very good if the test was very difficult. If the test was very easy however it is not. To be able to interpret test results meaningfully, we compare individuals results against those obtained by an appropriate reference group. This is a key feature of norm referenced tests over traditional criterion referenced tests.
Tests are delivered, scored and interpreted by our own specialised software. The software produces a range of customised reports, including detailed individual reports and reports across courses. The latter reports are particularly useful if a learner is enrolled on more than one course or support programme and needs to appear on various group reports.
What does this all mean for me and what’s the benefit over using traditional tests for initial assessment?
One question which is often asked, and rightly so, is: “How do I know what level the learner is at?”. The best way to answer this is to make use of concrete examples with results taken from genuine students.
If a learner was to take a traditional Level 2 Key Skills test and scored 50% what would this mean? Would you as a professional take this to mean they were actually working at level 2 across the board? Would the decision simply be a matter of judgement by whoever was interpreting the results? With traditional tests this is a common problem.
Norm based assessment
All Sherwood series tests are norm-referenced rather than criterion-referenced. Home-grown assessment tests (as many in this market are) are criterion-referenced, but cut points on these tests tend to be arbitrary and without evidence to support where cut points for levels are set, i.e. which percentage points get an A, a B, C and so on. Data is often not collected on their use, and if it is, it is not analysed by specialists in test development, so tests cannot claim to be technically sound.
Norm-based tests allow you to compare performance with standardised norm groups which are continually being updated. All Sherwood series tests are true psychometric instruments, developed by Chartered Occupational Psychologists in conjunction with specialists in the education and training sectors, and as such have established technical properties such as known levels of reliability and validity. Most “tests” on the market do not have this pedigree.
It would be very useful if the same learner could take the Level 1 and Level 3 versions of a traditional Key Skills test as well as Level 2. Then, in theory, you would be able to examine their score at a level above and below where you thought their ability level may lie. You could decide that if they scored 75% on the Level 1 Test, they were more likely to be working at Level 1. However this would be a matter of personal judgement and you’d have to allow some margin of error either side of your decision.
Ideally then you’d want the learner to take exams at all levels to see which one produced their best performance. As you can see this is fraught with problems:
- The learner has to take lots of exams
- You’d have to come up with some arbitrary criteria or cut-points or percentage correct figures for making decisions (criterion-referenced testing)
- How would you know that all the exams measured the same concepts, e.g. do Level 1 exams measure the same abilities as Level 3 exams in the same subject but at different levels of difficulty?
- How do you know how difficult an exam or individual questions actually are?
- What about evidence of reliability and validity?
Norm based assessments allow you to reinterpret test scores after a test has been taken. You can compare any learners performance against a larger standard national norm group who have taken the same test. These groups are stratified according to ability, allowing you to compare performance against high or low ability groups. Conceptually, learners at different levels of ability have taken the same test and you are placing each new learner somewhere on the curve of a comparison groups overall performance. A sample curve is shown below to illustrate the kind of normal distributions we use for each comparison group. Each of the commonly used scales which map onto such curves are shown below.
Remember, we assess whether the learner has answered correctly and also take into account the time they took to answer each individual question. The addition of the speed measure means you can be more confident in making the right decision about a learners ability level. Traditional tests do not give you this flexibility.
What’s the practical effect on reports I generate?
The best way to illustrate how all Sherwood series norm-referenced tests work is through practical examples of real reports with genuine students.
For example, a student – Joanne Andrew is registered on a level 2 course. How do I know whether Joanne is capable of coping with the literacy and numeracy demands of a Level 2 course? It would be fairly meaningless to assess her ability using the final exam questions before she has studied the course, so what can I do? I need to know if she may need support.
In order to generate some meaningful information about Joanne’s ability I could run her through the Sherwood Ability Screen for example, to assess general Literacy and Numeracy skills. I could then compare her performance with standard national norm groups at various levels. By doing this, I could determine whether her Literacy and Numeracy skills are typical of a Level 2 learner. If they are it would suggest that she is armed with the Literacy and Numeracy skills necessary to complete any Level 2 course. Obviously motivation and interest in the course is important too, as well as providing appropriate learning opportunities. Our ICT test, the Sherwood Technology Aptitude Test, measures performance in the same way but assesses the fundamental building blocks for success on ICT courses or courses with an ICT component.
Let’s look in more detail at the performance of Joanne Andrew. I could begin by comparing her score against a Basic Skills norm group. In essence I’m comparing her score against a much larger group of Basic Skills learners who have previously taken the test. She will be placed somewhere on the curve for the group and a percentile score produced.
Percentiles are very easy to understand. However they are not percent correct figures so be careful not to confuse the two. Instead they tell you what percentage of the comparison group a given learner has performed better than. Look at the Power score for Writing. Joanne has performed better than 73% of Basic Skills students who have taken the same test. This is a relative comparison. If we were using a traditional test what would 4/6 or 67% correct actually tell you in relative terms?
Power is based on the number correct and the time taken to answer each question. Here Joanne answered 4/6 correctly in the Writing section. However the time taken to get those four questions correct is taken into account to give the 73rd percentile. Below that is the Speed score. This tells you how fast the learner answered all the questions. There is no weighting for whether they answered correctly or not. So, on a speed curve for Reading Joanne is faster than 81% of Basic Skills students.
Looking at the graph you can see that the mid circles are slightly darker. This is a marker for the average performance of the Basic Skills national norm group. This marker relates to the mean on a normal distribution where an average Basic Skills learner would be. If a learner scores somewhere around the midpoint you may conclude they have a level of skill that is typical of a Basic Skills learner and that they are working at that level. In this case the student is clearly more able so we may choose to compare against a more able comparison group.
Below we can see the effect or comparing against increasingly able comparison groups. Remember the learners raw scores remain the same [4/6] and the time they took to answer the questions remains constant at one minute and fifty seconds [1:50]. We are simply comparing their performance against more able groups. As the ability of the comparison group increases ability relative to that group typically decreases. This is reflected graphically and in the percentile scores.
From these reports we might conclude that Joanne’s ability level is somewhere just above the average for Level 1. In terms of Writing ability she should be able to cope easily with a Level 1 course.
A range of other individual and group reports are available. If you’re not confident about making decisions about a learners level this way a “Highest Ability Level” report is available so that the reports engine automatically does the interpretation for you and gives you an exact level!
All Sherwood series tests work in the same way, whether they assess Literacy and Numeracy or the skills underlying ICT ability. All are predictive of course success as they assess the underlying skills required to succeed.