Skip to main content

Daniel Stucke

Assessing without levels - Milestones

7 min read

Assessing without levels - Milestones

At our school we took the decision last Summer to embrace the opportunities available to move away from National Curriculum levels. Our approach isn’t revolutionary, but I think it’s worth sharing.


I personally felt that there were numerous issues with the old NC levels. They were not as well understood by parents / pupils / parents as everybody thought. A false sense of accuracy had been developed as levels morphed into sub-levels, did anyone really know the difference between a 5c and a 5b? At a whole school level far too many schools, ourselves included were chasing sub-levels around in circles looking for ‘rapid and sustained progress’. They also lost so much detail, everyone would hang their hat on that one level. But a 5b could hide a myriad of important information. A student might have real strength in Shape and Data in Maths but be struggling with their Number and Algebra skills. We were also in the process of re-writing schemes for learning so it made sense to tackle the two jobs together.

Desired outcomes

Initial work was done between myself and our school improvement partner. We looked at the core outcomes we wanted from our assessment systems. Primarily we wanted to refocus assessment in the classroom on the learning. All assessment should help teachers and students understand which key concepts they had grasped and which they had not. Secondly we wanted a system that could report as efficiently and simply as possible to governors, leadership, teachers, students and parents which students were making expected progress in which subjects. On reflection that’s what all school wide level analysis looked at. And when it involved chasing 2-3 sub levels per year it was a nightmare.

Expected Progress

The National Curriculum and the GCSE programmes of study set in stone expected learning at the end of KS2, KS3 and KS4. School performance measures set 'expected’ progress from KS2-4. Whilst measures are changing at both KS2 and KS4 we felt confident that learners would still join us banded into High/Middle/Low 5+/4/3- attainment bands. And expected progress will still be around the equivalent of three levels of progress.

With this in mind our model takes each subject, splits this into three bands, and maps out the expected learning in each year to ensure that middle level learners join us and progress to at least a grade C+ equivalence. High level attainers to B+ etc.

We asked each academic subject to map out the learning for each of their three bands. Pasting in the GCSE PoS outcomes into Y11, and any NC outcomes into Y8 (we have moved to a 2 year KS3). Then shuffling these around to form their high level scheme of work.


Anyone who has read the NC or GCSE PoS’s will know that the language used therein is not great for use with pupils and parents. We asked our teams to cut these down to the key 'milestones’ that were crucial for progress in the subject. And we asked them to re-write these in language that pupils and parents would understand, without dumbing them down too much, I do think we should avoid hiding away from subject specific terminology. Staff were encouraged to use blooms and SOLO taxonomy language and structures as a guide in this process. This gave us a roadmap for each subject detailing exactly what we expected a student to learn each year, and it gave us a framework of expected knowledge and skills to assess learning against.

As an example the Computing roadmap is below:

Computing Roadmap

Reporting on progress

Each assessment window (we have four a year) we ask staff to report on each student’s progress. They simply report on whether students are making Significantly Above / Above / On / Below / Significantly Below expected progress. This judgment is made by taking a range of formative and summative assessment information and judging whether a student is on track to learn what is expected of them according to the milestone pathways.

Needless to say this makes reporting at class/subject/year group level incredibly easy, a simple tally and percentage of each grade allows us to monitor progress at this level.

Staff also record the next two key milestones that a student needs to master in order to make maximum progress. This might be a key skill that they should have mastered by now but are struggling with, or it might be an important topic that will be covered in coming weeks. Parents get a quarterly report detailing the SA/A/O/B/SA progress measures along with a pair of key milestones for each subject.

Reflections two terms in

I’m pleased with how things have gone so far. Staff worked incredibly hard over the Summer to write the frameworks for this to work. I do believe we have a system that has fulfilled our original aims. More assessment is focussed on specific areas of progress, the entire data collection and analysis system is far simpler, freeing our middle and senior leaders and reporting to governors et al is simplified.

I worry we may have set our expectations too low. 'Expected progress’ was set at the old equivalent of three levels of progress form KS2-4. We took the measure of 'expected’ as per performance tables and mapped it to individuals, sometimes individual expectations need to be different to those we 'expect’ of classes or year groups. The more work I’ve done in successful schools during my NPQH the more I’ve realised that setting an 'expectation’ high than that can lead to higher expectations from staff, parents and pupils of what progress is possible, and in turn leads to better progress. This was always planned as a flexible model that we could tweak as expectations and measures at KS2 and KS4 change and as the new PoS’s come into force. We will review formally at the end of the year and if we have to slide milestones around to raise expectations then so be it.

We have been through a brief spell of ’re-calibration’. Analysing the data showed that more students than should have been were making 'expected progress’ and less than should have been were making 'above expected progress’. On discussion with middle leaders and teachers it was clear that staff had set the bar a little too low when making judgements of 'expected progress’. And conversely too high for 'above expected’. Also staff were forgetting where students had started their journey. Those who joined us as level 4 learners and had worked hard for several years were being judged as making 'expected progress’ because that’s what staff had come to expect from them. When in fact, in terms of KS2-now measures they were working well 'above expected’. Much as it pained me a little to talk levels again, the diagram below helped staff to re-calibrate in their minds our progress statements.

Milestones mapped to levels and grades

In part two I’ll explain the implementation strategy we used to lead this change.

Daniel Stucke

Diagnostic questioning #Computing style

2 min read

Back in September I was keen to get the Computing teaching community to work together to write a bank of high quality multiple choice questions. In fact, rather poorly, it was the last thing I wrote on this blog! Head back to see why I think that good quality MCQs can be an invaluable part of assessment for learning. Some educators got in touch and sent me some questions, but a lack of common format and no central repository curtailed my enthusiasm for the idea.

Step in This fantastic site is the brainchild of @mrbartonmaths.

There are lots of sites out there that support MCQs, there are a few reasons why I think stands out:

  • It’s free.
  • It’s really easy to add questions, including as images. So if people already have quizzes as html files, or moodle quizzes or powerpoint or whatever then they can get transferred in here pretty easily with a simple screenshot upload.
  • Students have to explain their answer for every answer. And then they get to compare their explanations to the best ones from students around the world. Fantastic for them, and also fantastic for teachers to delve into understanding, or lack there of.

It would be fantastic if some of the Computing teaching community could chip in with questions of their own. Whilst I’ve put most up so far credit is due to the brilliant @mrocallaghanedu who wrote and shared them originally on his blog.

I plan to use small quizzes each lesson on the run up to exams to ensure that students and myself know what they know, and more importantly know where the gaps are. They can revise to the gaps, and I can teach to them. Of course this will be supplimented with lengthier question types akin to those found in the exam. I also hope to get some of my more able students to write MCQs of their own. It’s no easy task and really tests your subject knowledge, a perfect task for those pushing A/A* grades this Summer in the GCSE.

The Diagnostic Questions site:

Daniel Stucke

Crowd-Sourcing Multiple Choice GCSE Computing Questions

6 min read

A plea to teachers of Computing GCSE, please join an effort to write a bank of high quality multiple choice questions to support the teaching of this course.

Why multiple choice questions?

There is an increasing body of research and writing showing that skilfully written multiple choice questions are an effective means of developing retention. They are easy to administer and easy to mark, particularly in an IT rich environment. My experience (and looking at OCR data, most other school’s experience) is that students perform poorly on the written examination part of the GCSE. Their retention and recall of the knowledge needed to succeed in the exam is poor.

Well written multiple choice questions could be used:
- as lesson starters / plenaries etc to revisit learning covered earlier in the course, helping to develop retention
- as hinge questions in the middle of a lesson as part of the AfL process
- as small assessments of units of work
- without the wrong answers, as flashcards for learner’s revision

Further reading

My thoughts on this have been shaped by some excellent writing and research online…
- Joe Kirby - How to design multiple-choice questions
- Daisy Christodoulou - Research on multiple choice questions
- Robert Bjork - Multiple-Choice Tests Exonerated, at Least of Some Charges: Fostering Test-Induced Learning and Avoiding Test-Induced Forgetting
- Dylan Wiliam - When is assessment learning orientated
- Dylan Wiliam & Caroline Wylie - Diagnostic Questions: Is there value in just one?
- David Didau - How effective learning hinges on good questioning


Great! I have set up a Google Form for you to submit your questions and answers. I’ve split the Computing GCSE up into a few high level topic areas to categorise each question. Once you’ve shared some questions I will happily give you access to the spreadsheet behind the form and what will hopefully grow into a large bank of high quality questions. If you have lots that you’ve already prepared and want to send me an email with them in a different format that would be wonderful. I’ll do my best to add them into the rest without you having to copy each one in to the form.

7 Principles for Designing Multiple Choice Options

Quality questions that help develop retention need carefully crafted options for the answers. Please follow these guidelines when constructing your questions and answers.

With his permission, I’ve shamelessly stolen this list from Joe Kirby.

  1. The proximity of options increases the rigour of the question For instance, the question is, what year was the battle of Hastings? Options 1065, 1066, 1067, 1068 or 1069 are more rigorous than options 1066, 1166, 1266, 1366 or 1466. Of course, the question itself also determines the rigour: ‘80 is what percentage of 200?’ is much easier than ‘79 is what percentage of 316?’  
  2. The number of incorrect options increases rigour Three options gives pupils a 33% chance of guessing the correct answer; five options reduces the chances of guessing to 20%; always create five rather than three or four options for multiple choice questions. A ‘don’t know’ option prevents pupils from blindly guessing, allowing them to flag up questions they’re unsure about rather than getting lucky with a correct guess. With this in mind the form will accept questions of the form:
    • 1 correct answer from a total of 4
    • 1 correct answer from a total of 5
    • 2 correct answers from a total of 5
    • 2 correct answers from a total of 6
      The further down that list the less chance someone can guess the answer correctly.
  3. Incorrect options should be plausible but unambiguously wrong If options are too implausible, this reduces rigour as pupils can too quickly dismiss them. For instance, in the question: what do Charles Dickens and Oliver Twist have in common, an implausible option would be that they were both bank robbers. However, if answers are too ambiguously similar, this creates problems. For instance, in the question, ‘What happens in the plot of Oliver Twist?’, these options are too ambiguous: a) A young boy runs away to London b) An orphan falls in with a street gang of street urchins c) A poor orphan is adopted by a wealthy gentleman d) A criminal murders a young woman and is pursued by a mob e) A gang of pickpockets abduct a young boy
  4. Incorrect options should be frequent misconceptions where possible For example, if you know pupils often confuse how autobiographical ‘Oliver Twist’ is, create options as common confusions. These distractors flag up what pupils are thinking if they select an incorrect option: a) Both were born in a workhouse b) Both were separated from their parents and family c) Both were put in prison for debt d) Both had families who were put in prison for debt e) Both were orphans
  5. Multiple correct options make a question more rigorous. Not stating how many correct options there are makes pupils think harder. For example: Which characteristics of “Elegy Written in a Country Churchyard” can be seen as Romantic? a) It celebrates the supernatural. b) It is written in iambic pentameter. c) It emphasises emotion over reason. d) It deals with the lives of common people. e) It aspires to nature and the sublime.
  6. The occasional negative question encourages kids to read the questions more carefully. Once they get a question like ‘Which of these is NOT a cause of World War 1?‘ wrong, and realise why, they’ll work out they need to read questions again to double-check on what it is they’re asking.
  7. Stretch questions can be created with comparisons or connections between topics. What was common to both the USA and Germany during the Great Depression?
    a)     Jewish immigration increased
    b)     Membership of Ku Klux Klan increased
    c)     Public works projects were implemented
    d)     Government social programs were reduced

Still here? Then let us begin!

The form is available here Don’t rush, fashion some great questions as you go throughout this year. Together we can make a powerful learning resource. And hopefully many hands will make light work of the task!

If you have a big list of questions to submit to the cause then please email them to me at dstucke [at] and I will endeavour to add them to the master list. If you would like access to the bank of questions then leave me a comment here, email me on the address above or send me a tweet. I’ll collate them into a shareable format. Probably just a csv or txt file to start with that can then be used to import into your response system of choice be that Moodle quizzes, Socrative or ExitTicket questions, whatever you choose.

Daniel Stucke

Dan's Digest by danielstucke

1 min read

My blogging is not what it once was, although some new posts are in gestation. But I thought I’d try something different, and slightly more old fashioned for sharing links and thoughts that I come across. In comes my brand new, very old fashioned email newsletter. Please sign up, Newsletter 1 out soon, and you can always subscribe if there’s nothing of interest in it :)

Dan's Digest by danielstucke

Daniel Stucke

Confidence in predicting attainment

3 min read

How accurately can we set targets and predict performance of pupil’s exam attainment?

At this time of year the pressure is on teachers and leaders in school to know exactly how their young people will perform in the GCSE exams that they are currently sitting.

There is an expectation from leaders, governors and of course Ofsted to accurately track, monitor and predict pupil progress and hence exam performance.

Our school, like every other in the country, set targets for individual pupils at the start of the year and then ask the teachers to measure the pupil’s progress in comparison to this target grade, and to predict their final exam grade. These are collated and a range of performance measures for classes, subjects and the whole school are calculated.

But how accurate are these and how confident can we be in these predictions and targets? Our school has 155 young people in year 11. And some subjects have as few as 15 learners. One or two students having a bad day in the exam, or realising too late that they picked a subject that they really don’t enjoy, can have a large impact on the results. How much should we take this into account when setting targets and holding teachers and middle leaders to account?

This train of thought has led me to look at the use of confidence intervals. I’ve used the binomial proportion confidence interval to calculate the 95% confidence interval on the number of students predicted to achieve a C+ in each subject. This takes into account the size of the class.

I’ve ended up with results such as: English C+ = 79% +/- 6%. So we’re 95% sure they will end up with results between 73% and 85%. Computing, with a smaller number of students, comes in at 70% +/- 16%.

Is this an appropriate statistical measure to use?

The confidence intervals are quite large due to the small number of students involved. I’m not sure if this appropriate though? This measure is normally used when sampling a small part of a larger population. Is that what we are doing here? In that our Computing class is a small sample of the national group sitting that examination? Or are we actually sampling the whole population where that population is simply all the students studying it at our school?

If this is not that right measure of confidence to use in this case then what is?

Lots of questions that I’m hoping some of the data / statistics community can help answer. The stakes are high in schools now and the targets that departments and schools are set have high levels of accountability attached to them. It’s only fair that targets and predictions come with a suitable statistical window attached. How should that be calculated?