Adventures with Machine Learning – Part 1

Last year, I wrote this post which shared how we had developed some analytics tools and data dashboards across Northampton Primary Multi Academy Trust. This was part of ongoing work in our trust to get better at using data we already have to describe, predict and ultimately intervene in the way we operate.

Click the picture to read about how we’ve used data dashboards to share information from standardised tests.

Since this post, we’ve had a lot of interest in how we’ve been developing analytics tools in the trust and so this is the first in a series of three blogs which I’ll interview the brains behind the analytics, self-confessed data geek, Matt Woodruff.

Matt Woodruff: The data man…

 

Matt is the founder and ‘Chief Data Scientist’ at Coscole Ltd. (now a part of Groupcall Ltd.) and I’ve been working with him for the last 3 years on this project within our trust.

 

 

One of the things I’ve learned from Matt is the power of predictive analytics. I believe that too much time is spent looking at pupil data as a rear view mirror in schools. This is often driven by a need to ‘know your data’ for accountability purposes rather than to help you think about what data can tell us about the future. If we think about the publication lag of documents like ASP (previously RaiseOnline), it’s crazy to suggest leaders should wait until November to find out about what happened in the past to a group of children who have already left the school. I think there’s more we could do to analyse the information we have about future cohorts of Year 6 to help adapt and tailor their provision whilst we still have time.

An example of how Matt’s brain works is our exchange over the title for this blog. I went for, ‘Matt and Tom’s Excellent (Analytics) Adventure’ whilst Matt suggested the catchy ‘A ‘Small Data’ Predictive Experiment using Machine Learning – Can MAT pupil level data generate reliable predictions for outcomes or identify pupils ‘at risk’?’.

Matt & Tom’s Analytics Adventure…

In this first interview, we go back about 3 years to a point in time where we were fumbling around in the dark for answers to a life without levels.

Me: Matt – when we first sat down with you, we had a ideas session where we outlined our vision of trying to bring together many pieces of pupil data in one place. What were your initial thoughts when you looked at the sea of post it notes which represented the many different pieces of data we wanted to bring together?

Matt: This is taking me back some time! It was at this stage that Coscole was finishing its direct commission with another MAT where we had spent three years building an approach to personalised learning, and in aggregating and visualising data.  During this time I’d worked with multiple stakeholders from head office staff and directors of education, with data managers and school leaders as well as some pilot projects with teachers, students and parents.  What was refreshing about beginning to work with NPAT is that you had, even back then, a good understanding of where there was challenge, and therefore where the opportunity was to improve: you were not setting out to boil the ocean; to have analytics be all things to all people.

I think like the mantra that ‘Exams are necessary, but not sufficient’ you imparted a view to me that said ‘Teacher Assessment is necessary but not sufficient’ in relation to understanding the whole profile of the pupil.  You had already engaged with cognitive ability testing and understanding pupil attitudes to self and school, and were bringing in external assessment as standardised scores to provide a more rounded profile.

Of course it is natural at that stage you are inclined to pull your hair out – a growing MAT, albeit with a single MIS provider, with a growing need to make more effective use of data and to put it at the finger tips in an easily digestible manner to those that need it most – when your data comes from different providers, in different forms, at different times.

The ‘sea of post-it’ notes therefore represented the fact that as a Trust you had already embarked on the journey of understanding that there was a challenge in making effective use of data and saw the opportunity in the potential impact to improving outcomes if you could do it right.  More than this – you’d moved down the road in determining exactly what you thought it important to capture, but also some things that you may have been doing by rote up until that point that you reassessed and decided it was not as important as you once thought it might be.

Me: There was a lot of planning and work that went on behind the scenes before we ever got close to inputting data. Aligning the MIS databases for across the trust was a big job and we spent a lot of time cleaning up our data as a trust which was an important but time consuming stage. Is this a normal part of the process with all schools you work with?

Matt: Getting to effective analytics is a journey.  Everyone knows the adage ‘Garbage in Garbage out’, and it becomes particularly true when you compound the issue by aggregating ‘garbage’ across schools.  Once again though this is not about boiling the ocean.  Do you have to align everything across an MIS?  Absolutely not.  Should you seek alignment over time, in the things that really matter? Absolutely yes.  You can do this in a way that does not undermine the context of different schools – thats vital.  It is also not about a doctrine of ‘top down’.  In my mind it’s about identifying good practice around data, and making sure all schools understand the importance of that.  Good practices with data lead to much less wasted time further down the chain, and not only wasted time but the impact of not really being sure about your data providence.  The typical reaction to seeing numbers, metrics, percentages is that we believe them.  In too few cases are the underpinning assumptions challenged – “how was this data derived”, “what moderation is in place across schools to ensure that an apple in one is an apple in another”.

Yes, technically, the MIS databases were aligned in so much as NPAT standardised on naming conventions for Aspects.  Yes we put in place data extraction technology and we warehouse that data and layer education modelling on top (the calculations that do your %GLD, Combined Y1/2 Phonics etc).  That’s business as usual really.  The fun starts again with the people and process elements. As soon as you visualise data in a more effective way, and don’t forget we’re not inventing new data here – we are just taking data you already have available – you instantly see gaps.  You instantly notice things that aren’t right.  And when I say ‘you’ I mean from CEO down.  That can be a scary place for some because we lift up all the rocks.  I think that’s great, because this is absolutely not about blame for a legacy of data whose quality can be improved, it’s about finally having access to the tools to quickly spot variations and to scaffold the people and the processes to ensure data is reliable.

There is no one I have worked with that has not done something different in their schools after joining and visualising data, and that’s a great thing.

Me: We had several U-turns and changes throughout the process as we switched our position on the types of assessment data and teacher assessment descriptors. How were you able to manage these changing demands from a technical perspective?

Matt: You’d started with fairly granular objectives in teacher assessment if I remember rightly: levels and sub-levels, or steps and stages, or milestones and smaller objectives. The change specifically did not provide too much of a technical challenge around how we got the data, but I think we found it a particularly challenging time to understand the way in which you wanted to visualise it – how leaders and staff would need to see that in the most digestible form.  We use both a flexible visualisation approach with the Trust with Microsoft Power BI, as well as our own Apps built for Office 365.  PowerBI is naturally easier and quicker for us to adapt than the code in our Apps, but by the same stretch our Apps can provide a more effective interface at times for staff.

The biggest issue though is both from a technical and a data perspective.  We lose consistency, and history.  For me a major incentive for a mature approach to data and analytics is having access to this history so we can build trend analysis and forecasting.  Every time we decide to do something different it makes that more difficult.  In this case with NPAT those decisions and changes were actually dealt with fairly early on and we’ve collectively been consistent since.

Me : After a year or so of being to analyse pupil information, we then started the conversation around how we could use technology to start to predict future attainment. You introduced me to the concept of Machine Learning. Can you explain (to a non-data specialist) how Machine Learning works?

Matt: There is a lot of hype around Artificial Intelligence (AI) and Machine Learning (ML) right now.  Three years ago everything was Big Data, in much the same way.  In many respects there is absolutely nothing new about ML, its been an active research area since the 1950’s and arguably in different forms well before that.  Today, ML is a subset of the domain of AI and deals with the ability of computers to learn from data.  It is technology that is now prevalent in just about every other aspect of our lives – from blocking spam in emails, to recommending products on Amazon, films on Netflix, and of course most recently in developing self-driving cars.

ML is itself subdivided into:

  • ways that we set out specific parameters for the computer and where we know what we are looking for (supervised learning),
  • where we want the computer to look at unstructured data and classify it itself (unsupervised learning), or
  • where we set up a computer to learn through its own exploration – most famously used with Google’s AlphaGo team beating the world champion (reinforcement learning).

These developments have brought ML more recently into the mainstream.  Tools are widely available to utilise ML models both with open source approaches, with Microsoft, or in combination.  In fact, Microsoft have recently announced the fact that they are integrating ML/AI approaches with PowerBI, which is really exciting.  EdTech companies like CenturyTech integrate ML in their adaptive learning routines.

In these ways, the technology already exists to make what we do in education faster, better, and deliver more impact.  We spend 95% of our time looking backwards – what has just happened – whether that is last year or the last ‘data drop’.  This is ‘descriptive analytics’  – we are simply describing the things that have happened.  Other industries have already moved into ‘predictive analytics’ – using data to predict what is likely to happen in the future.  Where we can get to beyond that is with ‘prescriptive analytics’ – if we know what is likely to happen in the future what should we be doing now either to mitigate risk or extend opportunity?  The potential for a learning system that provides effective and efficient decision support for the human in an education context is vast.

This isn’t about ML being able to take peoples jobs; that in five years we’ll have robot teachers.  This is about us leveraging what computers do far better than us, in order that we focus our human intelligence on the things that the machines will never be able to do (there is some ongoing debate on when the singularity might occur, but I think that’s beyond this blog…).

Me: There came a point where I asked you if you thought you would be able to predict our SATs results last year and you went off and did something clever on your laptop and came back with some results. What did you do?

Yes, we got to that point where we knew that we had good consistent data, and conceivably enough to do something meaningful from a predictive point of view.  We tend to like lots of data for reliability, and when you boil it down a Year 6 cohort is not a lot of kids, even over 7 schools.  However, we were at the point 18 months in where we had one years of historical data on the same basis as the current Y6 cohort.  The same ‘schema’ if you like, of the things that we thought might matter in predicting outcomes.

Truth be told this was new to me.  We’d been active in development at Coscole around the AI stack being released by Microsoft and also at that time had been engaged in another predictive proof of concept with Microsoft and one of the largest MATs around Progress 8.  Looping back to my mantra of not boiling the ocean, I thought why not try something – it’ll either work or not work.  If it works, that’d be pretty cool.  If it doesn’t work, we’d be interested in why not.

The easy bit was actually the ‘data wrangling’.  This is normally the bit of any data scientists life that consumes 60-80% of their time – finding data, cleaning data, putting it in a form that can actually be consumed by something and then do something useful…  The joy for me is that we’ve done most of that: we have all your data warehoused, clean, ready to go.

I set out then to run a very simple experiment.  This type of work is not new, lots of providers do it with their own data and I’m sure in more advanced ways, FFT, Rising Stars, CenturyTech etc – but for me it was a validation of others results and I was personally interested in the correlations with the MAT data.  The question I was interested in was “Can we use school owned, in year data, from different data sources, for reliable prediction?” .  If so, the follow up would be more important – “How should this impact current data collection practices to save time for staff, and to highlight interventions early?”.  I also had an academic interest in a methods comparison study in the context of my PhD.

The experiment was a straightforward linear regression model trained on your prior year Year 6 MAT data including a selection of pupil characteristics and principally their standardised scores in Reading and Maths.  I completed this in both Python as an open source approach as well as in Microsoft AzureML.  I then used this model to run against your (then) current Year 6 cohort to predict their Reading and Maths test outcomes.

The results were interesting – in one way or the other I’d described earlier. Or both – pretty cool.

In Part 2, I’ll be asking Matt to explain how accurate his predictions were when we opened the envelopes* on results day in July 2018 and what the implications of this are for adopting predictive analytics for outcomes or identifying those at risk in the future.

*We didn’t really open any envelopes on results day. It was a downloadable csv. at the much more civil time of 8am this year rather than waiting till midnight.

Assess Like a Consultant Doctor: Chapter 8 preview from Wholesome Leadership

As part of this series of short posts to introduce my book, Wholesome Leadership, today I’m sharing a preview of Chapter 8:  ‘Assessing Like a Consultant Doctor’.

Wholesome Leadership is now on sale for pre-order and will be published around the 22nd of May 2018. You can read some of the early reviews or find out how to order here –  www.wholesomeleadershipbook.com 

This chapter sits in the second section of the book which is focused on the ‘head’ of leadership, part of the H4 Leadership Model which captures the heart, head, hands and health of school leaders. It follows on from previous posts I’ve written including Chapter 6 – ‘Strategic School Improvement & Research’ and Chapter 7 – Healthy Accountability‘.

H4 Leadership Model: ‘The Heart, Head, Hands & Health of School Leadership…

Assessment has become a primary culprit in the ongoing challenges of teacher wellbeing, recruitment and retention. Workload in this area has spiralled out of control – particularly in areas such as marking, data inputting and evidence gathering. Alongside this, a lack of clarity around national assessment since the removal of levels and increased accountability pressure on schools to achieve in performance tables have combined to create the perfect storm.

Within the chapter, I share some of the challenges that are faced by schools in this area, suggest 10 steps to ‘sorting out summative assessment’ and talk about how we can reduce workload through revising approaches to marking and feedback. The analogy of a ‘consultant doctor’ is used to suggest how we can use assessment in a more manageable and meaningful way within schools.

Assessing like a consultant doctor

One of the perks of having a child with a disability and a complicated medical history is that you get to see the expertise of consultant doctors up close.  I have immense respect for everyone in the medical profession, but some of the specialists who have worked with Freddie have been class acts. One of the things that strikes me about these doctors is how they look beyond the obvious and avoid drawing quick conclusions. Rather than making decisions based on limited information, the most skilled and experienced doctors will examine a range of information about a patient as part of their assessment, including blood tests, scans, examinations in clinic, patient history and referrals from other medical professionals. Similarly, the most effective teachers and leaders understand the limitations of any particular test or assessment and can use their experience and expertise to interpret them wisely. And just as careful and intelligent consideration of patient information can lead to an accurate diagnosis and the prescription of helpful treatment, meaningful assessment can lead to greater understanding of gaps in learning and effective tailored teaching and intervention.

Within the chapter, I interview Daisy Christodoulou (Director of No More Marking and author of Making Good Progress and the 7 Myths of Education‘.  who kindly gives up time to offer her expertise about the challenges that remain in schools to adapt to a life without assessment levels.

Here is a summary of the chapter in one page…

Wholesome Leadership is now on sale for pre-order and will be published around the 22nd of May 2018. You can read some of the early reviews or find out how to order here –  www.wholesomeleadershipbook.com !

Data Dashboards across the MAT…

I’ve had quite a bit of interest in the assessment and analytics development that we’ve been working on as a trust in the last few weeks so I thought I’d share some of our thinking along with some insights into how we’ve developed consistent summative assessment processes across our trust of 8 primary schools. I was also supposed to make a presentation at the BETT Show this week sharing the analytics tools we’ve developed but couldn’t make it so instead I’ll share some thoughts here.

Trust-Wide Assessment

One of my responsibilities across Northampton Primary Academy Trust is to develop our approaches to assessment in a world without levels. A big part of NPAT’s development is to constantly look at where we standardise practices and where we leave approaches down to individual schools and teachers. One area that made a lot of sense for us to standardise was our summative assessment processes and over the last three years we’ve been working on the what, why, when and how of assessment.

Standardised tests

Like many others, we’ve come around to the view that standardised tests across schools are a really important part of our internal assessment system.  There are different standardised tests out there and we use PIRA and PUMA for Reading and Maths respectively. Although ‘testing’ can get a bad press, we see a number of real benefits including the following:

  • They are more reliable than a teacher assessment grade in comparing attainment.
  • They take much less curriculum time than other forms of ‘Teacher Assessment’ or tests – the ones we use take 45 minutes each.
  • The workload associated with standardised tests is much less than other lengthy processes we’ve experienced involving evidence gathering or maintaining tracking systems with large amounts of objectives.

On the subject of standardised testing, James Pemroke’s post is well worth a read here.

If there is a question around tests such as PIRA and PUMA, it’s around validity and how relevant the information is that you get from them in relation to say the new end of KS2 tests. A specific example here is that there is almost no arithmetic in the PUMA tests in comparison to the new requirements at the end of KS2. But where the outputs are useful is as a predictor of what outcome children are likely to achieve at the end of Year 6 and thankfully some early correlation work now exists such as this from Tyrone Samuel from Ark Schools which we can build on.  Having a sense of how our children are performing in relation to the rest of the country is a really useful thing.

Stop Chasing Shadows

Getting good standardised attainment data from across different classes and schools is really helpful when identifying what the current strengths and weakness exist in the school – particular in comparison to others. By being able to see data such as comparative average scores, we can flag up where there are potential strengths and weaknesses more accurately across KS2 and crucially, before children get to Year 6.

Having an earlier radar on standards can help us to focus on the live issues in the school rather than being duped into a game of chasing shadows responding to what RAISE/ASP or FFT says about the children that left months before. It also gives us the opportunity to intervene earlier when necessary in KS2 which I hope can mean that there is less clamour in Year 6 as cohorts progress through.

How does it work?

Very simply, we have identified 3 standardised assessment points across the year (AP1, AP2 and AP3). These are in December, March and June. For children in Years 3-5, they complete PIRA and PUMA tests at this time. Children in Year 6 complete the 2016 SATs paper at AP1, 2017 SATs paper at AP2 and then the real thing in May. This happens consistently in all our schools at these times.

Collecting and Cleansing Data

Once tests are marked, teachers input the results into our MIS system (SIMS) and then this data is checked centrally to ensure that it is complete and in the right format. There is a lot of data ‘cleansing’ to do at this point in the process where data needs reformatting, double checking and testing. This is a really important stage and has required us to invest in staffing to manage the process as well as solving technical challenges so that the data manager has access to each school’s MIS remotely.

Once the data is in SIMS is complete, it is then sucked up using a ‘data agent’ and all the information held in the different schools is then stored centrally in a data warehouse. This part is really clever; way beyond my skill set and we’ve worked with Matt from Coscole Ltd. who does this work across our trust.

Once the data is in the warehouse, it can then be used for different purposes. This is part of our mantra to ‘collect once, use many times’.

Trust-Wide Analysis

Power BI (again customised and hosted through Coscole Ltd.) then provides the ‘front end’ which is the bit that school staff can engage with. It’s a part of our Office 365 dashboard which all staff already have access to and so it doesn’t require any additional login.

The following three screens are dashboard extracts from our system which allow us to compare attainment from standardised tests across schools in the trust. There are a range of filters you can tinker with to then view these same analytics by either school, contextual group etc.

Please note that the images here are from a version of our data in which all names of schools and individuals have been changed and results randomised so that no-one and no school can be identified.

This is a summary dashboard showing Reading and Maths headline data across all schools (light blue is Reading and black is Maths). You can view this by different year groups or altogether. In this screen, we are viewing Year 3 data.
This dashboard shows the current Year 6 reading data at December for a previous year’s SATs test across all trust schools. The data here indicates that 65% have achieved 100+ and 59% of FSM6 children have achieved 100+. There is also analysis by gender, SEND and PP on the right hand side.
This dashboard shows Maths average data for all Year 3 classes across the trust. It displays the same breakdown as the Reading dashboard.

This final screen is a scatterplot matching prior attainment (1 = Low, 2 = Middle, 3 = High, 0 = No Data) against current test scores. This is a much more visual way of comparing these two fields than looking down a spreadsheet. The same comparisons can be made with targets.

In this dashboard, the vertical axis separates the children by their prior attainment and the horizontal axis plots their average standardised score according to PIRA/PUMA (or Y6 test).

There’s lots more I could write about assessment (who knows I might have a chapter in an upcoming book?) but that’s all for now and hopefully enough to get a taster.

We’re hoping to host a visit to the trust later in the Spring term where anyone interested can find out more about how the data analysis works.

I’d be interested in any comments, suggestions for improvements and to know what other trusts or groups of schools are doing in this area around data.

TR

 

KS2 ASSESSMENT ‘CLARIFICATION’ WEBINAR – February 2016

Some questions and clarification from today’s KS2 assessment webinar.  These are my notes and I did my best to keep up with the questions and answers but they might not be 100% accurate so the usual disclaimer about taking them in that spirit and please refer to the DfE for any further clarification on anything controversial!

TESTS

  • Confirmation of KS2 Teacher Assessment submission date that has been ‘relaxed’ to the 30th June (as before).
  • 1900 schools will be tested in Science this year – schools will be contacted by the end of April about this.
  • There will be a number of schools contacted to trial the online times tables test this Summer ahead of all schools having to carry this out in 2017.
  • There will be some ‘revised guidance’ published shortly around the exemplification materials.  A clear statement was made that there doesn’t need to be ‘huge amounts of evidence collected.
  • Read the guidance information carefully around access arrangements as there have been changes to the application process.
  • Children working below the standard of the tests should not sit them.  Read the Rochford Review for information on children working below the ‘working towards standards’.
  • Children with additional needs may not apparatus such as numicon, number squares etc. within the tests, even if this is part of their normal classroom practice.  Only the apparatus listed in the test may be used.
  • Clarification was given on the use of ‘standard methods’ within the Maths test.  If children get the answer correct, they achieve full marks; if they get the answer incorrect, they will only get the 1 ‘working mark’ if they demonstrate one of the ‘standard methods’ in the revised National Curriculum.
  • Clarification again on the 65% floor targets and progress measures which has been explained in great detail here by James Pembroke.
  • GPS is not part of the ‘combined’ floor target but will still be published as an individual subject.  Combined is Reading (test), Writing (TA) and Maths (test).
  • The process of calculating scaled scores was explained in response to the questions about why these can’t be released earlier – they will be available on the 5th of July along with the results.  The trial data was only based on children who hadn’t studied the National Curriculum for 2 years and only this year’s Year 6 cohort will have done this.  Therefore, this year, it will have to based on this ‘live’ sample of children.
  • The school progress measure will be calculated well after the tests (Autumn?).
  • Tests and Teacher Assessment will be reported in different languages this year.  Tests will be reported as ‘scaled scores’ whilst Teacher Assessment will be reported against the definitions in the interim assessment frameworks.
  • Unlike KS1, there is a statutory requirement to report both test and Teacher Assessment outcomes to parents.
  • There will be no further sample tests published prior to May.

WRITING TEACHER ASSESSMENT

  • More moderation guidance will be issued shortly in response to the new dates for submission of Teacher Assessment.
  • It was suggested that a range of writing opportunities can all contribute towards the evidence base but nothing ‘too heavily scaffolded’.
  • ‘Independent’ work within an evidence base was discussed and there will be some more guidance issued shortly around how independent, ‘independent’ writing has to be.  It was suggested that work with some peer feedback, self review etc. can be considered as independent.
  • As previously, there is no special dispensation for children with dyslexia with regard to the teaching assessment of writing.  If any child doesn’t meet the spelling statement, they cannot meet the ‘secure fit’ for writing, however capable they may be in other aspects of writing.
  • For children with physical difficulties, the handwriting element is exempt for the expected standard but not for the ‘greater depth’ standard.
  • In response to questions around the standard of writing demanded in the exemplification materials, it was suggested that ‘Morgan’ from the exemplification standard is considered to be more of the 4B example that was announced.  Leigh, is considered to be more of a borderline between ‘expected’ and ‘greater depth’ – he is considered to have some but not all of the aspects of the ‘greater depth’ descriptors.
  • For children who are working at a ‘greater depth’, there is no requirement for any additional evidence base to be collated.  It is expected that the greater depth statements can be evidenced within the existing body of work.
  • There was an announcement of a ‘high score’ which will be measured after the tests which will be published similar to a previous Level 6? This will also be published.
  • The definition of ‘coasting schools’ was re-explained. There’s a definition here which I think is right.

MODERATION

LAs will be informing schools that will receive a moderation visit on or after the 20th of May either the afternoon before or on the morning of a moderation visit.

  • Moderators will choose the specific children that will be moderated either before or at their moderation visit.
  • The ‘supportive’ process of moderation will now take place before data is submitted (as previous) so that moderation should inform the final data that is submitted on the 30th of June.  Schools will be expected to have data on judgements available for moderators before the 30th of June, should they receive a moderation visit.
  • Moderators will not be involved in moderating judgements of children working at pre-key stage standards (old P-Scales). These should be moderated locally by either clusters or between schools.
  • The evidence base for each child may vary – there is no pre-requisite for there to be the same pieces of work for every child.
  • STA will be sampling 48 (a third) of local authorities to QA the moderation process across the UK this year.

TR