Teacher Training

Effect Size d= 0.11  (Hattie's Rank=124) this is similar to 'Teacher Subject Matter Knowledge' d = 0.09 but, conflicts with 'Teacher Professional Development' d = 0.62.

Teacher Training is another controversial finding as the effect size appears to contradict the Professional Development effect size and PISA analysis. Much of the same research was used in Teacher Training and Teacher Subject Knowledge, which accounts for the similar effect size.

All the studies Hattie uses, apart from the early childhood study (which should not be included anyway), are NOT about teacher training but about certification after teacher training. This is yet again another example of Hattie misrepresenting studies.

Hattie used the following meta-analyses:

Authors/Teacher TrainingYearNo. studiesstudentsMean (d)CLEVariable
Wu, Becker & Kennedy2002240.086%NBC vs Non-NBC
Wu, Becker & Kennedy2002240.1410%Trad v Emergency
Hacke20102119897610.09NBC vs Non-NBC
Kelley & Camilli2007320.15Early childhood teacher Ed
Sparks200450.128%Trad v Emergency

It was very difficult to track these papers down as Hattie does not correctly reference these studies - no title nor publication details.

Kelley & Camilli (2007) is a study on early childhood teachers, so should not be included in this influence. Sparks (2004) is a PhD dissertation, note: many researchers do not accept dissertations or conference presentations as they are not subject to peer review.

Sparks (2004) compares certified versus alternatively certified or non-certified teachers. She does not compare teacher training. Sparks reports a range of correlational effect sizes, she advises against averaging them, "The disparate definitions of certification do not permit effect size estimates to be combined" (p89). Yet Hattie goes ahead to get an average of 0.12.

Sparks reports a major problem with how student achievement is represented - a major issue is the use of school or state-level means in place of individual student data (p101). For example, in one of the 5 studies used "student achievement was represented by a single state-level mean and paired with the proportion of certified teachers in the state" (p102).

Wu, Becker & Kennedy (2002) is a conference presentation mostly comparing National Board Certified Teachers (NBC) with non-certified teachers (Non-NBC). 

NBC is a teaching certificate for teachers with a teaching degree plus 3 years experience.

Certification consists of four components:
-written assessment of content knowledge,
-reflection on student work samples,
-video and analysis of teaching practice,
-documented impact and accomplishments as a teaching professional.

The certification is expensive and takes a lot of time to prepare, so many experienced teachers do not go through the process. Also, school districts in poorer areas do not require certification as there is major turnover and shortage of teachers. Harris and Sass (2009) report these are MAJOR confounding variables (p7).

Hacke (2010) is a PhD dissertation, she also compares NBC versus Non-NBC teachers. The cost of certification is between $18,000- $31,000 per teacher and about 400 hours work (p113). One of Hacke's aims was to determine whether the cost and time of certification are worth the effort. The low effect size indicates NO.

So NONE of these papers are about Teacher Training, yet Hattie has included them in this category.

An example of Hattie's arbitrary Interpretations:


To show the inconsistency, Hattie uses his own research on 65 teachers comparing NBC with Non-NBC teachers and reports this in the last chapter of VL. But, Hattie is using the research for a very different purpose, to demonstrate the difference between expert versus experienced teachers. Hattie makes the arbitrary judgement that NBC certified teachers are 'Experienced Experts' while Non-NBC teachers are 'Experienced'. He does not use student achievement but rather arbitrary criteria as displayed in the graph below.

Podgursky (2001) in his critique, describes these criteria as "nebulous standards". Podgursky is also rather suspicious of Hattie's rationale for not using student achievement, “It is not too much of an exaggeration to state that such measures have been cited as a cause of all of the nation’s considerable problems in educating our youth. . . . It is in their uses as measures of individual teacher effectiveness and quality that such measures are particularly inappropriate" (p2).

Hattie concludes that expert teachers (NBC) outperform Non-NBC teachers on almost every criterion (p260). Although Professor Gore, who delivered the Dean's Lecture at Hattie's school at Melbourne University disagrees with him.





Harris and Sass (2009) report that the National Board for Professional Teaching Standards (NBPTS) who administer the NBC generate around $600 million in fees each year (p4). Harris and Sass's much larger study 'covering the universe of teachers and students in Florida for a four -year span' (p1) contradict Hattie's conclusion, 'we find relatively little support for NBC as a signal of teacher effectiveness' (p25).

It is interesting that much of Hattie's consulting work to schools involves measuring teachers on the arbitrary categories listed on the graph above, a significant omission is Teacher Subject Knowledge.

Yet, using the same type of research, e.g. Hacke (2010), comparing NBC with Non-NBC teachers, he uses the low effect size to conclude that Teacher Education is a DISASTER. See Hattie's slides from his 2008 Nuthall lecture.



Another example of inconsistent results when using different outcomes:


Hacke (2010) pinpoints the central issue to all of Hattie's work, "identifying effective teachers hinges on how it is defined and measured" (p32).

This is a good example of, on the one hand, the NBC research (using arbitrary outcomes) being used to demonstrate a SPECTACULAR difference in expert vs experienced teachers; yet when measuring student achievement it is used to demonstrate teacher training is a disaster.

Hacke (2010) goes further, illustrating the inconsistency of using the different type of tests: criterion -referenced tests are intended to measure how well a person has learned a specific body of knowledge and skills, whereas norm-referenced tests are developed to compare students with each other and are designed to produce a variance in scores. She cites a unique study by Harris and Sass (2007), who examined the influence of teacher certification (NBC) using two different types of assessment data from the state of Florida, which gives both norm-referenced and criterion referenced tests. Harris and Sass compared the results which revealed that the effect of NBC was negative for both reading and mathematics using the norm-referenced test, YET, for the criterion-referenced assessments they were positive (p109).

A more detailed look at the studies:


Hacke (2010) is a dissertation comparing National Board Certificated (NBC) teachers with Non-NBC teachers (p8).

Hacke's dissertation is of high-quality and is very thorough. Her major inclusion criteria are: studies must be done in the USA on year 3-12 students; "Student achievement is defined as end-of-year or end-of-instruction test score gains on standardised tests in reading and mathematics" (p20).

Also, she agrees with Wu et al, in identifying the major confounds. "The on-going debate over what an effective teacher is and does make measuring teacher effectiveness elusive, as there is no generally accepted method for doing so" (p28).



Wu, Becker & Kennedy (2002) is a paper presented at the annual meeting of the American Educational Research Association. I have not been able to get a copy of the full presentation, but I contacted Professor Kennedy and she sent me a summary. In the introduction they state, 

"Our synthesis will focus on studies conducted in the United States since 1960 ... We decided to limit ourselves to the U.S. K-12 context on the premise that the factors involved in the training and hiring of teachers at other educational levels and in other countries may differ functionally and culturally from those at play in the U.S. K-12 system. We also are examining studies of teachers in the workplace -- that is, we will not include studies of preservice teachers because we presume that they are still learning to teach and the relationships we might observe between qualifications and teaching outcomes for this population might not be reliable or stable. Also, studies examining whether in-service programs make better teachers are omitted."

They detail the "inclusion/exclusion" criteria used to select studies for their synthesis - it is interesting many of Hattie's studies would fail on many of these criteria, e.g., BIAS. They state, 

"When reviewers do not describe how studies are selected, the reader is left to wonder whether personal predilections or biases led the reviewer to select studies favouring his or her viewpoint. A major goal of data collection for systematic reviews is to have thorough and replicable search and selection procedures." 

They compare Alternate routes to teacher certification with traditional routes. Alternate routes often attract older recruits who are qualified in another profession (career changers) and involve an internship or teacher training "on the job". Emergency routes are often created by school districts in response to teacher shortages (the obvious confounding variable here is these school districts are often in the poor areas with low achieving schools!). 

They identify another major confounding variable - the terms 'qualifications' and 'quality' are used differently by researchers and are measured differently. Some consider a college education to be a qualification and teacher assessments to be a measure of quality; while others use teacher test scores as indications of qualifications and student achievement as quality. Once again the old comparing apples with oranges problem of meta-analyses. Their summary of Teacher Qualifications and Quality:


Teacher QualificationsNumber of DissertationsNumber of Other SourcesTotal
Educational Background*9690186
Certification365591
Subject Matter Knowledge61319
Verbal Ability077
Other Test Score213253
Teaching Experience4462
106



Quality of TeachingNumber of DissertationsNumber of Other SourcesTotal
Student Achievement110157267
Observed Classroom Practice173451
Teacher/School Effectiveness13114
Performance Assessments12012

Note the use of 'Subject Matter Knowledge', where they use many of the studies that Hattie uses for the different influence 'Teacher Subject Matter Knowledge'. This partially accounts for the similar effect sizes. (Although, the use of the same data across different influences is poor scholarship as it leads to bias).

Also, Professor Becker has published some details of their results which show wide variation (p12):



Becker warns, "The literature tells us very little about the exact nature of these programs, or about the comparison (traditional) programs" (p14).

They conclude with a caution: "Because we are still obtaining sources, the data we present here is tentative;"


Kelley & Camilli (2007) compare pre-school teachers with/without a BA of students from 3-5 years of age. They conclude:

"The analysis indicated that effects on quality outcomes from teachers with a bachelor’s degree (the treatment group) were significantly different from those teachers with less education (the comparison group). In standard deviation units, the average effect was .16 standard deviations ... There are, however, two caveats. First, the effect size is relatively small, though significant ... Second, the research underlying this effect size is correlational in nature. Thus, it is possible that any number of factors, aside from having a bachelor’s degree, cause this effect" (p1).

However, pre-school studies are precisely, the studies that Wu, Becker & Kennedy (2002) above have removed from their analysis - "we will not include studies of preservice teachers because we presume that they are still learning to teach and the relationships we might observe between qualifications and teaching outcomes for this population might not be reliable or stable." 

So this is another example of the problem with meta-analysis in comparing 'apples with oranges'.

However, it is an excellent summary of the protocols used in meta-analyses and highlights many of the issues of this methodology:

"Correlations were transformed into comparative effect sizes when sufficient information was given (e.g., point-biserials were transformed to ES). When studies failed to report such information, the Pearson correlation coefficient r was used as the primary effect size measure" (p18).

They advocate a strengthening of the peer review process as it is the traditional safeguard for ensuring complete and accurate reporting (p35).


Sparks (2004) 'The Looming Danger of a Two-Tiered Professional Development System'is not a meta-analysis but rather a commentary on professional development. So should NOT be included in this category. It is only 3 pages long and there is no mention of an effect size anywhere. So I'm not sure how Hattie gets d = 0.12. 

Sparks comments on two tiers of professional development - NOT Teacher Training or qualifications. "The first tier is an emerging system that advocates the development of professional community and the exercise of professional judgment ... Conversely, the second tier of professional development is built on mandates, scripted teaching, and careful monitoring for compliance" (p304).

He states, "I have several concerns about this second tier of professional development. Far too many tier-two efforts begin and end with top down, highly prescriptive approaches, leaving the culture of schools untouched and teachers and students ill prepared to function much beyond the most rudimentary levels of performance. I am also concerned that demeaning and mind-numbing staff development will create a persistent aversion to professional learning and leave teachers feeling resigned to their fate and dependent on experts as the primary source for their development. And most important, because such forms of professional development are typically directed at those who teach our most vulnerable students, I believe that this approach will have long-term, deleterious consequences for poor and minority students" (p305)

Sparks also interviewed education guru Andy Hargreaves in 2004, on issues relevant to our discussion - 'Broader purpose calls for higher understanding.' Hargreaves states,  'I come from England, where the professional culture was for many years based on a craft view of teaching in which teachers know best and researchers know little. Research was disparaged as irrelevant and esoteric with no relevance to the classroom. In moving to America, I found the opposite problem in which there's a tendency not only to respect but to revere research and researchers, to give them too much of their due, and not to challenge them enough from the wisdom of practice. Both of these extremes are undesirable.

The challenge is to bring the wisdom of practice into critical dialogue with the wisdom of research' (p49).

PISA Analysis:


The highly respected Grattan Institute analysed the high performing international educational systems and concluded that one of the reforms responsible for improving student achievement across the four high-performing education systems in East Asia was 'providing high-quality initial teacher education' (p12). While the high performing Finnish system introduced this reform in the late 1990's - see the interview with Pasi Sahlberg.


An example of a study on teacher training is in English schools.