Monday, November 28, 2016

Our Louisiana School Principal Evaluation System is Seriously Flawed

Note to readers: Recently I became concerned about Superintendent White’s proposal that the new calculation of a school’s performance score would have a 25% component based on the annual improvement of student test performance. I especially question such a system applied to all schools, particularly since A rated schools often have little room to improve. This concern only added to my fears that our new Louisiana school principal evaluation system may also be overly reliant on perpetual student test score increases.

With the above concerns in mind I asked Herb Bassett, a Louisiana educator whom I regard as one of the best analysts of statistical based rating systems, to study the new principal evaluation system now in operation in Louisiana and provide my readers with his insights as to the appropriateness of this model for principal evaluation.

Mr. Bassett’s conclusions described below are very worrisome, and lead me to believe that principals and district superintendents have been saddled with a poorly designed and extremely unfair system for principal evaluations. Please review Mr. Bassett’s analysis and also my commentary following the analysis.

Bassett’s Analysis of Guidelines for Principal Evaluations

The following is an analysis of SPS growth targets recommended by the Louisiana Department of Education as part of the latest principal evaluation system. The analysis explains how LDOE:

1.     imposed what amounts to a stack ranking system designed to fail both 25% of A school principals and 25% of D school principals on at least one component of their evaluations 
  Note from editor: Stack ranking is a type of employee evaluation system that ranks employees on the results of the employee evaluation. It is common in stack ranking to designate a certain percentage of employees each year as unsatisfactory and a certain percentage as satisfactory as well as a certain percentage as high performers. This procedure amounts to a quota system for each level in the evaluation system. Even though the ranking affects only part of the principal’s evaluation, it can make a huge difference in the final evaluation.,
2.     overrode its own Achievement Level Descriptions for the majority of its target recommendations, that the overrides had a downward influence on principal ratings, and that LDOE did not clearly explain its overrides in its Goal -Setting Toolkits, 
3.     used an incorrect method to establish its target recommendations. This resulted in unrealistically high targets for A school principals while allowing relatively lax targets for D and F school principals.

This year, the LDOE convinced the Accountability Commission and BESE to tie principals' evaluations directly to SPS growth. Bulletin 130 now states:

§305. Measures of Growth in Student Learning Targets
D. Principals and Administrators. A minimum of two student learning targets shall be identified for each administrator.
1. For principals, the LDE shall provide recommended targets to use in assessing the quality and attainment of both student learning targets, which will be based upon a review of “similar” schools. The LDE will annually publish the methodology for defining “similar” schools.
2. For principals, at least one learning target shall be based on overall school performance improvement in the current school year, as measured by the school performance score.

LDOE was left to decide how it would set the targets. Its 2016 overall SPS Improvement target recommendations would lead to the following (based on the recently released 2016 SPSs):

·      Over two-thirds of principals of A-rated high schools would get the lowest rating while only one principal of an F-rated high school would do so.

·      No A-rated combination school principal would make the highest rating while exactly half of the principals of F-rated high schools would.

·      More than one-third of all principals would make the lowest rating.

Why would we require an A school to improve its SPS more than twice as much as a D school to rate full attainment?
These outcomes defy common sense. LDOE's recommended targets were unrealistic for A rated schools and comparatively lax for D and F schools. Why would we require an A school to improve its SPS more than twice as much as a D school to rate full attainment? These inverted expectations came from a questionable ranking system and from using incorrect methods to calculate those rankings.

Principals were encouraged to base a second goal on an individual component of the SPS. LDOE recommended similarly flawed targets for those as well.

I certainly hope that many principals and their supervisors chose to override the LDOE recommendations when they set their goals. This linked spreadsheet shows how the flawed targets would negatively impact principal evaluations sorted by each school configuration and letter grade.

1) LDOE essentially applied stack-ranking to each letter-grade category of schools to achieve the same 25% "insuffucient" quotas from A-school principals and D-school principals.

LDOE published targets in its Principal Goal Setting Toolkits for K-8 Schools, Combination Schools, and High Schools. "Similar schools" were defined by school type - Elementary, Combination, and High Schools - and further subdivided by school letter grade.

LDOE's achievement level descriptions indicate that targets were set by the prior year SPS growth of the schools at the 25th, 50th, and 75th percentile within each "similar school" category. My analysis finds that LDOE used a system that required more growth from A schools than D or F schools to reach "full attainment".

This amounts to a stack ranking system with arbitrary quotas of 25% insufficient, 25% partial, 25% full, and 25% exceeds attainment within each school letter grade category.

By requiring principals to set an Overall SPS Improvement target, LDOE's system effectively made A school principals compete against A school principals, B school principals compete against B school principals and so on. In its setting of recommended targets, LDOE ignored the thorny question of,  "Are A school principals as good/bad as D and F school principals on the whole?" If not, why use the same quota for each group?

The data presented in the figure below - compiled from the Goal Setting Toolkits - show that A schools significantly outperform D and F schools with moving struggling students past their VAM expectations. Why, then should we accept a system designed to rate principals of A schools "insufficient" as often as principals of D and F schools? Such a formula for rating principals seems to be contrary to the entire theory of rating schools and their staffs using SPS and the letter grading system.  The logic also runs completely contrary to some of the assumptions made by the U.S. Dept. of Education in recent years which resulted in the firing of some principals of low performing schools as a form of restructuring designed to produce “school turnaround”.

2) LDOE overrode its own Achievement Level Descriptions in a manner that would produce higher-than-quota percentages of "insufficient" and "partial" attainment.

LDOE's Achievement Level Descriptions indicate that the "partial attainment" target was set by the growth of the school at the 25th percentile in the prior year. I found, however, that if that school had negative growth, LDOE set the minimum target for "partial attainment" to 0.1 even though that value corresponded to a higher percentile. Presumably LDOE interpreted "school performance improvement" to exclude negative growth. The override applied to the vast majority of the targets. LDOE marked such data with "^" but gave no explanation that I could find in the Toolkits.

Because of this override, the principal of any school showing negative growth would automatically rate "insufficient" and overall, that would result in more than 25% "insufficient" ratings.

Additionally, when that 25th percentile override applied, LDOE also raised the "full attainment" target to a value higher than the growth of the 50th percentile school even if that school showed positive growth.

Thus, most of the recommended targets for partial attainment and full attainment were actually set from higher percentile ranks than what the Achievement Level Descriptions stated.

There is an important issue that accompanies setting targets based on the previous year's growth. A year in which there is exceptionally strong overall growth dooms the next year to high "insufficient" rates because the new targets are based on an unsustainable rate of growth.

To its credit, LDOE did provide two years of Overall SPS Improvement data and three years of individual component data for reference. However, it provided targets clearly labeled: 2015-2016 Recommended Targets: based on 2013-2014 and 2014-2015 results.

3) LDOE's recommended targets were based on flawed methodology.

I reconstructed LDOE's high school targets and found that LDOE sorted the prior year growth by schools ending letter grades rather than their starting letter grades. Now, the principals were asked to set goals based on their schools' starting letter grades. For an accurate "similar school" comparison we must compare the schools starting this year with an A to schools that last year started with an A.

In calculating its recommended targets for the A schools, LDOE included schools that started with B's and C's the previous year but grew to an A, while simultaneously it excluded any school that started with an A but dropped to a B. By systematically adding in schools with excellent growth and excluding some schools with negative growth, the rankings were skewed and led to unrealistically high expectations being set for schools that were starting from an A.

In setting the F school targets, LDOE removed schools that rose from an F to a D or C. It computed the target based only on the schools that remained an F. This resulted in LDOE setting much lower recommended targets for F and D schools than for A schools.

The targets for the B, C, and D schools were skewed, but to a lesser extent because some of the movement between school letter grades was offsetting in these middle letter grade categories.

The chart below provides LDOE's recommended targets for high schools, my reconstruction of LDOE's targets, and the targets that would have resulted from sorting by starting letter grade rather than by ending letter grade. Note that the targets based on sorting by the starting letter grade fit the expected pattern of generally requiring more improvement from the lower letter grade schools and less improvement from the higher letter grade schools.

While I have not reconstructed every one of LDOE's target recommendations, it is clear that all of the recommended targets, both for Overall SPS Improvement and individual components and for all years' data were computed using the wrong sorting. Principals have been asked to base their targets on faulty data.

Consider the impact on Ruston High School. In 2016 it grew 7.0 points from 100.3 A to 107.3. A. The flawed recommended target would rate that growth only "partial" whereas under the version using the correctly sorted data, that seven point growth would rate "exceeds".

I question the logic of trying to force the same quotas of each letter-grade school principals into the four performance categories, and I question the efficacy of attempting to force high rates of low evaluations. If we discourage and run off too many principals, where will we find their replacements?

I especially question the wisdom of using a system that most of the time assigns lower improvement targets to D and F schools than A and B schools. Such recommendations create pressure to widen the achievement gap rather than narrow it.

I urge principals and superintendents to recommend a better evaluation system for the future and consider what measures should be taken to rectify LDOE's mistakes in recommending targets for this year.

Finally, I urge BESE and LDOE to allow that principals who feel that the recommended targets have caused them to receive unjust ratings be allowed to adjust their targets retroactively in consultation with their supervisors and superintendents.

Herb Bassett, Grayson, LA

My Observations and Commentary

I believe Mr. Bassett’s findings reveal an extremely ill-conceived and careless method for a critical part of our school principal evaluation system. If applied as designed by the Louisiana Department of Education, I believe it will result in unfair evaluations and a lowering of the morale of our dedicated school principals. Not only do I find the new guidelines statistically flawed, but I also question the motive behind such a negatively skewed rating system.

Teachers are already complaining that the incessant drive by our Louisiana Department of Education to simply raise student test scores each year is interfering with a healthy teaching and learning environment in our schools. 
I believe this new principal evaluation system and also the 25% school improvement component for rating and grading schools to be far too obsessed with a perpetual raising of student test scores. Teachers are already complaining that the incessant drive by our Louisiana Department of Education to simply raise student test scores each year is interfering with a healthy teaching and learning environment in our schools. Tests are important but they should not embody the whole of the education experience for our children. This incessant pressure to raise test scores no matter what, is what caused the cheating scandals we have seen in Atlanta, Washington D. C., and El Paso.

Educators in Louisiana have already been through a very unfortunate and unfair experience when the state attempted to mandate that each year 10% of teachers of tested subjects would automatically be rated “ineffective” based on the flawed value added model. Now the state is trying to impose a failure rate for principals based on student test scores, with a higher resulting failure rate for principals of our highest rated schools. This is a negative evaluation quota system on steroids! Such a system is insanity and should be scrapped immediately!

Finally my over-arching concern about this whole matter is that I have come to believe that this latest scheme attempting to mandate an extremely harsh evaluation system for principals is really designed to pressure principals to fire more teachers based on student test scores. I do not believe the state should be attempting to negatively evaluate and fire more principals and teachers in A rated schools, but I also do not believe it would be proper to arbitrarily fire more personnel in D and F rated schools! That’s because the evidence is overwhelming that student test scores are much more heavily influenced by socio-economic factors than by the school and its personnel. Until our education reformers understand this fact we will be forever doomed to scapegoating our professional educators for factors over which they have little control.

Mike Deshotels