Note to readers: Recently
I became concerned about Superintendent White’s proposal that the new calculation of a school’s performance score would have a 25% component based on
the annual improvement of student test performance. I especially question such
a system applied to all schools, particularly since A rated schools
often have little room to improve. This concern only added to my fears that our
new Louisiana school principal evaluation system may also be overly reliant on
perpetual student test score increases.
With the above
concerns in mind I asked Herb Bassett, a Louisiana educator whom I regard as
one of the best analysts of statistical based rating systems, to study the new
principal evaluation system now in operation in Louisiana and provide my
readers with his insights as to the appropriateness of this model for principal
evaluation.
Mr. Bassett’s
conclusions described below are very worrisome, and lead me to believe that principals
and district superintendents have been saddled with a poorly designed and
extremely unfair system for principal evaluations. Please review Mr. Bassett’s
analysis and also my commentary following the analysis.
Bassett’s Analysis of Guidelines for Principal
Evaluations
The following is an analysis
of SPS growth targets recommended by the Louisiana Department of Education as
part of the latest principal evaluation system. The
analysis explains how LDOE:
1.
imposed what
amounts to a stack ranking system designed to fail both 25% of A school
principals and 25% of D school principals on at least one component of their
evaluations
Note from editor: Stack ranking is a type of employee evaluation system that ranks employees on the results of the employee evaluation. It is common in stack ranking to designate a certain percentage of employees each year as unsatisfactory and a certain percentage as satisfactory as well as a certain percentage as high performers. This procedure amounts to a quota system for each level in the evaluation system. Even though the ranking affects only part of the principal’s evaluation, it can make a huge difference in the final evaluation.,
Note from editor: Stack ranking is a type of employee evaluation system that ranks employees on the results of the employee evaluation. It is common in stack ranking to designate a certain percentage of employees each year as unsatisfactory and a certain percentage as satisfactory as well as a certain percentage as high performers. This procedure amounts to a quota system for each level in the evaluation system. Even though the ranking affects only part of the principal’s evaluation, it can make a huge difference in the final evaluation.,
2.
overrode its own
Achievement Level Descriptions for the majority of its target recommendations,
that the overrides had a downward influence on principal ratings, and that LDOE
did not clearly explain its overrides in its Goal -Setting Toolkits,
3.
used an incorrect
method to establish its target recommendations. This resulted in
unrealistically high targets for A school principals while allowing relatively lax targets for D and F school
principals.
This year, the LDOE convinced the Accountability Commission and BESE to tie principals'
evaluations directly to SPS growth. Bulletin 130 now states:
§305. Measures of Growth in Student
Learning Targets
D. Principals and Administrators. A
minimum of two student learning targets shall be identified for each
administrator.
1. For principals, the LDE shall provide
recommended targets to use in assessing the quality and attainment of both student
learning targets, which will be based upon a review of “similar” schools. The
LDE will annually publish the methodology for defining “similar” schools.
2. For principals, at least one learning
target shall be based on overall school performance improvement in the current
school year, as measured by the school performance score.
LDOE was left to decide how it would set the targets. Its 2016 overall SPS Improvement target recommendations would lead to the following (based on the recently released 2016 SPSs):
·
Over two-thirds
of principals of A-rated high schools would get the lowest rating while
only one principal of an F-rated high school would do so.
·
No A-rated
combination school principal would make the highest rating while exactly half
of the principals of F-rated high schools would.
·
More than
one-third of all principals would make the lowest rating.
Why would we require an A school to improve its SPS more than twice as much as a D school to rate full attainment?
Principals were encouraged to
base a second goal on an individual component of the SPS. LDOE recommended
similarly flawed targets for those as well.
I certainly hope
that many principals and their supervisors chose to override the LDOE recommendations
when they set their goals. This linked spreadsheet shows
how the flawed targets would negatively impact principal evaluations
sorted by each school configuration and letter grade.
1) LDOE essentially applied stack-ranking to each
letter-grade category of schools to achieve the same 25%
"insuffucient" quotas from A-school principals and D-school
principals.
LDOE published targets in its
Principal Goal Setting Toolkits for K-8 Schools, Combination Schools, and High Schools. "Similar schools"
were defined by school type - Elementary, Combination,
and High
Schools - and further subdivided by school letter grade.
LDOE's achievement level
descriptions indicate that targets were set by the prior year SPS growth of the
schools at the 25th, 50th, and 75th percentile within each "similar
school" category. My analysis finds that LDOE used a system that required more growth from A
schools than D or F schools to reach "full attainment".
This amounts to a stack
ranking system with arbitrary quotas of 25% insufficient, 25% partial, 25%
full, and 25% exceeds attainment within each school letter grade category.
By requiring principals
to set an Overall SPS Improvement target, LDOE's system effectively made A
school principals compete against A school principals, B school principals
compete against B school principals and so on. In its setting of recommended
targets, LDOE ignored the thorny question of,
"Are A school principals as good/bad as D and F school principals
on the whole?" If not, why use the same quota for each group?
The data
presented in the figure below - compiled from the Goal Setting Toolkits - show
that A schools significantly outperform D and F schools
with moving struggling students past their VAM expectations. Why, then should
we accept a system designed to rate principals of A schools
"insufficient" as often as principals of D and F
schools? Such a formula for rating principals seems to be contrary to the
entire theory of rating schools and their staffs using SPS and the letter
grading system. The logic also runs completely contrary to some of the
assumptions made by the U.S. Dept. of Education in recent years which resulted
in the firing of some principals of low performing schools as a form of
restructuring designed to produce “school turnaround”.
2) LDOE overrode its own
Achievement Level Descriptions in a manner that would produce higher-than-quota
percentages of "insufficient" and "partial" attainment.
LDOE's Achievement Level
Descriptions indicate that the "partial attainment" target was set by
the growth of the school at the 25th percentile in the prior year. I found,
however, that if that school had negative growth, LDOE set the minimum target
for "partial attainment" to 0.1 even though that value corresponded
to a higher percentile. Presumably LDOE interpreted "school performance
improvement" to exclude negative growth. The override applied to the vast
majority of the targets. LDOE marked such data with "^" but gave no
explanation that I could find in the Toolkits.
Because of this override, the
principal of any school showing negative growth would automatically rate
"insufficient" and overall, that would result in more than 25%
"insufficient" ratings.
Additionally, when that 25th
percentile override applied, LDOE also raised the "full attainment"
target to a value higher than the growth of the 50th percentile school even if
that school showed positive growth.
Thus, most of the recommended
targets for partial attainment and full attainment were actually set from
higher percentile ranks than what the Achievement Level Descriptions stated.
There is an important issue
that accompanies setting targets based on the previous year's growth. A year in
which there is exceptionally strong overall growth dooms the next year to high
"insufficient" rates because the new targets are based on an
unsustainable rate of growth.
To its credit, LDOE did
provide two years of Overall SPS Improvement data and three years of individual
component data for reference. However, it provided targets clearly labeled: 2015-2016
Recommended Targets: based on 2013-2014 and 2014-2015 results.
3) LDOE's recommended
targets were based on flawed methodology.
I reconstructed LDOE's high
school targets and found that LDOE sorted the prior year growth by schools ending
letter grades rather than their starting letter grades. Now, the
principals were asked to set goals based on their schools' starting
letter grades. For an accurate "similar school" comparison we must
compare the schools starting this year with an A to schools that
last year started with an A.
In calculating its
recommended targets for the A schools, LDOE included schools that
started with B's and C's the previous
year but grew to an A, while simultaneously it excluded any school that
started with an A but dropped to a B. By systematically adding in
schools with excellent growth and excluding some schools with negative growth,
the rankings were skewed and led to unrealistically high expectations being set
for schools that were starting from an A.
In setting the F
school targets, LDOE removed schools that rose from an F to a D
or C. It computed the target based only on the schools that remained an F.
This resulted in LDOE setting much lower recommended targets for F and D
schools than for A schools.
The targets for
the B, C, and D schools were skewed, but to a lesser
extent because some of the movement between school letter grades was offsetting
in these middle letter grade categories.
The chart below provides
LDOE's recommended targets for high schools, my reconstruction of LDOE's
targets, and the targets that would have resulted from sorting by starting
letter grade rather than by ending letter grade. Note that the targets based on
sorting by the starting letter grade fit the expected pattern of generally
requiring more improvement from the lower letter grade schools and less
improvement from the higher letter grade schools.
While I have not reconstructed every one of
LDOE's target recommendations, it is clear that all of the recommended targets,
both for Overall SPS Improvement and individual components and for all years'
data were computed using the wrong sorting. Principals have been asked to base their targets on faulty data.
Consider the impact on Ruston High School. In 2016 it grew 7.0
points from 100.3 A to 107.3. A. The flawed recommended target would rate that
growth only "partial" whereas under the version using the correctly
sorted data, that seven point growth would rate "exceeds".
I question the logic of
trying to force the same quotas of each letter-grade school principals into the
four performance categories, and I question the efficacy of attempting to force
high rates of low evaluations. If we discourage and run off too many
principals, where will we find their replacements?
I especially question the
wisdom of using a system that most of the time assigns lower improvement
targets to D and F schools than A and B schools.
Such recommendations create pressure to widen the achievement gap rather than
narrow it.
I urge principals and superintendents to recommend a better
evaluation system for the future and consider what measures should be taken to
rectify LDOE's mistakes in recommending targets for this year.
Finally,
I urge BESE and LDOE to allow that principals
who feel that the recommended targets have caused them to receive unjust
ratings be allowed to adjust their targets retroactively in consultation with
their supervisors and superintendents.
Herb Bassett, Grayson, LA
My Observations and Commentary
I believe Mr.
Bassett’s findings reveal an extremely ill-conceived and careless method for a
critical part of our school principal evaluation system. If applied as designed
by the Louisiana Department of Education, I believe it will result in unfair
evaluations and a lowering of the morale of our dedicated school principals.
Not only do I find the new guidelines statistically flawed, but I also question
the motive behind such a negatively skewed rating system.
Teachers are already complaining that the incessant drive by our Louisiana Department of Education to simply raise student test scores each year is interfering with a healthy teaching and learning environment in our schools.
Teachers are already complaining that the incessant drive by our Louisiana Department of Education to simply raise student test scores each year is interfering with a healthy teaching and learning environment in our schools.
I believe this new
principal evaluation system and also the 25% school improvement component for
rating and grading schools to be far too obsessed with a perpetual raising of
student test scores. Teachers are already complaining that the incessant drive
by our Louisiana Department of Education to simply raise student test scores
each year is interfering with a healthy teaching and learning environment in
our schools. Tests are important but they should not embody the whole of the
education experience for our children. This incessant pressure to raise test
scores no matter what, is what caused the cheating scandals we have seen in
Atlanta, Washington D. C., and El Paso.
Educators in
Louisiana have already been through a very unfortunate and unfair experience
when the state attempted to mandate that each year 10% of teachers of tested
subjects would automatically be rated “ineffective” based on the flawed value
added model. Now the state is trying to impose a failure rate for principals
based on student test scores, with a higher resulting failure rate
for principals of our highest rated schools. This is a negative evaluation
quota system on steroids! Such a system is insanity and should be scrapped
immediately!
Finally my
over-arching concern about this whole matter is that I have come to believe
that this latest scheme attempting to mandate an extremely harsh evaluation
system for principals is really designed to pressure principals to fire more
teachers based on student test scores. I do not believe the state should be
attempting to negatively evaluate and fire more principals and teachers in A
rated schools, but I also do not believe it would be proper to arbitrarily fire more
personnel in D and F rated schools! That’s because the evidence
is overwhelming that student test scores are much more heavily influenced by
socio-economic factors than by the school and its personnel. Until our
education reformers understand this fact we will be forever doomed to
scapegoating our professional educators for factors over which they have little
control.
Mike Deshotels
Mike Deshotels