Mike Brooks"Growing up in Nelsonville taught me to value and appreciate small-town America and the importance of educational opportunities to ensure the success of our region's families and hometowns. "
David Wilhelm"The story of Appalachian Ohio is a story of boom and busts and chicken coops, but it is also a story of people who cared a great deal about education. I think we come out of the southeastern Ohio hills with some lessons in life that give us a competitive edge."
ESEA Reauthorization: Options for Improving NCLB's Measures of Progress
ESEA Reauthorization: Options for Improving NCLB's Measures of Progress
Testimony for the House Education and Labor Committee
Charles E. Ducommun Professor,
Stanford University School of Education
April 3, 2007
Charles E. Ducommun Professor,
Stanford University School of Education
April 3, 2007
I thank Chairman Miller and the members of the Committee for the opportunity to offer testimony on the re-authorization of ESEA, in particular the ways in which we measure and encourage school progress and improvement. My perspective on these issues is informed by my research, my work with states and national organizations on standards development, and my work with local schools. I have studied the implementation of No Child Left Behind, as well as testing and accountability systems within the United States and abroad. I have also served as past Chair of the New York State Council on Curriculum and Assessment and of the Chief State School Officers’ INTASC Standards Development Committee. I work closely with a number of school districts and local schools on education improvement efforts, including several urban high schools that I have helped to launch. Thus, I have encountered the issues of school improvement from both a system-wide and local school vantage point.
I am hopeful that this re-authorization can build on the strengths and opportunities offered by No Child Left Behind, while addressing needs that have emerged during the first years of the law’s implementation. Among the strengths of the law is its focus on improving the academic achievement of all students, which triggers attention to school performance and to the needs of students who have been underserved, and its insistence that all students are entitled to qualified teachers, which has stimulated recruitment efforts in states where many disadvantaged students previously lacked this key resource for learning.
The law has succeeded in getting states, districts, and local schools to pay attention to achievement. The next important step is to ensure that the range of things schools and states pay attention to actually helps them improve both the quality of education they offer to every student and the quality of the overall schooling enterprise. In order to accomplish this, I would ask you to actively encourage states to:
- Develop accountability systems that use multiple measures of learning and other important aspects of school performance in evaluating school progress;
- Differentiate school improvement strategies for schools based on a comprehensive analysis of their instructional quality and conditions for learning.
Why Use Multiple Measures?
There are at least three reasons to gauge student and school progress based on multiple measures of learning and school performance:
To direct schools’ attention and effort to the range of measures that are associated with high-quality education and improvement;
- To avoid dysfunctional consequences that can encourage schools, districts, or states to emphasize one important outcome at the expense of another; for example, focusing on a narrow set of skills at the expense of others that are equally critical, or boosting test scores by excluding students from school; and
- To capture an adequate and accurate picture of student learning and attainment that both measures and promotes the kinds of outcomes we need from schools.
Directing Attention to Measures Associated with School Quality
One of the central concepts of NCLB’s approach is that schools and systems will organize their efforts around the measures for which they are held accountable. Because attending to any one measure can be both partial and problematic, the concept of multiple measures is routinely used by policymakers to make critical decisions about such matters as employment and economic forecasting (for example, the Dow Jones Index or the GNP) and admission to college, where grades, essays, activities, and accomplishments are considered along with test scores.
Successful businesses use a “dashboard” set of indicators to evaluate their health and progress, aware that no single indicator is sufficient to understand or guide their operations. This approach is designed to focus attention on those aspects of the business that describe elements of the business’s current health and future prospects, and to provide information that employees can act on in areas that make a difference for improvement. So, for example, a balanced scorecard is likely to include among its financial indicators not only a statement of profits, but also cash flow, dividends, costs and accounts receivable, assets, inventory, and so on. Business leaders understand that efforts to maximize profits alone could lead to behaviors that undermine the long-term health of the enterprise.
Similarly, a single measure approach in education creates some unintended negative consequences and fails to focus schools on doing those things that can improve their long-term health and the education of their students. Although No Child Left Behind calls for multiple measures of student performance, the implementation of the law has not promoted the use of such measures for evaluating school progress. As I describe in the next section, the focus on single, often narrow, test scores in many states has created unintended negative consequences for the nature of teaching and learning, for access to education for the most vulnerable students, and for the appropriate identification of schools that are in need of improvement.
A multiple measures approach that incorporates the right “dashboard” of indicators would support a shift toward “holding states and localities accountable for making the systemic changes that improve student achievement” as has been urged by the Forum on Education and Accountability. This group of 116 education and civil rights organizations – which include the National Urban League, NAACP, League of United Latin American Citizens, Aspira, Children’s Defense Fund, National Alliance of Black School Educators, and Council for Exceptional Children, as well as the National School Boards Association, National Education Association, and American Association of School Administrators – has offered a set of proposals for NCLB that would focus schools, districts, and states on developing better teaching, a stronger curriculum, and supports for school improvement.
Avoiding Dysfunctional Consequences
Another reason to use a multiple measures approach is to avoid the negative consequences that occur when one measure is used to drive organizational behavior.
The current accountability provisions of the Act, which are focused almost exclusively on school average scores on annual tests, actually create large incentives for schools to keep students out and to hold back or push out students who are not doing well. A number of studies have found that systems that reward or sanction schools based on average student scores create incentives for pushing low-scorers into special education so that their scores won't count in school reports, retaining students in grade so that their grade-level scores will look better, excluding low-scoring students from admissions, and encouraging such students to leave schools or drop out.
Studies in New York, Texas, and Massachusetts, among others, have showed how schools have raised their test scores while “losing” large numbers of low-scoring students. For example, a recent study in a large Texas city found that student dropouts and push outs accounted for most of the gains in high school student test scores, especially for minority students. The introduction of a high-stakes test linked to school ratings in the 10th grade led to sharp increases in 9th grade student retention and student dropout and disappearance. Of the large share of students held back in the 9th grade, most of them African American and Latino, only 12% ever took the 10th grade test that drove school rewards. Schools that retained more students at grade 9 and lost more through dropouts and disappearances boosted their accountability ratings the most. Overall, fewer than half of all students who started 9th grade graduated within 5 years, even as test scores soared.
Paradoxically, NCLB’s requirement for disaggregating data and tracking progress for each subgroup of students increases the incentives for eliminating those at the bottom of each subgroup, especially where schools have little capacity to improve the quality of services such students receive. Table 1 shows how this can happen. At “King Middle School,” average scores increased from the 70th to the 72nd percentile between the 2002 and 2003 school year, and the proportion of students in attendance who met the proficiency standard (a score of 65) increased from 66% to 80% -- the kind of performance that a test-based accountability system would reward. Looking at subgroup performance, the proportion of Latino students meeting the standard increased from 33% to 50%, a steep increase.
However, not a single student at King improved his or her score between 2002 and 2003. In fact, the scores of every single student in the school went down over the course of the year. How could these steep improvements in the school’s average scores and proficiency rates have occurred? A close look at Table 1 shows that the major change between the two years was that the lowest-scoring student, Raul, disappeared. As has occurred in many states with high stakes-testing programs, students who do poorly on the tests – special needs students, new English language learners, those with poor attendance, health, or family problems – are increasingly likely to be excluded by being counseled out, transferred, expelled, or by dropping out.
King Middle School: Rewards or Sanctions?
The Relationship between Test Score Trends and Student Populations
King Middle School: Rewards or Sanctions?
The Relationship between Test Score Trends and Student Populations
Ave Score = 70
% meeting standard = 66%
Ave. Score = 72
% meeting standard = 80%
This kind of result is not limited to education. When one state decided to rank cardiac surgeons based on their mortality rates, a follow up investigation found that surgeons’ ratings went up as they stopped taking on high-risk clients. These patients were referred out of state if they were wealthy, or were not served, if they were poor.
The three national professional organizations of measurement experts have called attention to such problems in their joint Standards for Educational and Psychological Testing, which note that:
Beyond any intended policy goals, it is important to consider potential unintended effects that may result from large-scale testing programs. Concerns have been raised, for instance, about narrowing the curriculum to focus only on the objectives tested, restricting the range of instructional approaches to correspond to the testing format, increasing the number of dropouts among students who do not pass the test, and encouraging other instructional or administrative practices that may raise test scores without affecting the quality of education. It is important for those who mandate tests to consider and monitor their consequences and to identify and minimize the potential of negative consequences. 
Professional testing standards emphasize that no test is sufficiently reliable and valid to be the sole source of important decisions about student placements, promotions, or graduation, but that such decisions should be made on the basis of several different kinds of evidence about student learning and performance in the classroom. For example, Standard 13.7 states:
In educational settings, a decision or characterization that will have major impact on a student should not be made on the basis of a single test score. Other relevant information should be taken into account if it will enhance the overall validity of the decision.The Psychological Standards for Testing describe several kinds of information that should be considered in making judgments about what a student knows and can do, including alternative assessments that provide other information about performance and evidence from samples of school work and other aspects of the school record, such as grades and classroom observations. These are particularly important for students for whom traditional assessments are not generally valid, such as English language learners and special education students. Similarly, when evaluating schools, it is important to include measures of student progress through school, coursework and grades, and graduation, as part of the record about school accomplishments.
Evaluating Learning Well
Indicators beyond a single test score are important not only for reasons of validity and fairness in making decisions, but also to assess important skills that most standardized tests do not measure. Current accountability reforms are based on the idea that standards can serve as a catalyst for states to be explicit about learning goals, and the act of measuring progress toward meeting these standards is an important force toward developing high levels of achievement for all students. However, an on-demand test taken in a limited period of time on a single day cannot measure all that is important for students to know and be able to do. A credible accountability system must rest on assessments that are balanced and comprehensive with respect to state standards. Multiple-choice and short-answer tests that are currently used to measure standards in many states do not adequately measure the complex thinking, communication, and problem solving skills that are represented in national and state content standards.
Research on high-stakes accountability systems shows that, “what is tested is what is taught,” and those standards that are not represented on the high stakes assessment tend to be given short shrift in the curriculum.  Students are less likely to engage in extended research, writing, complex problem-solving, and experimentation when the accountability system emphasizes short-answer responses to formulaic problems. These higher order thinking skills are those very skills that often are cited as essential to maintaining America’s competitive edge and necessary for succeeding on the job, in college, and in life. As described by Achieve, a national organization of governors, business leaders, and education leaders, the problem with measures of traditional on-demand tests is that they cannot measure many of the skills that matter most for success in the worlds of work and higher education:
States … will need to move beyond large-scale assessments because, as critical as they are, they cannot measure everything that matters in a young person’s education. The ability to make effective oral arguments and conduct significant research projects are considered essential skills by both employers and postsecondary educators, but these skills are very difficult to assess on a paper-and pencil test.One of the reasons that U.S. students fall further and further behind their international counterparts as they go through school is because of differences in curriculum and assessment systems. International studies have found that the U.S. curriculum focuses more on superficial coverage of too many topics, without the kinds of in-depth study, research, and writing needed to secure deep understanding. To focus on understanding, the assessment systems used in most high-achieving countries around the world emphasize essay questions, research projects, scientific experiments, oral exhibitions and performances that encourage students to master complex skills as they apply them in practice, rather than multiple-choice tests.
As indicators of the growing distance between what our education system emphasizes and what leading countries are accomplishing educationally, the U.S. currently ranks 28th of 40 countries in the world in math achievement – right above Latvia – and 19th of 40 in reading achievement on the international PISA tests that measure higher-order thinking skills. And while the top-scoring nations – including previously low-achievers like Finland and South Korea – now graduate more than 95% of their students from high school, the U.S. is graduating about 75%, a figure that has been stagnant for a quarter century and, according to a recent ETS study, is now declining. The U.S. has also dropped from 1st in the world in higher education participation to 13th, as other countries invest more resources in their children’s futures.
Most high-achieving nations’ examination systems include multiple samples of student learning at the local level as well as the state or national level. Students’ scores are a composite of their performance on examinations they take in different content areas – featuring primarily open-ended items that require written responses and problem solutions – plus their work on a set of classroom tasks scored by their teachers according to a common set of standards. These tasks require them to conduct apply knowledge to a range of tasks that represent what they need to be able to do in different fields: find and analyze information, solve multi-step real-world problems in mathematics, develop computer models, demonstrate practical applications of science methods, design and conduct investigations and evaluate their results, and present and defend their ideas in a variety of ways. Teaching to these assessments prepares students for the real expectations of college and of highly skilled work.
These assessments are not used to rank or punish schools, or to deny promotion or diplomas to students. In fact, several countries have explicit proscriptions against such practices. They are used to evaluate curriculum and guide investments in professional learning -- in short, to help schools improve. By asking students to show what they know through real-world applications of knowledge, these nations’ assessment systems encourage serious intellectual activities on a regular basis. The systems not only measure important learning, they help teachers learn how to design curriculum and instruction to accomplish this learning.
It is worth noting that a number of states in the U.S. have developed similar systems that combine evidence from state and local standards-based assessments to ensure that multiple indicators of learning are used to make decisions about individual students and, sometimes, schools. These include Connecticut, Kentucky, Maine, Nebraska, New Hampshire, Oregon, Rhode Island, Pennsylvania, Vermont, and Wyoming, among others. However, many of these elements of state systems are not currently allowed to be used to gauge school progress under NCLB.
Encouraging these kinds of practices could help improve learning and guide schools toward more productive instruction. Studies have found that performance assessments that are administered and scored locally help teachers better understand students' strengths, needs, and approaches to learning, as well as how to meet state standards. Teachers who have been involved in developing and scoring performance assessments with other colleagues have reported that the experience was extremely valuable in informing their practice. They report changes in both the curriculum and their instruction as a result of thinking through with colleagues what good student performance looks like and how to better support student learning on specific kinds of tasks.
These goals are not well served by external testing programs that send secret, secured tests into the school and whisk them out again for machine scoring that produces numerical quotients many months later. Local performance assessments provide teachers with much more useful classroom information as they engage teachers in evaluating how and what students know and can do in authentic situations. These kinds of assessment strategies create the possibility that teachers will not only teach more challenging performance skills but that they will also be able to use the resulting information about student learning to modify their teaching to meet the needs of individual students. Schools and districts can use these kinds of assessments to develop shared expectations and create an engine for school improvement around student work.
Research on the strong gains in achievement shown in Connecticut, Kentucky, and Vermont in the 1990s attributed these gains in substantial part to these states’ performance-based assessment systems, which include such local components, and related investments in teaching quality. Other studies in states like California, Maine, Maryland, and Washington, found that teachers assigned more ambitious writing and mathematical problem solving, and student performance improved, when assessments included extended writing and mathematics portfolios and performance tasks. Encouraging these kinds of measures of student performance is critical to getting the kind of learning we need in schools.
Not incidentally, more authentic measures of learning that go beyond on-demand standardized tests to look directly at performance are especially needed to gain accurate measures of achievement for English language learners and special needs students for whom traditional tests are least likely to provide valid measures of understanding.
What Indicators Might be Used to Gauge School Progress?
This analysis suggests that school progress should be evaluated on multiple measures of student learning – including local and state performance assessments that provide evidence about what students can actually do with their knowledge – and on indicators of other student outcomes, including such factors as student progress and continuation through school, graduation, and success in rigorous courses. The importance of these indicators is to encourage schools to keep students in school and provide them with high-quality learning opportunities – elements that will improve educational opportunities and attainment, not just average test scores.
To these two categories of indicators, I would add indicators of learning conditions that point attention to both learning opportunities available to students (e.g. rigorous courses, well-qualified teachers) and to how well the school operates. In the business world, these kinds of measures are called leading indicators, which represent those things that employees can control and improve upon. These typically include evidence of customer satisfaction, such as survey data, complaints and repeat orders; as well as of employee satisfaction and productivity, such as employee turnover, project delays, evidence of quality and efficiency in getting work done; reports of work conditions and supports, and evidence of product quality.
Educational versions of these kinds of indicators are available in many state accountability systems. For example, State Superintendent Peter McWalters noted in his testimony to this committee that Rhode Island uses several means to measure school learning conditions. Among them is an annual survey to all students, teachers, and parents that provides data on “Learning Support Indicators” measuring school climate, instructional practices, and parental involvement. In addition, Rhode Island, like many other states, conducts visits to review every school in the state every five years, not unlike the Inspectorate system that is used in many other countries. These kinds of reviews can examine teaching practices, the availability and equitable allocation of school resources, and the quality of the curriculum, as it is enacted.
Ideally, evaluation of school progress would be based on a combination of these three kinds of measures and would emphasize gains and improvement over time, both for the individual students in the school and for the school as a whole. Along with data about student characteristics, an indicator system could include:
- Measures of student learning: both state tests and local assessments, including performance measures that assess higher-order thinking skills and understanding, including student work samples, projects, exhibitions, or portfolios.
- Measures of additional student outcomes: data about attendance, student grade-to-grade progress (promotion / retention rates) and continuation through school (ongoing enrollment), graduation, and course success (e.g. students enrolled in, passing, and completing rigorous courses of study).
- Measures of learning conditions, data about school capacity, such as teacher and other staff quality, availability of learning materials, school climate (gauged by students’, parents’, and teachers’ responses to surveys), instructional practices, teacher development, and parental engagement.
These elements should be considered in the context of student data, including information about student mobility, health, and welfare (poverty, homelessness, foster care, health care), as well as language background, race / ethnicity, and special learning needs.
How Might Indicators be Used to Determine
School Progress and Improvement Strategies?
School Progress and Improvement Strategies?
The rationale for these multiple indicators is to build a more powerful engine for educational improvement by understanding what is really going on with students and focusing on the elements of the system that need to change if learning is to improve. High-performing systems need a regular flow of useful information to evaluate and modify what they are doing to produce stronger results. State and local officials need a range of data to understand what is happening in schools and what they should do to improve outcomes. Many problems in local schools are constructed or constrained by district and state decisions that need to be highlighted along with school-level concerns. Similarly, at the school level, teachers and leaders need information about how they are doing and how their students are doing, based in part on high-quality local assessments that provide rich, timely insights about student performance.
Some states and districts have successfully put some of these indicators in place. The federal government could play a leadership role by not only encouraging multiple measures for assessing school progress and conditions for learning but by providing supports for states to build comprehensive databases to track these indicators over time, and to support valid, comprehensive information systems at all levels.
If we think comprehensively about the approach to evaluation that would encourage fundamental improvements in schools, several goals emerge. First, determinations of school progress should reflect an analysis of schools’ performance and progress along several key dimensions. Student learning should be evaluated using multiple measures that provide comprehensive and valid information for all subpopulations. Targets should be based on sensible goals for student learning, examining growth from where students start, setting growth targets in relation to that starting point, and pegging “proficiency” at a level that represents a challenging but realistic standard, perhaps at the median of current state proficiency standards. Targets should also ensure appropriate assessment for special education students and English language learners and credit for the gains these students make over time. And analysis of learning conditions including the availability of materials, facilities, curriculum opportunities, teaching, and leadership should accompany assessments of student learning.
A number of states already have developed comprehensive indicator systems that can be sources of such data, and the federal government should encourage states to propose different means for how to aggregate and combine these data. In addition, many states’ existing assessment systems already provide different ways to score and combine state reference tests with local testing systems, locally administered performance tasks (which are often scored using state standards), and portfolios.
For evaluating annual progress, one likely approach would be to use an index of indicators, which can include a weighted combination of data about state and local tests and assessments as well as other student outcome indicators like attendance, graduation, promotion rates, participation and pass rates for academic courses. Assessment data from multiple sources and evidence of student progression through / graduation from school would be required components. Key conditions of learning, such as teacher qualifications, might also be required. Other specific indicators might be left to states, along with the decision of how much weight to give each component, perhaps within certain parameters (for example, that at least 50 percent of a weighted index would reflect the results of assessment data).
Within this index, disaggregated data by race/ethnicity and income could be monitored on the index score, or on components of the overall index, so that they system pays ongoing attention to progress for groups of students. Wherever possible these measures should look at progress of a constant cohort of students from year to year, so that actual gains are observed, rather than changes in averages due to changes in the composition of the student population. Furthermore, gains for English language learners and special education students should be evaluated on a growth model that ensures appropriate testing based on professional standards and measures individual student growth in relation to student starting points.
Non-academic measures such as improved learning climate (as measured by standard surveys, for example, to allow trend analysis over time), instructional capacity (indicators regarding the quality of curriculum, teaching, and leadership), resources, and other contributors to learning could be included in a separate index on Learning Conditions, on which progress is also evaluated annually as part of both school, district, and state assessment.
Once school progress indicators are available, a judgment must be made about whether a school has made adequate progress on the index or set of indicators. Rather than identifying a school as requiring intervention when a single target is missed (for example, if 94% of economically disadvantaged students take the mathematics test one year instead of 95%), progress could be gauged by whether the overall index score increases adequately, with the proviso that the progress of key subgroups continues to be examined, with lack of progress a flag for intervention.
The additional use of the indicators schools and districts have assembled would be in the determination of what kind of action is needed if a school does not make sufficient progress in a year. To use resources wisely, the law should establish a graduated system of classification for schools and districts based on their rate of progress, ranging from state review to corrective actions to eventual reconstitution if such efforts fail over a period of time. States should identify schools and districts as requiring intervention based both on information about the overall extent of progress from the prior year(s) and on information about specific measures in the system of indicators -- for example, how many progress indicators have lagged for how long. This additional scrutiny would involve a school review by an expert team – much like the inspectorate systems in other countries – that conducts an inspection of the school or LEA and analyzes a range of data, including evidence of individual and collective student growth or progress on multiple measures; analysis of student needs, mobility, and population changes; and evaluation of school practices and conditions. Based on the findings of this review, a determination would be made about the nature of the problem and the type of school improvement plan needed. The law should include the explicit expectation that state and district investments in ensuring adequate conditions for learning must be part of this plan.
The overarching goal of the ESEA should be to improve the quality of education students receive, especially those traditionally least well served by the current system. To accomplish this, the measures used to gauge school progress and guide improvement must attend to the range of school outcomes and conditions that are needed to ensure that all students are educated to higher levels.
 See, e.g. L. Darling-Hammond, No Child Left Behind and High School Reform, Harvard Education Review, etc.
 Linda Darling-Hammond, Elle Rustique-Forrester, & Raymond Pecheone (2005). Multiple measures approaches to high school graduation: A review of state student assessment policies. Stanford, CA: Stanford University, School Redesign Network.
 (Allington and McGill-Franzen, 1992; Figlio & Getzler, 2002)
 (Jacob, 2002; Haney, 2000),
 (Darling-Hammond, 1991; Smith,1986) Heilig
 (Haney, 2000; Orfield & Ashkinaze, 1991; Smith, 1986). (Jacob, 2001; Lilliard & DeCicca, 2001; National Board on Educational Testing and Public Policy, 2000; Orfield & Ashkinaze, 1991; Roderick et al., 1999; Wheelock, 2003) B. A. Jacob (2001). Getting tough? The impact of high school graduation exams. Education and Evaluation and Policy Analysis, 23 (2): 99-122. M. Roderick, A.S. Bryk, B.A. Jacob, J.Q. Easton, & E. Allensworth (1999). Ending social promotion: Results from the first two years. Chicago: Consortium on Chicago School Research.
 Advocates for Children, Heilig, Wheelock
 (Heilig, 2005).
 American Educational Research Association, American Psychological Association, & National Council on Measurement in Education, Standards for Educational and Psychological Testing, Washington DC: American Educational Research Association, 1999, p.142.
 AERA, APA, NCME, Standards for Educational and Psychological Testing., p.146.
 See for example, W. Haney (2000). The myth of the Texas miracle in education. Education Policy Analysis Archives, 8 (41): Retrieved Apr. 10, 07 from: http://epaa.asu.edu/epaa/v8n41/; J.L. Herman & S. Golan (1993). Effects of standardized testing on teaching and schools. Educational Measurement: Issues and Practice, 12(4): 20-25, 41-42; B.D. Jones & R. J. Egley (2004). Voices from the frontlines: Teachers’ perceptions of high-stakes testing. Education Policy Analysis Archives, 12 (39). Retrieved August 10, 2004 from http://epaa.asu.edu/epaa/v12n39/; M.G. Jones, B.D. Jones, B. Hardin, L. Chapman, & T. Yarbrough (1999). The impact of high-stakes testing on teachers and students in North Carolina. Phi Delta Kappan, 81(3): 199-203; Klein, S.P., Hamilton, L.S., McCaffrey, D.F., & Stetcher, B.M. (2000). What do test scores in Texas tell us? Santa Monica: The RAND Corporation; D. Koretz & S. I. Barron (1998). The validity of gains on the Kentucky Instructional Results Information System (KIRIS). Santa Monica, CA: RAND, MR-1014-EDU; D. Koretz, R.L. Linn, S.B. Dunbar, & L.A. Shepard (1991, April). The effects of high-stakes testing: Preliminary evidence about generalization across tests, in R. L. Linn (chair), The Effects of high stakes testing. Symposium presented at the annual meeting of the American Educational Research Association and the National Council on Measurement in Education, Chicago; R.L. Linn (2000). Assessments and accountability. Educational Researcher, 29 (2), 4-16; R.L. Linn, M.E. Graue, & N.M. Sanders (1990). Comparing state and district test results to national norms: The validity of claims that "everyone is above average." Educational Measurement: Issues and Practice, 9, 5-14; W. J. Popham (1999). Why Standardized Test Scores Don’t Measure Educational Quality. Educational Leadership, 56(6): 8-15; M.L. Smith (2001). Put to the test: The effects of external testing on teachers. Educational Researcher, 20(5): 8-11.
Achieve, Do graduation tests measure up? A closer look at state high school exit exams. Executive summary. Washington, DC: Achieve, Inc.
 L. Darling-Hammond & J. Ancess (1994). Authentic assessment and school development. NY: National Center for Restructuring Education, Schools, and Teaching, Teachers College, Columbia University; B. Falk & S. Ort (1998, September). Sitting down to score: Teacher learning through assessment. Phi Delta Kappan, 80(1): 59-64. G.L. Goldberg & B.S. Rosewell (2000). From perception to practice: The impact of teachers’ scoring experience on the performance based instruction and classroom practice. Educational Assessment, 6: 257-290; R. Murnane & F. Levy (1996). Teaching the new basic skills. NY: The Free Press.
 J.B. Baron (1999). Exploring high and improving reading achievement in Connecticut. Washington: National Educational Goals Panel. Murnane & Levy (1996); B.M. Stecher, S. Barron, T. Kaganoff, & J. Goodwin (1998). The effects of standards-based assessment on classroom practices: Results of the 1996-97 RAND survey of Kentucky teachers of mathematics and writing. CSE Technical Report. Los Angeles: UCLA National Center for Research on Evaluation, Standards, and Student Testing; S. Wilson, L. Darling-Hammond, & B. Berry (2001). A case of successful teaching policy: Connecticut’s long-term efforts to improve teaching and learning. Seattle: Center for the Study of Teaching and Policy, University of Washington.
 C. Chapman (1991, June). What have we learned from writing assessment that can be applied to performance assessment?. Presentation at ECS/CDE Alternative Assessment Conference, Breckenbridge, CO; J.L.Herman, D.C. Klein, T.M. Heath, S.T. Wakai (1995). A first look: Are claims for alternative assessment holding up? CSE Technical Report. Los Angeles: UCLA National Center for Research on Evaluation, Standards, and Student Testing; D. Koretz, K., J. Mitchell, S.I. Barron, & S. Keith (1996). Final Report: Perceived effects of the Maryland school performance assessment program CSE Technical Report. Los Angeles: UCLA National Center for Research on Evaluation, Standards, and Student Testing; W.A. Firestone, D. Mayrowetz, & J. Fairman (1998, Summer). Performance-based assessment and instructional change: The effects of testing in Maine and Maryland. Educational Evaluation and Policy Analysis, 20: 95-113; S. Lane, C.A. Stone, C.S. Parke, M.A. Hansen, & T.L. Cerrillo (2000, April). Consequential evidence for MSPAP from the teacher, principal and student perspective. Paper presented at the annual meeting of the National Council on Measurement in Education, New Orleans, LA; B. Stecher, S. Baron, T. Chun, T., & K. Ross (2000) The effects of the Washington state education reform on schools and classroom. CSE Technical Report. Los Angeles: UCLA National Center for Research on Evaluation, Standards, and Student Testing.
 Darling-Hammond, Rustique-Forrester, and Pecheone, Multiple Measures.
 Marshall smith paper.
 At least 27 states consider student academic records, coursework, portfolios of student work, and performance assessments, like research papers, scientific experiments, essays, and senior projects in making the graduation decision. Darling-Hammond, Rustique-Forrester, and Pecheone, Multiple Measures.