Testing the test

Scores are up, but doubts remain about the exam that determines who can teach in public schools

The headlines shocked the public and rocked the education establishment: It was the first time the state ever tested would-be teachers to weed out who did–and did not–belong in front of a public school classroom, and almost 60 percent failed. House Speaker Thomas Finneran threw rhetorical fuel on the fire by deriding the test-flunkers as “idiots.” But the dismay over a pipeline of future teachers that seemed clogged with candidates of dubious ability ran deeper than the legislative leader’s inflammatory remark.

Massachusetts officials vowed to close down schools of education where morethan 20 percent of prospective teachers failed. And the federal government weighed in with a warning of its own: No more federal funds for teacher-preparation programs with high failure rates.

The fury died down as passing rates improved and public attention moved to other issues. But three years later, testing experts are raising tough new questions about whether this exam and the Amherst-based company that created it are fit to determine who can teach the state’s schoolchildren.

Among the reasons for concern:
• No independent audit of the tests has ever been performed, although such audits are strongly recommended by national testing standards to demonstrate the reliability and validity of standardized exams.
• Independent testing experts were not given the technical data needed to perform an audit for a national panel examining teacher licensure exams on behalf of the US Department of Education.
• The limited data that have been released provide grounds for critics to claim that the tests may not be reliable or valid.
• Three other states that use the same vendor’s teacher tests have faced legal claims that the tests were faulty. Plaintiffs received cash settlements totaling more than $3 million in the two cases that have reached conclusion.

The bottom line, say some testing experts, is that so far it has been impossible to determine whether the Massachusetts Educator Certification Tests are respectable measures of anyone’s skills. Until such conclusions can be drawn, they say, the tests should not be used as a gatekeeper to the profession.

“Deciding whether someone gets a [teaching] license without knowing if the tests are any good is an outrage,” says Barbara Plake, a University of Nebraska testing expert who served on the federal education department’s “blue ribbon” panel on teacher licensure. Plake also advises the Massachusetts Department of Education on its student assessment program, the MCAS–a test that remains controversial for use as a high-school graduation requirement, but which has been subjected to, and survived, the kind of technical scrutiny the teacher test has so far avoided.

The state Department of Education claims to have no qualms about its teacher test or the firm that created it, National Evaluation Systems. And the department sees no cause to subject the test to a formal review. “We have no reason to do it, because we have no doubts” about the test, says deputy commissioner Alan Safran. “It’s a question of whether you trust NES. We do. We trust that they aren’t hiding anything.”

Putting teachers to the test

The idea of testing prospective teachers floated around Beacon Hill for years before it became law in 1993, as part of the state’s multi-pronged plan for improving public schools. The goal of the certification test was to weed out weak candidates by testing their communication and literacy skills and their knowledge of the subject they wanted to teach. “Students can meet high standards only if teachers are well-qualified to teach them,” said Frank Haydu III, then interim commissioner of education, before the first tests were administered.

When the Department of Education went looking for a vendor, just two companies submitted bids: National Evaluation Systems, of Amherst, and Educational Testing Service, of Princeton, NJ. This was no surprise. These two companies dominate the market for teacher certification tests. ETS, the testing behemoth best known for its SAT college-admissions test, produces Praxis, an exam used by 32 of the 42 states that test teacher candidates. NES produces teacher tests customized to client specifications; currently, 10 states use NES made-to-order tests.

In December of 1997, then-Education Commissioner Robert Antonucci selected NES as the Commonwealth’s teacher-testing vendor, and the state has stood by the company ever since. The state Board of Education wanted the literacy and subject-matter tests to be based on Massachusetts teacher standards. The DOE developed these standards in the mid-1990s to match the state’s new curriculum frameworks for grades K-12. A custom test–the MCAS–was developed to test pupil knowledge of the curriculum frameworks, and it followed that a custom teacher test would be the best way to test mastery of the teacher standards. In addition, some board members, most notably then-chairman John Silber, wanted the test to include certain material not found in “off-the-shelf” exams. These included dictation, which tests spelling and punctuation skills as applied to a passage read out loud, and a writing section that requires candidates to name and define the parts of speech. Only a customized test could fit the bill.

The test has been given 13 times–to about 68,000 teacher candidates–since the spring of 1998, when 59 percent failed the exam. The scores have improved a bit over time. Some 47 percent of first-time test takers failed the second sitting, that July. In this year’s administrations, in January and April, just under 40 percent of first-time test-takers failed. In part, this improvement comes from candidates knowing more about what’s on the test; the state made almost no information available on the test format or content in advance of the first administration. Schools of education have also beefed up test-prep offerings.

And to boost their institutional performance on the teacher tests, the education schools have become more selective in admissions. Most state education programs now require students to pass the reading and writing sections of the teacher test before they can enroll; they must also pass their subject matter tests before they are allowed to do their student teaching–a graduation requirement. Thus the test has served to keep teacher candidates not only out of the classroom, but also out of teacher training programs themselves.

“We feel we are getting stronger candidates,” says Diane Lapkin, dean of the School of Human Services, which includes education, at Salem State College.

Test questions

The Massachusetts Educator Certification Tests have done the job of winnowing the prospective teacher field. But neither time, the test maker, nor the state Department of Education has laid to rest questions about the test itself.

Upon its first administration in the spring of 1998, Massachusetts’s made-to-order test was instantly controversial, and not just because six out of 10 teacher candidates failed. Critics derided Silber’s cherished exercise in dictation when it was revealed that the passage aspiring teachers were supposed to transcribe was from The Federalist Papers, a document of historical significance but written in 18th century language. (Dictation remains a required element of the test, but the passage has since been replaced by contemporary material.)

Then there were critical studies of the Massachusetts teacher test by testing experts. These include critiques by Walt Haney of Boston College’s Center for the Study of Testing, Evaluation, and Educational Policy and by Clarke Fowler, an education professor at Salem State. Both are known critics of high-stakes testing, including MCAS; Haney gave expert testimony against the Texas graduation test, known as TAAS, claiming that it resulted in higher dropout rates among Mexican-American students.

In 1998, Haney, Fowler, and other colleagues looked at the Massachusetts test sections on communication and literacy skills. They compared scores on the first and second administrations for 219 individuals and concluded that they failed key standards of reliability and validity–testing lingo meaning they were neither consistent nor meaningful. In a test that is consistent, people who score poorly on one administration should also score poorly on a second administration, not markedly better. A meaningful test correlates with related measures. For instance, a PhD candidate in English would be expected to do well on a communication and literacy test.

Haney and Fowler also looked at how individual scores compared with candidates’ scores on other graduate-level tests, such as the Praxis and the Graduate Record Exam, based on results given to them by 12 test-takers, and interviews with those who were willing. While this sample is small for statistical purposes, the anecdotal evidence raised worrisome issues. Those who failed the first administration of the teacher test, for instance, included people with advanced degrees.

While some of the test’s critics have made reputations for themselves as outspoken critics of standardized testing, not all can be written off as anti-testing crusaders. Take Larry Ludlow, a statistician and test expert who, like Haney, works at Boston College. Ludlow is a self-professed fan of MCAS who serves on the technical advisory committee for the student test. In 1998, several private education colleges hired Ludlow to help them make sense of their students’ scores on the teacher test. In a recent article for the Education Policy Analysis Archives, an academic journal, Ludlow inspected the technical tables created by NES for the state and concluded that many items on the first four tests were problematic. He found that more than 20 percent of the questions did not meet a minimum statistical criterion typically used to indicate that an item is functioning as intended, and thus appeared flawed. Without review of the test itself, it is impossible to know whether the questions had more than one correct answer or were scored incorrectly. But Ludlow concluded that the number of suspect questions could well have affected a candidate’s score. In other words, prospective teachers could have failed the test even though they gave enough correct answers to pass.

The most recent blow came in April, when a national “blue-ribbon” panel released a report on the quality of teacher certification tests in the 42 states that require them. The Committee on Assessment and Teacher Quality, a board of academic experts and teachers appointed by the US Department of Education, was blunt in stating its concerns about the limitations of such tests: What teachers do is so complex that no single measurement can guarantee that a teacher is qualified. But the panel went on to judge the quality of the tests against national standards in place since the 1950s: the Standards for Educational and Psychological Tests, formulated by the American Educational Research Association, the American Psychological Association, and the National Council on Measurement in Education.

The panel’s conclusion: Despite some flaws–questions that were dated, were not grade-level specific, or were not relevant to the purpose of the test–ETS’s Praxis meets most national testing standards. The customized NES tests, on the other hand, including those used in Massachusetts, do not.

No open-book test

The judgment that NES’s tests were not up to snuff had more to do with what the panel could not find out about the exams than what it found that was wrong. One of the professional standards of the testing field is that exams accurately measure what they say they do–and that test authors make available evidence of such validity. Without that information, “states can’t know if the tests are working,” says panel member Barbara Plake. But neither National Evaluation Systems, nor the Massachusetts Department of Education, gave the panel the technical data it needed to evaluate the customized test’s quality. All the panel got was a test registration booklet, according to the panel’s study director, Karen Mitchell.

For the state’s part, deputy commissioner Safran says he does not know why the national panel didn’t get the necessary data, which Mitchell says was requested from the education department several times. In 1999, NES provided the state with a multi-volume technical report on the first four administrations of the test, including all of the data requested by the national panel, according to Safran. A second technical report is now in the works, he says.

“If they had asked us, they would have gotten the technical report,” Safran says. “If they ask us today, we would give it to them still.” (Indeed, after CommonWealth‘s inquiries, Safran did ship the information to the panel, two months after release of its report.)

But critics say it is no surprise that the assessment committee ran into difficulty getting information to evaluate the NES tests. They say the company is notoriously closed about such information, rarely providing states, let alone the public, the kind of data that would allow their tests to be independently judged. Fowler, of Salem State, charges that the “technical report” provided by NES does not include sufficient data to judge the quality of most of the tests, even on its own criteria; does not look at fluctuation in scores when candidates take the test again; and does nothing to show any correlation between teacher-test scores and those on nationally normed tests such as the SAT or GRE.

Indeed, the company, which is privately held, is tight-lipped about its business overall. Founded in 1972 by two Stanford graduates, William Gorth and Paul Pinsky, and overseen by a board of five directors, all withadvanced degrees, NES initially developed tests for elementary and secondary students, as well as survey research. In 1975, NES started creating customized teacher tests, which have since become its main product line. NES also has offices in Texas and California and employs about 100 people, mostly in its Amherst headquarters.

NES officials, rarely, if ever, grant interviews or answer questions from the press. Inquiries are fielded by the company’s public relations firm. Dominic Slowey, a former newspaper reporter and Weld administration press operative, is the testing company’s designated spokesman.

According to Slowey, NES cannot release the kind of technical information that outside reviewers want because the company does not own the tests or the data it produces about them. That information belongs to the states, he says. Furthermore, if specific items were released, the test would have to be redesigned more frequently, he says, and the price would be prohibitive. The Massachusetts test costs about $2 million, Slowey says–a sum covered by fees charged to test takers of $70 to $100 per sitting. The state would likely have to bear the extra cost, he says, rather than raising fees any higher.

NES also does its best to stifle the rumor mill among test takers. As a condition of taking the test, teacher candidates have to sign a contract that they will not discuss the exam’s content. The contract also allows the state to invalidate test scores of candidates who challenge the test, according to Diana Pullin, a professor of education law at Boston College’s School of Education. “Courts have invalidated [similar contracts] against other testing companies,” Pullin says.

NES spokesman Slowey says that the contracts prevent cheating. “A lot of test questions get re-used from one test administration to another,” he says. “We’re trying to prevent people from passing on information to other test takers.”

Failing the legal test

NES teacher tests have also not fared so well in court. Since the 1980s, at least three states that hired NES have been sued by prospective teachers for test errors, resulting in costly and embarrassing settlements.

In the earliest case, prospective teachers sued the state of Alabama in 1981 claiming that errors in the tests caused as many as 355 qualified candidates to fail. The allegations included faulty test questions, scoring errors, and bias against African Americans. The state ended up paying $500,000 in settlements and issued teaching certificates to the plaintiffs. Alabama has not tested its teachers since.

Similar issues surfaced in the teacher tests developed in the late 1970s and early 1980s by NES for the state of Georgia, which in 1988 paid out about $2.5 million in professional development grants to test takers. Many veteran teachers were reinstated to the teaching ranks at the pay they would have received if they had passed the teacher tests, and others who were denied their initial teaching certificates were given three free attempts to retake the tests, according to Dr. Rona Flippo, a professor of education at Fitchburg State College who worked with the Georgia Teacher Certification Testing Program in the early 1980s. Georgia now uses the ETS Praxis as its teacher test.

Currently, in New York, teachers required to take an NES test for recertification are contesting the exam, in part because of questions about its quality. In May, plaintiffs went before a federal judge with complaints that NES had deleted information from documents produced for the case. A trial date is set for next spring.

Alan Safran, the deputy education commissioner, is not troubled by NES’s history of legal problems. “There’s no suit here,” he says. “Nobody’s alleged that here. There’s no claim on which relief could be granted.”

Ludlow, of Boston College, thinks there may be another reason for the lack of litigation here. In other states, cases against NES tests were built after statistical analysis of the results. In Massachusetts, Ludlow contends, such analysis has been hampered by the old-fashioned way scores are released to each college that has students taking the test‹on paper.

“I get MCAS results [for the entire state] on a CD, and anyone can run the data,” Ludlow notes. Slowey says NES issues test results in the format requested by the state. Electronic data, he notes, is more expensive to produce.

Since the state began testing would-be teachers three years ago, the Department of Education has promised to assemble an independent audit committee to verify the quality of the tests. But none has materialized.

That, says Safran, is because there’s no need for an audit. But he adds that the department did recently establish a three-person “technical advisory committee” to serve a similar function. That group, he says, should put to rest any remaining doubts about the test.

All members of the technical advisory committee are out-of-state professionals, including Robert Gabrys, chief of the education office at NASA’s Goddard Space Flight Center, and Stephen Klein, senior research scientist at RAND. The third member, William A. Mehrens, a professor in the college of education at Michigan State University, testified for Alabama in support of the NES test when that state was sued in the 1980s.

To those who question the quality of the teacher test, the Department of Education’s response falls short of what’s needed. Among them are Democratic state representatives David Flynn of Bridgewater and James Fagan of Taunton. For the past two years, they have sponsored budget amendments to fund an independent audit of the test, but each year then-Gov. Paul Cellucci vetoed the provision and the legislative session ended without the House or Senate voting on an override. Flynn and Fagan have also co-sponsored legislation with Sen. Guy Glodis, a Democrat from Worcester, to mandate an audit this year.

“Rep. Flynn and I are both graduates of Bridgewater State,” home to one of the state’s largest teacher-preparation programs, says Fagan. “There are concerns that the test may not accurately reflect test-takers’ skills. We feel we need to treat our future teachers fairly.”

Andreae Downs is a freelance education writer who lives in Newton.