Part of the NCA Commission on Accreditation and School Improvement Journal of School Improvement, Volume 3, Issue 1, Spring 2002
Assessment and Accommodations of English Language¹: Issues, Concerns, and Recommendations

Jamal Abedi


About the Author: Jamal Abedi is the Director of Technical Projects at the UCLA National Center for Research on Evaluation, Standards and Student Testing (CRESST) and a faculty member at the UCLA Graduate School of Education. Dr. Abedi's research interests include psychometrics and test and scale development. His recent work includes validity studies for the National Assessment of Educational Progress (NAEP) focusing on the impact of language background on students' performance. He can be reached at jabedi@cse.ucla.edu.

 
Previous Article | Next Article | Contents, This Issue | Feedback | JSI Home | NCA Home
 

The population of English Language Learners (ELLs)¹; in the United States is growing. Between 1990 and 1997 the number of US residents not born in this country increased by 30%, from 19.8 million to 25.8 million, the largest total in the nation's history (Hakuta & Beatty, 2000, p. 1). One out of seven children in the US speaks a language other than English at home (Garcia, 1998). In California, Florida, New Mexico, and Texas, ELLs exceed 10% of the student population in public schools.

¹The term "English language learner" (ELL) refers to students who are not native speakers of English and are not as proficient in English as the native speakers are. A subgroup of these students with lower level of English proficiency is referred to as "limited English proficient" (LEP). The term LEP is used primarily by government-funded programs to classify students, as well as by the National Assessment of Educational Progress (NAEP) for determining inclusion criteria. In this article we use "English language learner (ELL)" to refer to students who are not native English speakers and who are not reclassified as fluent in English.

As the ELL population has grown, so has the need to include these students in large-scale testing programs. Recent legislation, from the reauthorization of the Elementary and Secondary Education Act (ESEA), through the enactment of the Improving America's Schools Act (IASA) of 1994, has mandated the inclusion of English language learners in large-scale academic assessments administered in English.

However, language factors may influence the validity of assessments for ELLs. Literature on the assessment of students with limited English proficiency has found a substantial link between students' language background and their performance in content-based areas. For example, studies by CRESST researchers have clearly demonstrated that language factors have significant impact on students' performance in math and science (Abedi & Lord, 2001; Abedi, Lord, Hofstetter, & Baker, 2000; Abedi, Lord & Hofstetter, 1998, Abedi, Lord & Plummer, 1997). Following is a summary of findings of studies on the impact of language on the assessment of ELLs.

  • When NAEP test items were grouped into long and short items, Abedi, Lord, and Plummer (1997) found that ELL students performed significantly lower on the longer test items regardless of the level of content difficulty of the items. Lower performance of ELL students on longer test items is mainly due to higher level of language demand in the long items. Abedi et al. (1997) also found that ELL students had higher proportions of omitted/not-reached items and had more difficulty with the items that were judged to be linguistically complex.
  • When math test items were modified to reduce the level of linguistic complexity, over 80% of middle-school students who were interviewed preferred the linguistically modified test items over the original English version of the items (Abedi et al., 1997). These students, who were mainly ELL, preferred the linguistically modified (simplified) items since, as they indicated, these questions were easy to understand.
  • ELL students who received the modified English version of the math test items (approximately 700 students) performed significantly better than those receiving the original items (Abedi et al., 1997).
  • The results of another CRESST study (Abedi et al., 1998) indicated that Spanish-speaking students who received the Spanish translation of the NAEP math test (main assessment, 1996) performed significantly lower than the Spanish-speaking students who received the English version of the test. We speculate that this is due to the impact of language of instruction on assessment (Abedi et al., 1998). The results of this study also indicated that among the three groups, ELL students who received the linguistically modified version of the tests (NAEP math items) performed the best, followed by the students receiving the original English version.
  • Among the four accommodation strategies that were used (extra time, glossary, linguistically modified items, and glossary plus extra time), the linguistically modified items were the only accommodation that reduced the performance-gap between ELL and non-ELL students (Abedi et al., 2000; Abedi et al., 1998).

Results of the studies that were summarized above along with studies conducted by other researchers clearly indicate that the performance-gap between ELL and non-ELL students is mainly due to language factors (see for example, Adams, 1990; Cocking & Chipman, 1988; Cummins, 1984; De Corte, Verschaffel, & DeWin, 1985; Hudson, 1983; Noonan, 1990; Ramirez, Yuen, Ramey, & Billings, 1991; Riley, Greeno, & Heller, 1983). To present a more valid picture of ELL students' content-knowledge, innovative ways of assessing student performance are encouraged, including modifications to existing instruments (August & Hakuta, 1997).

Provision of accommodations has helped to increase the rate of inclusion for ELL students (Mazzeo, 1997). Based on the promising results from using accommodations in the 1996 National Assessment of Educational Progress main assessment, accommodations were provided in the 1997 assessment in art and in the 1998 assessment in reading, writing, and civics.

There are, however, some major concerns regarding the use of accommodations for ELL students. Among the most important issues is the concern on the effectiveness and validity of accommodation strategies. How effective are accommodations in reducing the performance-gap between ELL and non-ELL students? How valid are the accommodated assessments? Do accommodations provide unfair advantage to the recipients, i.e., do they alter the construct under measurement?

Effectiveness

Research has clearly demonstrated that the effectiveness of accommodations in reducing the performance-gap between ELL and non-ELL students varies considerably across different types of accommodations and due to the characteristics of recipients.

To be effective, the accommodation should reduce the performance gap between ELL and non-ELL students by helping ELL students to improve their performance. In other words, it should level the playing field. In a study on the impact of accommodation on eighth grade students in math, Abedi et al. (2000) applied four different types of accommodations (linguistically modified English version of the test, standard NAEP items with glossary only, extra time only, and glossary plus extra time). Students were also tested using standard NAEP items with no accommodation. Students were randomly assigned to one of the four accommodation conditions or to a control group with no accommodation. Table 1 presents mean math score under the four different accommodations and under the standard NAEP condition. As the data in Table 1 show, some of the accommodations were effective and increased students' performance in both ELL and English-proficient groups. Glossary plus extra time was the most effective form of accommodation in this study. This accommodation helped to increase performance of ELL students by 1.62 score points or 13%; it also increased performance of English-proficient students by 2.81 points or 16% (35 points possible).

Table 1

Mean, Number of Students Tested under Different Accommodations and t-test Comparing Means

ELL Status

Type of Accomm. LEP (ELL) FEP/IFE (non-ELL) t (p)
Original English 12.07 (n=144) 17.56 (n=130) 4.27 (0.00)
Modified English 12.63 (n=124) 15.94 (n=117) 3.18 (0.00)
Glossary only 11.84 (n=146) 17.78 (n=121) 5.22 (0.00)
Extra Time only 12.93 (n=30) 18.88 (n=25) 5.11 (0.00)
Glossary + Extra Time 13.69 (n=29) 20.37 (n=30) 6.11 (0.00)
t (p) 1.98 (0.06) 2.91 (0.00)  

Note: A t-test examines the difference between two mean scores to determine if the difference is statistically significant. The denominator for the t-ratios was the pooled within subject variance to control for impact on alpha (Type I error rate) due to multiple comparisons.

The results of another study (Abedi & Lord, 2001) of 1,031 eighth grade students in southern California showed small but significant differences in the scores of students in low and average-level math classes. Among the linguistic features that appeared to contribute to the differences were low-frequency vocabulary and passive voice verb constructions (see Abedi et al., 1997, for discussion of the nature of and rationale for the modifications). Another study (Abedi et al., 1998) of 1,394 eighth graders in schools with high enrollments of Spanish speakers showed that modification of the language of the items contributed to improved performance on 49% of the items; the students generally scored higher on shorter problem statements. Rivera and Stansfield (2001) modified the complex language of science test items and compared student performance on regular and simplified fourth and sixth grade items. Although the small sample size did not show significant differences in scores for English language learners, the study did demonstrate that linguistic simplification did not affect the scores of the English-proficient students, indicating that linguistic simplification is not a threat to score comparability.

Extra time is a popular and easily implemented accommodation strategy and may lead to higher scores for English learners (Hafner, 2001; Kopriva, 2000). The results of several studies indicated that extra time helps ELL students. For example, extra time helped eighth grade English learners on NAEP math tests, but it also aided students already proficient in English, thus limiting its potential as an assessment accommodation (Abedi et al., 1998; Abedi et al., 2000; Hafner, 2001).

Commercially published English dictionaries are also used as a form of accommodations for ELL students. In a study, English dictionaries were provided to urban middle school students in Minnesota during a reading test (Thurlow, 2001). The results indicated that some students benefited from this accommodation. This accommodation strategy has an advantage in that students are probably already familiar with using dictionaries. However, commercially available dictionaries differ widely; some have entries in "plain language" that English learners can understand (Kopriva, 2000). Abedi, Courtney, Leon, Mirocha & Goldberg (2001) used a published dictionary as a form of accommodation for students in grades four and eight in several locations nationwide. They found the published dictionary was not effective and was administratively difficult. A major concern in the use of published dictionaries is the possibility that they may provide information that the test is measuring. Thus, dictionary as a form of accommodation may jeopardize the validity of assessment.

Another study of 422 students in eighth grade science classes (Abedi, Lord, Kim, & Miyoshi, 2000) compared performance on NAEP science items that were randomly distributed among students in three test formats: one booklet in original format (no accommodation); one booklet with an English glossary and Spanish translations in the margins; and one booklet with a customized English dictionary at the end of the test booklet. The customized dictionary included only words that appeared in the test items. English learners scored highest on the customized dictionary accommodation (their mean scores for the three formats were 8.36, 8.51, and 10.18, respectively, on a 20-item test). Interestingly, although the accommodations helped the English learners score higher, for the English-proficient students there was no significant difference between their scores in the three test formats. This suggests that the accommodation strategies did not affect the construct.

Validity

What do research findings suggest about the validity of scores from accommodated assessments? Reducing the performance gap between ELL and English-proficient students is desirable, but only if the accommodations do not give an unfair advantage to the recipients (Shepard, Taylor, & Betebenner, 1998). The results of CRESST studies showed that some accommodation strategies helped both ELL and English-proficient students. For example, Abedi et al. (2000) showed that both ELL and English-proficient students benefited substantially from having a glossary plus extra time. However, students who were already proficient in English showed a greater improvement (16%) than the ELL students (13%).

Table 1 presents t-ratios comparing performance of ELLs and non-ELLs under different forms of accommodation. The results of analyses in this table suggest that some accommodation helped non-ELL students significantly and did not have much impact on the performance of ELL students. This is a cause for concern because it may invalidate the accommodated results. Unfortunately, many states use accommodations without evidence of their validity. If an accommodation improves the performance of all students, then the accommodated results may not be combined with the results of non-accommodated assessments.

There are also validity concerns over other forms of accommodations. One of the most commonly used accommodation strategy is translation of assessment tools into student's native language. Research has indicated that native-language assessments are useful only when students can demonstrate their content knowledge more effectively in their native language. Otherwise translated items may confuse students who have learned content and concepts in English. Abedi et al. (1998) found that eighth grade Hispanic ELL math students taught in English or sheltered English scored higher on NAEP math items in English than their peers who received the same math items administered in Spanish. In contrast, students who received their math instruction in Spanish performed higher on the Spanish-language math items than those in either modified or Standard English. This is a threat to the validity of assessment since lack of understanding of content related terms may add a source of construct irrelevant variance to the assessment (Messick, 1994).

Other findings from CRESST research include the following.

  • Students designated ELL by their schools score significantly lower than non-ELL students on standardized achievement test science and math questions (Abedi & Leon, 1999). However, the performance-gap decreases or even disappears on math items that have relatively low language demands, such as math computation (Abedi, Leon, & Mirocha, 2001).
  • ELL students who are better readers, as measured by separate reading tests, perform better on questions with high language demands (Abedi & Leon, 1999, Abedi et al., 2001).
  • The only accommodation that narrowed the gap between ELL and non-ELL students was reducing the unnecessary language complexity of those test questions with excessive language demands (Abedi et al., 2000).
  • In addition to language proficiency, other background factors influence ELL performance. These factors include length of time in the United States, overall grades, and student mobility (Abedi et al., 2000; Abedi et al., 1998).

Recommendations

The results of CRESST studies along with the findings of other studies nationwide leads us to make the following recommendations to policy makers and educators involved in the assessment of ELL students.

  • To present a more valid assessment for ELLs, a common definition of ELL students is needed for the assessment. Without a common definition comparisons between states, as required in current reauthorization of the Elementary and Secondary Education Act, will be impossible.
  • New and innovative assessment techniques should be developed and empirically tested in order to provide approaches that are effective and valid for all our students, including English learners.
  • The costs of accommodations should be tracked. Cost-benefit analyses are needed to compare the relative advantages of accommodation alternatives.
  • " Translating test items from English to other languages may not be a successful accommodation when the students are taught in English. The findings of our studies suggest that the language of the assessment should match the student's language of instruction.
  • Studies suggest that students' background variables including language background are strong indicators of achievement. These background variables can also help in providing appropriate accommodations to ELL students. We recommend that large-scale assessments collect background information such as length of time living in the United States, type and amount of language spoken in the home, proficiency level in English and students' native language, and number of years taught in both languages.
  • Modifying test questions to reduce unnecessary language complexity should be a priority in the development and improvement of all large-scale assessment programs. Reducing language complexity helps to narrow the performance gap between native English and ELL students.
  • Customized dictionaries are a viable alternative to providing traditional dictionaries as accommodations (see Abedi et al., 2001; Abedi et al., 2000). A traditional dictionary may provide ELL students an unfair advantage on certain types of tests.
  • Intended and unintended accommodations effects must be monitored and evaluated closely. Ideally, accommodations will have no effect on native English students while reducing the language barrier for ELL students (Shepard et al., 1998; Rivera & Stansfield, 1998,). With states increasingly moving to reward or sanction schools based on test results, evaluating accommodation effects takes on added school-wide importance.

References

Abedi, J.; Courtney, M., Mirocha, L.; Leon, S., & Goldberg, J. (2001). Language accommodation for large-scale assessment in science. Los Angeles: University of California, Los Angeles, National Center for Research on Evaluation, Standards, and Student Testing.

Abedi, J., & Leon, S. (1999). Impact of students' language background on content-based performance: Analyses of extant data. Los Angeles: University of California, National Center for Research on Evaluation, Standards, and Student Testing.

Abedi, J., Leon, D., & Mirocha J. (2001). Impact of students' language background on standardized achievement test results: Analyses of Extant Data. Los Angeles: University of California, Los Angeles, National Center for Research on Evaluation, Standards, and Student Testing.

Abedi, J. & Lord, C. (2001). The language factor in mathematics tests. Applied Measurement in Education, 14(3).

Abedi, J., Lord, C., & Hofstetter, C. (1998). Impact of selected background variables on students' NAEP math performance. Los Angeles: UCLA Center for the Study of Evaluation/ National Center for Research on Evaluation, Standards, and Student Testing.

Abedi, J., Lord, C., Hofstetter, C., & Baker, E. (2000) Impact of accommodation strategies on English language learners' test performance. Educational Measurement: Issues and Practice, 19(3), 16-26.

Abedi, J.; Lord, C.; Kim, C., & Miyoshi, J (2000). The effects of accommodations on the assessment of LEP students in NAEP. Los Angeles: University of California, Los Angeles, National Center for Research on Evaluation, Standards, and Student Testing.

Abedi, J., Lord C., & Plummer, J. R. (1997). Final report of language background as a variable in NAEP mathematics performance. Los Angeles: Center for the Study of Evaluation, CSE Technical Report # 429.

Adams, M. J. (1990). Beginning to read: Thinking and learning about print. Cambridge: MIT Press.

August, D., & Hakuta, K. (Eds.). (1997). Improving schooling for language-minority children: A research agenda. Washington, DC: National Academy Press.

Cocking, R. R., & Chipman, S. (1988). Conceptual issues related to mathematics achievement of language minority children. In R. R. Cocking & J. P. Mestre (Eds.), Linguistic and cultural influences on learning mathematics (pp. 17-46.) Hillsdale, NJ: Erlbaum.

Cummins, J. (1984). Bilingualism and special education. San Diego: College Hill Press.

De Corte, E., Verschaffel, L., & DeWin, L. (1985). Influence of rewording verbal problems on children's problem representations and solutions. Journal of Educational Psychology, 77(4), 460-470.

Garcia, E. (1998). Excellence and equity for language minority students: Critical issues and promising practices. Chevy Chase, MD: The Mid-Atlantic Equity Center.

Hafner, A. L. (2001, April). Evaluating the impact of test accommodations on test scores of LEP students & non-LEP students. Paper presented at the annual meeting of the American Educational Research Association, Seattle, WA.

Hakuta, K. & Beatty, A. (2000). (Eds.) Testing English-language learners in US Schools. Washington, D.C. National Academy Press.

Hudson, T. (1983). Correspondences and numerical differences between disjoint sets. Child Development, 54, 84-90.

Kopriva, R. (2000). Ensuring accuracy in testing for English language learners. Washington, DC: Council of Chief State School Officers.

Mazzeo, J. (1997, March). Toward a more inclusive NAEP. Paper presented at the annual meeting of the American Educational Research Association, Chicago, IL.

Messick, S. (1994). The interplay of evidence and consequences in the validation of performance assessments. Educational Researchers, 23, 13-23.

Noonan, J. (1990). Readability problems presented by mathematics text. Early Child Development and Care, 54, 57-81.

Ramirez, J., Yuen, S., Ramey, D., & Billings, D. (1991). Final report: Longitudinal study of structured English immersion strategy, early-exit and late-exit bilingual education programs for language minority children (Vols. 1, 11) (No. 300-87-0156). San Mateo, CA: Aguirre International.

Riley, M. S., Greeno, J. G., & Heller, J. I. (1983). Development of children's problem-solving ability in arithmetic. In H. P. Ginsburg (Ed.), The development of mathematical thinking (pp. 153-196). New York: Academic Press.

Rivera, C., & Stansfield, C. W. (1998). Leveling the playing field for English language learners: Increasing participation in state and local assessments through accommodations. Retrieved from: http://ceee.gwu.edu/standards_assessments/researchLEP_accommodcase.htm

Rivera, C., & Stansfield, C. W. (2001, April). The effects of linguistic simplification of science test items on performance of limited English proficient and monolingual English-speaking students. Paper presented at the annual meeting of the American Educational Research Association, Seattle, WA.

Shepard, L., Taylor, G., & Beterbenner, D. (1998). Inclusion of limited-English-proficient Students in Rhode Island's Grade 4 mathematics performance assessment. Colorado, CRESST/University of Colorado at Boulder.

Thurlow, M. L. (2001, April). The effects of a simplified-English dictionary accommodation for LEP students who are not literate in their first language. Paper presented at the annual meeting of the American Educational Research Association, Seattle, WA.

Previous Article | Next Article | Contents, This Issue | Feedback | JSI Home | NCA Home


All material on this site © 2000-08 NCA Commission on Accreditation and School Improvement unless otherwise noted.
Questions may be directed to the Webmaster (webmaster@ncacasi.org).