IT 544 Research Articles & Reactions: Article #3 Performance of a Generic Approach in Automated Essay Scoring

Attali, Y., Bridgeman, B., Trapani, C., (2010). Performance of a Generic Approach in Automated Essay Scoring. The Journal of Technology, Learning, and Assessment, 10 (3), 1-15. Retrieved from http://escholarship.bc.edu/cgi/viewcontent.cgi?article=1255&context=jtla

Summary:

The purpose of this paper was to compare automated scores of essays on the GRE and the TOEFL (Test of English as a Foreign Language) using e-rater, a program that aggregates features of writing and weights their significance in a score, with scores on the same tests that were derived from human grading. E-rater is a relatively sophisticated program that is able to pick up on conventions in written language and factors grammar, mechanics, style, usage, organization, development, vocabulary, and word length into a final assessment and score. This study used an enormous sample (205,566 essays for the TOEFL) and correlated scores between human scores (H1) and e-rater scores (G) and also between human scores and a second set of human scores (H2) on the same essays. Surprisingly, the coefficient between H1 and G was higher in many instances than between H1 and H2, which may be due to inadvertent subjectivity in the human scorers. This study shows that automated essay scoring may be approaching levels of sophistication and accuracy that can be used institutionally, such as on the GRE, SAT, and TOEFL instead of primarily as a regression tool to predict test scores and scoring accuracy of humans.

Reaction:

This is interesting. I never thought that automated essay scoring could really exist, due to the difficulty computer programs generally have in evaluating or understanding content. The e-rater program has some of this built into it; it has specific vocabulary it looks for depending upon the writing prompt and also evaluates worth depending on certain words used and length of paragraphs etc. So I wonder what they use as a model essay for the prompts because it must have some kind of analog to refer to. If this is the case, it remains subjective in terms of content, structure, and organization because it begins in a subjective place that assigns worth depending upon values that may or may not materialize in the writing of the average essayist. Only from this baseline of subjectivity can any objective assessment of writing take place. I can understand employing automated programs such as e-rater out of economic considerations and time constraints, especially for national or internationally taken tests, but it seems as if the art of writing, even though it is something as trivial as a test essay, has become cheapened and the human connection between emotion, experience, and meaning is becoming lost in conventions and vocabulary that a only dry academia finds worthwhile. I could be wrong, but I certainly hope that essay scoring is not moving in this direction.

There is also a philosophical problem I find with the concept of automated scoring of essays and it deals primarily with the nature and function of writing as a communicative form. If there is ever indeed a time in which the writer in educational institutions has no audience but a lifeless program that bestows judgment, isn’t the entire purpose of writing null? If there is no one to connect with the writing and understand the views and ideas expressed within the content, there is virtually no point in communicating in the written form. This seems analogous to creating paintings or printing photographs for the blind, who in this hypothetical situation bestow worth upon the visual arts based upon touch, taste, and smell; while I am certain that many masterworks incorporate similar materials and, thus, must certainly taste, feel, and smell especially similar, I am equally certain that they are not the same based upon the sense of sight for which they are intended. Maybe this subject just brings out the shameless cynic in me, but the idea of automated essay scoring is strange and I can’t help but think of the how futile the writing process will feel when we know for sure that nobody cares what we have to say as long we punctuate and spell correctly.

IT 544 Research Articles & Reactions

Monday, October 18, 2010

Article #3 Performance of a Generic Approach in Automated Essay Scoring

No comments:

Post a Comment

About Me

Blog Archive