Excerpts from

by Michael Huemer



Informal student evaluations of faculty were started in the 1960's by enterprising college students.(1) Since then, their use has spread so that now they are administered in almost all American colleges and universities and are probably the main source of information used for evaluating faculty teaching performance.(2) 



There is an enormous literature on the subject of student evaluations of faculty (SEF).(3) The following is a summary of some developments in that literature that should be of special interest to faculty, with particular emphasis on criticisms of SEF that have emerged recently.


1. Reliability and Validity of SEF


A test is said to be "reliable" if it tends to give the same result when repeated; this indicates that it must be measuring something. A test is said to be "valid" if it is measuring what it is intended to measure.

Most researchers agree
(1) that SEF are highly reliable, in that students tend to agree with each other in their ratings of an instructor. And
(2) that they are at least moderately valid, in that student ratings of course quality correlate positively with other measures of teaching effectiveness


2. Usefulness of SEF


Instructors who received results of a midsemester evaluation tended to have higher ratings on end-of-semester evaluations than those who did not, suggesting that SEF cause changes in teaching behaviors which result in higher ratings.


3. Grading Leniency Bias


The most common criticism of SEF seems to be that SEF are biased, in that students tend to give higher ratings when they expect higher grades in the course. This correlation is well-established,



.....Despite some dissenting voices,(9) the influence of grades on student evaluations seems to be an open secret in colleges and universities.


In one survey, 70% of students admitted that their rating of an instructor was influenced by the grade they expected to get.(10



Similar proportions of professors believe that grading leniency and course difficulty bias student ratings.(11)



4. Dumbing Down Courses



A related complaint many have is that


SEF encourage professors to dumb down courses in an effort to keep students happy at all costs. In one survey, 38% of professors admitted to making their courses easier in response to SEF.(12)



Peter Sacks provides a more detailed, though anecdotal picture. Sacks reports having almost lost his job due to low teaching evaluations from his students. He was able to dramatically raise his teaching evaluations and gain tenure, he says, by becoming utterly undemanding and uncritical of his students, giving out easy grades, and teaching to the lowest common denominator. Sacks claims that this behavior is not unusual but is rather the norm at his college, where students are king and entertainment is all that matters.


5. Educational Seduction, or the Dr. Fox Effec


In a well-known study, a professional actor was hired to deliver a non-substantive and contradictory lecture, but in an enthusiastic and authoritative style. The audience, consisting of professional educators, had been told they would be listening to Dr. Myron Fox, an expert on the application of mathematics to human behavior. They were then asked to rate the lecture. Dr. Fox received highly positive ratings, and no one saw through the hoax.(14) Later studies have obtained similar results,(15) showing that audience ratings of a lecture are more strongly influenced by superficial stylistic matters than by content.



Adding support to this conclusion was another study, in which students were asked to rate instructors on a number of personality traits (e.g., "confident," "dominant," "optimistic," etc.), on the basis of 30-second video clips, without audio, of the instructors lecturing. These ratings were found to be very good predictors of end-of-semester evaluations given by the instructors' actual students.


A composite of the personality trait ratings correlated .76 with end-of-term course evaluations; ratings of instructors' "optimism" showed an impressive .84 correlation with end-of-term course evaluations. Thus, in order to predict with fair accuracy the ratings an instructor would get, it was not necessary to know anything of what the instructor said in class, the material the course covered, the readings, the assignments, the tests, etc.(16)



Williams and Ceci conducted a related experiment. Professor Ceci, a veteran teacher of the Developmental Psychology course at Cornell, gave the course consecutively in both fall and spring semesters one year. In between the two semesters, he visited a media consultant for lessons on improving presentation style. Specifically, Professor Ceci was trained to modulate his tone of voice more and to use more hand gestures while speaking. He then proceeded, in the spring semester, to give almost the identical course (verified by checking recordings of his lectures from the fall), with the sole significant difference being the addition of hand gestures and variations in tone of voice (grading policy, textbook, office hours, tests, and even the basic demographic profile of the class remained the same). The result: student ratings for the spring semester were far higher, usually by more than one standard deviation, on all aspects of the course and the instructor. Even the textbook was rated higher by almost a full point on a scale from 1 to 5. Students in the spring semester believed they had learned far more (this rating increased from 2.93 to 4.05), even though, according to Ceci, they had not in fact learned any more, as measured by their test scores.



Again, the conclusion seems to be that student ratings are heavily influenced by cosmetic factors that have no effect on student learning


6. Academic Freedom


Some argue that SEF are a threat to academic freedom.(17) Not only do SEF influence instructors' grading policies, teaching style, and course difficulty, but they may also restrict what a professor says in class. Professors may feel inhibited from discussing controversial ideas or challenging students' beliefs, for fear that some students will express their disagreement through the course evaluation form. More than one author has described SEF as "opinion polls," with the suggestion that SEF require professors to think like politicians, seeking to avoid giving offense and putting style before substance.(18)



Alan Dershowitz reports that some of his students have "used the power of their evaluations in an attempt to exact their political revenge for my politically incorrect teaching." One student, who complained to Dershowitz about his (Dershowitz') teaching about rape from a civil liberties perspective, informed Dershowitz that he should expect to be "savaged" on the student evaluations at the end of the term. Several students subsequently complained on their teaching evaluations about the content of his lectures on the subject of rape, saying that they were offensive, that he should not be allowed to teach at Harvard, and so on.



Alan Dershowitz, of course, need have little fear of losing his job. The same is not true of less prominent, junior faculty at institutions across the country.(19) I have personally received evaluation forms complaining that the professor "teaches his own views," and I have as a result been influenced to remove controversial material from my classes.


9. The Philosophy of Consumerism


A fourth reason why SEF are widely used may be the belief that the university is a business and that the responsibility of any business is to satisfy the customer.



 Whether they measure teaching effectiveness or not, SEF are probably a highly accurate measure of student satisfaction (and the customer is always right, isn't he?).



Abrami, Philip C., Les Levanthal, and Raymond P. Perry. "Educational Seduction," Review of Educational Research 52 (1982): 446-64.

Ambady, Nalini and Robert Rosenthal. "Half a Minute: Predicting Teacher Evaluations from Thin Slices of Nonverbal Behavior and Physical Attractiveness," Journal of Personality and Social Psychology 64 (1993): 431-41.

Cahn, Steven M. Saints and Scamps: Ethics in Academia (Totowa, NJ: Rowman & Littlefield, 1986).

Cave, Martin, Stephen Hanney, Mary Henkel, and Maurice Kogan. The Use of Performance Indicators in Higher Education: The Challenge of the Quality Movement, 3rd ed. (London: Jessica Kingsley Publishers, 1997).

Centra, John A. Reflective Faculty Evaluation (San Francisco: Jossey-Bass Publishers, 1993).

d'Apollonia, Sylvia and Philip C. Abrami. "Navigating Student Ratings of Instruction," American Psychologist 52 (1997): 1198-1208.

Dershowitz, Alan. Contrary to Popular Opinion (New York: Pharos Books, 1992).

Gilbaugh, John W. "Renner Substantiated," Phi Delta Kappan 63 (Feb. 1982): 428.

Goldman, Louis. "The Betrayal of the Gatekeepers: Grade Inflation," Journal of General Education 37 (1985): 97-121.

Greenwald, Anthony G. and Gerald M. Gillmore. "Grading Leniency Is a Removable Contaminant of Student Ratings," American Psychologist 11 (1997): 1209-17.

Haskell, Robert E. "Academic Freedom, Tenure, and Student Evaluation of Faculty: Galloping Polls in the 21st Century," Education Policy Analysis Archives 5 (1997). Available online at <>.

Marsh, Herbert W. "Student Evaluations of University Teaching: Research Findings, Methodological Issues, and Directions for Future Research," International Journal of Educational Research 11 (1987): 253-388.

Marsh, Herbert W. and Lawrence A. Roche. "Making Students' Evaluations of Teaching Effectiveness Effective," American Psychologist 52 (1997): 1187-97.

Naftulin, Donald H., John E. Ware, and Frank A. Donnelly, "The Doctor Fox Lecture: A Paradigm of Educational Seduction," Journal of Medical Education 48 (1973): 630-5.

Rice, Lee. "Student Evaluation of Teaching: Problems and Prospects," Teaching Philosophy 11 (1988): 329-44.

Ryan, James J., James A. Anderson, and Allen B. Birchler, "Student Evaluations: The Faculty Responds," Research in Higher Education 12 (December, 1980): 317-33.

Sacks, Peter. Generation X Goes to College (LaSalle, IL: Open Court, 1986).

Schueler, G. F. "The Evaluation of Teaching in Philosophy," Teaching Philosophy 11 (1988): 345-8.

Selvin, Paul. "The Raging Bull of Berkeley," Science 251 (1991): 368-71.

Williams, Wendy M. and Stephen J. Ceci. "'How'm I Doing?' Problems with Student Ratings of Instructors and Courses," Change: The Magazine of Higher Learning 29 (Sept./Oct. 1997): 12-23.

Wilson, Robin. "New Research Casts Doubt on Value of Student Evaluations of Professors," Chronicle of Higher Education (Jan. 16, 1998): A12.



1. Cahn, 37.

2. Cave, et al., 147; Haskell; d'Apollonia and Abrami, 1198; Wilson.

3. According to Wilson, nearly 2000 studies of SEF have been completed.

4. For a summary of the data on reliability and validity, see Centra, 58-65.

5. Marsh and Roche, 1190.

6. See Rice, 335-6; Wilson; Greenwald and Gillmore, 1214.

7. See Goldman; Sacks.

8. See Greenwald and Gillmore. The authors discuss five alternative interpretations of the grades-ratings correlation, arguing that only the leniency-bias hypothesis explains all the patterns in the data.

9. d'Apollonia and Abrami, 1204-5.

10. See Gilbaugh, who reports that 360 of 518 students surveyed at San Jose State University gave the response indicated. This result may be taken with a grain of salt, as Gilbaugh reports it in a letter to the editor and does not give details as to survey methods. However, the results are more likely an underestimate than an overestimate, both because students may be reluctant to admit to what most would regard as unfair behavior on their part and because some students may be unaware of their bias.

11. See Marsh.

12. Ryan et al.

13. Sacks, 85.

14. Naftulin, et al.

15. See Abrami, Leventhal and Perry. However, the authors caution that these results provide little information about the validity of student ratings, in part because it is not known how much either content or stylistic factors vary among actual college professors. If, for instance, actual professors varied very little in presentation styles, then the Dr. Fox effect would not be relevant in most cases.

16. See Ambady and Rosenthal.

17. See Haskell.

18. Williams and Ceci, 12, 23; Schueler.

19. Dershowitz, 117-19.

20. The incident is discussed in Selvin.

21. Schueler, 345.

22. Cahn, 36-41.








OK Economics was designed and it is maintained by Oldrich Kyn.
To send me a message, please use one of the following addresses: ---

This website contains the following sections:

General  Economics:

Economic Systems:

Money and Banking:

Past students:

Czech Republic

Kyn’s Publications

 American education

free hit counters
Nutrisystem Diet Coupons