Education Policy Analysis Archives

Volume 8 Number 50  November 2, 2000 ISSN 1068-2341

A peer-reviewed scholarly electronic journal
Editor: Gene V Glass, College of Education
Arizona State University
Copyright 2000, the
Permission is hereby granted to copy any article
EPAA is credited and copies are not sold.


Excerpts from
Student Evaluation of Teaching:
A Methodological Critique of Conventional Practices
Robert Sproule
Bishop's University (Canada)


I. Introduction


.....the present work should be seen as an attempt to further reinforce two views: that


 SET data are not methodologically sound, and that they ought not be treated as admissible evidence in any legal or quasi-legal hearing related to the "summative" function.

(From "Student Evaluation of Teaching" by Robert Sproule)


And the third motivation stems from the notion of academic honesty, or from the virtue of acknowledging ignorance when the situation permits no more or no less – a notion and a virtue the academic community claims as its own. This motivation is captured succinctly by Thomas Malthus (1836) in a statement made over a century and half ago. He wrote:

To know what can be done, and how to do it, is beyond a doubt, the most important species of information. The next to it is, to know what cannot be done, and why we cannot do it. The first enables us to attain a positive good, to increase our powers, and augment our happiness: the second saves us from the evil of fruitless attempts, and the loss and misery occasioned by perpetual failure. (p. 14)


III. Fallacies Of A Conceptual Sort
Inherent In The SET Process


In this section, I outline two fallacies of a conceptual sort inherent in the SET process. These are: (a) that students are a, or alternatively are the only, source of reliable information on teaching effectiveness, and (b) there exists a unique and immutable metric termed "teaching effectiveness


III.1. Students As A, Or The Only, Source Of
Reliable Information on Teaching Effectiveness


The Public-Good Argument


 The advocates of the SET process would argue: The university is a business, and the student its customer. And since the customer is always right, customer opinion must drive the business plan. Mainstream economists would argue that this is a false analogy. Their reason is that these same advocates are assuming that the provision of tertiary education is a "private good." This (economists would argue) is not so: It is a "public good."(Note 7) As such, students are not solely qualified to evaluate course content, and the pedagogical style of a faculty member.

(From "Student Evaluation of Teaching" by Robert Sproule)


The Student-Instructor Relationship Is Not One of Customer-Purveyor, And Hence Not A Relationship Between Equals: As Stone (1995) noted,
"Higher education makes a very great mistake if it permits its primary mission to become one of serving student "customers


 Treating students as customers means shaping services to their taste.  It also implies that students are entitled to use or waste the services as they see fit.  "

(From "Student Evaluation of Teaching" by Robert Sproule)


As Michael Platt (1993) noted: "The questions typical of student evaluations teach the student to value mediocrity in teaching and even perhaps to resent good teachers .... Above all, such questions also conceive the relation of student and teacher as a contract between equals instead of a covenant between unequals. Thus, they incline the student, when he learns little, to blame the teacher rather than himself. No one can learn for another person; all learning is one's own ...." (p. 31)

While the student-instructor relationship is not one of customer-purveyor, and hence not a relationship between equals, the SET process itself offers the illusion that it is. As Platt (1993) noted:


"Merely by allowing the forms, the teacher loses half or more of the authority to teach." (p. 32)

(From "Student Evaluation of Teaching" by Robert Sproule)


Students Are Not Sufficiently Well-Informed To Pronounce On The Success Or Failure of the Academic Mission: ... Therefore, students are not in a position to speak for all vested interests (including their own long- term interests).

(From "Student Evaluation of Teaching" by Robert Sproule)


 For example, Michael Platt (1993) noted:

"Pascal says: while a lame man knows he limps, a lame mind does not know it limps, indeed says it is we who limp. Yet these forms invite the limpers to judge the runners; ....Naturally, this does not encourage the former to become the latter. " (p. 32)

(From "Student Evaluation of Teaching" by Robert Sproule)


In the same vein, Adams (1997) noted, "Teaching, as with art, remains largely a matter of individual judgment. Concerning teaching quality, whose judgment counts? In the case of student judgments,


the critical question, of course, is whether students are equipped to judge teaching quality. Are students in their first or second semester of college competent to grade their instructors, especially when college teaching is so different from high school? Are students who are doing poorly in their courses able to objectively judge their instructors? And are students, who are almost universally considered as lacking in critical thinking skills, often by the administrators who rely on student evaluations of faculty, able to critically evaluate their instructors? There is substantial evidence that they are not." (p. 31)

(From "Student Evaluation of Teaching" by Robert Sproule)


The Anonymity of The Respondent: As noted above,

the SET process provides that the identity of the respondent to the SET questionnaire would or could never be disclosed publicly. This fact contains a latent message to students. This is, in the SET process, there are not personal consequences for a negligent, false, or even malicious representation. There is no "student responsibility" in student evaluations.

(From "Student Evaluation of Teaching" by Robert Sproule)


 It is as if the student was being assured: "We trust you. We do not ask for evidence, or reasons, or authority. We do not ask about your experience or your character. We do not ask your name. We just trust you. ... [Platt (1993, p. 34)]


III.2. Opinion Misrepresented As Fact Or Knowledge


 A major conceptual problem with the SET process is that opinion is misrepresented as fact or knowledge, not to mention the unintended harm that this causes to all parties. This misrepresentation .....raises problems in statistical analysis of the SET data in that any operational measure of "teaching effectiveness" will not be, by definition, a unique and immutable metric.

(From "Student Evaluation of Teaching" by Robert Sproule)


Two premises of the conventional SET process are: (i) there exists a unique and an immutable metric, "teaching effectiveness," and (ii) the operational measure of this metric can be gleaned from data captured by the SMIQ, or by a latent-variable analysis (most commonly, factor analysis) of a number of related questions. .... In my view, neither premise is credible..... the data captured by the conventional SET process in general, and the SMIQ in particular, can at best measure "instructor popularity" or "student satisfaction"


IV. Fallacies Of A Statistical Sort
Inherent In The SET Process


In this section, I outline potential fallacies of a statistical sort inherent in the SET process. There are two: (a) under all circumstances, the SMIQ provides a cardinal measure of "teaching effectiveness" of an instructor, and (b) in the absence of statistical controls, the SMIQ provides an ordinal measure of "teaching effectiveness" of an instructor. (Notes 15,16)


IV.1. Ascribing A Cardinal Measure
of Teaching Effectiveness To An Instructor
Based on The SMIQ


Return to the example of the three professors, A, B, and C, who teach classes, X, Y, and Z, respectively. Recall that A in X scored 4.5, B in Y scored 3.0; C in Z scored 2.5, and the reference group scored 3.5. A premise of the SET process is that these averages are cardinal measures of "teaching effectiveness." ...    In my view, one would not be justified in believing any such claim simply because of the argument outlined in the previous section; that is, a unique and an immutable metric, "teaching effectiveness," does not exist.


IV.2. The Rank Ordering Of Instructors
By Teaching Effectiveness Based on The


 An alternative premise of the conventional SET process is that the averages of the data captured by the SMIQ serve as a basis for an ordinal measure of "teaching effectiveness." .... In my view, this belief could be seen as justifiable: (a) if the SMIQ captures an unequivocal reading of "teaching effectiveness" (see above), and (b) if the subsequent analysis controls for the many variables which confound the data captured by the SMIQ.(Note 18) What are these confounding variables that require control? To answer this question, two studies are worthy of mention.

One, in a review of the literature, Cashin (1990) reports that (in the aggregate) students do not provide SET ratings of teaching performance uniformly across academic disciplines. (Note 19)  


Two, in their review of the literature, Mason et a
l. (1995, p. 404) note that there are three clusters of variables, which affect student perceptions of the teaching effectiveness of faculty members. These clusters are: (a) student characteristics, (b) instructor characteristics, and (c) course characteristics. (Note 20) They also note that only one of these clusters ought to be included in any reading of "teaching effectiveness." This is the cluster, "instructor characteristics." Commenting on prior research, Mason et al. (1995, p. 404) noted: "A …virtually universal problem with previous research is that the overall rating is viewed as an effective representation of comparative professor value despite the fact that it typically includes assessments in areas that are beyond the professor's control. The professor is responsible to some extent for course content and characteristics specific to his/her teaching style, but is unable to control for student attitude, reason for being in the course, class size, or any of the rest of those factors categorized as student or course characteristics above. Consequently, faculty members should be evaluated on a comparative basis only in those areas they can affect, or more to the point, only by a methodology that corrects for those influences beyond the faculty member's control. "


V. Why Has The Conventional SET Process
Not Been Discarded?


 Given that the likelihood of deriving meaningful and valid inferences from raw SET data is nil, the question remains: Why is the conventional SET process (with its conceptual and statistical shortcomings) employed even to this day, and by those for who highly revere the power of critical thinking?

(From "Student Evaluation of Teaching" by Robert Sproule)


To my mind, there are three answers to this question. The first answer concerns political expediency; that is, while


 fatally flawed, raw SET data can be used as a tautological device; that is, to justify most any personnel decision.

(From "Student Evaluation of Teaching" by Robert Sproule)


As a professor of economics at Indiana University and the Editor of The Journal of Economic Education noted:
"End of term student evaluations of teaching may be widely used simply because they are inexpensive to administer, especially when done by a student in class, with paid staff involved only in the processing of the results…Less-than-scrupulous administrators and faculty committees may also use them … because they can be dismissed or finessed as needed to achieve desired personnel ends while still mollifying students and giving them a sense of involvement in personnel matters. "[Becker (2000, p. 114


The second is offered by Donald Katzner (1991). He asserted that in their quest to describe, analyze, understand, know, and make decisions, western societies have accepted (for well over five hundred years) the "myth of synonymity between objective science and measurement" (p. 24). (Note 22)

 The third reason is offered by Imre Lakatos (1978) in his explanation as to why prevailing scientific paradigms are rarely replaced or overthrown.

     Thus, in my view, the conventional SET process is the artifact of an SRP. Judging from the substance of its protective belt, and from the disciplinary affiliations of its proponents or advocates, this is an SRP defined and protected by a cadre of psychologists and educational administrators. (Notes 23,24)




Adams, J.V. (1997), Student evaluations: The ratings game, Inquiry 1 (2), 10-16.

Aiger, D., and F. Thum (1986), On student evaluation of teaching ability, Journal of Economic Education, Fall, 243-265.

Altschuler, G. (1999), Let me edutain you, The New York Times, Education Life Supplement, April 4.

Becker, W. (2000), Teaching economics in the 21st century, Journal of Economic Perspectives 14 (1), 109-120.

Becker, W., and J. Power (2000), Student performance, attrition, and class size, given missing student data, Economics of Education Review, forthcoming.

Belman, D., and J.S. Heywood (1991), Sheepskin effects in the returns to education: An examination on women and minorities, Review of Economics and Statistics 73 (4), 720-24.

Belman, D., and J.S. Heywood (1997), Sheepskin effects by cohort: Implications of job matching in a signaling model, Oxford Economic Papers 49 (4), 623-37.

Blunt, A. (1991), The effects of anonymity and manipulated grades on student ratings of instructors, Community College Review 18, Summer, 48-53.

Canadian Association of University Teachers (1986), What is fair? A guide for peer review committees: Tenure, renewal, promotion, Information Paper, November.

Canadian Association of University Teachers (1998), Policy on the use of anonymous student questionnaires in the evaluation of teaching, CAUT Information Service Policy Paper 4-43.

Cashin, W. (1990), Students do rate different academic fields differently, in M. Theall and J. Franklin, eds., Student Ratings of Instruction: Issues for Improving Practice, New Directions for Teaching and Learning, No. 43 (San Francisco, CA: Jossey-Bass).

Cornell University (1997), Cornell study finds student ratings soar on all measures when professor uses more enthusiasm: Study raises concerns about the validity of student evaluations, Science News, September 19th.

Crumbley, D.L. (1995), Dysfunctional effects of summative student evaluations of teaching: Games professors play, Accounting Perspectives 1 (1), Spring, 67-77.

Damron, J.C. (1995). The three faces of teaching evaluation, unpublished manuscript, Douglas College, New Westminster, British Columbia.

d'Apollonia, S., and P. Abrami (1997), Navigating student ratings of instruction, American Psychologist 52 (11), 1198-1208.

Feyerabend, P. (1975), Against Method (London: Verso).

Fox, D. (1983), Personal theories of teaching, Studies in Higher Education 8 (2), 151-64.

Frankel, C. (1968), Education and the Barricades (New York: W.W. Norton).

Gillmore, G. (1984), Student ratings as a factor in faculty employment decisions and periodic review, Journal of College and University Law 10, 557- 576.

Gramlich, E., and G. Greenlee (1993), Measuring teaching performance, Journal of Economic Education, Winter, 3-13.

Grant, H. (1998), Academic contests: Merit pay in Canadian universities, Relations Industrielles / Industrial Relations 53 (4), 647-664.

Greenwald, A., and G. Gilmore (1997), Grading leniency is a removable contaminant of student ratings, American Psychologist 52 (11), 1209-17.

Hands. D.J. (1996), Statistics and the theory of measurement, Journal of the Royal Statistical Society – Series A 159 (3), 445-473.

Haskell, R.E. (1997a), Academic freedom, tenure, and student evaluations of faculty: Galloping polls in the 21st century, Education Policy Analysis Archives 5 (6), February 12.

Haskell, R.E. (1997b), Academic freedom, promotion, reappointment, tenure, and the administrative use of student evaluation of faculty (SEF): (Part II) Views from court, Education Policy Analysis Archives 5 (6), August 25.

Haskell, R.E. (1997c), Academic freedom, promotion, reappointment, tenure, and the administrative use of student evaluation of faculty (SEF): (Part III) Analysis and implications of views from the court in relation to accuracy and psychometric validity, Education Policy Analysis Archives 5 (6), August 25.

Haskell, R.E. (1997d), Academic freedom, promotion, reappointment, tenure, and the administrative use of student evaluation of faculty (SEF): (Part IV) Analysis and implications of views from the court in relation to academic freedom, standards, and quality of instruction, Education Policy Analysis Archives 5 (6), November 25.

Heywood, J.S. (1994), How widespread are sheepskin returns to education in the U.S.?, Economics of Education Review 13 (3), 227-34.

Hungerford, T., and G. Solon (1987), Sheepskin effects in the returns to education, Review of Economics and Statistics 69 (1), 175-77.

Jaeger, D., and M. Page (1996), Degrees matter: New evidence on sheepskin effects in the returns to education, Review of Economics and Statistics 78 (4), 733-40.

Johnson, R., and D. Wichern (1988), Applied Multivariate Statistical Analysis, Second Edition (Englewood Cliffs: Prentice-Hall).

Katzner, D. (1991), Our mad rush to measure: How did we get there?, Methodus 3 (2), 18-26.

Lakatos, I. (1978), The Methodology of Scientific Research Programmes (Cambridge: Cambridge University Press).

Linden, M. (1977), A factor analytic study of Olympic decathlon data, Research Quarterly 48 (3), 562- 568.

Malthus, T. (1836), Principles of Political Economy, 2nd Edition.

Marsh, H. (1987), Students' evaluations of university teaching: Research findings, methodological issues, and directions for future research, International Journal of Educational Research 11, 253-388.

Marsh, H., and L. Roche (1997), Making students' evaluations of teaching effectiveness effective: The central issues of validity, bias, and utility, American Psychologist 52 (11), 1187-97.

Mason, P., J. Steagall, and M. Fabritius (1995), Student evaluations of faculty: A new procedure for using aggregate measures of performance, Economics of Education Review 12 (4), 403-416.

McKeachie, W. (1997), Student ratings: The validity of use, American Psychologist 52 (11), 1218- 1225.

Molho, I. (1997), The Economics of Information: Lying and Cheating in Markets and Organizations (Oxford: Blackwell).

Nelson, J., and K. Lynch (1984), Grade inflation, real income, simultaneity, and teaching evaluations, Journal of Economic Education, Winter, 21-37.

Pearce, D.W., ed. (1992), The MIT Dictionary of Modern Ecoonomics, 4th Edition (Cambridge, MA: MIT Press).

Pindyck, R. and D. Rubinfeld (1991), Econometric Models & Economic Forecasts (New York: McGraw-Hill).

Platt, M. (1993), What student evaluations teach, Perspectives In Political Science 22 (1), 29-40.

Rifkin, T. (1995), The status and scope of faculty evaluation, ERIC Digest.

Rodin, M., and B. Rodin (1972), Student evaluations of teaching, Science 177, September, 1164- 1166.

Rundell, W. (1996), On the use of numerically scored student evaluations of faculty, unpublished working paper, Department of Mathematics, Texas A&M University.

Ruskai, M.B. (1996), Evaluating student evaluations, Notices of The American Mathematical Society 44 (3), March 1997, 308.

Scriven, M. (1967), The methodology of evaluation, in R. Tyler, R. Gagne, and M. Scriven, eds., Perspectives in Curriculum Evaluation (Skokie, IL: Rand McNally).

Siegel, S. (1956), Nonparametric Statistics For The Behavioral Sciences (New York: McGraw-Hill).

Smith, R. (1999), Unit roots and all that: The impact of time-series methods on macroeconomics, Journal of Economic Methodology 6 (2), 239-258.

Spence, M. (1974), Market Signaling (Cambridge, MA: Harvard University Press).

Sproule, R. (2000). The underdetermination of instructor performance by data from the student evaluation of teaching,Economics of Education Review (in press).

Stevens, S.S. (1946), On the theory of scales of measurement, Science 103, 677-680.

Stone, J.E. (1995), Inflated grades, inflated enrollment, and inflated budgets: An analysis and call for review at the state level, Education Policy Analysis Archives 3 (11).

Walstad, A. (1999), Science as a market process, unpublished paper, Department of Physics, University of Pittsburgh—Johnstown.

Weissberg, R. (1993), Standardized teaching evaluations, Perspectives In Political Science 22 (1), 5-7.

Zucker, S. (1996), Teaching at the university level, Notices of The American Mathematical Society 43 (8), August, 863-865.


OK Economics was designed and it is maintained by Oldrich Kyn.
To send me a message, please use one of the following addresses: ---

This website contains the following sections:

General  Economics:

Economic Systems:

Money and Banking:

Past students:

Czech Republic

Kyn’s Publications

 American education

free hit counters
Nutrisystem Diet Coupons