Follow us on Instagram
Try our daily mini crossword
Play our latest news quiz
Download our new app on iOS!

How do Princetonians feel about their courses, really? Machine learning analysis offers an answer.

Sun shines on a brown brick building and trees on a snowy day.
The Computer Science building. 
Candace Do / The Daily Princetonian

As classes recommence for the spring 2024 semester, The Daily Princetonian data section took a look ahead, examining a common end-of-semester ritual for Princeton students: course evaluations. At the end of each semester, the University encourages students to submit numerical course evaluations, rating their courses on a scale from one to five in various categories.

Questions in the survey ask about various aspects of the course experience, including “what advice would you give to another student taking this course?” These evaluations are available for the reference of future students who consider enrolling in a given course, as resources such as allow students to easily compare numerical course ratings.


However, the utility of this kind of evaluation has broadly been called into question recently, due to its inherent subjectivity. For example, the difference in quality between a score of “1” versus “2” may differ among students — this makes comparing average “Course Quality” ratings particularly problematic. The ‘Prince’ set out to test a metric that avoids the shortcomings of numerical evaluations: sentiment analysis via natural language processing (NLP).

The Computer Science (COS) department scored an average of 4.02/5. As a fraction, 4.02/5 would suggest a high course rating — this is the highest average numerical evaluation of any sequence analyzed. However, out of all written course evaluations for intro COS, only approximately 66 percent were classified as positive.

RoBERTa, the type of machine learning model we used, is pre-trained on thousands of books and English Wikipedia articles. The exact model we used is a version of RoBERTa, trained on an additional 58 million tweets. The differences between Princeton course evaluations and the data RoBERTa was trained on (books, articles, and tweets) may have affected our findings; tweets have a significant amount of linguistic noise, and this model is unfamiliar with Princeton-specific terminology (i.e. PDF, PSet, precept, etc.).

Introductory and core courses

Introductory courses, which are often required for students pursuing a particular degree track or concentration, are among the largest courses offered at the University. We evaluated introductory course sequences across various popular departments, such as Computer Science (COS) and Economics (ECO), the two departments that awarded the most degrees in the 2022–2023 academic year. For some departments, like COS, the sentiment of written reviews was consistently positive; for others, like ECO, findings varied.


The percentage of students leaving a negative written evaluation of COS 126, COS 226, and COS 217 only rose above ten percent twice since fall 2014: once in fall 2018, and again in fall 2022. Positivity in written feedback increased during the pandemic by roughly 11 percent, before returning to pre-pandemic levels. In spring 2023, 65 percent of students left a positive review, 26 percent were neutral, and only nine percent were negative. In comparison, the average numerical evaluations from fall 2014 to spring 2023 were 3.96 for COS 126, 4.29 for COS 226, and 3.82 for COS 217, for a total average of 4.02 out of 5. This numerical rating shows a potentially more positive rating than the comments show. 

COS 226: Algorithms and Data Structures consistently accounts for a large proportion of the positive reviews. 

“I appreciated the fact that our programming assignments were relevant to real world applications,” wrote Rayan Elahmadi ’26 in an email to the ‘Prince.’ Elahmadi cited assignments such as Autocomplete, where students implement a text auto-completion algorithm à la Google search, as particularly applicable.

Get the best of ‘the Prince’ delivered straight to your inbox. Subscribe now »

Before the end of junior year, all economics students at Princeton are required to complete ECO 300/310: Microeconomics, ECO 301/311: Macroeconomics, and ECO 302/312: Econometrics. Our findings for these courses varied — from fall 2015 to spring 2016, the proportion of students leaving a positive review for those six classes decreased from 54 percent to 29 percent.

During the COVID-19 pandemic, reviews gained positivity before gradually returning to pre-COVID-19 levels. In the spring of 2023, 44 percent of all submissions were positive, 40 percent were neutral, and 16 percent were negative. To contrast with the typical numerical averages, over our period of interest the core ECO classes scored an average of 3.48 out of 5.

The overall trend is slightly positive for introductory BSE math and physics courses, with an increase in the proportion of positive reviews from 33 percent in Fall 2014 to 45 percent in spring 2023. In the spring of 2020, there was a noticeable jump in negative reviews. From fall 2014 to spring 2023, the average numerical rating for these courses was 3.23.

The EGR sequence, which began in 2017, was introduced as an alternative to the traditional math and physics courses for first-year engineering students. Responses were very positive in the first several semesters — in spring 2018 and fall 2019, the proportion of positive reviews approached 80 percent of all written evaluations submitted, compared to roughly 45 percent for traditional BSE introductory courses.

There were noticeable oscillations in the data from fall to spring semesters – in some instances there was ten percent more negative feedback in the spring compared to the previous fall. Only two EGR sequence courses are offered in the spring: EGR 153: Electricity, Magnetism, and Photonics, and EGR 154: Linear Systems. The other three, EGR 151: Mechanics, Energy, and Waves, EGR 152: The Mathematics of Shape and Motion, and EGR 156: Multivariable Calculus, tend to receive more positive feedback.

The spring 2023 semester showed an all-time negativity high for the EGR sequence. The average numerical review for the EGR sequence was 3.84 out of 5.

Myles Anderson is an assistant Data editor for the ‘Prince.’

Additional consulting provided by emerita head Web Design and Development editor Anika Maskara.

Please send any corrections to corrections[at]