Step 81: Address Teaching Evaluations 2

I am about to start my fourth semester of teaching at Oxy, and I’ve started thinking more about how to collect meaningful longitudinal feedback. I took time over the break to automate some of the analysis of teaching evaluations, and in particular collating scores from the same questions over multiple years. One trend that caught my eye is the student response to “The instructor stimulated intellectual enthusiasm for the material presented”:


These drops are minimal on the absolute seven-point Likert scale, but even if they are not significant, they revive an old fear. One of my concerns before starting grad school was that I would have to suppress my intellectual curiosity. I’m not sure I ever liked the idea of studying a single topic, at least not at the expense of not pursuing other ideas. It was liberating to be done with grad school, and I did pick back up some old interests. My new worry, however, is that I will again be bored from teaching the same material semester after semester.

This is not an idle concern. Oxy is my first long-term full-time teaching job, but it would not be my first extended teaching experience with the same material. As an undergrad, I was a peer facilitator for Northwestern’s Gateway Science Workshop, and I taught the same faculty-created engineering worksheets for three years in a row. I didn’t need student evaluations to tell that I connected less and less with my students through the years. I stopped bothering with ice-breakers; I stopped asking about their non-academic life; I started following the worksheets more closely without wasting time to draw in additional concepts. It is the same narrative I took away from the plot above: following the same template semester after semester, growing comfortable with the material, but ultimately disengaging from the students and unable to inspire them to pursue the discipline.

Teaching at Oxy is very different from peer tutoring – for one, I have complete control over the material, which makes it easier to include new lectures and keep things interesting. Nonetheless, I am starting to feel that same slide towards apathy. To be clear, I don’t actually think I am losing enthusiasm for the material. Rather, what I think I am losing is the spontaneity and authenticity of presenting material for the first time. I could feel myself being less engaged the second or third time I reuse my slides. I suspect what’s happening is that I design my presentations with a lot of additional cues to keep in mind. The first time through, the class is only days (or hours) after my prep, so all the supplemental content is still in my head. When I revisit the lecture a semester later though, it’s no longer available, so I end up strictly following the content on the outline, to the detriment of the class.

The obvious answer is to start including speaking notes for my lectures, but that’s a lot of work and I honestly don’t prepare for class that way. I once heard a story, from someone who watched/shadowed a skilled teacher, who had apparently rehearsed their lecture down to pausing to put down their cup. At the other extreme is discarding all previous material and starting over, but I also worry about thereby lose the culminated improvements I’ve made over the years. The temptation of finding the middle ground is that it’s too easy to just take the material from the previous semester and use it wholesale.

One thing I might try this semester is to derive the goals of each class from scratch, before looking at old material. This would at least identify missing content and drive improvement to my classes. Separately, I’m resolving to rediscover my interests, if not in the lesson plans, then in introducing new students to the thought-provoking concepts in cognitive science and computer science.

Step 81: Address Teaching Evaluations 2

Step 80: Respond to Teaching Evaluations 1

Note: I’m addressing this to students, and this post is… condescending and patronizing. You have been warned.

As a whole, academics are self-centered. I don’t mean that they are egotistical – although some certainly are – but that they have an internal locus of control as well as a high self-efficacy. Academics tend to believe that they are competent and capable of doing their job. This is not to say that they are not open and sensitive to critique, but that academics tend to be critical of the criticisms themselves, and groundless criticisms mostly flow off our backs. After all, we went through a PhD program, and a lot of that was being told that our work was insufficient.

(That was a paragraph of sweeping generalizations; I apologize.)

Which is to say that if you despise a faculty and you want to tank their teaching evaluations, YouTube comment tactics are not going to work. First, giving someone across-the-board zeroes is easily detected. This is called an outlier in statistics, and is often excluded for summarization. Similarly, comments such as “Justin is a terrible person” do not mean much to me. It’s kind of like being called “stupid” by a young kid – the default response is “yeah, okay, I have better things to worry about”. For the comment about me being a terrible person, it’s not even that I disagree with the comments – me writing this blog post is terrible and passive aggressive of me.

So, students, here’s a tip. The way to make your negative evaluation count is to point out where the instructor is incompetent then (and this is key) back it up with evidence. Stop with the personal attacks (“Justin is a terrible person.”) and talk about what they did not do (“Justin is a terrible teacher.”). Personally, comments that I am condescending cause me less stress than arguments that my classes were not thought out. An evaluation that says “Justin’s classes are disorganized” is good, one that says “he jumped from one topic to the next” is better. Show that you know what the instructor was trying to do and that they failed. Talk about how the instructor negatively affected your ability to meet the goals of the course (maybe “Every class presented a random collection of facts, and there was no attempt to give the big picture.”) or better yet, that the instructor reflects negatively on the department/field (eg. “Although I was really interested in the class at first, I have decided that I will not major in this department if I have to continue taking classes with Justin.”). Finally, if you want to be just plain mean, compare them to other professors.

This is not guaranteed to work, especially as professors gain experience and have seen the gamut of comments. But you would have achieved your goal of rattling the instructor. Why am I telling you this? Because the most effective criticism are also the ones that help faculty figure out what to change. You are telling us what doesn’t work, and where we might do better. Speaking for myself, the more biting your criticism – as long as I see it as valid – the more I’m motivated to improve and change it. So if you’re disgruntled, by all means, negatively evaluate us – but doing it well.

Most of the quotes so far were made up, but I do want to give a real teaching evaluation comment that hit me hard. This was a mid-semester comment from two years ago:

Justin, honestly, has been terrible so far. His method of teaching is simply not conducive to learning. For example, the class features i-clicker questions, which from my experience have helped me test whether I’m understanding the material. However, Justin usually gives out an increasingly difficult series of questions regarding a topic and then proceeds to teaching the topic, generally making what could be considered mocking remarks when people get it wrong and effectively negating the purpose of i-clickers by testing us on material that we don’t cover until after the questions. Furthermore, when giving out answers for i-clickers, he generally makes remarks like “I think it’s this one” or “pretty sure, it’s C,” as if he is unaware of the correct answers for a class he’s teaching (i.e. unprepared for class). Finally, a TA led lecture when Justin was unable to attend, and it was by far the clearest, most helpful lecture I’ve experienced in the course. And from interacting with nearby students after the TA’s lecture, my sentiment seems to be shared.

Overall, I’m taking this class as a senior for general interest, so Justin’s inadequacy as a lecturer is frustrating but not inhibiting. However, for the freshmen/sophomore in the class who are considering an EECS major, I feel that the EECS department has done those students a massive disservice by allowing Justin to teach. I can’t imagine how uninspired I would be if I came across an unprepared, rude, unhelpful lecturer like him when doing the pre-reqs for my current major, and I sincerely hope he doesn’t deter some of the smart, engaging students around me who are considering an EECS major. Besides the problems/suggestions highlighted above, my final suggestion would be to allow another professor (or honestly, even the aforementioned TA) to teach the remaining lectures. Otherwise, the EECS department can go on knowing that they wasted two hours and forty minutes of interested, devoted students’ time per week because of Justin’s poor performance as an instructor.

After I first read this comment, I could not focus on my work for a week. I seriously questioned my ability and my desire to continue teaching. Part of it was because it was the first wholly negative teaching evaluation I have received. I still wince when I reread those two paragraphs, but I’m not sure I would break down quite as badly if I get the same evaluation now – I’ve just come to accept that I can’t please everyone.

(PS. Although the comment was provided anonymously, I have reason to believe that the same student ended the semester with a positive evaluation of me. The corresponding paragraph is one of all all-time favorite comments of my teaching.)

Step 80: Respond to Teaching Evaluations 1

Step 79: Explore CS1 Grading

At the end of every semester, and especially the last two when I’ve taught CS1, I always have the same thoughts about grade inflation and the meaning of grades. This is a particular problem for CS1 because somewhere between a third and a half of the students get A-‘s or A’s – a much larger proportion than in other introductory courses. One possible interpretation is that I grade too easily, but other interpretations are possible. Two in particular come to mind:

  • The course structure allows students to get objectively higher grades. I’ve written about the lack of exams and the frequent, low-stakes grading structure. I also allow unlimited autograder submissions, which means that students can tweak their code until they pass all the test cases. The autograder also provides immediate feedback, which leads to…
  • Students spend more time on this class. The data from the end-of-semester teaching evaluations support this: over both semesters (59 respondents total), the median and average time spent on this course outside of the classroom is 6 and 8.4 respectively. Keep in mind that this course already has 6 hours of lectures and labs per week, and that a course is supposed to take about 10 hours total.


(This plot omits one student who reported spending over 40 hours per week on this course. I really hope they were exaggerating.)

The real answer to the high grades is likely some combination of all three explanations. What I can’t decide is what this means in terms of the grading structure of the course. I am less concerned about grade inflation than I am about the distribution of grades. I wrote in the previous post that my “quizzes” are tri- or quad-modal. It turns out that my final grades are not as bad, but are still bi-modal, with peaks around 85% and 95%.


(Grades lower than B- have been omitted.)

As with grade inflation, there is the question of what this means, and there is the meta-question of whether it is problematic. The face-value explanation would that there are two groups students – one that gets computer science, and one that doesn’t. I’m undecided whether this describes the “true” distribution of computer science competency, but philosophically as a teacher I should not design courses with this assumption. If instead I take for granted that student skill levels are unimodal, then what the grade distribution would suggest is that I am not sufficiently sensitive to students some middle section of that curve.

One thing I do know is that this is not a problem I can fix by changing the grading structure but keeping the same assignments. I know this because I have iterated through the space of assignment weights. Within the constraints of low-stake assignments, no set of weights would transform the existing grades of my students into a unimodal distribution peaking around B or B+. What this means to me is that if I am indeed failing to identify the B+ students, the place to start would be to look at the actual content of the assignments.

I don’t have a takeaway from this. I dislike the bimodal distribution of grades, but it’s unclear whether I am justified in my dislike, and even if so, what I can do to change it. Assigning grades, as well as deciding on the grading structure of a course, requires thinking through not just what students should learn and whether their grade reflects that, but also how we trade off student achievement, time spent, and the value of negating institutional grade inflation. As a final thought, it has occurred to me that perhaps grades are not the venue to demonstrate these nuances. Perhaps grades should be seen only as the carrot-and-stick, with more emphasis put on detailed feedback provided through other channels.

Step 79: Explore CS1 Grading

Step 78: Assign Essays in CS1

As the semester wraps up, I found myself in the strange position of grading three sets of essays for my introductory computer science class.

  1. The first essay is what I called the “How to Computer” essay. Specifically, the prompt asked students to explain what it means “for everything to be ones and zeros, from numbers to videos to programs themselves”, and “what is actually happening when you run the code that you write”. This assignment served as the culmination of two weeks of lectures, starting with memory, then parsing, computer systems, and finally assembly. It’s my first time teaching this material, and I have more thoughts on this which I’ll share in a later post.
  1. The second essay is for the students’ final project, which is broadly defined structured around data journalism. Their assignment was to find a public dataset, answer a question about it with programming, then write up a report. Specifically, they must answer:
    • What dataset you used and what the dataset describes
    • What question you asked and why it’s an interesting question
    • What your predicted answer was, before you wrote code
    • The tricky part(s) to answering your question (if appropriate)
    • What your code suggests the answer is
    • Reasons why your answer may be incorrect
    • Additional questions that you might ask in the future
  1. The third essay, for extra credit, is an edited paper on diversity. Specifically, I asked students to write about ways to attract diverse students to computer science. In retrospect, I should have been a lot more specific, but asking for actionable strategies for recruiting women and minorities to computer science at Oxy.

I will be honest and say that all three assignments were unplanned – that is, I was not deliberately seeking to incorporate writing into my course. The extra credit essay had the longest history – it has been in the syllabus since the beginning, and was partially inspired by the experiences of my students. The others, however, were mostly spur-of-the-moment decisions as I struggled to find sufficiently interesting assignments. It’s worth nothing that the three assignments are different in both prompt and response. The computer organization paper provides minimal direction and asks students to regurgitate lecture material within a larger framework; the data journalism paper is much more focused in context, but much broader in content; and the diversity paper requires research and editing. I was surprised that the last paper had the highest quality, even by the first draft – my hypothesis is that it’s the closest in style to the other papers that the students have written (not to mention the self-selection for extra credit).

I think these first attempts were in the right direction, and I like the idea of making students write in a computer science course. That said, I’m not happy with how I support student writing. Both required papers could use an editing process, and I wish I had stayed at a higher level when I edited the extra credit paper. The main constraint is scheduling – it’s hard to find a week in the semester for this revision process, and harder still to teach anything substantial about writing. I will have to remember this when I teach this course again next year.

Step 78: Assign Essays in CS1

Step 77: Practice Practicum

The semester is wrapping up, and the first course to “conclude” is the Practicum course. There was no final presentation; instead, the last community partner meeting was on Saturday, where the students demo’ed their work… and that was that. There are some small changes we still have to do, and I will be back with some students next week to set up a production system, but the course is over.

Since this is the first time I’ve taught the course, I thought I would share some lessons learned. As a reminder, we were building a system that allowed volunteers with no tech experience (as in, may need help creating an email account) take pictures, then have it be OCR’ed and made searchable online.

  • Setting up expectations is (as expected) key. I knew this going in, and still failed to convey that we were not building a mobile app for the pictures. This mismatch was discovered three months into the four-month project. Lucky for us, this expectation mismatch did not require starting over, and I think the conversation we had in fixing this led to a better program at the end. Given that I was already watching for this problem and it still happened, I’m not sure what I should do next time to avoid it.
  • Students need help with organizing larger code bases. This lesson may be specific to me, because I was working with students who have only finished CS1. The insidiousness of this lesson is that the code works – it just repetitive, with multiple functions that do similar things, and the entire codebase difficult to extend for new functionality. I ended up re-writing a significant portion of the code about halfway through the semester, but in the future I will require code reviews and refactors. Similarly…
  • Force students to write documentation. By documentation here I don’t mean comments or APIs, but a narrative of the design process. I would like records of what ideas they have considered and why they ultimately decided on their current solution, but getting them to write this report was like pulling teeth. I assigned weekly reflections, which worked at the beginning of the semester when they were still understanding the problem, but were less useful as the coding took precedence. Even then, the reflections do not address the technical decisions. Both the previous point and this one lead to…
  • Breaking down grades is futile. I started the semester with a grade breakdown, with some percentage of student grades for peer evaluations, community partner evaluations, report, code, etc., but it’s unlikely I will grade based on anything but instinct. I talked to a colleague about their community-based learning class, one that is similarly product-focused, and they told me that they start with a baseline of A’s and subtract from there. I’m not opposed to this strategy, but also find it unsatisfactory. At the same time, it’s hard to devise any objective measure of goodness, even if the grade is broken down, so it seems like it comes down to gut judgments regardless.
  • Be prepared to provide ongoing support. This is tricky; students have no formal ties with the project after the semester, and even if they are willing to support the code, I feel bad putting them on the hook. As a trial run of the course, this current project is simple enough that I can support it, but I’m not sure how it would scale up when projects get more complicated or when there are more projects to be supported.

I deliberately imposed as little structure as possible on the students this semester, and it has helped me see where things break down. I was able to pick my students, and having their trust helped prevent the class from failing as I figured things out. If I teach this course again next year, it will be with more students and more projects. The library we worked with this semester agreed to work with us again (which is honestly very validating), but I will need to find new community partners as well. In the meantime, let’s see if I can put together a more coherent and structured course.

Step 77: Practice Practicum

Step 76: Discuss CS1 Grading Structure

When I designed the new CS1 last year, I made the explicit decision not to have midterms or finals, and instead have two in-lab “quizzes”. These 2-hour quizzes are done on the computer, with open notes and open internet (except communication), and account for a small percentage of the students’ grades (3% each). Teaching the same course my second time this semester, I mostly kept the same format, but increased the weight to 6% each. I write in my syllabus:

This course has no exams. This is deliberate; I believe that at the beginner level, computer science is more about the ability to produce working code (with a computer) than it is about answering questions about code (with pen and paper). For those who enjoy exams, rest assured that later computer science courses are more traditional.

I hesitate to draw conclusions from my first semester’s results, given that I was still calibrating what is possible. I believe, however, that this semester’s scores are more representative. The first quiz was before fall break and covers variables, functions, branches, lists, and loops, with three questions that each focused on one of the last three topics. The questions provided students with some example function calls and return values, but notably, their submission to the autograder does not give any feedback – they must write additional test cases themselves.

The results are decidedly mixed. A histogram with 10 bins results in a quad-modal distribution, with the two most extreme modes heavier than the middle two. The scores from the second quiz, on object-oriented programming (and in Java), had a similar distribution, although the two higher-grade modes were heavier. In both cases, the lowest mode contains exclusively students who scored 0%, and in both cases, there were students who scored 100%. There is some correlation between the two scores (correlation coefficient = 0.62), but there are students who did well on the first but tanked on the second, but also others who failed the first quiz but did decently on the second.

On a normal exam, this analysis would suggest that the exam is poorly written, and I’m not sure that this is not the case. The saving grace, however, is that a normal exam is worth 15-30+% of a students’ grade, which is more than the 12% of these two quizzes combined. In fact, the entire course is designed around frequent but low-stakes assignments. The vast majority (92%) of a student’s grade comes from

  • 8 “weekly” labs at 3% each, for 24% total
  • 6 individual homeworks at 4% each, for 24% total
  • 4 (optionally) group projects at 8% each, for 32% total
  • 2 quizzes at 6% each, for 12 % total

I would like to think this justified. First, frequent, low-stakes assessments allow students to evaluate their own progress. This is well-established in literature (here are two articles with references). Second, this allows students to stumble on a few assignments but still receive a good grade. This is necessary to accommodate the 12 homeworks, projects, and quizzes, spaced roughly once a week. Student evaluations suggest that students often spend 8+ hours a week on this course, and that’s outside of the 6 contact hours (3 hours of lecture and 3 hours of lab). Even taking late days into account, this is a significant amount of work, but is offset by the smaller weight of each assignment. Finally, I allow students to do the projects and many of the labs in groups. I half-follow Richard Felder’s ideas for cooperative learning in this, by assigning lab partners but not project partners, but definitely paying attention to minority status. As listed above, project groups are optional, which allows more advanced students to challenge themselves if they so choose.

In this grading structure, quizzes present the only timed, individual assignment. (I just realized that the projects are collaborative and untimed, the labs are collaborative and timed, the homeworks are individual and untimed, and the quizzes are individual and timed. I wish I could say I did this on purpose, and I’m not sure which is worth bragging about more.) In a way, quizzes serve as a reality check on the student’s abilities. Students can do well on the projects and homeworks by taking advantage of other students and multiple autograder submissions, but neither of those are available on a quiz. In terms of pure coding ability, writing real code with internet access under time pressure is one of the more realistic assessments.

That said, I am not convinced that coding quizzes are the right approach. While thinking about this issue, I came across Frank McCown’s blog post, where he summarized a thread on the SIGCSE mailing list on the topic. I am not convinced by the coding interview argument (being in an interactive environment dramatically changes the evaluation), nor by the cheating argument (the distributions above are not ones where cheating has occurred). The argument I am most swayed by is that pseudocode allows students to ignore syntax and focus on the problem solving. I don’t have a good idea of how much the syntax played into students who received low grades (although I should survey my students to find out), but it is a stumbling block that should not exist in an ideal class.

The SIGCSE thread makes it clear that there is no consensus on this issue, even if a majority of instructors prefer written exams. I am inclined to go deeper into the rabbit hole and ask whether exams/quizzes have a place in introductory computer science courses at all, but that will have to wait for another post.

Step 76: Discuss CS1 Grading Structure

Step 75: Share Governance of a Department

Oxy is in a time of transition – our Dean of Students stepped down not long after I started, and our Dean of the College recently started at his new position at Kalamazoo College. I was asked to be on the Dean of Students search committee, which just concluded last week. The process was time-consuming, with reading the applications, full days of interviews, and the on-campus interviews from the candidates. But it was also educational: I met people who I wouldn’t otherwise cross paths with, and the discussions on institutional priorities were some of the most meaningful, substantial conversations I have had since coming to Oxy.

One topic that stuck with me was a discussion of how Oxy students lack opportunities for leadership. What the speaker meant was that there were no obvious ways for student achievements to be recognized, nor for such students to take on additional responsibilities to guide future students down the same path. The most hierarchical organizations – the Occidental Weekly and the Orientation Team – has a visible impact on campus and do provide a path for advancement, but engages only a small portion of the community. The suggestion was that, as administrators and faculty, we should create channels for student merit to be nurtured, ultimately creating a culture where students recognize and encourage each other’s leadership potential.

The conversation was timely. Since GHC, I have thought hard about how to start a computer science peer mentorship program, especially one that is student-run and sustainable. As a leadership opportunity, I envision mentors coming to a consensus on goals to achieve for the year, as well as to interview and approve new potential mentors. The main reason I haven’t put this into action is that I am not clear on what the aim of such mentorship would be, if it should be network building, or include career path exploration, or extend into tutoring. I am also unsure of the role that existing student clubs should play or, in fact, how to best leverage my position as faculty to foster computer science at Oxy.

I had a second, much more radical, thought after the discussion: what would it mean for students to take partial responsibility for an academic program, to share governance of a department? This idea has been bubbling in my head for a while, from when students invited themselves to the faculty retreat, to earlier in the Dean of Students search process when I wondered why faculty meeting minutes are not available for students. The candid conversations about how the Dean is important to different constituents was not only meaningful to me, but also to the three students on the committee. It’s hard for me to say what exactly they experienced, if it’s simply having a voice at the table, or if it’s seeing faculty and administrators struggle with differing viewpoints. Whatever it was, I think they came out empowered with a deeper insight into how decisions are made at Oxy, and hopefully also more invested in the success of the new Dean.

Compared to Student Affairs, an academic department is one step removed from being directly responsible for and answerable to students, but that does not mean they should not be involved in the departmental decision. Students may not have the expertise to design a curriculum, but they could have a say in future course offerings, or who to bring in as speakers, or the structure of senior comps. In turn, these student representatives are expected to faithfully report the concerns and requests of their classmates and to bring them to the attention of the faculty, and conversely to relay the reasoning of the faculty back to other students.

Going back to student leadership, this role of student-faculty liaison is one that has clear privileges and responsibilities, and provides an obvious goal for students who want to excel and give back. Whoever steps into the role – and I am imagining two students, both to make scheduling easier and so they can support each other – must be respected by the other students, which requires leadership ability. And finally, the whole structure would strengthen student buy-in, while providing additional learning opportunities for students.

As far as I know, this model of departmental governance does not exist. Oxy, as with other colleges, assembles ad-hoc student committees for faculty searches. Students do sit on various administrative committees, but I do not know if they have a defined role; at least, I have not heard of student reports from committee meetings. The most relevant example I know of is for graduate students, but despite the stated responsibilities of the Graduate/Faculty Representative, that person has mostly relayed information out of faculty meetings and rarely provide input. Regardless, I do not know of any undergraduate departments where students participate in departmental affairs.

Secretly, I’m hoping that such a program will take off and help mend some of the mistrust between students and administration. The idea of departmental shared governance seems sufficiently obvious that I may be missing a critical flaw, but I’m not currently seeing one. And if starting a department is not the time to experiment, when is?

Step 75: Share Governance of a Department