Step 76: Discuss CS1 Grading Structure

When I designed the new CS1 last year, I made the explicit decision not to have midterms or finals, and instead have two in-lab “quizzes”. These 2-hour quizzes are done on the computer, with open notes and open internet (except communication), and account for a small percentage of the students’ grades (3% each). Teaching the same course my second time this semester, I mostly kept the same format, but increased the weight to 6% each. I write in my syllabus:

This course has no exams. This is deliberate; I believe that at the beginner level, computer science is more about the ability to produce working code (with a computer) than it is about answering questions about code (with pen and paper). For those who enjoy exams, rest assured that later computer science courses are more traditional.

I hesitate to draw conclusions from my first semester’s results, given that I was still calibrating what is possible. I believe, however, that this semester’s scores are more representative. The first quiz was before fall break and covers variables, functions, branches, lists, and loops, with three questions that each focused on one of the last three topics. The questions provided students with some example function calls and return values, but notably, their submission to the autograder does not give any feedback – they must write additional test cases themselves.

The results are decidedly mixed. A histogram with 10 bins results in a quad-modal distribution, with the two most extreme modes heavier than the middle two. The scores from the second quiz, on object-oriented programming (and in Java), had a similar distribution, although the two higher-grade modes were heavier. In both cases, the lowest mode contains exclusively students who scored 0%, and in both cases, there were students who scored 100%. There is some correlation between the two scores (correlation coefficient = 0.62), but there are students who did well on the first but tanked on the second, but also others who failed the first quiz but did decently on the second.

On a normal exam, this analysis would suggest that the exam is poorly written, and I’m not sure that this is not the case. The saving grace, however, is that a normal exam is worth 15-30+% of a students’ grade, which is more than the 12% of these two quizzes combined. In fact, the entire course is designed around frequent but low-stakes assignments. The vast majority (92%) of a student’s grade comes from

  • 8 “weekly” labs at 3% each, for 24% total
  • 6 individual homeworks at 4% each, for 24% total
  • 4 (optionally) group projects at 8% each, for 32% total
  • 2 quizzes at 6% each, for 12 % total

I would like to think this justified. First, frequent, low-stakes assessments allow students to evaluate their own progress. This is well-established in literature (here are two articles with references). Second, this allows students to stumble on a few assignments but still receive a good grade. This is necessary to accommodate the 12 homeworks, projects, and quizzes, spaced roughly once a week. Student evaluations suggest that students often spend 8+ hours a week on this course, and that’s outside of the 6 contact hours (3 hours of lecture and 3 hours of lab). Even taking late days into account, this is a significant amount of work, but is offset by the smaller weight of each assignment. Finally, I allow students to do the projects and many of the labs in groups. I half-follow Richard Felder’s ideas for cooperative learning in this, by assigning lab partners but not project partners, but definitely paying attention to minority status. As listed above, project groups are optional, which allows more advanced students to challenge themselves if they so choose.

In this grading structure, quizzes present the only timed, individual assignment. (I just realized that the projects are collaborative and untimed, the labs are collaborative and timed, the homeworks are individual and untimed, and the quizzes are individual and timed. I wish I could say I did this on purpose, and I’m not sure which is worth bragging about more.) In a way, quizzes serve as a reality check on the student’s abilities. Students can do well on the projects and homeworks by taking advantage of other students and multiple autograder submissions, but neither of those are available on a quiz. In terms of pure coding ability, writing real code with internet access under time pressure is one of the more realistic assessments.

That said, I am not convinced that coding quizzes are the right approach. While thinking about this issue, I came across Frank McCown’s blog post, where he summarized a thread on the SIGCSE mailing list on the topic. I am not convinced by the coding interview argument (being in an interactive environment dramatically changes the evaluation), nor by the cheating argument (the distributions above are not ones where cheating has occurred). The argument I am most swayed by is that pseudocode allows students to ignore syntax and focus on the problem solving. I don’t have a good idea of how much the syntax played into students who received low grades (although I should survey my students to find out), but it is a stumbling block that should not exist in an ideal class.

The SIGCSE thread makes it clear that there is no consensus on this issue, even if a majority of instructors prefer written exams. I am inclined to go deeper into the rabbit hole and ask whether exams/quizzes have a place in introductory computer science courses at all, but that will have to wait for another post.

Step 76: Discuss CS1 Grading Structure

One thought on “Step 76: Discuss CS1 Grading Structure

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s