Step 33: Ponder the New Curriculum

I’ve been meaning to talk about the changes to Oxy’s computer science curriculum, and at the end of the semester things seem to have moved so quickly and yet not at all. On one hand, I submitted a curriculum proposal to the relevant faculty body close to two months ago; on the other hand, I have yet to hear back (I heard they decided to focus on other issues as a result of the student protests on campus; I don’t blame them). Still, even though what I’m writing about may not be what is actually implemented, I want to reflect a little on what I proposed and the thought that went into it.

The first question that I should answer is why I’m doing this in the first place. Oxy already has a computer science minor, so it seems I should be focusing on creating a major instead. In some respects, the answer is obvious: the current curriculum starts off with a two unit course (so what students can learn and take away are limited) and the curriculum overall skims on the applications in favor of technical details. These, I think, are sufficient justification to rethink the program. The problem is not that students are not minoring in computer science – there are over thirty juniors and seniors declared as computer science minors – but that it’s a particular kind of student who does: physics and math majors, white or Asian, mostly male. You might say that the current students are there in spite of the curriculum, not because of it, and that is what I want to change most of all.

What’s funny is that I don’t remember ever being explicitly asked to create a new curriculum. Looking back through my emails and journal entries, I was somehow invited to curricular planning meetings where a redesign was assumed, and it’s almost as though I just took the responsibility by chance. I say this because I can’t find an obvious source of inspiration for what I ended up producing. I did look at the curricular of other schools, even though I already had a good idea of what they were. But mostly, I approached the problem from the faculty’s perspective – what computational skills are missing from our students’ repertoire, and how we can create courses to build those skills up. Previously I listed five user stories that I thought the curriculum should accommodate, but I as just as much influenced by faculty who lamented the inability of their students to do computational research.

The result is the following structure for a minor:

  1. One of either intro to computer science courses, focused on either media or data
  2. Data structures and algorithms
  3. Discrete math
  4. Three electives

Everything from data structures on up is standard, so I want to talk more about the two intro courses. The idea is that the two courses would cover the same standard content in a first course – variables, loops, basic data structures – but in the context of either digital/new media or data generation/visualization. The former was probably inspired by some of the work on media computation, as well as the projects that were used in Michigan’s intro course. As for the latter, it is partly driven by the kinds of computational research that faculty do, but also by students’ lack of awareness of the role that computers play. These two are of course not mutually exclusive – the current plan for the media course (which I’m teaching this coming semester) will likely involve a web interface for course schedule data (an idea I stole from Michigan), while I would like the data course to end with a piece of data-driven journalism where users can explore the data themselves. In both cases, the hope is that even from just one course, students can take what they learned and use it in whatever their major is, whether it’s visual arts or economics.

I first drafted the proposal in the last half of September, which means it has been at least three months since I had given serious thought to this curriculum design. Looking back now and writing this blog post, I’m not sure how I feel about this design anymore. Many of the reasons I listed above feel post-hoc and unconvincing, and I worry that I won’t be able to develop the courses to the satisfaction of both my students and myself. I can convince myself that it’s not a bad idea, since it’s an improvement over what Oxy has now; I just can’t yet convince myself that it’s a good idea. I don’t know of other curricula that have a similar topic-based divide (although Harvey Mudd has a special biology-focused intro course), and I worry that there’s some reason for its failure that I have yet to discover.

I guess I’ll find out when the semester starts. I have a rough schedule for the course planned out, but will still have to do a lot of work as assignments need to be released. With luck, teaching actual computer science will be more familiar territory than doing it in a cognitive science context.

Step 33: Ponder the New Curriculum

Step 32: Respond to Teaching Evaluations

Grades were due last Monday, and on last Friday the student teaching evaluations were released. Contrary to all the previous times I had teaching evaluations, I actually held off from immediately reading them. Instead, I took the time to write out my own three points of improvement for each class. The goal was not to predict what students would write, but to set goals for improvement without being biased by students.

Instead of talking about all of them, let me just mention one point of improvement that applies to both of my courses: that I need to make the big picture more explicit. This was surprisingly to realize, since the courses cover very different topics and are aimed at very different audiences. But both courses are organized as surveys of multiple topics, and I just didn’t spend enough time explicitly pulling everything together. For Topics in AI, this was somewhat compensated for by the chats, which allowed me to prod students in that direction; for Intro to Cog Sci, although write homework and exam questions that pull together multiple lectures’ content, I suspect students still don’t get the big picture.

I didn’t think students would pick up on this, but they did. Aside from that, the remainder of this post will talk about each class individually, starting with my better one. Instead of sharing all the numbers – which I don’t feel comfortable doing yet – I will talk about general trends I noticed, and (where applicable) ideas for fixing them in the future.

Note: the evaluations are separated into sections about the student, the instructor, and the course. I will only talk about the last two.

Topics in Artificial Intelligence

For the course, the lowest rating is for whether students improved on speaking clearly – an understandable issue, since students never had to present anything (except to me). The next two, however, were surprising: that students don’t think they improved on their ability to work independently or to write clearly. The independence issue I can see, since I require all projects to be done in pairs. Getting students to work individually was not a goal of mine, and a good question at this point is whether it should be. Part of my consideration is that I’m not sure of the value of working individually in a course like this one, since the focus is not on developing any deep technical ability. I think the evaluation is justified, and that it’s not something that I intend to change.

The small improvements in rating I’m more confused by. Most of the projects require a written report as the deliverable, although I admit that no effort was spent on my part trying to improve their writing. One possibility here is to direct students to the writing center, but the projects here are also not the type of writing they can help with. The conflict I see here is between improving writing – which often requires a more open-ended prompt – versus guiding students in asking and considering the right questions. The latter was something I had trouble with, which naturally led to the restriction on what students could write about. I do want to help my students develop their writing ability, but I don’t want to sacrifice the topical (artificial intelligence) questions that they should also be thinking about. It may be possible to provide sample answers as demonstration of what I expect, as opposed to explicitly listing what questions they have to answer – but I suspect that the questions are abstract enough that it’s hard for beginners in the field to grasp.

The last thing about the course evaluations, I do want to say that I don’t believe that students improved their ability to read critically, despite the relatively high ratings for that question. Although I assigned readings and made students ask questions about them, I never explicitly talk about the readings in detail, never mind any kind of deep analysis. Unlike the writing, this I could have done more about; maybe that’s something for me to work on in the future.

Onto the instructor questions then. The lowest rating I got in that set of questions was for whether the instructions and criteria for assignments were clear. I agree with this evaluation, even taking into account the improvements made after learning to be specific about the type of explanations I wanted. One example of this (this is me speculating, not from student comments) is in the last assignment about NLP, when I asked students to give their information extraction program a grade, but then took points off when they didn’t say how they are counting false positives and false negatives. As I said above, this may be an issue of course structure more than assignment instruction.

Other than that issue, the overall evaluations for this class are fairly high – which I think is expected for a class of fifteen. There are some more smaller issues with the format of the lecture – I opted not to use slides and instead using the blackboard where necessary, but in retrospect having something that students can refer back to may have been more beneficial, in additional to providing (potentially interactive) visual aid. Another good suggestion from students is including smaller assignments between the “projects”, which could provide practice for the mathematical/algorithmic underpinnings of the topics. I think both of these are good ideas worth incorporating in the next iteration of the course.

Introduction to Cognitive Science

I know I mostly focused on the areas of improvement for Topics in AI, and it therefore sounds strange to say that the evaluations for this course is less positive.

As before, let’s start with the course-level comments. Keeping in mind that this course is graded almost entirely on homeworks and exams, it’s perhaps not surprising that students didn’t feel their speaking or writing skills improved. Again, a question here is whether this course should aim to train students in these skills as well, and whether it would take away from the cognitive science content. In this case, I think it’s possible to do so, although it would require a larger change in the structure of the course. Although – I wonder what STEM class has the highest scores for these categories. The speaking I can see as coming from some research-focused course, where the final assessment is a presentation of some kind (with practices during the semester for scaffolding). The writing part I am truly baffled by. How do humanities courses do it? They have multiple papers on predefined topics that apply some theory, which is then workshopped over multiple weeks. I can’t decide whether a similar system would work for STEM courses – nor why it would or wouldn’t work.

The more intrusive part of the evaluation for this course is about me as the instructor. The category with the lowest score was for my ability to clearly explain concepts, and a look at the comments makes it clear that it was specifically the computational (eg. A*, perceptrons) and mathematical (eg. Bayes’ theorem) concepts that students found opaque. Let me see if I can break them down into categories:

  1. The lack of connection between the computer science parts and the cognitive science parts. This one I agree with, especially for teaching A*, which I completely screwed up on, and I explained last time how students didn’t understand how artificial neural networks fit into cognitive science. The solution here, I think, is two fold. First, there needs to be a more general lecture (or portions of a lecture devoted to why computers are used in cognitive science in the first place. This is a theme that has been implied the entire semester, but which (given the comments) needs to be emphasized. Second, every computational and mathematical topic needs to be turned on its head – first present a problem in cognitive science, then suggest computers/math as the way to tackle it. This would embed the concepts within the broader content of the course.
  2. The general difficulty of computer science/math. I’m… not sure what to do about this. I know there are improvements to my teaching, some of which I’ve mentioned previously on this blog; students also suggested having more worked examples/problems. Both of these are good ideas which I will use for next semester. What I’m still unclear on is how much computer science to cover. I will actually single out a comment here, from a student who felt that “There was too much actual computer science in the sense I was doing actual computer science computations. I feel as though the class could do with out it, or reduced. Keep the aspects of computer science, but take away the computations.”

    Here’s what’s been bothering me for the last couple days: what does it mean to do computer science without the computation? I mean, students can read about computer models and what that showed about (say) neural networks. Is that computer science or cognitive science? When a students reads Descartes and dualism, it’s clearly both philosophy (of mind) and some of the underpinnings of cognitive science. The same cannot be said of cognitive models, not without also teaching what it means to follow an algorithm. The same can be said about math – we can talk about how researchers use math in their work, but even at the introductory level, I don’t think it’s unreasonable for students to be doing calculations.

    What I’m trying to do is understand where students are coming from, and what would help them understand the content better. To put it more simply: why are people afraid of math/find it more difficult, and what can I do as an instructor to overcome that? I’ve never asked myself that question before, but that seems like a topic the evaluations are pointing me towards. It’s also interesting to me that this didn’t come up when I taught introductory computer science at Michigan, despite the first and second projects being somewhat math heavy (no more advanced than floors and ceilings; but then, neither is Bayes’ theorem). Glimpsing the answer to this question – in addition to improving how I explain things – seems like appropriate first steps.

There are some other points that students made, but they are relatively insignificant compared to the two above. It seems like I have some thinking to do before I teach those topics again.

Step 32: Respond to Teaching Evaluations

Step 31: Consider a (Perfect?) Holistic Assessment

My first semester is done! Last week was finals, and grades were due yesterday. I want to do a more holistic review of my semester, but the student evaluations are not released until Friday, so I’ll wait for next week.

Instead, I want to talk about evaluating students. I re-read a particular Math with Bad Drawing comic – the Church of the Right Answer – and realized there’s something in there that relates to both of my classes.

Let me start with Intro to Cog Sci, which is what I had planned (weeks in advance) to write about. I had written about my experience teaching A* before, and how I was going to apply what I learned to teaching the perceptron algorithm. As I laid out in that post, I made sure that the surface structure of the algorithm was kept constant, that I emphasize the unintuitive parts of the algorithm (the activation threshold tripped up some students), and so on. I even provided them with a web app where they can play around with the algorithm and check their work, in the same vein as what I did for my Topics in AI students. I made students do several trials of perceptron learning in the homework, which went well in general (some students thought that all connections weights would change by the same amount every time).

Switching it up for the exam, I instead asked them how a perceptron model might explain tip-of-the-tongue states (as explained on SciShow) – and was promptly surprised that many students didn’t get the connection. I later realized that while students could do the math, they didn’t understand how the math relates to cognitive science; they know what to do with the numbers, but don’t understand what they mean.

Which is where the Math with Bad Drawings post comes in. The author talks about how students can get the right answers (in this case, the mathematical operations) without really understanding what is going on (in this case, why we’re studying perceptrons in a cognitive science class). The most interesting quote, however, is his suggestion that students can always do this, no matter what assessment method we us. “As for our tests—no matter how well-intentioned, no matter how clever and fair, there will always be a back-road to the right answer. There will be something to memorize—a procedure, a set of buzzwords, whatever—that will function as a fake ID, a convincing charade of understanding.”

Which is what made me think of my Topics in AI course. 80% of the grade for the course is from the projects, with another 10% coming from participation and peer evaluation. The last 10%, however, is what I called a “grad student chat” – essentially, they come in and have a 30-45 minute conversation with me about what they learned. I stole this from a Northwestern professor of mine, who made students do the same in his compilers course. He called them “code walks”, where they actually talked about the structure and design of their compilers; you can peak at how he does them in his current courses. (This is actually more elaborate than when I did them; notably, other students are now doing part of the evaluation, and are themselves evaluated on how they evaluated.) Most of my students didn’t code, so our conversation was at a level up, but the idea is roughly the same. I even have them give themselves a grade for the conversation, before I reveal what grade I will actually give them.

There are several downsides to this method of evaluation – notably, it’s somewhat subjective and extraordinarily time consuming – but I find that it works well in general. The resulting grades roughly correlate with my perception of the students, and I get to see them think through answers to unfamiliar questions (and nudge them with hints if I have to). I like to think that it’s less stressful than an exam as well, and also provides students more flexibility in terms of scheduling.

But, to go back to the church of the right answer – can students cheat through a code walk or this kind of chat/interview/whatever? I suppose due to time constraints it’s possible that they are not asked about a topic on which they are weak, but the same could be said for any assessment, and it’s not really cheating. It’s hard for me to imagine what it would mean to cheat through a conversation, given that the questions are really hard to prepare for, and I can always push students for a deeper explanation if I think they’re just faking it.

Which raises the question of why it’s so hard to get away with just a surface understanding. You could argue that a PhD defense is this format for the same reason. When I first contemplated this post, my thoughts immediately jumped to the Turing Test, which also uses conversation as the medium, but for an entirely different reason (because it is a sufficient demonstration of intelligence – at least, that’s how the argument would go). Although, now I think about it, maybe the underlying reason actually is the same: there’s something about the potential breadth and depth of a conversation that makes it amendable as a measure of both topical understanding as well as intelligence. I still can’t put my finger on what it is about conversations that give it this property though.

Step 31: Consider a (Perfect?) Holistic Assessment

Step 30: Build Metacognitive Skills

I harp a lot about how computer science is a great approach to thinking about representations. This semester, however, I’ve started thinking about how computer science is also fertile ground for getting students to reflect on their own work. I seem to recall a study about how metacognition training doesn’t really work (or rather, that it falls back to pre-training levels very quickly. In this case, though, I don’t seem much opportunity cost in including it in the class.

One thing I’ve been doing in my assignments, especially as the semester went on, was to include questions that are about how students think they did and justifying that answer. In the Bayesian network homework, for example, one of the questions was about creating evidence and explaining why the resulting posteriors make sense. A follow up question then asked them to find posteriors that don’t make sense, then consider whether it’s their network model that’s incorrect, or if it’s their intuition that is incorrect. The answers okay – nothing great – but at least I enjoyed the experience.

The NLP assignment, that was due just several days ago, did something similar. It was one of the more “standard” computer science assignments, creating an information extraction program. Among the questions they have to answer, however, is one that asks them to give their own program a score. In a normal computer science project, of course, there would be some set of hand-curated answers, and students’ work would be compared against it. In this case, because the dataset is new (and because I’m lazy), I didn’t create such a solution.

It was interesting to see the responses. I didn’t specify what the score meant – only that it should be between 0 and 100. Many students gave their program fairly high scores, but justified it only vaguely, by saying that it does “well” on a “majority” of courses. The point, of course, is for them to think about how they would actually measure this, and what constitutes success. For example, how do they incorporate the false positives and false negatives into their score, if at all? Does the score even represent the percentage of courses they got correct, and if so, which courses from which departments? Very few students even considered these questions in their answers.

The questions from both assignments, especially the NLP one, highlight something in computer science that I’ve only otherwise found in design: the general lack of specified goals, and therefore the freedom to select your own metric. Building models in the sciences have some of this, but you’re still matching your model to data. In design, on the other hand, the explicitly metric for evaluation is often unclear. Consider user interface design: you could optimize for least clicks to goal, least time to goal, least time on site, or any of a hundred other metrics.

Because of this, students to need quickly ask themselves what exactly they are trying to accomplish, and if what they’re doing is the best way towards that goal. I’m not sure if I can incorporate this into my intro course next semester, but I do plan on requiring students to do more self-evaluation in the future.

Step 30: Build Metacognitive Skills

Step 29: Design an Autograder

The best answer to the question, “What is the most effective method of teaching?” is that it depends on the goal, the student, the content, and the teacher. But the next best answer is, “students teaching other students”. – Wilbert J. McKeachie

One thing I’ve been working on – in the copious free time that I have – is an autograder. I’m mildly surprised that Oxy does not already have one, since we have existing computer science courses, but we don’t. I’m not writing the autograder for fun; there are a number of open source/free autograders out there, and I also have access to the autograder that Michigan uses. No, the reason I wanted/needed to write my own is that none of the existing autograders do what I want.

My idea is to build an autograder that allows students to use the previous submissions of other students. As per the McKeachie quotation above, I think students can learn a lot from using other student’s work. I did a little bit of that this semester, when I workshopped the papers from a blotched assignment, but the potential for an actual programming class is much higher. I am currently planning two things:

  • Students first write test cases, then write programs that must pass each other’s test cases. One of the problems we faced at Michigan was getting students to write test cases (for details, see [1]). Instead, the idea here is students have an earlier deadline to submit test cases, then when they submit their code, it’s run against the entire test case collection (which they will have access to). They get points not only for writing correct code, but also for submitting test cases that other students fail. I’m still a little fuzzy on the exact point system, but I am definitely deducting points for incorrect tests.
  • Students first write a library of functions, then write programs that use someone else’s library. This is something I’ve never heard done anywhere, but it seems like a good way for students to learn to write good code (and to experience bad ones). Aside from the obvious lesson about coding style and commenting, there is also a deeper lesson of having good representations. I suspect students will be completely surprised that other people represented things entirely differently, and furthermore, that the API can completely change how easy or hard it is to do something. I am also unsure of the exact grading scheme here, but again the ease-of-use of the library should be part of the grade.

These are the two main ones I can think of, although other crowd-sourcing techniques can be used as well (eg. having students come up with small coding questions, then solving each others’ as homework).

On the autograder side, the main accommodation is that the instructor needs to be able to specify which projects depend on which previous ones, and to specify which students can see which other students’ files. There may or may not also be settings to allow instructor test cases, which are hidden from the students – this would then require them to come up with good enough test cases to help each other get pass my tests. (Side note: the dependencies are not just a tree, but a directed acyclic graph – I can imagine having students write both the test cases and the libraries, then using those for the final product. I can also imagine a library that’s general enough to be used in multiple projects, again with an eye towards the issue of data representation.)

I’m calling this project the Demograder – demo-, from Ancient Greek demos, meaning of or pertaining to people, a la democracy. I have barely started the coding – using a Django middle layer with some scripts for the actual compilation and running. If anyone has any cool ideas for how this could be used, I would love to hear them.

[1] The system at Michigan had the staff write buggy code, then gave students points if their test suite would produce different output between the buggy code and the correct code. There are several issues with this. The first is that students don’t get to see what the buggy code is, and finding bugs is really specific to the program in which said bugs exist and to the representation used. So students were sort of shooting in the dark as to what would be a complete enough test case (and I was devious enough to have a test case that almost no one found). The second issue is that this kind of “testing” only works if the correct code already exists – great for regression testing, not so great for developing new software (which was the case for the students).

This led to the third issue: students saw the test case and their code as completely separate parts of the assignment. Since they had no output to regress upon, they would have to manually look through their test cases to see if all the outputs were what they expect. This is tedious and error-prone, and I don’t know of a single student who went through this process. The result is that students are barely testing their code at all, the opposite effect of making them write test cases.

Step 29: Design an Autograder