(Aside: I’ve been writing in 12 week blocks then taking a week off, meaning there won’t be a blog post next week. This pattern is nice because there are roughly 52 weeks in a year, and 52 / 4 = 13 = 12 + 1.)

I’m going to spend one more post thinking about teaching computer science to cognitive science students. I know this topic is super niched and probably irrelevant for most people, but this is the first time I’ve seriously thought about how I teach computer science to students who may not care (ie. outside a computer science course).

- One issue I ran into is how to label abstract things like states. Computer science students understand that if I call something “State 1”, the “1” is just an arbitrary symbol – I could have equivalently called it “State A” or “State alpha”. This is second nature to me, but I now know that some students may treat that “1” as the
*number*1, then try to use it in calculations. “State A” doesn’t help either when I’m using algebraic notation. I’m not sure that I need to change how I teach as much as being aware of the possibility of a use/mention confusion. - In teaching algorithms to students, it’s useful to pick a particular presentation format. This is related to the misunderstanding above. Computer scientists know that how we manipulate data is unrelated to how we present that data; a graph search is a graph search whether I use a table to calculate the cost or whether I draw out the graph with the cost written in the nodes. But other students may not understand this, and switching representations may cause confusion. One argument would be that I should show multiple representations while teaching, so students can learn the difference between algorithm and presentation; the other argument would be to pick one presentation format and stick with it. For now, I’m going with the second approach, which is a little bit less work (aside from making sure the single format is understandable), but in future I might try multiple formats instead.
- I don’t know how to generalize this, but make sure students know the difference between “the vertical difference” and “subtract the first y-value from the second y-value”. The former is always positive, while the latter could be negative. I’m still not sure how to teach this properly, since I defined my terms precisely and gave examples, so this may be simply another “watch out for it” error.
- On a related note, students often confuse the
*function*of something with the exact*algorithm*that accomplishes that function. The specific example I ran into is for the heuristic in A*. I explained how A*works in a grid world, where the heuristic is the Manhattan distance to the goal, then explained how A* works in a sliding puzzle, where the heuristic is the sum of the Manhattan distances of each tile to their final location. This is not a distinction that occurred to me before this semester, but the Manhattan distance is just a procedure – like addition – and is not related to heuristic in any real way (other than that, in these two cases, calculating the heuristic requires using the Manhattan distance). I had also shown a third example with word ladders, where the Manhattan distance doesn’t apply, but somehow the equivalence of “heuristic” with “Manhattan distance” still sticks. Maybe next time I should show the same domain with two heuristics, just to see how students would react. - Speaking of A
*, one unintuitive aspect of the algorithm is that the node to be expanded jumps between different paths. The interesting thing is that students understand this for sliding puzzles, but*not*for mazes, where students would say that A*just expands along the solution. You can argue that this means students don’t actually understand the algorithm, but I would instead attribute this error to not being able to ignore “privileged” information. That is, to students, the solution is “obvious”, so they think that the algorithm will do the same thing – without realizing that the point of the algorithm is to discover the solution in the first place. This would explain why students get it in the sliding puzzle (where the solution*isn’t*obivious) but not in the maze. - Finally, students have trouble understanding the difference between the most efficient solution to a puzzle, and the most efficient way of finding that most efficient solution. My example is again A
*and why it should be preferred over uninformed greedy search – both algorithms will give you the same, best solution, but A*would give it to you*faster*. To be fair, I should admit that I screwed up here. The actual question that I gave the students was “Why is the heuristic useful?”, and only when I started grading did I realize that the answer I wanted (algorithmic efficiency, not solution efficiency) wasn’t something the students would understand. So now I know not to ask that question.

I’ve had the opportunity since realizing these mistakes to teach a different algorithm (back propagation on a perceptron), and at the very least I picked a presentation format and stuck with it. I do think it helped, but that algorithm is also simpler than A*. I will be teaching this course again next semester, and I’ll report back to see if I can overcome the other issues.