Step 24: Review Learned Pedagogical Content Knowledge

(Aside: I’ve been writing in 12 week blocks then taking a week off, meaning there won’t be a blog post next week. This pattern is nice because there are roughly 52 weeks in a year, and 52 / 4 = 13 = 12 + 1.)

I’m going to spend one more post thinking about teaching computer science to cognitive science students. I know this topic is super niched and probably irrelevant for most people, but this is the first time I’ve seriously thought about how I teach computer science to students who may not care (ie. outside a computer science course).

  • One issue I ran into is how to label abstract things like states. Computer science students understand that if I call something “State 1”, the “1” is just an arbitrary symbol – I could have equivalently called it “State A” or “State alpha”. This is second nature to me, but I now know that some students may treat that “1” as the number 1, then try to use it in calculations. “State A” doesn’t help either when I’m using algebraic notation. I’m not sure that I need to change how I teach as much as being aware of the possibility of a use/mention confusion.
  • In teaching algorithms to students, it’s useful to pick a particular presentation format. This is related to the misunderstanding above. Computer scientists know that how we manipulate data is unrelated to how we present that data; a graph search is a graph search whether I use a table to calculate the cost or whether I draw out the graph with the cost written in the nodes. But other students may not understand this, and switching representations may cause confusion. One argument would be that I should show multiple representations while teaching, so students can learn the difference between algorithm and presentation; the other argument would be to pick one presentation format and stick with it. For now, I’m going with the second approach, which is a little bit less work (aside from making sure the single format is understandable), but in future I might try multiple formats instead.
  • I don’t know how to generalize this, but make sure students know the difference between “the vertical difference” and “subtract the first y-value from the second y-value”. The former is always positive, while the latter could be negative. I’m still not sure how to teach this properly, since I defined my terms precisely and gave examples, so this may be simply another “watch out for it” error.
  • On a related note, students often confuse the function of something with the exact algorithm that accomplishes that function. The specific example I ran into is for the heuristic in A. I explained how A works in a grid world, where the heuristic is the Manhattan distance to the goal, then explained how A* works in a sliding puzzle, where the heuristic is the sum of the Manhattan distances of each tile to their final location. This is not a distinction that occurred to me before this semester, but the Manhattan distance is just a procedure – like addition – and is not related to heuristic in any real way (other than that, in these two cases, calculating the heuristic requires using the Manhattan distance). I had also shown a third example with word ladders, where the Manhattan distance doesn’t apply, but somehow the equivalence of “heuristic” with “Manhattan distance” still sticks. Maybe next time I should show the same domain with two heuristics, just to see how students would react.
  • Speaking of A, one unintuitive aspect of the algorithm is that the node to be expanded jumps between different paths. The interesting thing is that students understand this for sliding puzzles, but not for mazes, where students would say that A just expands along the solution. You can argue that this means students don’t actually understand the algorithm, but I would instead attribute this error to not being able to ignore “privileged” information. That is, to students, the solution is “obvious”, so they think that the algorithm will do the same thing – without realizing that the point of the algorithm is to discover the solution in the first place. This would explain why students get it in the sliding puzzle (where the solution isn’t obivious) but not in the maze.
  • Finally, students have trouble understanding the difference between the most efficient solution to a puzzle, and the most efficient way of finding that most efficient solution. My example is again A and why it should be preferred over uninformed greedy search – both algorithms will give you the same, best solution, but A would give it to you faster. To be fair, I should admit that I screwed up here. The actual question that I gave the students was “Why is the heuristic useful?”, and only when I started grading did I realize that the answer I wanted (algorithmic efficiency, not solution efficiency) wasn’t something the students would understand. So now I know not to ask that question.

I’ve had the opportunity since realizing these mistakes to teach a different algorithm (back propagation on a perceptron), and at the very least I picked a presentation format and stuck with it. I do think it helped, but that algorithm is also simpler than A*. I will be teaching this course again next semester, and I’ll report back to see if I can overcome the other issues.

Step 24: Review Learned Pedagogical Content Knowledge

Step 23: Respond to Peter Gray’s Free to Learn

Free to Learn on Goodreads

This post will hopefully be the start of a continuing series of posts giving my reactions to books I read, spread over both of my blogs depending on whether they are about education. I am calling these “book responses” instead of “book reviews” because, unlike most book reviews, I will not be talking about whether the book was well-written – that I am writing a response at all should answer that question. Instead, I will be writing about the connections that I made while reading the book, and the thoughts that those connections led to. So let’s get started!

The main premise of Peter Gray’s Free to Learn is that play is underrated in the current education system. While he does not give a concise definition of play, Gray’s position might be adequately summarized as the belief that children, if left to their own devices, will educate themselves. This education process is called “play”, and since it achieves the same benefits as the traditional compulsory schooling system, while being more enjoyable besides, it is therefore the preferable alternative. The book uses two main case studies: that of children in hunter-gatherer societies, and that of students at Sudbury schools. The benefits, however, may be boiled down to three “natural drives” that children have, and which are suppressed in school:

  1. Sociability: children tend to play with others, including children of different ages
  2. Curiosity: children want to learn about the world, and develop individual interests while doing so
  3. Playfulness: children enjoy learning, and often have fun doing it

One of the first questions I asked myself while reading this book is whether the general thesis is still true of higher education, as Gray is mostly addressing the K-12 system. For one, college is certainly not compulsory, at least not in the legal sense. One could argue, however, that the bachelor’s degree holds a cultural importance that pressures young adults to get one regardless of whether they want to. From this perspective, what matter is not whether people are required to go to college, but whether they feel compelled to despite how they might feel. Regardless, Gray does not give a definition of the existing schooling system either. So instead of asking whether young adults should play instead of attending college, I’m going to ask a different question: how can college classes beneficially incorporate the natural drives listed above?

Below is my attempt at answering that question, going down the list from the most developed/easiest to include, to the least developed/hardest to include.

First, sociability. According to Gray, children learn best when in mixed-age groups, for several reasons. For the younger children, they get to observe and imitate others; for the older children, they get to exercise their knowledge in teaching; and for both groups, they learn mutual respect and conflict resolution. In general, the benefits of groupwork in education is well known. Reading about how older students mentor younger ones, I was specifically reminded of a workshop I attended on cooperative learning, given by Richard Felder. Felder is much more specific about how groupwork should be structured – such as how every member must depend on each other to succeed, but at the same time be individually accountable mastering all the material – but one idea that stuck out is the important of forming the right teams, and particularly the use of students with a range of abilities. Presumably taps into how we learn by observation, and how we enjoy helping others achieve mastery as well.

One difference I found interesting between Gray’s playgroups and Felder’s teams is that Felder recommends the instructor assigning the students into teams, while Gray stresses that students benefit from choosing who they play with. One explanation for this may be the setting. Cooperative learning was presented as a strategy for the classroom, where students receive grades at the end, and therefore equality in ability between groups is necessary. Play, on the other hand, has no final evaluation, and so students may benefit from the comfort of deciding whether they want partners who struggle as much as them, or partners who may provide them with help. I suspect, however, that the root distinction lies not in that, but in the degree to which we trust students to do what is good for them. Without meaning insult, one might say that Felder is pessimistic and that he believes that students need supervision in team selection, and that Gray is the opposite. Trusting student decisions is a recurring theme in the book, and one that I had a lot of trouble accepting.

Regardless of whether Felder’s exact recipe for cooperative learning is followed, groupwork has become widely used in education, and perhaps its success could be attributed to the psychology that Gray described.

The second drive that Gray presents is curiosity, by which he means the desire to understand the world around them. When asked, most teachers would of course say that they want their students to be curious; if pressed, however, they would likely also admit that students should not go off track in class, which seems counter to the fostering of curiosity. Perhaps a better name for the drive that Gray is describing is that students should choose what and how they learn. This may include the choice of who they work with, but let’s restrict the topic to what and how they learn for now.

Again, in some ways, American colleges have already incorporated choice into the system: consider how students can often choose from multiple electives and, at a higher level, select majors (and minors) that interests them the most. This contrasts with the British university system, where although students do have the choice of the single major (often with no ability to go back on that choice), minors are non-existant and electives are few and far between. Furthermore, depending on the course, students of different ages and abilities could end up working together, thus achieving some of the mixed-age/ability groups that was described above.

The way Gray describes student choice, however, extends beyond the choice of courses. For one, neither hunter-gatherers nor Sudbury had formal classes; rather, students learned what they wanted in a free-form setting. Gray emphasized how the idea of grades in K-12 is a result of the industrial mindset, that children are products moving in an assembly line, and that this is driven by the need for efficiency (ie. it is easier to teach children in batches of roughly the same age). It occurred to me, however, that college courses exist for the same reason; if we had unlimited resources, having a predetermined syllabus for 15 weeks would not make sense, since we could instead adapt to each student and what we think they need to know. A course syllabus, in other words, is the result of averaging across the needs of the thirty students in one course, necessary only because we cannot teach thirty courses each with one student.

One possibility for increasing the amount of student choice in a course is therefore to poll students for what they want to learn, instead of fixing the topics of a course beforehand. The Sudbury schools call it self-governance, a way of increasing student buy-in; I prefer to call it meaningful action, after one of the categories of human information need in the reasonable person model (as proposed by Stephen and Rachel Kaplan, the former with whom I took two classes). The idea is that people like to know they matter, that their opinions and actions make a difference. By providing an avenue where the desires of students can be taken into account, students are more likely to accept and respect the resulting plan. In fact, aspects of this was included in Felder’s recipe as well; students are encouraged to create a team contract amongst themselves that specify their expectations of each other, as well as how conflicts should be resolved without the last resort of talking to the instructor.

Giving students choice may be beneficial, but in my experience it is rarely done in the college setting, at least not in the extreme “decide what we should do in this course” sense. One can speculate a number of reasons: that it requires more preparation that must be done during the semester, that students’ suggestions may not be feasible to implement, or that students may not know what topics to suggest in the first place. For me, this last concern hints at the underlying tension I feel in allowing student choice, and it is a question that persisted throughout my reading of Gray’s book: how do we ensure that students choose to study the necessary core skills and knowledge? And here the issue of trust returns. Gray’s argument, I suspect, would be that if the skills we want students to learn are in fact as necessary as we say they are, then students will choose them because it lies on the path to what they want to accomplish – with the implication, of course, being that if they are not truly necessary then we should not force students to learn them. One wonders whether this is consistent with, or whether it goes against, the philosophy of a liberal arts education.

For the most part, I am sympathetic to Gray’s argument, or at least to the belief that the instructor should be able to justify what is taught beyond “you will need it later”. I do think, however, that students need to be guided to understand the consequences of their choices, and I’m still learning how to provide this guidance without influencing their decision.

The third and final drive is playfulness, about which Gray cites research that suggests it helps problem solving and creativity. I don’t doubt that this is true, but what was more interesting to me is that play seems so hard to define. “Play is serious, yet not serious” – consider a pickup basketball game, where the stakes are low, but players still try their hardest to win – until one team is clearly weaker, in which case players may switch sides to balance things out. Gray goes as far as to say that playing – or rather, playfulness – is entirely about motivation and mental attitude, and is not about the specific behaviors at all (although it’s hard for me to imagine a group of competitive basketball players consent to trading a teammate for balancing reasons, not without themselves only playing at the time). The one characteristic of play that I do agree with is that it seems to be more about the means than the end – playing basketball is different from playing at basketball, which is why NBA teams don’t switch members in the middle of the tournament game.

The strongest connection I made to playfulness is not to education literature, but to James Carse’s Finite and Infinite Games, which I have written about elsewhere. Finite games, in Carse’s book, are games which have predefined rules and a specific goal, where players often rank themselves based on how well they do after the game ends. Infinite games, on the other hand, have neither goals nor rules – or rather, both are flexible and are adjusted for the purpose of continuing play. Of course, these are not so much descriptions of specific games as they are descriptions of how players play specific games, just like the comparison of an NBA match versus a 3-on-3 pickup at the gym.

The idea of infinite games has come to be central to my philosophy, and since it’s hard to do it justice here, I will instead return to the topic of education. Taking for granted the assumption that playful students are a good thing, the question then becomes how an educator can induce playfulness in the classroom. This question is almost paradoxical: most descriptions of playfulness suggests that you cannot be forced to play and that, in the words of Carse, whoever must play, cannot play. I sometimes tell my students to take a coded up example of something, then go and play with it – but the results are never quite what I want. It has never been clear to me whether it’s because the students are not truly playing (because I had told them to play), or if they lack the background knowledge to play at the level I want (which, again, sounds paradoxical). It’s like how Robert Pirsig described teaching rhetoric in Zen and the Art of Motorcycle Maintenance (emphasis mine):

He became convinced that all the writers the students were supposed to mimic wrote without rules, putting down whatever sounded right, then going back to see if it still sounded right and changing it if it didn’t… But how’re you to teach something that isn’t premeditated? It was a seemingly impossible requirement. He just took the text and commented on it in an unpremeditated way and hoped the students would get something from that. It wasn’t satisfactory.

I suspect the answer is that you can’t teach students to play, not directly. It’s a culture thing, a feature of the environment. It’s why in teaching programming we often ask students to “just try it and see what happens”. Sure, they learn by doing, and that’s often the excuse teachers use to justify telling students to try things. But there is also the willingness to explore, to temporarily give up whatever problem they’re immediately trying to solve and learn something new, the playfulness – with the hope that, eventually, the student will just try things and see what happens and play with the technology. It’s one way of approaching it, but “it wasn’t satisfactory”, and at times teaching playfulness seems to require setting up the entire environment in which the student grew up.

I’m making less and less sense, so let me just say that I agree with Gray on playfulness being a desirable virtue. There do seem to be teachers who can maintain a playfulness in their classroom. I don’t know whether or not the students become playful after interacting with those teachers – nor do I have a good metric to answer that question; but having a playful classroom is better than not having one, and that seems to be one step forward.

So that’s the three “natural drives” Gray started out with. To the extent that I think these drives exist, and that satisfying them makes for happier and more productive students, I think they have to be taken into account when considering the classroom. Some of these, like sociability and to a lesser extent curiosity/meaningful action, have crept into the standard repertoire of teachers; others, like playfulness, remain ill-defined, never mind any kind of standardization. What Gray’s book does well, I think, is linking these existing fragments of thoughts into a coherent narrative about play, and it serves as a great mid-semester reminder of some of the things I want to achieve in teaching.

Step 23: Respond to Peter Gray’s Free to Learn

Step 22: Learn to Teach AI to Cognitive Science Students

I think I’ve figured out what the problem is.

I have written multiple posts already on the things I learning about teaching cognitive science: how people use readings differently, how we team-teach to provide different perspectives, and so on. The most recent development is that I spent a lecture on the A algorithm, which we teach to give students a perspective on how computers solve problems. (Aside: the course used to teach depth- and breadth-first search, but I figured that A is intuitive enough that students can get the idea. A* was also the topic of my teaching demonstration, so that was mildly amusing.) Then we assigned homework, which made students follow several steps of the algorithm by hand. Keep in mind that this is the first real graded work that we have received from students.

And students all failed.

No, that’s not true, but the distribution for that question is definitely more uniform and longer-tailed than the distribution for the other questions. Some students performed best-first search instead (without knowing it), while others explained their answers with “intuition” without providing the calculations. I had anticipated that the scores would be lower, but not to this extent. This is especially since I thought the lecture went well, and students seemed to follow along or even preempted me on the introduction of the heuristic. Maybe one student out of the fifty had questions during office hours, so I though it wasn’t a big deal.

While this was going on, I also got back an assignment I gave to my Topics in AI students. To accommodate students who do or do not know how to code, I gave them two choices. The first is to implement a lambda-Q-learner (ie. with eligibility traces), then test its performance on different domain and agent parameters. For the other students, their assignment was to describe a real task in reinforcement learning terms, by specifying what the environment consists of, how they would represent the state, what actions are available, and what the rewards would be, and justifying all of the above. The intention was for students to recognize the cases that cause problems with reinforcement learning (eg. large state spaces, exploration/exploitation) and use them to explain their choice.

As is the theme for this post, the assignment didn’t go as planned. Their representations of state are often valid – discretizing GPS locations into a larger grid, for example – but what they didn’t do is explain why they did this, or what they are trading off when they make that decision.

Thinking about both of these failures, what I realized is that I was missing some crucial pieces of pedagogical content knowledge. Or rather, there’s something that the idea of pedagogical content knowledge doesn’t capture: the background knowledge and training of the students. This is not surprisingly in any way – one of the first things I tell knew teachers is to consider what students already know – but I feel this is a different kind of knowledge that I have failed to account for. It’s not the students’ lack of understand of A* that matters, but they’re unfamiliarity with the methods of computer science itself.

For example, one explanation for students performing best-first search instead of A is that they don’t understand the technical definitions of “cost” and “heuristic”, since this is not a precision necessary for normal essay writing. Similarly, some students were confused about the Manhattan distance as a calculation, versus its use as a heuristic. This is a difference that has never occurred to me before, how these are actually different, and the Manhattan distance is in fact used differently in different problems – say, between a grid world where the Manhattan distance is used directly, versus in a sliding puzzle where the Manhattan distance is used on each tile*, with the heuristic being the sum of these distances. Again, this is not a mistake I would have anticipated before looking at student work.

My students in Topics in AI have a similar problem, if only on a more abstract level. I take more fault here – I should have gone through an example first, and also have been more specific on what questions they should be answering. In this case, though, it was the mental habits that the students have yet to build, of asking the questions that computer scientists tend to ask. Of course, this is exactly what computer science is about – representing the relevant information so that algorithms can take advantage of it. However, I have yet to figure out how to explain this to students, never mind teaching them to think about the representations. I wonder how long it take students to internalize asking these kinds of questions.

Now that I have some experience teaching cognitive science students, I can start adapting my existing pedagogical content knowledge. Although I’m not happy to have made the mistakes I have, I also think it’s fascinating that I still have so much to learn about teaching computer science.

Step 22: Learn to Teach AI to Cognitive Science Students