Step 45: Include Diversity in Class Content (part 2)

One aspect of computer science I really like, but that I don’t get to talk about often, is that it makes clear the map-territory distinction. What we represent in a computer program (the map) is necessarily a simplification of the real thing we are trying to represent (the territory). This means we must pick our representations carefully, so that everything we want to capture can be expressed with that representation. The opposite problem – of having too expressive a representation – is only a problem if efficiency is necessary.

I think the general idea is something that even beginner students realize relatively quickly. In my previous post about diversity, I asked students to verify that a chapter and verse number from the Quran is valid, which means that their program (the map) must represent the chapter and the verse (the territory). This example is somewhat trivial, and even my current students will dismiss it as such. What I don’t think they realize, however, is that the principle runs deeper than that, and their choice of representation may have consequences that they did not foresee.

I decided to try and get this message across in the lab last week. We just covered classes and exception handling, so the first part of the lab asked them to write a program to survey the visitors on campus, and having functions that verify their input (name, age, gender, phone number, etc.). I helpfully provided short vignettes like this one:

  • Rudolph Feierabend has lived in Eagle Rock/north-east LA since 1961, when his parents immigrated to the US, bringing then-seventeen Rolf in tow. He has been retired for a couple years now, and takes his husky lab Winston on walks through campus every so often. Although he has an email account, Rolf says he doesn’t check it anymore and should just call him at home (323) 827-6316.

That was about the level of instruction I gave to students, leaving it up to them as to what they should check for. The object-oriented programming was straightforward, while students had to think through how they would verify the various fields. Some students made sure that names do not have punctuation, some students checked that a phone number had 10 or 11 digits (depending on whether the user typed “+1”), some students asked for the city and state of origin separately. Fairly standard programming exercise, took 1.5-2 hours, with just enough creative freedom to not be boring.

Once students showed me their code works (with me playing devil’s advocate on what they may want to rule out), the second part of the lab springs the trap. I gave students two new vignettes:

  • Chun Ying Tsang (20) is an exchange student from Birmingham in England. He is here for the summer program, and only just moved into his dorm yesterday. Although Chun Ying has an email address (, he only has his UK cell phone +44 075 9921 9264.
  • Ash Reid-Chapman (33) has been invited to speak at an event co-hosted by CODE and Project SAFE. Drawing on personal experience growing up in Portage, MI, Ash will be talking the discrimination faced by the transgender community, and how allies can support those who are transitioning. Due to previous harassment, Ash refused to give out a phone number or an email, instead directing you to the National Center for Transgender Equality website.

The twist, of course, is that people are more complicated and diverse than students thought. My original vignettes deliberately depicted a homogeneous sample of visitors: single-word first and last names, easily identifiable as male or female, all with US phone numbers and places of origin. I had to be a little evasive with how I answer student questions – asking if they themselves would be happy with their checks if they needed the data – but everyone made at least one assumption that didn’t hold. The verification for gender caught the fewest students (although one or two pairs did enforce a binary gender), while the verification for phone number caught most groups (some groups allowed anything with seven or more digits). The hyphenated last name caught some groups that only allowed letters (I should have had a name with an apostrophe as well), and I don’t think a single group allowed the user to not provide an answer.

The lab actually goes one step further, by asking students if they can think of other kinds of information that may be distorted. Many students picked up on the issues of race (ie. what if you are bi-/multi-racial?); other students found more interesting examples with, such as how Facebook limits the types of “reactions” you can have, or how filters for swearing could accidentally ban real people/location names (aka. the Scunthorpe problem). The last question in the lab asks generally what students learned about computer science. From reading the students’ answers, I’m not convinced that even half of them realize how deep this map/territory disconnect could go, but I do think that at least some students understood the difficulty – or even the impossibility – of accurately representing the range of diversity that exists, or of not imposing any assumptions on the data you might get. And this is not even touching on the tradeoff between data validity and data representativeness.

I tweeted the night before that this may be my best lab yet, and I stand by that assessment now. This is an exercise that I’ve wanted to do since the beginning of the semester, and I’m really happy that it turned out well. To me, this is exactly the right mix of technical knowhow and broader impact that a computer science course (introductory or not) should have, especially in a liberal arts setting.

The instructions and code for this lab can be found on GitHub.

Step 45: Include Diversity in Class Content (part 2)

Step 44: Build the CS Habit of Testing

One of the things I promised my students I would do – and which I finally did after coming back from spring break – is that I would limit how many times they can submit to the autograder. At the beginning of the semester, students had unlimited submissions. The idea was that, since they were just learning programming, it was more important that they get practice with the concepts of variables and branching and looping. At the time, it was not unheard of for a student to submit to the autograder ten times in as many minutes. They would wait until they see the first failed test case – they see the results come back live – then tweak their code (intelligently or not) and submit again. Not everyone did this, but many students did take seven or so submissions before passing all the tests.

About a month into the semester, I started making the students write test cases. This was mostly in lab, with instructions that they must write and submit the test cases first before they are allowed to work on the actual program. The test cases then become part of their grade, based on how many other students’ bad code the tests catch. I don’t have a good impression of the number of autograder submissions (and I’m too lazy to look now), but I do remember students actually running their own test cases to see if their code works. I didn’t tell the students how many test cases they should write, only that they should write enough so that, if there is a bug in their code, they should be confident that a test case will fail. It took half a lab before students experientially understood what I meant, and some students needed more prompting to use their own test cases than others, but in general students adopted the idea of running test cases first before submitting.

Then, for the homework due this week, I told them that they only had five submissions. Period. The assignment wasn’t a trivial one – it required a good dose of problem solving/sequencing, and also reasoning about loops. Like I said above, I have threatened to do so for a while now, and although students were initially apprehensive of when it would finally happen, I don’t think I heard a single complaint when I announced it. This submission limit was even more restrictive than Michigan’s, which was two submissions per day. But, as of this writing an hour away from the deadline, more than half the class has passed all the tests, and none of them used more than five submissions, with many doing it in their first or second attempt.

One time is not a habit, but for now I will say that I am really happy with how this turned out. The weird thing is that I’m actually not sure which components led to this result. I don’t feel like I have done anything special, and yet clearly something is different, because most Michigan students never took their initiative that far to test each function they wrote. I was talking to another faculty today, who had previously taught at UCLA, and he said that the students here were more driven than the undergraduates at UCLA, and I am inclined to agree with him. I feel as though the students here are more willing to take that extra step, and not just do what you literally tell them then stop. Maybe it has to do with building trust as well – much easier in a class of 40 than in a class of 140 – so students are more willing to try what you say and see if it works out.

Again, I’m not sure what I did, or if I did anything at all, but I’m elated that it is working.

Step 44: Build the CS Habit of Testing

Step 43: Advise Students Unofficially

I feel competent in most parts of my new job as a professor – I have taught before, I have done research before, even if they were in a different context. One role in which I feel inadequate, however, is being an advisor to students. First-year faculty at Oxy do not officially have advisees, so we can learn the curricular requirements first. As one of two computer science professors, however, I inevitably get questions from students about whether and how they should continue with computer science. This number has recently increased as I meet students who have only just discovered computer science through my new intro course, and while the students I’ve built a relationship with are also coming to ask about careers and grad school.

For grad school – or at least PhD programs – I feel fairly comfortable with my advice. I had the opportunity to mentor an undergraduate for research during my last year of grad school, when they asked me whether I think they should get a PhD. I didn’t know how to advise them at the time, and although they ultimately settled on Master’s programs instead, my insufficient mentorship stuck in my head. Getting a PhD is a grueling process, and financially it is almost never the right decision. I asked for suggestions from some friends who have taught for a couple years. The best rule of thumb I’ve heard is that the student must be able to succinctly summarize why they need a PhD for what they want to do, at the level of “I want to be a professor”, and not just “I want to be on the cutting edge of human knowledge”. This rule will exclude some potentially brilliant researchers, but it seems to be a good first-order approximation. I haven’t had enough students for this rule to apply.

The more general category of career advice is actually my favorite. I enjoy hearing what students want to do with their lives, the long-term plans they might have and why they are important. Most students who come to me for career advice are interested in technology in some form or another, but part of the fun is seeing the diversity of interests. I have talked to a student who is dead set on joining or creating a startup, while a different student is interested in social justice and data journalism. While I don’t have any inside scoop on either of these fields, I have a rough idea of the skillsets that these careers require, and can point students in the right direction or help them decide if it’s what they really want to do.

Strangely, I am the most anxious about advising students on how to continue with computer science. On one hand, as a faculty trying to build a new department, I want more computer science minors to show demand. On the other hand, I think my target student population for introductory computer science courses is exactly students who would not become minors or majors. I would rather see arts, history, and biology majors doing computational work in their fields than see them switch to computer science. I often end up telling students to forget about minor requirements and just taking whatever classes interest them (a strategy I have mentioned before).

The flux of the computer science curriculum does not help. We barely have enough faculty to teach all the existing courses, and there are more courses I personally want to teach that we cannot offer. As a result, I have suggested that students do an independent study with me as their next step, since the course they actually want to take has not been created yet. This is beneficial for the student, who would get the education they want, while also giving me experience on how to run the full course in the future. This policy probably makes some faculty unhappy, and definitely skews various statistics for institutional research, but I can’t think of a better way to compensate for what is essentially our inability to meet students’ needs.

Starting in the fall, I will have my own group of freshmen advisees, and will likely be advising new cognitive science majors. I will get a small amount of training before then, but I don’t expect it to solve my computer science problems.

Step 43: Advise Students Unofficially

Step 42: Include Diversity in Class Content

One result of the student protests last semester is that faculty are having more deliberate discussions about diversity in the classroom. I am meeting semi-regularly with a group of faculty this semester – a “Faculty Learning Community”, or FLC – to talk about diversity in science classes. This has been a long standing issue, of course, with the physical sciences facing a particularly homogeneous student population, and computer science one of the worst disciplines among those. The goal of the FLC is to talk not only about recruitment, but also about other ways we can retain and support disciplinary minorities.

I have never thought about all the different ways diversity might be worked into the classroom. One topic we discussed in the last meeting is including stories and papers from researchers who are not cis-gendered white men. This may be harder for introductory computer science courses that following a single textbook, but I can see how talking about (for example) Lynn Conway could reject the standard stereotypes. Although this is valuable, the FLC faculty are also interested in including diversity in the content of the course. This is a lot harder, since it is difficult to find room for such “digressions” in a science course – how would racial and social diversity fit into an organic chemistry lecture?

I am lucky in that cognitive science is more amendable to such discussion. Our introductory course does talk about issues of researcher and subject bias (ie. how most psychology experiments are done on undergraduates at highly-ranked colleges) and also of stereotype threat. My own lectures mention how language could change our recollection of an event, which would lead to unconscious bias due to priming effects.

But I want to talk more about computer science. As a computer scientist, I am an odd duck at the FLC meetings. Introductory computer science courses focus on developing students’ computational thinking, which put little to no constraints on the specific examples that demonstrate the principles and abstractions. With the student protests on my mind, I semi-deliberately looked for non-mainstream examples this semester. Surprisingly, I found two without even trying particularly hard:

  • In a homework about if-then statements, I originally wrote a question asking students to write a function to verify Biblical verses. The function would get a chapter and a verse number, then return true only if that verses exists (in the book of Genesis, say). Halfway through writing the question, I realized that there is no reason to use the Bible as the text. I could just as well use the Quran – except for the fact that I myself didn’t know whether the Quran even had chapters and verses!A quick scan of Wikipedia told me that they do in fact exist as surahs and ayat (singular: ayah), and I made sure to check my vocabulary with a student familiar with the Quran. The final homework question my students to look up the number of ayat on Wikipedia:

    Complete the following function, which takes a surah and ayah number (roughly the equivalent of chapter and verse numbers) from Quran, and return True if it’s a valid surah and ayah number, and False if it is not. Although the Quran has 114 surahs, your program only needs to work for the first ten (ie. you may assume the surah argument will be an integer between 1 and 10, inclusive). You may further assume that the ayah argument will be an integer. A list of surahs and the number of ayat it contains can be found on Wikipedia (it’s the number outside the parentheses).

  • The other example was also for if-then statements, this time from lecture. I was looking for a way to demonstrate how a function may not have a return value if students are not careful with their branches. (In C++, this is the “control reaches end of non-void function” warning; in Python, this introduces the special value None.) I eventually settled on first presenting a seemingly-stereotyped function:
    def background_color(gender):
        if gender == 'male':
            return 'blue'
        if gender == 'female':
            return 'red'

    I then asked students what the following call would return:


    I didn’t think of it at the time, but I could have asked students what would happen if the argument was a Boolean variable is_male instead. I would then guide them to realize that non-binary gender identities would then be considered “female”, and that the opposite would be true if the variable was is_female.

In fact, I had originally envisioned the second example to be part of a larger theme in the course, on how the representation of data constrains/influences the algorithm. In the first lecture of the semester, I had talked about how Facebook used to only allow five relationship statuses: “single”, “in a relationship”, “married”, “divorced”, and “it’s complicated”. (I am not sure if this is true, but it’s a good story.) The point is that not all relationships fit neatly into these five categories, and a better representation would be to allow users to type in whatever they want (which I believe is Facebook’s current solution). A related lesson would be how the technology we create have unintended side-effects that may disadvantage particular populations. I didn’t have time to integrate such a theme this semester, but now I have more of the mechanics of the course down, I will try to incorporate it next time.

My point is that computer science classes have many opportunities to address and demonstrate diversity. Although recruitment of disciplinary minorities must occur first, I hope even these small examples will reduce the isolation that non-traditional students feel, and perhaps also open up additional conversation between faculty and students.

Step 42: Include Diversity in Class Content

Step 41: (Fail to) Find Loop Resources

Spring break is next week, so I’m predictably buried in work. I am still working with my students to master loops, and I’m learning a lot about how students think about loops. In education this is called pedagogical content knowledge (PCK) – not just pedagogical knowledge (ie. how to teach), not just content knowledge (ie. how to program), but knowledge of how to teach programming. Here’s a list of PCK I’ve gained in the last three weeks:

  • Looping over indices of a list, as opposed to looping over the elements of a list, confuses students. I think this is one advantage of C++ or Java over Python: there is exactly one way of looping through a list. At least, this was the case before the foreach syntax was introduced to either language; I don’t know if that syntax is used in introductory courses.

    In make sense in hindsight to stick with one kind if loop, but I have a hard time saying which kind it should be. I have read that students find operations on sets easier to understand, but this may get confusing if students have to get things out of a list. I also know a student who prefers looping over indices, but that’s closer to anecdote than data (as though this whole blog is anything but).

  • Related to this, dictionaries should be not introduced until students are comfortable with lists and loops. The relationship between the key and the value, and the intricacies of each (for example, that keys are unique but values are not) only confuse students. In fact, I’ve found that students are more confused by dictionaries than they are by nested loops.
  • While loops don’t seem to be as confusing, although there are not many uses for while loops in Python. The only time a while loop has been necessary is getting verified input from the user, and I provided that for the students. The only other time it has come up was with a Collatz sequence example, but outside of such numerical examples where you wait for a value to converge, most while loops can be written as for loops instead. This means that while loops can be introduced later without affecting what students can do.
  • Students often want to do too much at once. For example, when counting how many times different things occur, they often want to get this number in one step, instead of incrementing the number every time. I suspect this is related to the first point in this list, namely, that Python offers too much choice. Python lists have a .count method that students want to use, even if that function doesn’t do what they want; C++ arrays have no such convenience method. As a result, students learning C++ know immediately that they have to loop, while students learning Python waste time figuring out whether the function is the right one to use.

    The tradeoff here, which I haven’t found the equilibrium to, is how to allow students to efficiently write cool programs while also becoming competent with loops. This semester I created a worksheet that asks students to write their own version the library functions, which gives students practice with loops while conveying that they’ll survive without the library functions. I’m not convinced that this is the best approach.

  • Another reason for doing too much at once is that accumulating values incrementally is unintuitive. Creating a variable outside a loop that keeps changing does not come naturally to students. This applies to the counting example, but also to building up other values (lists, strings, etc.). I’m not entirely sure how students tackle this problem, but it seems to be something about the reassignment of the variable in the loop.
  • A different conceptual block is how to convert repeating code into a loop, and in particular, how to update values in preparation for the next loop. For example, students could write an unrolled program that calculates the Fibonacci sequence, but they would use a new variable for every term, and it takes prodding for them to see how you use shuffle values to turn it into a loop. I’m inclined to say that it’s something about seeing the terms absolutely (ie. the 6th Fibonacci number is the sum of the 5th and the 4th Fibonacci numbers) as opposed to relatively (ie. the n-th Fibonacci number is the sum of the n-1-th and the n-2-th Fibonacci numbers). I don’t quite see how this will help me teach in the future, but now I wonder how high school math teachers introduce sequences.
  • The final difficulty I will mention here is not strictly about loops: students have trouble seeing whether they should be ruling things out or counting things in. In loops, one example is writing a function that checks if a list only contains odd numbers; the idea that they can quit (and return false) when they hit an even number does not occur to them. The opposite scenario (whether a list contains at least one odd number) is equally difficult.

    Outside of loops, this manifests in most Boolean functions. For checking if a year, month, and day is after a particular date, both ruling things out and counting things in is equally hard. For checking if a coordinate is within an area, the shape of that area determines which one is easier – but being able to estimate the difficulty is a skill that students haven’t built up yet. This causes them to struggle with the problem, even if they know how to write the code once the correct approach is pointed out to them.

The last four points in this list are patterns in how loops work – accumulate values incrementally, keep previous values in variables, and quit early if possible. But even after a full hour of Googling, I could not find a single resource that talks about these patterns. This took me by surprise, since we have design patterns for more abstract programming concepts. I use an interactive online textbook for the class, and that doesn’t have a section on these patterns – or for that matter, how to think with loops. Neither do several other textbooks I checked.

Why not? It’s generally acknowledged that the hardest part of programming is not the syntax, but the abstract problem solving necessary to translate thought to code. (I tell my students that if they understand the question, and can get the answer by hand, they know how to solve it; the hard part is slowing down and figuring out what’s going on in their head.) So it seems like textbooks should directly address the thinking necessary – but I haven’t found anything that even touches on how to think about loops. Even worked examples, at least the ones I looked at, only show the solution then explain why it works, but not how the author came up with the solution in the first place.

Is there a resource I’m missing, or is this a need that has yet to be filled?

Step 41: (Fail to) Find Loop Resources