Archive for the 'Data' Category

It’s a Poor Workman Who Blames Yogi Berra: Artificial Intelligence and Jeopardy!

Wednesday, February 23rd, 2011

Last week, an IBM computer named Watson beat Ken Jennings and Brad Rutter, the two greatest Jeopardy! players of all time, in a nationally televised event. The Man vs. Machine construct is a powerful one (I’ve even used it myself), as these contests have always captured progressive imaginations. Are humans powerful enough to build a rock so heavy, not even we can lift it?

Watson was named for Thomas J. Watson, IBM’s first president. But he could just as easily have been named after John B. Watson, the American psychologist who is considered to be the father of behaviorism. Behaviorism is a view of psychology that disregards the inner workings of the mind and focuses only on stimuli and responses. This input leads to that output. Watson was heavily influenced by the salivating dog experiments of Ivan Pavlov, and was himself influential in the operant conditioning experiments of B.F. Skinner. Though there are few strict behaviorists today, the movement was quite dominant in the early 20th century.

The behaviorists would have loved the idea of a computer playing Jeopardy! as well as a human. They would have considered it a validation of their theory that the mind could be viewed as merely generating a series of predictable outputs when given a specific set of inputs. Playing Jeopardy! is qualitatively different from playing chess. The rules of chess are discrete and unambiguous, and the possibilities are ultimately finite. As Noam Chomsky argues, language possibilities are infinite. Chess may one day be solved, but Jeopardy! never will be. So Watson’s victory here is a significant milestone.

Much has been made of whether or not the contest was “fair.” Well, of course it wasn’t fair. How could that word possibly have any meaning in this context. There are things computers naturally do much better than humans, and vice versa. The question instead should have been in which direction would the unfairness be decisive. Some complained that the computer’s superior buzzer speed gave it the advantage, but buzzer speed is the whole point.

Watson has to do three things before buzzing in: 1) understand what question the clue is asking, 2) retrieve that information from its database, and 3) develop a sufficient confidence level for its top answer. In order to achieve a win, IBM had to build a machine that could do those things fast enough to beat the humans to the buzzer. Quick reflexes are an important part of the game to be sure, but if that were the whole story, computers would have dominated quiz shows decades ago.

To my way of thinking, it’s actually the comprehensive database of information that gives Watson the real edge. We may think of Ken and Brad as walking encyclopedias, but that status was hard earned. Think of the hours upon hours they must have spent studying classical composers, vice-presidential nicknames, and foods that start with the letter Q. Even a prepared human might temporarily forget the Best Picture Oscar winner for 1959 when the moment comes, but Watson never will. (It was Ben-Hur.)

In fact, given what I could see, Watson’s biggest challenge seemed to be understanding what the clue was asking. To avoid the complications introduced by Searle’s Chinese Room thought experiement, we’ll adopt a behaviorist, pragmatic definition of “understanding” and take it to mean that Watson is able to give the correct response to a clue, or at least a reasonable guess. (After all, you can understand a question and still get it wrong.) Watching the show on television, we are able to see Watson’s top three responses, and his confidence level for each. This gives us remarkable insight into the machine’s process, allowing us a deeper level of analysis.

A lot of my own work lately has been in training school-based data inquiry teams how to examine testing data to learn where students need extra help, and that work involves examining individual testing items. So naturally, when I see three responses to a prompt, I want to figure out what they mean. In this case, Watson was generating the choices rather than simply choosing among them, but that actually makes them more helpful in sifting through his method.

One problem I see a lot in schools is that students are often unable to correctly identify what kind of answer the question is asking for. In as much as Watson has what we would call a student learning problem, this is it. When a human is asked to come up with three responses to a clue, all of the responses would presumably be of the correct answer type. See if you can come up with three possible responses to this clue:

Category: Hedgehog-Pogde
Clue: Hedgehogs are covered with quills or spines, which are hollow hairs made stiff by this protein

Watson correctly answered Keratin with a confidence rating of 99%, but his other two answers were Porcupine (36%) and Fur (8%). I would have expected all three candidate answers to be proteins, especially since the words “this protein” ended the clue. In many cases, the three potential responses seemed to reflect three possible questions being asked rather than three possible answers to a correct question, for example:

Category: One Buck or Less
Clue: In 2002, Eminem signed this rapper to a 7-figure deal, obviously worth a lot more than his name implies

Ken was first to the buzzer on this one and Alex confirmed the correct response, both men pronouncing 50 Cent as “Fiddy Cent” to the delight of humans everywhere. Watson’s top three responses were 50 Cent (39%), Marshall Mathers (20%), and Dr. Dre (14%). This time, the words “this rapper” prompted Watson to consider three rappers, but not three potential rappers that could have been signed by Eminem in 2002. It was Dr. Dre who signed Eminem, and Marshall Mathers is Eminem’s real name. So again, Watson wasn’t considering three possible answers to a question; he was considering three possible questions. And alas, we will never know if Watson would have said “Fiddy.”

It seemed as though the more confident Watson was in his first guess, the more likely the second and third guesses would be way off base:

Category: Familiar Sayings
Clue: It’s a poor workman who blames these

Watson’s first answer Tools (84%) was correct, but his other answer candidates were Yogi Berra (10%) and Explorer (3%). However Watson is processing these clues, it isn’t the way humans do it. The confidence levels seemed to be a pretty good predictor of whether or not a response was correct, which is why we can forgive Watson his occassional lapses into the bizarre. Yeah, he put down Toronto when the category was US Cities, but it was a Final Jeopardy, where answers are forced, and his multiple question marks were an indicator that his confidence was low. Similarly cornered in a Daily Double, he prefaced his answer with “I’ll take a guess.” That time, he got it right. I’m just looking into how the program works, not making excuses for Watson. After all, it’s a poor workman who blames Yogi Berra.

But the fact that Watson interpreted so many clues accurately was impressive, especially since Jeopardy! clues sometimes contain so much wordplay that even the sharpest of humans need an extra moment to unpack what’s being asked, and understanding language is our thing. Watson can’t hear the the other players, which means he can’t eliminate their incorrect responses when he buzzes in second. It also means that he doesn’t learn the correct answer unless he gives it, which makes it difficult for him to catch on to category themes. He managed it pretty well, though. After stumbling blindly through the category “Also on Your Computer Keys,” Watson finally caught on for the last clue:

Category: Also on Your Computer Keys
Clue: Proverbially, it’s “where the heart is”

Watson’s answers were Home is where the heart is (20%), Delete Key (11%), and Elvis Presley quickly changed to Encryption (8%). The fact that Watson was considering “Delete Key” as an option means that he was starting to understand that all of the correct responses in the category were also names of keys on the keyboard.

Watson also is not emotionally affected by game play. After giving the embarrassingly wrong answer “Dorothy Parker” when the Daily Double clue was clearly asking for the title of a book, Watson just jumped right back in like nothing had happened. A human would likely have been thrown by that. And while Alex and the audience may have laughed at Watson’s precise wagers, that was a cultural expectation on their part. There’s no reason a wager needs to be rounded off to the nearest hundred, other than the limitations of human mental calculation under pressure. This wasn’t a Turing test. Watson was trying to beat the humans, not emulate them. And he did.

So where does that leave us? Computers that can understand natural language requests and retrieve information accurately could make for a very interesting decade to come. As speech recognition improves, we might start to see computers who can hold up their end of a conversation. Watson wasn’t hooked up to the Internet, but developing technologies could be. The day may come when I have a bluetooth headset hooked up to my smart phone and I can just ask it questions like the computer on Star Trek. As programs get smarter about interpreting language, it may be easier to make connections across ideas, creating a new kind of Web. One day, we may even say “Thank you, Autocorrect.”

It’s important to keep in mind, though, that these will be human achievements. Humans are amazing. Humans can organize into complex societies. Humans can form research teams and develop awesome technologies. Humans can program computers to understand natural language clues and access a comprehensive database of knowledge. Who won here? Humanity did.

Ken Jennings can do things beyond any computer’s ability. He can tie his shoes, ride a bicycle, develop a witty blog post comparing Proust translations, appreciate a sunset, write a trivia book, raise two children, and so on. At the end of the tournament, he walked behind Watson and waved his arms around to make it look like they were Watson’s arms. That still takes a human.

UPDATE: I’m told (by no less of an authority than Millionaire winner Ed Toutant) that Watson was given the correct answer at the end of every clue, after it was out of play. I had been going crazy wondering where “Delete Key” came from, and now it makes a lot more sense. Thanks, Ed!

Accountability

Tuesday, February 1st, 2011

I was talking to my graduate students about the literacy standards last night, and predictably got pulled off on a tangent about accountability. I found myself making a point that I’ve alluded to before, but it’s worth making explicit now.

Robert Benchley famously said “There are two kinds of people in the world: those who divide the world into two kinds of people, and those who don’t.” I will put myself in the former category when I say that, generally, there are two kinds of people who talk about standards and accountability.

The first believes that anything worth doing is worth doing well. In order to make sure we’re doing the best job we can, it’s important to measure our results, so we can identify areas for potential improvement and apply strategies for intervention where they will do the most good.

The second believes that taxpayer-funded education is one of the evils of socialism and must be eradicated. In order to make the necessary changes, evidence must be gathered that the public education system is a failure, so that arguments to turn education over to the free market will be more persuasive.

And my point was that, when you hear someone talking about standards and accountability, it’s important to know which of these two groups that person is in.

Item of the Week

Monday, January 17th, 2011

In this somewhat new blog feature, I will offer up a question from the statewide examinations that New York City students take each year. The purpose of this will not be for you to try to provide the correct answer, but rather to join me in examining the question. What does it tell us about student understanding? What do each of the wrong answers mean? What is this question testing? What is it really testing? What would students need to know and be able to do to answer this question correctly?

I gave a workshop for data teams on Friday. Three of the groups were examining last year’s 4th grade ELA scores, which I knew meant that we’d be talking about Abigail. In my visits to schools, I’ve found that students who took this exam had a lot of trouble on questions relating to this poem (click to enlarge):

Students had trouble on a number of the questions, but we will just look at one: Item 21 on the 2010 New York State Grade 4 ELA Exam:



The intended performance indicator is “Make predictions, draw conclusions, and make inferences about events and characters,” but we can be the judge of that.

What is this question testing? Does it fit the performance indicator? Which of the wrong answers would you predict students would choose the most often? Why? What would students need to know and be able to do to answer this question correctly?

Item of the Week

Monday, January 10th, 2011

I thought it might be fun to try something new with the “Question of the Week” feature here on the blog. Instead of asking my readers a question, I will offer up a question from the statewide examinations that New York City students take each year.

The purpose of this will not be for you to try to provide the correct answer, but rather to join me in examining the question. What does it tell us about student understanding? What do each of the wrong answers mean? What is this question testing? What is it really testing? What would students need to know and be able to do to answer this question correctly?

Sound like fun?

To differentiate this feature from the Question of the Week, I’ll call this the Item of the Week, which is what we call questions in the parlance of standardized testing.

Today’s item comes from the 2010 New York State Grade 4 Mathematics Exam. The strand is Measurement and the performance indicator is “4.M04 Select tools and units appropriate to the mass of the object being measured (grams and kilograms).” You can click the image for a larger view.

I like the layering of this question. First of all, the student needs to know which units measure mass and which don’t. If they answer A or D, they don’t. But to choose between B and C, students need to have some idea of how much a gram really is.

Sometimes these questions will have distractor answers that use numbers from the problem to try to trick students into choosing them. But there are no numbers in this problem. And all of the answers use the same number.

The trick here is in the first sentence. The fact that Mr. Patel moved his chair across the room is not relevant. But if you don’t know what “mass” means, that first sentence might trick you into thinking you are looking for a distance, in which case you might choose D. This assumes, of course, that you have no idea how long a kilometer is.

All in all, it seems like a pretty fair question that tests what it purports to test. In practice, it turned out to be one of the harder items for New York City students taking this exam.

As always, I invite further discussion.

Question of the Week

Monday, January 3rd, 2011

Last month, I was giving a workshop for principals on Instructional Rounds, a method of structuring conversations about best practices based on classroom observations conducted in teams, when an interesting question arose. I asked them if teaching was an art or a science.

In this context, it was more than just a philosophical question. If teaching is an art, like music or painting, then each teacher should be allowed as much freedom and creativity as possible in developing a personal teaching style. If, on the other hand, teaching is a science, like medicine or physics, then we must determine best practices through research and establish standards and methodologies for the profession that all are expected to follow.

Carol Ann Tomlinson calls teaching a science-informed art, an answer the group liked, but I’d like to take a closer look at the question. The way we view the profession affects everything from how we train teachers to how we evaluate their performance. So is it an art, or is it a science?

Perhaps the distinction between the two isn’t as clear-cut as we think. Teaching may be a “science-informed art,” but what art hasn’t been influenced by the sciences? Each artistic discipline codifies what works and what doesn’t, and even the most promising young talents must study for many years to perfect their craft. There are certainly examples of highly successful art forms and artists that are defined largely by breaking the rules, like jazz or Picasso, but even they are influenced by science. Would Picasso’s “Blue Period” have been possible if Heinrich Diesbach hadn’t developed an affordable blue paint? And you can’t just play anything you like in improvisational jazz; you really have to know what you’re doing. In other words, it doesn’t mean a thing if it hasn’t got that swing.

Science, on the other hand, has a lot more intuition and creativity than it generally gets credit for. It comforts us to think of medicine as a hard science, but a lot of times doctors just have to go with their best instincts. I may have seen too many episodes of House, but let me ask you this: If you had to go in for surgery, would you prefer a young surgeon who recently graduated from a top medical school with a high GPA, or would you prefer a doctor with 25 years of experience doing this kind of surgery with a high success rate? And the most creative, mind-blowing stuff we’ve seen lately is coming out of the field of theoretical physics. Einstein famously said that imagination was more important than knowledge, and we have more knowledge because of his imagination.

So in deciding if teaching is an art or a science, we have to look at art and science for what they really are: two ends of a continuum, rather than binary opposites. But where on the continuum does teaching belong? The term “Instructional Rounds” borrows its name from the medical profession. But others refer to a similar activity as a “Gallery Walk” which takes its title from the arts.

There is, of course, a third option that falls outside of this continuum. In this option, teaching is neither an art nor a science, as each word implies a skilled and knowledgeable practitioner. It is simply a trade, one that can be standardized and learned. In this view, teaching is not a profession at all. I reject this idea, but it becomes part of the conversation nevertheless. And so, I bring back the Question of the Week by asking you this:

Is teaching an art or a science?

Film: Waiting for “Superman”

Tuesday, October 12th, 2010

Davis Guggenheim’s new documentary about the need for reform in the American school system is one of the most important films of the year and everyone should go see it. Although I have a number of significant problems with the movie (which – rest assured – will be inventoried below), I think there are a lot of dark truths that Guggenheim brings to light, and even if we don’t all agree on what the solutions are, we can agree on what’s at stake in getting it right.

Waiting for “Superman” follows the journey of five students, and their individual quests to improve their educational opportunities. I’d say the movie gets about 75% of it right: the system is failing these students, and millions like them. But while it might make a good movie narrative to divide the issue into good guys (charter schools) and bad guys (teachers unions), the real issues surrounding education in this country are much more complicated than Guggenheim suggests.

I came out of the movie disappointed about many of the factual inaccuracies and glaring omissions that Guggenheim uses to make his case, but I found that these were well addressed by this piece in the Washington Post. Even better is this excellent article in The Nation, which digs much deeper into the issues surrounding the debate. I strongly recommend these two articles, as they cover a lot of ground that I consequently won’t need to cover.

I do believe that Guggenheim is sincere in his desire to reform education, and that’s important to say, because many participants in this discussion are not. Their goal is to end taxpayer-funded education entirely, and they tend to support measures that move the nation closer to this ultimate goal. The problem with this is that the free market will do an excellent job of educating some of our students, while a great number of children in this country will be starkly left behind. So I’m on my guard when I hear arguments about how charter schools have solved all of the problems faced by public education. But despite some of the darker connections behind Waiting for “Superman”, I do believe that the filmmaker is earnest and I can counter his points secure in the belief that we share the common goal of educating all of our students.

Not only does Guggenheim omit important details, but he often doesn’t even draw the correct conclusions from the evidence actually presented in the movie. What was most striking to me was how powerfully the film showed how the lack of economic opportunities for parents in these inner-city communities directly impacts the education of their children. That alone was worth the price of the surprisingly expensive ticket. But then, we’re told that “many experts” (who?) now believe that failing schools are responsible for failing communities, not the other way around.

Each of the five children depicted has a parent or guardian who is hell-bent on making sure the child has the best education possible. They enter their children into a lottery for the local high-performing charter schools. Presumably, all of the children in the lottery have similarly committed parents. That makes for a pretty good head start for the charter school. Public schools tend to have a more varied range of parent commitment. Also, did you notice how few students are accepted each year? What does that do for class size? And I have to mention, even though it’s well covered in the articles linked above, the large amounts of private funding that the high-performing charter schools depicted in the movie enjoy.

So yes, the charter schools in the film are doing very well, and that’s great news for the students who attend them. But if, as it is admitted in the movie, only one in five charter schools are showing results, that’s a dismal record indeed. And despite the emotionally manipulative scenes where each student’s “fate” was decided by random lottery, I felt myself more concerned for the students who were never in the lottery.

So perhaps the real lesson we can learn from the successful charter schools is that, if the school has a clear and progressive vision, then increased funding can actually make a difference in student achievement. And if we take a closer look at what Geoffrey Canada is really doing for the students in the Harlem Children’s Zone, we might realize that student achievement isn’t only impacted within the school building. He may have even created a microcosm of the society we would have if we could make the connection between our nation’s social fabric and the way our children are educated.

But “firing all the bad teachers” is a much more digestible solution.

And yes, there are bad teachers, and I agree that it should be easier to get rid of them. But in truth, this represents a very small part of the problem, and blaming teachers unions for the decline in educational quality is seriously misguided. Teachers unions have been and should be a partner in education reform, but they also have the task of protecting the rights of their members. Teachers have the same rights to collective bargaining as any other labor force in the country. To frame the issue as children vs. adults is a dangerous distraction, especially when our goal should be to attract the very best people to the profession, and retain them once they’re in. The movie makes the point that great schools start with great teachers. I agree! So let’s make teaching the most desirable profession in America. You can read more about teacher recruitment and retention issues in this Washington Post article. Because once we’ve fired all the bad teachers, who will we get to replace them?

By the way, nobody is actually waiting for Superman to come and save our children. It’s a classic rhetorical trick to frame the sides of the debate as the people who agree with the solutions provided and the people who would rather do nothing. But smart and passionate people are already implementing solutions within public education that resonate with the solutions presented by Guggenheim. Here in New York City, we’ve increased educational accountability enormously, and with the cooperation of the teachers union. Nationally, we’re moving towards Common Core Standards for student achievement. We’re not there yet, not by a longshot, but nobody in the system is complacent about that.

Still, despite all the movie gets wrong, it should be praised for shining a spotlight on issues that have been festering in the darkness. This movie has the potential to spark a national conversation about the problems in American education, and how we can best address them. If it does that, despite the film’s flaws, its ultimate effect will be a net positive. If it does that, it will be my very favorite of all of the Superman films.

UPDATE: An anagram review.

Shakespeare Teacher: The Book!

Wednesday, September 1st, 2010

I am proud to announce that I have recently published a chapter in this book on teaching literature through technology. You can ignore the description; it seems to have been inadvertently switched with that of this book. Neither page describes my chapter, but you can read the abstract on the publisher’s page, or I could just tell you what it’s about.

Unlike this blog, the book chapter is actually about teaching Shakespeare! No riddles. No anagrams. No politics. (Well, maybe a little bit of politics.)

Here is the basic idea. I begin by citing experts who are skeptical of the ability of elementary school students to do Shakespeare. Specifically, I discuss the Dramatic Age Stages chart created by Richard Courtney.

Courtney describes “The Role Stage” as lasting from ages twelve to eighteen, at which point students are capable of a number of new skills that I would consider essential for understanding Shakespeare in a meaningful way. These skills include the ability to think abstractly, to understand causality, to interpret symbols, to articulate moral decisions, and to understand how a character relates to the rest of the play. So based on this chart, I would have to conclude that a student younger than twelve would not be ready to appreciate Shakespeare in these ways.

But Courtney bases his chart on the framework of developmental phases of Swiss psychologist Jean Piaget. These phases describe what a lone child can demonstrate under testing conditions. A more accurate and nuanced way of looking at development is provided in the work of Soviet psychologist Lev Vygotsky, who described a “Zone of Proximal Development” (ZPD), which is a range between what a child can demonstrate in isolation, and what the same child can do under more social conditions.

So I wondered if fifth-grade students (aged 10) would have some of the skills associated with “The Role Stage” somewhere in their ZPD. If so, a collaborative class project should provide enough scaffolding to develop those skills and allow ten-year-old students to understand and appreciate Shakespeare on that level.

So I developed and implemented a unit to teach Macbeth to a fifth-grade class in the South Bronx, using process-based dramatic activities, a stage production of the play performed for their school, and a web-based study guide to apply what they had learned. The idea was to use collaborative projects to get the kids to work together to make collective sense of the play. I then examined their written work for evidence that they had displayed the skills associated with “The Role Stage” in Courtney’s chart, and I was able to find a great deal of it.

I also create a three-dimensional rubric to assess the students’ work over the course of the unit. I say a three-dimensional rubric because I use the same eight categories in all three rubrics, but they develop over time to reflect the increased sophistication that I expect the students to demonstrate. I then compare the students’ performance-based rubric scores to their reading test scores to demonstrate that standardized testing paints only a very limited picture of what a student can achieve. (I did say that it had a little bit of politics.)

Anyway, that’s what my chapter was about. I just saved you $180! And I’m hoping to return to a regular blogging schedule soon, so more content is hopefully on the way.

Word of the Week: Community

Wednesday, March 18th, 2009

The word of the week is community.

It’s a word I’ve been thinking about a lot lately, as I’ve been doing a lot of leaning on my own community over the past few weeks. I’ve also been thinking about how new technologies and changes in society affect our idea of community.

Today is Wednesday. Since last Wednesday, I…

  • attended a Bris for my cousin’s son.
  • ended my 30-day mourning period for my mother.
  • participated in a live reading of The Comedy of Errors with a group I found online.
  • reconnected via e-mail with a close childhood friend I lost touch with 15 years ago.
  • participated in a learning community seminar about 21rst century schools with my work colleagues.
  • was called for an aliyah at the Bar Mitzvah of another cousin’s son.
  • visited my sister in the hospital and held my 10-hour-old niece.
  • conducted a day-long data workshop that helped a school identify a pervasive student learning problem.
  • began teaching The Merchant of Venice to an 8th-grade class who will be creating a video project based on the play.
  • joined Facebook.
  • was invited to present at a conference at the Folger on teaching Shakespeare in the elementary school.
  • participated in a webinar, cosponsored by the Folger and PBS, that brought together 176 Shakespeare teachers from across the country.

Traditional community structures such as family, school, religion, and professional networks are supplemented and even augmented (though never replaced) by technology and an increased focus on interconnectivity and collaboration. What I learned this week, though, is that there’s no substitute for being there in person.

Welcome to the world, Elena. You have big shoes to fill.

Using Data

Tuesday, December 16th, 2008

Yesterday, I gave a workshop for teachers on using data to improve student achievement. This is something that is going to become an increasing part of my work, so I may be blogging about it from time to time. The idea is to cull information about students from a variety of sources, systematically analyze that information in order to identify areas of improvement, and then create an action plan for targeting those areas.

In some cases, the results of careful data analysis can be surprising. So often we jump to conclusions about why students aren’t achieving, or we depend on underlying assumptions that may be based on our own pre-conceived notions. Consider for a moment this piece of student work:

Laugh if you must, but it’s easy to get the wrong idea from only a cursory examination. Further investigation revealed that the child’s mother works at Home Depot, and is here depicted selling snow shovels. And if you only relied on your initial observations and didn’t investigate further, you could be lead astray.

Hopefully, the systematic use of data will allow us to avoid such snap judgements and take a more scientific approach to improving student achievement.