## Student-Generated Questions: When they don’t work

Many data-oriented lessons in math and science ask students to come up with the question they want to investigate. The idea behind this is great. It ought to promote buy-in: students will want to use data to explore something they’re interested in—so we get them to come up with a question they want an answer for.

I have no trouble with that.

But I recently saw a lesson which was, essentially,

• Look at this table of data
• Come up with a question about the data
• Make a prediction
• Make a scatter plot
• Put a line on it

The teacher materials even had a section for what do do if students have trouble coming up with a question, with suggestions such as asking, “how do you think the variables might be related?” Fine prompt. But how does it actually elicit a question about a table of data the student has only just seen?

Since this is in a tech-rich environment, it might be better to approach (numerical) data like this:

• Plot two variables against each other
• Look at the patterns that emerge
• See if they make sense in context
• Look for surprises
• Repeat until you find something cool
• Analyze the relationship by making lines and all that

At this point, you could retrospectively come up with a question, but chances are good that the question—since it really wasn’t a question you were interested in to begin with—will be lame.

## An example

I have batting records for MLB hitters from 2009. It has games, at-bats, hits, doubles, triples, homers, all that stuff. What questions do you have about relationships among the variables?

Neither do I. I have some interest (and let’s assume that you do too), and specific questions (e.g., where is Bengie Molina?), but no relationship questions. Instead, I just want to look and see what I find.

So I make a bunch of scatter plots. I make up stories about why they’re shaped the way they are, without having any questions in my head. Here’s one of the plots:

2009 MLB batting stats: at-bats versus games, 1249 players.

I’ve even put a relevant line on it. (What does the slope mean? That’s an interesting question—but not the kind of question the lesson is talking about…)

Instead, the school lesson wants a kid to write, “How is the number of at-bats for an MLB player associated with the number of games they play in?”

Do you feel the lameness? This question arises only because we have to come up with a question for a school assignment. We’re interested in the relationship, sure, once we see it, but wasn’t a question we had before we saw the data.

### digression

This is why I like the idea of having kids mess with data and then make a claim they can buttress with data. I describe that here.

As to questions, looking at the scatter plot, the big question for me—which I would never have come up with beforehand, only after messing with the data—is, “who are those players at the bottom with a bunch of games and very few at-bats?”

Because you don’t have the data in front of you, I’ll at least identify the guy at the far right of that plume of points: “Perpetual” Pedro Feliciano, Mets, 88 games, no at-bats.

and we’re back…

So there are at least two kinds of legitimate questions:

• Sometimes you have a genuine question even before you get data. That should propel you to get data and answer it.
• In messing around with data, you get curious about the data you see and wonder about the shape or specific features. That creates questions: Who are those guys? What does the slope mean?

These are very different kinds of questions. I think we can expect students to have the second, even insist on it. But it will be more unusual for them to have the first. And it will help if we don’t force them to ask a kind of question they don’t really have.

Using “claims” is one approach that doesn’t seem to have that problem; I wonder what others we could find.

## Questioning Research Questions

On a related subject, grant proposals and dissertations often seem to require “research questions,” which are like the first kind. I think many of them are asked, well, not retrospectively but justificationally, like this:

We want support to create cool educational solution S. We have some evidence that S works already. We need more evidence to get support, though, so we look around for a measurable effect M (ideally with a rich literature) that S might cause. But to get approved, we need a question. So we ask, “Does S result in M?”

This is not what we actually care about. And this effect M is probably not what we think matters most. But it’s measurable, and we can dig up plenty of citations.

Makes me wonder if there’s an alternative system.

Math-science ed freelancer and sometime high-school stats teacher
This entry was posted in Uncategorized. Bookmark the permalink.

### 4 Responses to Student-Generated Questions: When they don’t work

1. You’re looking at the difference between “hypothesis-driven” and “discovery” research. Both are useful, but many teachers have been drilled that only “hypothesis-driven” is “science”. As a bioinformatician, far more of my research is “discovery” research, which generates hypotheses, rather than testing pre-exisitng hypotheses. The statistics needed is somewhat different, as corrections for multiple hypotheses are the key to finding anything, and all generated hypotheses must be tested on data independent of the data used to generate the hypothesis.

2. harrysmith4444 says:

Not being a statistician or even a scientist but having been a kid and liking to mess with stuff and digging graphs and relationships I appreciate the less formal approach. Achieving interest, engagement is the key. It also seems to me that whenever you choose two variables to plot you are in effect asking some question.

3. shaunteaches says:

I love the phrasing of this post:

“Since this is in a tech-rich environment, it might be better to approach (numerical) data like this:

Plot two variables against each other
Look at the patterns that emerge
See if they make sense in context
Look for surprises
Repeat until you find something cool
Analyze the relationship by making lines and all that”

I am going to use these prompts in a lesson I am designing for my students (we are in New York City).

I also think I might have a lesson that satisfies the following criteria around questioning:

“So there are at least two kinds of legitimate questions:

Sometimes you have a genuine question even before you get data. That should propel you to get data and answer it.

In messing around with data, you get curious about the data you see and wonder about the shape or specific features. That creates questions: Who are those guys? What does the slope mean?”

The lesson isn’t ready yet, but I can send it to your way when its ready. However, I wanted to send you the data and see if it might help you create some great investigations (I think this data is beautifully suited for a lesson):

http://www.citibikenyc.com/system-data

I love that you can download the raw data as it is updated.

• Tim Erickson says:

Definitely interesting data– I have to see how they imagine our getting the raw data, and of course it’s deliciously local to you in NYC.

Good luck with the lessons! Let us know how they go!