What Makes Good Exercises?

Ben Pace has a new post up on LessWrong that's asking about good exercises for rationality / general LW-adjacent stuff. I think this is a good thing to put up a bounty for, and I started thinking about what makes a good exercise. Exercises are good because they help you further the develop the material; they give you opportunities to put whatever relevant skill to use.

There are differing levels of what you can be trying to assess:

Identifying the correct idea from a group of different ones
Summarizing the correct idea
Transferring the idea to someone else
Actually demonstrating whatever skill it is (if it's something you can do)
Actually using the skill to deduce something else (if it's a model thing)

I think there's a good set of stuff to dive into here about the distinction between optimizing for pedagogy versus effectiveness. In the most stark case, you want to teach people using less potent versions of something, at least at first. Think not just training wheels on a bike, but successively more advanced models for physics or arithmetic. There's a gradual shift happening.

More than that, I wonder if the two angles are greatly orthogonal.

Anyway, back to the original idea at hand. When you give people exercises, there's a sense of broad vs narrow that seems important, but I'm still teasing it out. In one sense, you can think of tests that do multiple choice vs open-ended answers. But it's not like multiple-choice questions have to suck. You could give people very plausible-sounding answers which require them to do a lot of work to determine which one is correct. Similarly, open-ended questions could allow for bullshitting.

It's not exactly the format, but what sort of work it induces.

At the very least, it's about pushing for more Generative content. But beyond that, it gets into pedagogy questions:

How can you give questions which increase in difficulty?
1. What does difficulty correspond to? If something is "hard to figure out", what is that quality referring to?
If you give open-ended questions, how can you assess the answers you get?
How much of this is covered already by the teaching literature?