December 28, 2008

36-350, Data-Mining: Self-Evaluation and Lessons Learned

Attention conservation notice: > 600 words on how I'd teach this semester's course differently next time.

So; grades are done, and, a decent interval after submitting the grades, I got the (anonymized) student evaluations. (Five of the eighteen students bothered to fill them out.) This seems like a good time to take a look at how things went.

Overall, I'm pleased with the semester. Their grades were quite good, and actual performance on the final exam was even better than I'd hoped — several students who'd done poorly on the homework pulled off really good exams, and nobody did much worse on the exam than on the homework. Most importantly, judging by what people wrote for the final, lots of them actually understood what I was trying to say. (Of course, I didn't give them a version of the final exam at the start of the class, so maybe they all knew it already.) I'm also reasonably satisfied with the choice of materials, and definitely think that replacing the weekly lab sessions with an extra lecture was the right thing to do.

Of course it wasn't all good. While linear algebra is not a pre-req for the class, I was still surprised at how unfamiliar many of the students were with it. The difficulty being that it is very, very hard to say anything about high-dimensional data without linear algebra. Some of them of course had no problem; perhaps I need a pre-test at the start, with catch-up reading for those without the background. (Making linear algebra an official pre-req doesn't seem like an option.)

The big issue, both from my point of view and according to three of the five students who bothered to write evaluations, were the programming assignments. These were much harder for them, especially for the bottom half of the class, than I had anticipated. In fact they kept being harder than I anticipated, so I really need to dial down the initial programming expectations, and include more programming instruction. (See previous post.) I am not sure what to cut to make room for this; the best approach might be to integrate demos and code walk-throughts with some lectures. Teaching them data-mining without getting their hands dirty, however, seems like a travesty.

Student participation also needs work. Out of eighteen students, there were, to first order, three who spoke up in lecture. (To second order, maybe six.) This was not a problem with them, but rather I should have done more to encourage the others to talk. Likewise, only three students came to office hours.

Some more specific things to work on, in no particular order:

Update, 16 March 2009: A nice sequence might be: PCA (subtracting off successive principal components), to the coordinate-descent/back-fitting approach to linear regression, to the coordinate descent lasso, to additive models, to SpAM. But this will need a lot of linear algebra, and the middle steps are impractical.

Corrupting the Young; Enigmas of Chance; Self-Centered

Posted at December 28, 2008 10:55 | permanent link

Three-Toed Sloth