### Change of Representation (Introduction to Statistical Computing)

(My notes for this lecture are too fragmentary to post. What follows is the sketch.)

The "raw data" is often not in the format most useful for the model one
wants to work with. Lots of statistical computing work is about moving the
information from one format to another --- about changing representations.
Lossless transformations vs. lossy; why we often want lossy transformations.
Re-organizing data to group it properly. (Example: going from multi-dimensional
arrays to 2D data-frames and vice versa.) Aggregation as a change of
representation. (Example: Going from dates of adoption for each doctor to
cumulative proportion of adopters.)

Text processing via change of representation: the bag-of-words ("vector
space") representation. Cosine and Jaccard similarities. Term
frequency-inverse document frequency. Document clustering and classification.

Readings: Spector, chapters 8 and 9.

On text:
Lectures 1, 2
and 4
(+ slides)
from data mining (vintage
2009).

Introduction to Statistical Computing

Posted at November 18, 2013 10:30 | permanent link