February 07, 2011

Writing R Functions (Advanced Data Analysis from an Elementary Point of View)

As in some of my previous classes, there is a wide range of programming skill among the students in 402. The following notes are mostly intended to help those at the lower end of the scale catch up, but may be of some interest to others. (It presumes familiarity with using R from the command line.) The last section rips off largely incorporates Minimal Advice to Undergraduates on Programming.

Statisticians must be able to do basic programming; someone who only knows how to run canned routines is not a data analyst but a technician who tends a machine they do not understand. Programming in R is best organized around functions. Parts of a function and a function declaration. Writing functions to encapsulate repeated procedures. First example: calculating quantiles of Pareto distributions, by hand and by a function; checking the function. Extending the function. Writing functions which call other user-defined functions. Sanity-checking arguments, e.g., with stopifnot. More layering of functions: writing a Pareto random number generator. Our first bug. The debugging process; traceback as a useful utility. Checking the Pareto generator. Automating the checking process. Passing arguments from function to function with the ... pseudo-argument. More debugging. Contexts and "scope". Revising functions to work with each others. Avoiding iteration in R for speed and clarity. Returning lists and other complex data structures; writing a function to estimate a Gaussian. General programming advice: take a real programming class; comment your code; RTFM; start from the beginning and break it down; break your code into many short, meaningful functions; avoid writing the same thing twice; use meaningful names; check whether your code works; complain rather than giving up; avoid iteration.


Advanced Data Analysis from an Elementary Point of View

Posted at February 07, 2011 22:40 | permanent link

Three-Toed Sloth