### Re-Writing Your Code (Advanced Data Analysis from an Elementary Point of View)

An extended example of re-writing code to make it more powerful,
flexible, and clear, based on in-class discussion.

Calculating a standard error for the median of a particular Gaussian sample
by repeated simulation, "manually" at the R console. Writing a function to
automate this task, with everything hard-coded. Adjusting the function to let
the number of simulation runs be an argument. Writing a parallel function to
do the same job for an exponential distribution. Since this is almost entirely
the same, why have two functions? Putting in a logical switch between
hard-coded options. Better approach: abstract out the simulation into a
separate function, and make the simulator an argument to the
standard-error-in-median function. Example of applying the latter function to
a much more complicated simulator. Advantages of the modular approach:
flexibility, clarity, ease of adjustment. Example: removing a `for`
loop in favor of `replicate` in the find-the-standard-error function,
without having to change any of the simulators. Writing parallel functions to
find the interquartile range of the median, or the standard error of the mean.
Repeating the process of abstraction: the common element is taking a simulator,
estimating some property of the simulation, and summarizing the simulated
distribution. All three tasks are logically distinct and should be performed
by separate functions. Reduction of bootstrapping to a two-line function
taking other functions as arguments.

PDF handout, incorporating R examples

Advanced Data Analysis from an Elementary Point of View

Posted at February 16, 2011 01:48 | permanent link