Machine Learning in R

Most of my work requires implementing learning and inference algorithms from scratch, but it’s great to have a collection of methods you can turn to when you need to build a solution to a problem fast. Here are some R packages implementing state-of-the-art machine learning/predictive models.

  1. earth: multivariate adaptive regression splines
  2. mgcv: generalized additive models
  3. rpart: classification and regression trees
  4. randomForest: random forests
  5. gbm: gradient-boosted regression models

 

Posted in Uncategorized | Leave a comment

HOWTO: Write your first research proposal

This weekend I’m writing a short proposal for a research project that will hopefully be funded by an NSF Research Experience for Undergraduates (REU) grant. I wanted to take a few minutes to compile a list of the bits of advice that I’ve come across in the process.

First, I think the advice my advisor gave me is most important. It’s short, but sweet: the idea you propose should be crisp and easy to understand. It should be clear with minimal technical detail why the problem is important and whether it’s reasonable to tackle in the timeframe of the research program.

Now, on to advice compiled from sources on the internet. First, from the HFSP:

  1. Do you have a clear, concise, and testable hypothesis? (Similar to my advisor’s advice)
  2. Are your objectives and aims coming into focus?
  3. What questions are to be addressed?
  4. Can you define and design specific experiments that will test your hypothesis?

I think 3 is critical for a short proposal like mine. The most important aspect of conveying an idea for a research project is clearly stating and motivating the questions that the work will answer.

Simon Peyton Jones and Alan Bundy also have a great compilation of advice geared towards computer science research. First and foremost, they emphasize the importance of writing an introduction that has wide appeal. The first page or so should not cater only to experts in your field as they will make up a very small percentage of the people who will read it. From a technical standpoint, it’s important that the problem be well-formulated, that the solution require original research (not a mechanical¬†application of existing techniques), and that the solution be important.

More to come as the process continues.

Posted in Uncategorized | Tagged | Leave a comment

HOWTO: Get High-Quality Plots in IPython Notebooks

Type the following chunk into a cell at the beginning of the notebook. I pulled this out of the notebook here.

%matplotlib inline
%config InlineBackend.figure_format = "retina"

from matplotlib import rcParams
rcParams["savefig.dpi"] = 100
rcParams["font.size"] = 20
Posted in Uncategorized | Leave a comment

Causal Inference and Graphical Representations

I’m taking a course on causal inference in the Department of Sociology at JHU. The book we’re using (Morgan and Winship) focuses on the approach to causal inference that lies at the intersection of the potential outcome model and the causal graph work that’s been pushed forward by Judea Pearl in the pat 20 years or so.

In the first chapter, Morgan and Winship lay out the three main approaches they go over in the book: conditioning arguments (removing “backdoor” effects), instrumental variables and mechanism models. I don’t fully understand the math behind these strategies, but when reading the high-level descriptions I couldn’t help but be reminded of the three different types of reasoning expressed in graphical models: causal (not used in the same was as in causal inference), evidential, and co-causal or “explaining away.” I’m looking forward to fleshing these analogies out once we’ve made it further through the book.

Posted in Uncategorized | Tagged | Leave a comment