R ‘cheats’

After spending uncountable hours in front of my computer trying to figure out how things work in R,  I thought I might contribute to ease the learning process for R by compiling this list of helpful commands. Clearly all of these things can be found somewhere else, and this list is by no means complete. I will try to keep it updated.

Last update 2012/10/05 new ggplot2 syntax

Getting started

What you need: Download and install R:  http://www.r-project.org/

The functionality of R can be extended by installing additional packages. I use mainly those packages
ggplot2  to make nice looking graphs;
grid also for plotting purposes;
gdata to import data 
lme4 for all kinds of mixed models; 
multcomp for posthoc tests of linear models

To install a package write this:

 install.packages("libraryname", dependencies = TRUE)

How to load these packages

library(libraryname) #for example: library(lme4)

To load them always at startup write this:

     old <- getOption("defaultPackages")
     options(defaultPackages = c(old, "ggplot2"))

Preparing your data table
To reduce the problems with your data in R replace any empty field with NA in Excel or OpenOffice and save your file as txt or csv file.

Getting data inside

Choose your working directory
Menu: File/Change Dir

Import data

dataname = read.csv("filename.csv")
dataname = read.table("filename.txt", header=TRUE) 

Check if your data is correctly imported (numbers are still numbers, ...)


more details about the data (Quantiles, Mean..)


Data generation and manipulation

variable1 = c(1,2,3,4,5) 
variable2 = c(5,4,3,2,1) 
dataname = data.frame(variable1,variable2) 

cbind combines the two vectors into one table

 variable3 = seq(1,10, by = 2) 

this makes a sequence from 1 to 10 by 2 increments.

How to change from one format to the other


to change a variable to a number, change it first to a character, then to a number.

how to refer to columns/variables of your data


How to change the column names of your dataframe.

 colnames(dataname)[2]  = 'copepod number' 

How to change one value in the dataframe

 dataname[2,1] = 11 

this changes the second element of the first column ([row,column])

how to reshape a data.frame

 reshape(tt, idvar=c("animal.id","station", "bottle", "sex"),
 timevar="part", direction="wide")


r4 = melt(r3, id="t") 

you can avoid writing always the "dataset$" by attaching the data using:


remember to detach a dataset after you finished, as you can #get in trouble having two datasets with the same variable #names.


How to make a subset of your dataset

 datasetsubset = dataset[dataset$variable1 == 0 & dataset$variable2 > 2,] 

important! don't forget the comma at the end, otherwise no other columns will be selected.

How to calculate a new factor out of other factors

 factor3 = factor1 + factor2 

* multiply       / divide     + add   - substract    ^ power

How to drop all NAs from the data


get all possible ("unique") values from a variable


how to aggregate data (mean, sum,...)

 newdata = aggregate(data$x , by =list(Var1=data$Var1, Var2 = data$Var2), sum) 

Simple plotting


or add ( … type="line") in case you want a lineplot


to open a new plot window

windows() #in windows 
quartz() #on a mac 

to save a plot either use right-click (however only windows #metafile and bitmap here) or write

 savePlot("Figurename.pdf", type="pdf") 

Mean and standard deviation

 mean(dataset$columnname, na.rm=TRUE) 
 sd(dataset$columnname, na.rm=TRUE) 

na.rm will skip the NA values, otherwise you will get an NA as output.




 ks.test(dataset$columnname, dataset$columnname) 


 t.test(y ~ x, data = dataset) 


 cor.test(y,x, data = dataset) 

default is Pearsson product momentum, you can also define kendall, by adding method = "kendall")

Test for homogeneity of variances


Simple linear model

 modelname =lm(dataset$columnname~dataset$columnname) 

After you have made your model you can use "summary()" #and "anova()" to get the results.
Complete model details



 anova(modelname, type = "marginal") 
 #"marginal" tests each factor independent of interactions 

Some of the models have inherent plotting functions which generate plots to test if the model is valid.


#plotting the residuals of the model


#Mixed effect models

Assuming normally distributed data

1. using the nlme package

 modelname = lme(y~x1*x2, data=dataset, random = ~ 1|r1, 

#if you want to nest one factor into another write  ~1|r1/r2 , this would mean r2 is nested within r2. If you have repeated measures use them as if they were random factors.

2. using the lme4 package

this is more flexible as it allows for different data distribution families, but it does not give you p-values for the gaussian case. Notice that the formulation of the random effects is slightly different to the lme model.

 modelname = lmer(y~x1*x2 + (1|r1), data=dataset, 
family=gaussian, na.action=na.omit) 

with a binomial distribution, here the data should in a "two-column" format.

 modelname = lmer(cbind(y1,y2)~x1+x2+(1|r1)+(1r2), data=dataset,

Other families to use

 binomial(link = "logit") 
 gaussian(link = "identity") 
 Gamma(link = "inverse") 
 inverse.gaussian(link = "1/mu^2") 
 poisson(link = "log") 
 quasi(link = "identity", variance = "constant") 
 quasibinomial(link = "logit") 
 quasipoisson(link = "log") 

Posthoc tests on models

load package multcomp

 m1 = glht(modelname, linfct=mcp(TheFactorYouAreInterestedIn="Tukey")) 

Fitting an equation to data

model=nls(variable1~variable2^unknownParameter, data=dataset, 
start = list(UnknownParameter = educatedguess)) 

example: model=nls(time~raddd^n, data=Bob, start = list(n=1))


"Beautiful" plotting with ggplot2

ggplot2 has a different approach compared to the "normal" R graphics, but I think you can control more and it also has nicer default settings.

This gives you an easy scatter plot

 qplot(x-variable,y-variable, data=dataset) 

scatter plot with a smooth function or a regression line

 qplot(x-variable,y-variable, data=dataset)+geom_smooth() 
 qplot(x-variable,y-variable, data=dataset)+geom_smooth(method=”lm”) 

For a lineplot use this:

 qplot(x-variable,y-variable, data=dataset, geom=c("line")) 

a combination of line and point

 qplot(x-variable,y-variable, data=dataset, geom=c("line", "point")) 


 qplot(x-variable,y-variable, data=dataset, geom=c("bar")) 

if you want to change the color or the shape of points,lines...

 qplot(x-variable,y-variable, data=dataset, geom=c("line"), 
shape = variablename, colour=variablename) 

if the colour result looks weird you should try colour = factor(variablename)

to change the labels

 qplot(....) + scale_x_continuous("xlabelname") + 

in case your data on one axis is discrete use scale_x_discrete() instead.
how to get rid of the grey background in case the journal you want to submit to does not approve it.

 qplot(x-variable,y-variable, data=dataset, geom=c("bar")) + theme_bw() 

The second approach to ggplot2

using ggplot; this might be more tedious in the beginning, but it will pay off because you have more control over each element in the graph.

Define what data you have

 p = ggplot(data=dataset, aes(x=xvariablenname, y=yvariablename)) 

normal plot

 p + geom_point() 


 p + geom_line() 


 p + geom_bar() 

If you want to visualize different groups , you can just use

 p + geom_bar(aes(fill=groupname)) 

if you want the bars to be next to each other use

 p + geom_bar(position="dodge", aes(fill=groupname)) 

to get rid of the legend

 p + theme(legend.position="none") 

to change the title of the legend box

 p + guides(colour = guide_legend(title = "title here")) 

to rotate axis labels

 p + theme(axis.text.x  = element_text(angle=45, hjust=1.0)) 

to add a general title to the graph

 p + labs(title = "New plot title")

how to limit the plotting area

 p + coord_cartesian(xlim = c(-5000, 5000)) 

make an area graph (filled line graph)

 ggplot(data, aes(x, y)) + geom_area(aes(fill = grouping.variable),
 position = "identity") 
+ scale_fill_manual(value = alpha(c("green", "blue"), 0.4)) 

error bars

 mean_se = function (x, ...) 
 x = na.omit(x) 
 se = function(x)sqrt(var(x)/length(x)) 
 data.frame(y=mean(x), ymin=mean(x)-se(x), ymax=mean(x)+se(x)) 
 p + stat_summary(fun.data = "mean_se", colour = "red") 

for more details visit the ggplot homepage

4 thoughts on “R ‘cheats’

  1. of course like your website however you need to check the spelling on several of your posts. Many of them are rife with spelling problems and I find it very bothersome to tell the reality on the other hand I’ll certainly come again again.

Leave a Reply

Your email address will not be published. Required fields are marked *