**ONE MORE R “CHEAT” SHEET**

After spending uncountable hours in front of my computer trying to figure out how things work in R, I thought I might contribute to ease the learning process for R by compiling this list of helpful commands. Clearly all of these things can be found somewhere else, and this list is by no means complete. I will try to keep it updated.

*Last update 2012/10/05 new ggplot2 syntax
*

**Getting started**

What you need: Download and install R: http://www.r-project.org/

The functionality of R can be extended by installing additional packages. I use mainly those packages

*ggplot2* to make nice looking graphs;

*grid* also for plotting purposes;

*gdata* to import data

*lme4* for all kinds of mixed models;

*multcomp* for posthoc tests of linear models

To install a package write this:

install.packages("libraryname", dependencies = TRUE)

How to load these packages

library(libraryname) #for example: library(lme4)

To load them always at startup write this:

local({ old <- getOption("defaultPackages") options(defaultPackages = c(old, "ggplot2")) })

**Preparing your data table **

To reduce the problems with your data in R replace any empty field with NA in Excel or OpenOffice and save your file as txt or csv file.

**Getting data inside**

Choose your working directory

Menu: File/Change Dir

Import data

dataname = read.csv("filename.csv")

dataname = read.table("filename.txt", header=TRUE)

Check if your data is correctly imported (numbers are still numbers, ...)

str(dataname)

more details about the data (Quantiles, Mean..)

summary(dataname)

**Data generation and manipulation**

variable1 = c(1,2,3,4,5) variable2 = c(5,4,3,2,1) dataname = data.frame(variable1,variable2)

cbind combines the two vectors into one table

variable3 = seq(1,10, by = 2)

this makes a sequence from 1 to 10 by 2 increments.

How to change from one format to the other

as.numeric(data$columnname) as.character(data$columnname) as.factor(data$columnname)

to change a variable to a number, change it first to a character, then to a number.

how to refer to columns/variables of your data

dataset$columnname

How to change the column names of your dataframe.

colnames(dataname)[2] = 'copepod number'

How to change one value in the dataframe

dataname[2,1] = 11

this changes the second element of the first column ([row,column])

how to reshape a data.frame

reshape(tt, idvar=c("animal.id","station", "bottle", "sex"), timevar="part", direction="wide")

or

library(reshape) r4 = melt(r3, id="t")

you can avoid writing always the "dataset$" by attaching the data using:

attach(dataset)

remember to detach a dataset after you finished, as you can #get in trouble having two datasets with the same variable #names.

detach(dataset)

How to make a subset of your dataset

datasetsubset = dataset[dataset$variable1 == 0 & dataset$variable2 > 2,]

important! don't forget the comma at the end, otherwise no other columns will be selected.

How to calculate a new factor out of other factors

factor3 = factor1 + factor2

* multiply / divide + add - substract ^ power

How to drop all NAs from the data

x1=x[!is.na(x)]

get all possible ("unique") values from a variable

unique(data$variable)

how to aggregate data (mean, sum,...)

newdata = aggregate(data$x , by =list(Var1=data$Var1, Var2 = data$Var2), sum)

Simple plotting

plot(dataset$columnname~dataset$columnname)

or add ( … type="line") in case you want a lineplot

boxplot((dataset$columnname~dataset$columnname)

to open a new plot window

windows() #in windows quartz() #on a mac

to save a plot either use right-click (however only windows #metafile and bitmap here) or write

savePlot("Figurename.pdf", type="pdf")

** **

**Statistics**

Mean and standard deviation

mean(dataset$columnname, na.rm=TRUE) sd(dataset$columnname, na.rm=TRUE)

na.rm will skip the NA values, otherwise you will get an NA as output.

Shapiro-Wilk

shapiro.test(dataset$columnname)

Kolomogorov-Smirnov

ks.test(dataset$columnname, dataset$columnname)

t-test

t.test(y ~ x, data = dataset)

Correlation

cor.test(y,x, data = dataset)

default is Pearsson product momentum, you can also define kendall, by adding method = "kendall")

Test for homogeneity of variances

fligner.test(y~x)

Simple linear model

modelname =lm(dataset$columnname~dataset$columnname)

After you have made your model you can use "summary()" #and "anova()" to get the results.

Complete model details

summary(modelname)

ANOVA Table

anova(modelname, type = "marginal") #"marginal" tests each factor independent of interactions

Some of the models have inherent plotting functions which generate plots to test if the model is valid.

plot(modelname)

#plotting the residuals of the model

plot(resid(modelname))

**#Mixed effect models**

Assuming normally distributed data

** 1. using the nlme package**

modelname = lme(y~x1*x2, data=dataset, random = ~ 1|r1, na.action=na.omit)

#if you want to nest one factor into another write ~1|r1/r2 , this would mean r2 is nested within r2. If you have repeated measures use them as if they were random factors.

**2. using the lme4 package**

this is more flexible as it allows for different data distribution families, but it does not give you p-values for the gaussian case. Notice that the formulation of the random effects is slightly different to the lme model.

modelname = lmer(y~x1*x2 + (1|r1), data=dataset, family=gaussian, na.action=na.omit)

with a binomial distribution, here the data should in a "two-column" format.

modelname = lmer(cbind(y1,y2)~x1+x2+(1|r1)+(1r2), data=dataset, family=binomial(link=logit))

Other families to use

binomial(link = "logit") gaussian(link = "identity") Gamma(link = "inverse") inverse.gaussian(link = "1/mu^2") poisson(link = "log") quasi(link = "identity", variance = "constant") quasibinomial(link = "logit") quasipoisson(link = "log")

**Posthoc tests on models**

load package multcomp

library(multcomp) m1 = glht(modelname, linfct=mcp(TheFactorYouAreInterestedIn="Tukey")) summary(m1)

**Fitting an equation to data**

model=nls(variable1~variable2^unknownParameter, data=dataset, start = list(UnknownParameter = educatedguess))

example: model=nls(time~raddd^n, data=Bob, start = list(n=1))

model

** **

**"Beautiful" plotting with ggplot2**

ggplot2 has a different approach compared to the "normal" R graphics, but I think you can control more and it also has nicer default settings.

This gives you an easy scatter plot

qplot(x-variable,y-variable, data=dataset)

scatter plot with a smooth function or a regression line

qplot(x-variable,y-variable, data=dataset)+geom_smooth()

qplot(x-variable,y-variable, data=dataset)+geom_smooth(method=”lm”)

For a lineplot use this:

qplot(x-variable,y-variable, data=dataset, geom=c("line"))

a combination of line and point

qplot(x-variable,y-variable, data=dataset, geom=c("line", "point"))

barchart

qplot(x-variable,y-variable, data=dataset, geom=c("bar"))

if you want to change the color or the shape of points,lines...

qplot(x-variable,y-variable, data=dataset, geom=c("line"), shape = variablename, colour=variablename)

if the colour result looks weird you should try colour = factor(variablename)

to change the labels

qplot(....) + scale_x_continuous("xlabelname") + scale_y_continuous("ylabelname")

in case your data on one axis is discrete use scale_x_discrete() instead.

how to get rid of the grey background in case the journal you want to submit to does not approve it.

qplot(x-variable,y-variable, data=dataset, geom=c("bar")) + theme_bw()

**The second approach to ggplot2**

using ggplot; this might be more tedious in the beginning, but it will pay off because you have more control over each element in the graph.

Define what data you have

p = ggplot(data=dataset, aes(x=xvariablenname, y=yvariablename))

normal plot

p + geom_point()

lineplot

p + geom_line()

barplot

p + geom_bar()

If you want to visualize different groups , you can just use

p + geom_bar(aes(fill=groupname))

if you want the bars to be next to each other use

p + geom_bar(position="dodge", aes(fill=groupname))

to get rid of the legend

p + theme(legend.position="none")

to change the title of the legend box

p + guides(colour = guide_legend(title = "title here"))

to rotate axis labels

p + theme(axis.text.x = element_text(angle=45, hjust=1.0))

to add a general title to the graph

p + labs(title = "New plot title")

how to limit the plotting area

p + coord_cartesian(xlim = c(-5000, 5000))

make an area graph (filled line graph)

ggplot(data, aes(x, y)) + geom_area(aes(fill = grouping.variable), position = "identity") + scale_fill_manual(value = alpha(c("green", "blue"), 0.4))

**error bars **

mean_se = function (x, ...) { x = na.omit(x) se = function(x)sqrt(var(x)/length(x)) data.frame(y=mean(x), ymin=mean(x)-se(x), ymax=mean(x)+se(x)) } p + stat_summary(fun.data = "mean_se", colour = "red")

**for more details visit the ggplot homepage**

http://docs.ggplot2.org/

of course like your website however you need to check the spelling on several of your posts. Many of them are rife with spelling problems and I find it very bothersome to tell the reality on the other hand I’ll certainly come again again.

This is an amazingly useful collection of R facts. Thanks for writing this post.

can’t put my finger where i heard this before but its still interesting

That is very interesting, You’re a very professional blogger. I’ve joined your feed and stay up for in quest of more of your magnificent post. Also, I have shared your website in my social networks