The following code is required for proper rendering of the document. Please do not modify it. When working in the Rmd version of this file, do not attempt to run this chunk.
knitr::opts_chunk$set(error = TRUE)
Last updated on 2023-Feb-03.
Original text: Chad M. Eliason, PhD
Revisions: Nick M. A. Crouch, PhD; Lucas J. Legendre, PhD; and Carlos A. Rodriguez-Saltos, PhD
Exercises: Lucas J. Legendre, PhD
Rmarkdown implementation: Carlos A. Rodriguez-Saltos, PhD
Principal course instructor: Julia A. Clarke, PhD
These modules are part of the course “Curiosity to Question: Research Design, Data Analysis and Visualization”, taught by Dr. Julia A. Clarke and Dr. Adam Papendieck at UT Austin.
For questions or comments, please send an email to Dr. Clarke (julia_clarke@jsg.utexas.edu).
Eliason, C. M., Proffitt, J. V., Crouch, N. M. A., Legendre, L. J., Rodriguez-Saltos, C. A., Papendieck, A., & Clarke, J. A. (2020). The Clarke Lab’s Introductory Course to R. Retrieved from https://juliaclarke.squarespace.com
When using an RMarkdown file, the working directory will be the folder containing the Rmd file. The data will be stored in a separate folder, the “data” folder. It is good practice to place your unmodified data in a folder by their own. You will store scripts, documents, and results in other folders.
For today’s class, download all the datasets available on Canvas and place them in the data folder.
We will import data.txt
, which contains a rectangular
matrix written in plain text (ASCII) format. Data files such as these
can be exported from Excel or a database program.
The easiest way to define your working directory in
RStudio (so that you don’t need to redefine it later on) is to go
to:
Session > Set Working Directory > Choose Directory…
and choose the folder that contains both your script
and your data.
dat<-read.table("data.txt")
## Warning in file(file, "rt"): cannot open file 'data.txt': No such file or
## directory
## Error in file(file, "rt"): cannot open the connection
If you have a very large dataset, you can use head()
to
visualize just the first six lines rather than the whole data frame.
head(dat)
## Error in head(dat): object 'dat' not found
Sometimes, data files have columns that are separated by columns.
Files written in this format usually end in .csv. To open these files,
we use read.csv
.
flowers <- read.csv(file = "iris.csv")
## Warning in file(file, "rt"): cannot open file 'iris.csv': No such file or
## directory
## Error in file(file, "rt"): cannot open the connection
head(flowers)
## Error in head(flowers): object 'flowers' not found
names(flowers)
## Error in eval(expr, envir, enclos): object 'flowers' not found
# Summary of dataset
summary(flowers)
## Error in summary(flowers): object 'flowers' not found
# Structure of dataset and of each column
str(flowers)
## Error in str(flowers): object 'flowers' not found
# Class of a column
class(flowers$Sepal.Width)
## Error in eval(expr, envir, enclos): object 'flowers' not found
If you check the iris.csv
file, you will see that the
first row is the header containing the names of the variables.
It is advisible to always include a header in your data
files. Optionally, the first column in the file may contain the
row names (labels).
read.delim
allows you to use delimited files, such as
Tab-delimited ones. Check the help file of read.table
to
find out more.
You can also import Excel data into R. For that, you need to install
and load the gdal
package. The function for reading Excel
files is read.xls
.
By using functions from other R packages you can import a huge variety of data files. However, the most common files you will probably deal with are .txt and .csv.
This tutorial has more info on importing data files in R: https://www.datacamp.com/community/tutorials/r-tutorial-read-excel-into-r
You can launch a dialog box to ask the user to pick a file. You
launch the box using file.choose
.
flowers <- read.csv(file = file.choose())
## Error in file.choose(): file choice cancelled
This method, however, is not recommended because it cannot be automated, and therefore, it may present difficulties when other researchers (or yourself in the future) want to replicate your results.
use the functions head
, names
,
class
, and summary
to figure out what is wrong
with the format of the flowers
dataset (hint: there are at
least 3 things that are wrong).
Here is a corrected version of the file.
flowers <- read.csv(file = "iris_good.csv")
## Warning in file(file, "rt"): cannot open file 'iris_good.csv': No such file or
## directory
## Error in file(file, "rt"): cannot open the connection
head(flowers)
## Error in head(flowers): object 'flowers' not found
summary(flowers)
## Error in summary(flowers): object 'flowers' not found
class(flowers)
## Error in eval(expr, envir, enclos): object 'flowers' not found
For the rest of the module, we will work with this corrected version of the dataset.
We will install a package with many custom color palettes to choose from.
install.packages("RColorBrewer")
## Error in contrib.url(repos, "source"): trying to use CRAN without setting a mirror
library(RColorBrewer)
Here is a sample of the color palettes available.
display.brewer.all()
For this document we will also need ggplot2
, so install
it if you haven’t already:
install.packages("ggplot2")
## Error in contrib.url(repos, "source"): trying to use CRAN without setting a mirror
library(ggplot2)
The following are functions commonly used in R. If you are unsure what any of them does, search for its help file. For example: ?length
length()
rev()
sum(), cumsum(), prod(), cumprod()
mean(), sd(), var(), median()
min(), max(), range(), summary()
exp(), log(), sin(), cos(), tan() ## radians, not degrees
round(), ceiling(), floor(), signif()
sort(), order(), rank()
which(), which.max()
any(), all()
We can apply these functions to a given variable in a dataset.
mean(flowers$Sepal.Length)
## Error in mean(flowers$Sepal.Length): object 'flowers' not found
sd(flowers$Sepal.Length)
## Error in is.data.frame(x): object 'flowers' not found
range(flowers$Sepal.Length)
## Error in eval(expr, envir, enclos): object 'flowers' not found
These functions don’t like NAs: you have to specify how they should be handled.
age <- c(32, 25, NA, 52)
# The following line does not produce meaningful output
mean(age)
## [1] NA
# The following does
mean(age, na.rm = TRUE)
## [1] 36.33333
You can use apply
to execute a function over each column
in a dataset. In the following example, we will do so with the
mean
function. Note, however, that this function requires
numeric data; therefore, we need to exclude some columns from the
dataset when we send it to apply
.
# Species column contains characters. Note the use of the minus (-) sign to
# exclude this column when using subset.
apply(subset(flowers, select = -Species), MARGIN = 2, mean)
## Error in subset(flowers, select = -Species): object 'flowers' not found
We can also use the aggregate function.
?aggregate
aggregate(Sepal.Length ~ Species, data = flowers, FUN = mean)
## Error in eval(m$data, parent.frame()): object 'flowers' not found
‘.’ in a formula, such as the one below, means “all variables”.
aggregate(. ~ Species, data = flowers, FUN = mean)
## Error in eval(m$data, parent.frame()): object 'flowers' not found
aggregate(. ~ Species, data = flowers, FUN = sd)
## Error in eval(m$data, parent.frame()): object 'flowers' not found
There are four types of graphic functions in R (some of which we have already encountered in module 1):
High level plotting functions, which create complete plots. Examples: plot(), hist(), barplot(), boxplot(), qqnorm(), qqplot(), pairs().
Low level plotting functions, which add features to a plot. Examples: lines(), points(), text(), mtext(), abline(), qqline(), title().
Interactive plotting functions.
par(), which changes plot settings.
Histrograms display the distribution of your records along a continuous variable. To make a histogram, use hist() on a numeric, continuous vector:
hist(flowers$Sepal.Length)
## Error in hist(flowers$Sepal.Length): object 'flowers' not found
We can specify the approximate number of bins using the
breaks
argument.
hist(flowers$Sepal.Length, breaks=5)
## Error in hist(flowers$Sepal.Length, breaks = 5): object 'flowers' not found
With the ggplot2
package, you can get a similar result
using qplot():
library(ggplot2)
qplot(flowers$Sepal.Length)
## Warning: `qplot()` was deprecated in ggplot2 3.4.0.
## Error in eval_tidy(mapping$x, data, caller_env): object 'flowers' not found
If the vector is in a data frame, you can use the following syntax:
qplot(Sepal.Length, data=flowers, binwidth=.25)
## Error in eval_tidy(mapping$x, data, caller_env): object 'flowers' not found
Alternatively, we can use the function ggplot
. This
function allows greater flexibility when modifying the plot, which is
done by “summing” plotting functions.
ggplot(flowers, aes(x=Sepal.Length)) +
geom_histogram(binwidth=.25, colour = "black")
## Error in ggplot(flowers, aes(x = Sepal.Length)): object 'flowers' not found
Here is a way to color the data by species:
ggplot(flowers, aes(x=Sepal.Length, fill=Species)) +
geom_histogram(binwidth=.25, alpha=.75, colour = "black")
## Error in ggplot(flowers, aes(x = Sepal.Length, fill = Species)): object 'flowers' not found
A scatter plot shows the distribution of your records along two continuous variables. Use plot() on a vector of x values followed by a vector of y values:
names(flowers)
## Error in eval(expr, envir, enclos): object 'flowers' not found
plot(Sepal.Length ~ Sepal.Width, data = flowers)
## Error in eval(m$data, eframe): object 'flowers' not found
With qplot():
qplot(flowers$Sepal.Width, flowers$Sepal.Length)
## Error in `geom_point()`:
## ! Problem while computing aesthetics.
## ℹ Error occurred in the 1st layer.
## Caused by error in `FUN()`:
## ! object 'flowers' not found
# Same output, but using the `data` argument
qplot(Sepal.Width, Sepal.Length, data = flowers)
## Error in ggplot(data, mapping, environment = caller_env): object 'flowers' not found
With ggplot():
ggplot(flowers, aes(x=Sepal.Width, y=Sepal.Length)) + geom_point()
## Error in ggplot(flowers, aes(x = Sepal.Width, y = Sepal.Length)): object 'flowers' not found
Making line graphs with plot() is similar to making scatterplots, but
you set type
to “l”.
?pressure
plot(pressure$temperature, pressure$pressure, type="l")
To add points and/or multiple lines, use the functions points() and lines(). Those functions are not standalone, plot() must be called first (ie. each line in the following chunk will not run by itself, the whole chunk has to be called at once for it work).
plot(pressure$temperature, pressure$pressure, type="l")
points(pressure$temperature, pressure$pressure)
lines(pressure$temperature, pressure$pressure/2, col="red")
points(pressure$temperature, pressure$pressure/2, col="red")
With ggplot2, you can draw a line graph using qplot() with geom=“line”:
qplot(pressure$temperature, pressure$pressure, geom="line")
If the two vectors are already in the same data frame:
qplot(temperature, pressure, data=pressure, geom="line")
Which is equivalent to:
ggplot(pressure, aes(x=temperature, y=pressure)) + geom_line()
Lines and points together:
qplot(temperature, pressure, data=pressure, geom=c("line", "point"))
Which is equivalent to:
ggplot(pressure, aes(x=temperature, y=pressure)) +
geom_line() +
geom_point()
Or a plot with both lines.
ggplot(pressure) +
geom_line(aes(x=temperature, y=pressure)) +
geom_point(aes(x=temperature, y=pressure)) +
geom_line(aes(x=temperature, y=(pressure/2), colour = 'red')) +
geom_point(aes(x=temperature, y=pressure/2, colour = 'red', fill = 'red')) +
theme(legend.position = 0)
Check the Canvas supplement on resources for learning R; included there are cheat sheets on the various options available in ggplot2. The R graph gallery is strongly recommended: https://www.r-graph-gallery.com/.
Bar graphs represent the relationship between a continuous variable, plotted int the y-axis, and a categorical one, plotted in the x axis.
?BOD
barplot(BOD$demand, names.arg=BOD$Time)
Sometimes we want a bar graph to represent the number of cases in each level of a categorical variable. In this sense, the barplot is similar to a histogram, but in the latter the x-axis is continuous and the y-axis represents frequency, not counts.
Let’s say that in the mtcars
dataset, we want to know
the number of cars for each category of number of cylinders. This
information is not given explicitly in the dataset, but we can generate
it using the table
function.
table(mtcars$cyl)
##
## 4 6 8
## 11 7 14
There are 11 cases of the value 4, 7 cases of 6, and 14 cases of 8. Simply pass the table to barplot() to generate a bar graph.
barplot(table(mtcars$cyl))
With the ggplot2
package, you can get a similar result
using qplot(). If you generate a bar graph with information that is
explicitly in the dataset, use geom=“bar” and stat=“identity”. Notice
the difference in the output when the x variable is continuous and when
it is discrete:
ggplot(data=BOD, aes(Time, demand)) +
geom_bar(stat="identity")
# Converting a numeric variable to a factor results in it being treated as a
# discrete variable
ggplot(data=BOD, aes(factor(Time), demand)) +
geom_bar(stat="identity")
When you want to generate counts that are not included in the dataset:
qplot(factor(cyl), data=mtcars)
Which is equivalent to:
ggplot(mtcars, aes(x=factor(cyl))) + geom_bar()
Boxplots also allow you to explore the relationship between a categorical variable and a continuous one; in addition, they allow you to see, within each category, the distribution of several statistics, such as the mean, median, mininum, maximum, and some quartiles. To make a box plot, use plot() on a factor of x values and a numeric vector of y values.
plot(as.factor(flowers$Species), flowers$Sepal.Length)
## Error in is.factor(x): object 'flowers' not found
If the two vectors are in the same data frame, you can also use a formula.
boxplot(Sepal.Length ~ Species, data = flowers)
## Error in eval(m$data, parent.frame()): object 'flowers' not found
Check the help file of boxplot
to see what the boxes and
lines mean.
You can plot the interaction of two variables.
?ToothGrowth
table(ToothGrowth$dose)
##
## 0.5 1 2
## 20 20 20
boxplot(len ~ supp + dose, data = ToothGrowth)
With the ggplot2 package, you can get a similar result using qplot(), with geom=“boxplot”:
qplot(flowers$Species, flowers$Sepal.Length, geom="boxplot")
## Error in `geom_boxplot()`:
## ! Problem while computing aesthetics.
## ℹ Error occurred in the 1st layer.
## Caused by error in `FUN()`:
## ! object 'flowers' not found
If the two vectors are already in the same data frame, you can use the following syntax:
qplot(Species, Sepal.Length, data=flowers, geom="boxplot")
## Error in ggplot(data, mapping, environment = caller_env): object 'flowers' not found
Which is equivalent to:
ggplot(flowers, aes(x=Species, y=Sepal.Length)) + geom_boxplot()
## Error in ggplot(flowers, aes(x = Species, y = Sepal.Length)): object 'flowers' not found
It’s also possible to make box plots for multiple variables, by
combining the variables using the function interaction(). In this case,
the dose
variable is numeric. interaction() converts it to
a factor before combining it with another variable.
qplot(
interaction(ToothGrowth$supp, ToothGrowth$dose),
ToothGrowth$len,
geom="boxplot"
)
Alternatively, when the variables are in the same data frame.
qplot(interaction(supp, dose), len, data=ToothGrowth, geom="boxplot")
Which is equivalent to:
ggplot(ToothGrowth, aes(x=interaction(supp, dose), y=len)) + geom_boxplot()
To plot a function, use curve() on an expression using the object x, which does not need to be an object in your environment.
curve(x ^ 3 - 5 * x, from= -4, to= 4)
curve(x ^ 2, from= -10, to =10)
You can plot any function that takes a numeric vector as input and returns another numeric vector, including functions that you define yourself. Using add=TRUE will add a curve to the previously created plot.
myfun <- function(xvar) {
1/(1 + exp(-xvar + 10))
}
curve(myfun(x), from= 0, to= 20)
# Adding a line to plot defined in previous line of code
curve(1 - myfun(x), add = TRUE, col = "red")
With the ggplot2 package, you can get a similar result using
ggplot()
, by using stat_function “fun = myfunctionname” and
geom=“line”.
ggplot(data.frame(x=c(0, 20)), aes(x=x)) +
stat_function(fun=myfun, geom="line")
We will work with the volcano
dataset.
?volcano
data(volcano)
head(volcano)
## [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12] [,13] [,14]
## [1,] 100 100 101 101 101 101 101 100 100 100 101 101 102 102
## [2,] 101 101 102 102 102 102 102 101 101 101 102 102 103 103
## [3,] 102 102 103 103 103 103 103 102 102 102 103 103 104 104
## [4,] 103 103 104 104 104 104 104 103 103 103 103 104 104 104
## [5,] 104 104 105 105 105 105 105 104 104 103 104 104 105 105
## [6,] 105 105 105 106 106 106 106 105 105 104 104 105 105 106
## [,15] [,16] [,17] [,18] [,19] [,20] [,21] [,22] [,23] [,24] [,25] [,26]
## [1,] 102 102 103 104 103 102 101 101 102 103 104 104
## [2,] 103 103 104 105 104 103 102 102 103 105 106 106
## [3,] 104 104 105 106 105 104 104 105 106 107 108 110
## [4,] 105 105 106 107 106 106 106 107 108 110 111 114
## [5,] 105 106 107 108 108 108 109 110 112 114 115 118
## [6,] 106 107 109 110 110 112 113 115 116 118 119 121
## [,27] [,28] [,29] [,30] [,31] [,32] [,33] [,34] [,35] [,36] [,37] [,38]
## [1,] 105 107 107 107 108 108 110 110 110 110 110 110
## [2,] 107 109 110 110 110 110 111 112 113 114 116 115
## [3,] 111 113 114 115 114 115 116 118 119 119 121 121
## [4,] 117 118 117 119 120 121 122 124 125 126 127 127
## [5,] 121 122 121 123 128 131 129 130 131 131 132 132
## [6,] 124 126 126 129 134 137 137 136 136 135 136 136
## [,39] [,40] [,41] [,42] [,43] [,44] [,45] [,46] [,47] [,48] [,49] [,50]
## [1,] 110 110 108 108 108 107 107 108 108 108 108 108
## [2,] 114 112 110 110 110 109 108 109 109 109 109 108
## [3,] 120 118 116 114 112 111 110 110 110 110 109 109
## [4,] 126 124 122 120 117 116 113 111 110 110 110 109
## [5,] 131 130 128 126 122 119 115 114 112 110 110 110
## [6,] 136 135 133 129 126 122 118 116 115 113 111 110
## [,51] [,52] [,53] [,54] [,55] [,56] [,57] [,58] [,59] [,60] [,61]
## [1,] 107 107 107 107 106 106 105 105 104 104 103
## [2,] 108 108 108 107 107 106 106 105 105 104 104
## [3,] 109 109 108 108 107 107 106 106 105 105 104
## [4,] 109 109 109 108 108 107 107 106 106 105 105
## [5,] 110 110 109 109 108 107 107 107 106 106 105
## [6,] 110 110 110 109 108 108 108 107 107 106 106
We can represent the information in volcano
with a
heatmap.
image(volcano)
or with a contour plot.
contour(volcano)
We can also combine layers into a plot.
image(volcano, col = terrain.colors(50))
contour(volcano, add = TRUE, lwd = .5)
We can also produce a 3D perspective plot.
persp(volcano)
We can change the angle of the perspective using the
theta
and phi
arguments. Check the help file
of persp
for more information.
persp(volcano, theta=20)
persp(volcano, theta=20, phi = 35)
The website R Graph Gallery contains useful instructions on reproducing 3D plots, including information on how to make them with ggplot2.
The locator
function allows you to find values in a
plot. The function, however, won’t work if you run it inside a
chunk. Copy the entire code in the chunk and paste it to the
console. The plot will appear in the lower right pane. Click on regions
of interest to find the corresponding values. When you are done, hit
Esc
.
plot(Sepal.Length ~ Sepal.Width, data = flowers)
## Error in eval(m$data, eframe): object 'flowers' not found
xy <- locator()
## Error in locator(): plot.new has not been called yet
xy
## Error in eval(expr, envir, enclos): object 'xy' not found
Another function that might interest you is identify(). Check its help file.
Take into account that interactive functions get very slow with large datasets.
The package plotly
can help you with implementing
advanced interactivity. If you are interested, check their website: https://plot.ly/r/
You can generate pretty much any plot using low-level functions, but it can be time-consuming. For example, here we will show you how to generate a cool thermometer display.
First, generate the x values.
x <- 1:2
Then, generate random temperature measurements.
y <- runif(2, 0, 100)
Now we will generate the plot. Because we will use low-level plotting functions, we require an open plot (see “Types of graphic functions” in this document). Thus, we need to run everything in a single chunk. Read the comments for information on what each function does.
# Generate an empty plot
par(mar=c(4,4,2,4))
plot(x, y, type='n', xlim=c(.5, 2.5), ylim=c(-10, 110), axes=FALSE, ann=FALSE)
# Create an axis for C scale
axis(2, at=seq(0, 100, 20))
mtext("Temp (C)", side=2, line=3)
# Create a treatment (x) axis
axis(1, at=1:2, labels=c("Trt 1", "trt 2"))
# Create a secondary y-axis for F scale
axis(4, at=seq(0, 100, 20), labels=seq(0, 100, 20)*9/5 + 32)
mtext("temp F", side=4, line=3)
# Add a box around the plot
box()
# Plot the temperature measurements
segments(x, 0, x, 100, lwd=20, col="dark grey")
segments(x, 0, x, 100, lwd=16, col="white")
segments(x, 0, x, y, lwd=16, col="light pink")
Alternatively, you can display the thermometer in two plots side by
side. Note the mfrow
argument inside par()
.
Check the corresponding help file to read how to use it.
# Generate an empty plot
par(
mfrow= c(1,2),
mar= c(4,4,2,4)
)
## Celsius
plot(1, y[1], type='n',
xlim=c(.5, 1.5), ylim=c(-10, 110), ylab= "Temp (C)",
axes=FALSE, ann=FALSE)
axis(2, at=seq(0, 100, 20))
axis(1, at=1, labels= "Trt 1")
box()
segments(1, 0, 1, 100, lwd=20, col="dark grey")
segments(1, 0, 1, 100, lwd=16, col="white")
segments(1, 0, 1, y[1], lwd=16, col="light pink")
## Fahrenheit
plot(1, y[2], type='n',
xlim=c(.5, 1.5), ylim=c(-10, 110), ylab= "temp F",
axes=FALSE, ann=FALSE)
axis(1, at=1, labels= "trt 2")
axis(2, at=seq(0, 100, 20), labels=seq(0, 100, 20)*9/5 + 32)
box()
segments(1, 0, 1, 100, lwd=20, col="dark grey")
segments(1, 0, 1, 100, lwd=16, col="white")
segments(1, 0, 1, y[2], lwd=16, col="light pink")
OK, now let’s create a publication-ready plot.
This is our original plot:
plot(Petal.Length ~ Petal.Width, data=flowers)
## Error in eval(m$data, eframe): object 'flowers' not found
We can use par() to change plot settings. Note: these changes will be applied to all plots within a chunk. For example, to change margins
par(mar=c(4,4,2,2))
plot(Petal.Length~Petal.Width, data=flowers)
## Error in eval(m$data, eframe): object 'flowers' not found
To change the typeface to Times or Times New Roman (depending on your operating system):
par(family="serif")
plot(Petal.Length ~ Petal.Width, data= flowers)
## Error in eval(m$data, eframe): object 'flowers' not found
To modify the axis labels.
plot(
Petal.Length ~ Petal.Width,
data= flowers,
xlab= "Petal width (mm)",
ylab= "Petal length (mm)"
)
## Error in eval(m$data, eframe): object 'flowers' not found
To adjust axis label orientation.
plot(
Petal.Length ~ Petal.Width,
data=flowers,
xlab="Petal width (mm)",
ylab="Petal length (mm)",
las = 1
)
## Error in eval(m$data, eframe): object 'flowers' not found
To adjust point type (pch) and size (cex).
plot(
Petal.Length ~ Petal.Width,
data= flowers,
xlab= "Petal width (mm)",
ylab= "Petal length (mm)",
las = 1,
pch = 21,
cex = 1.5
)
## Error in eval(m$data, eframe): object 'flowers' not found
We will create a palette for our plot. We will use it color each
species. Given that there are three species, we will select 3 colors
from a palette from RColorBrewer
.
library(RColorBrewer)
## custom 3-color palette from the "Set2" base palette in RColorBrewer
pal <- brewer.pal(3, "Set2")
pal
## [1] "#66C2A5" "#FC8D62" "#8DA0CB"
To look at the colors:
barplot(c(1,1,1), col = pal)
We can select elements of the color vector using indexing (numbers inside brackets).
barplot(c(1,1,1,1), col = pal[c(1,1,1,3)])
We can assign a color to each level of a categorical variable, such
as species
in flowers
. The categorical
variable can be used as an indexing vector, but only if it is coded as a
factor. The reason is that R uses a numeric vector to code the levels of
a factor. For example, see the structure of the variable
species
:
str(flowers$Species)
## Error in str(flowers$Species): object 'flowers' not found
We will now assign a color to each observation in
flowers
, according to its species
.
# Species must be a factor for the code to work
flowers$Species <- factor(flowers$Species)
## Error in factor(flowers$Species): object 'flowers' not found
species.cols <- pal[flowers$Species]
## Error in eval(expr, envir, enclos): object 'flowers' not found
Now, we will repeat our scatter plot showing petal length versus width. This time, though, we will color the dots according to species.
plot(
Petal.Length ~ Petal.Width,
data= flowers,
xlab= "Petal width (mm)",
ylab= "Petal length (mm)",
las = 1,
pch = 21,
cex = 1.5,
bg= species.cols, # Filling color
col= "black" # Outline color
)
## Error in eval(m$data, eframe): object 'flowers' not found
You can use the export button in the Plots pane of RStudio. But to do that, you need to recreate the plot in the console (copy code from chunk to console). When you export in this way, choose PDF, because it contains vector images, which are easier to edit later in image editing software (eg. Adobe Illustrator, Inkscape, PowerPoint).
Alternatively, you can use code to export your image. We will show an example, in which we export a plot to a PDF. We need the functions pdf() and dev.off(). In the former, we can specify attributes of the file such as its name and the size of the exported plot (default is in inches). Any code generating and modifying the plot should be written between those two functions. All the code needs to be in the same chunk. Check the example below.
# Generate the PDF and open it for exporting a plot
pdf(file = "flowerplot.pdf", width = 6, height = 6)
# Change the margins of the graphic device (the PDF)
par(mar=c(4,4,2,2), family="Times")
# Generate the plot
plot(
Petal.Length ~ Petal.Width,
data= flowers,
xlab= "Petal width (mm)",
ylab= "Petal length (mm)",
las = 1,
pch = 21,
cex = 1.5,
bg= species.cols, # Filling color
col= "black" # Outline color
)
## Error in eval(m$data, eframe): object 'flowers' not found
# Close the PDF
dev.off()
## quartz_off_screen
## 2
Check your working directory, after running the above code you should
see the plot in a flowerplot.pdf
file.
You need to do only one of the exercises.
Load the dataset sleep
from the package
datasets
(take some time to look at it). Then, rearrange
the data so that the values of the extra
variable are in
two columns, one for each drug. The columns should be labeled
drug1
and drug2
. The rearranged dataset, which
should be a data frame, should not contain the group
variable.
Using the barplot
function, make two bar plots of
the increase in hours of sleep, one for drug1
and another
one for drug2
. Plot them side by side. The plots should
have different colors. In each one, label the y-axis with the variable
name.
Generate the same barplots from the previous chunk, and in the
same arrangement (side by side), but using functions from
ggplot2
. Tip: Use the function
grid.arrange
from package gridExtra
to have
both plots side by side. Check this link from the R Graph Gallery to
learn how to use it: https://www.r-graph-gallery.com/261-multiple-graphs-on-same-page.html
Load the dataset VADeaths
.
Make a bar graph of death rate vs. age group. Within each bar, the population groups should be stacked. Label the y-axis, and add a legend of those four population groups at the top left of the plot
Load the crabs
dataset.
Using the function ggplot
, make a scatter plot of FL
~ CL, with points colors matching species colors (orange and blue).
Hint: use the function
scale_color_manual
.
Add dashed regression lines for each species, using
geom_smooth
. Your code below should reproduce the entire
plot with the added line.
Write code to save your plot as “crabsplot.pdf”.
Make a boxplot of BD with ggplot
. The boxes should
be sorted by sex. The box colors should match species colors (orange and
blue).