This tutorial will demonstrate how to import a time series dataset stored in
format into R. It will explore data classes for columns in a
will walk through how to
convert a date, stored as a character string, into a date class that R can
recognize and plot efficiently.
After completing this tutorial, you will be able to:
- Open a
.csvfile in R using
read.csv()and understand why we are using that file type.
- Work with data stored in different columns within a
- Examine R object structures and data
- Convert dates, stored as a character class, into an R date class.
- Create a quick plot of a time-series dataset using
Things You’ll Need To Complete This Tutorial
You will need the most current version of R and, preferably, RStudio loaded on your computer to complete this tutorial.
Install R Packages
More on Packages in R – Adapted from Software Carpentry.
The data used in this lesson were collected at the National Ecological Observatory Network's Harvard Forest field site. These data are proxy data for what will be available for 30 years on the NEON data portal for the Harvard Forest and other field sites located across the United States.
Set Working Directory: This lesson assumes that you have set your working directory to the location of the downloaded and unzipped data subsets.
R Script & Challenge Code: NEON data lessons often contain challenges that reinforce learned skills. If available, the code for challenge solutions is found in the downloadable R script of the entire lesson, available in the footer of each lesson page.
Data Related to Phenology
In this tutorial, we will explore atmospheric data (including temperature, precipitation and other metrics) collected by sensors mounted on a flux tower at the NEON Harvard Forest field site. We are interested in exploring changes in temperature, precipitation, Photosynthetically Active Radiation (PAR) and day length throughout the year -- metrics that impact changes in the timing of plant phenophases (phenology).
About .csv Format
The data that we will use is in
.csv (comma-separated values) file format. The
.csv format is a plain text format, where each value in the dataset is
separate by a comma and each "row" in the dataset is separated by a line break.
Plain text formats are ideal for working both across platforms (Mac, PC, LINUX,
etc) and also can be read by many different tools. The plain text
format is also less likely to become obsolete over time.
Import the Data
To begin, let's import the data into R. We can use base R functionality
to import a
.csv file. We will use the
ggplot2 package to plot our data.
# Load packages required for entire script. # library(PackageName) # purpose of package library(ggplot2) # efficient, pretty plotting - required for qplot function # set working directory to ensure R can find the file we wish to import # provide the location for where you've unzipped the lesson data wd <- "~/Documents/"
Data Tip: Good coding practice -- install and load all libraries at top of script. If you decide you need another package later on in the script, return to this area and add it. That way, with a glance, you can see all packages used in a given script.
Once our working directory is set, we can import the file using
# Load csv file of daily meteorological data from Harvard Forest harMet.daily <- read.csv( file=paste0(wd,"NEON-DS-Met-Time-Series/HARV/FisherTower-Met/hf001-06-daily-m.csv"), stringsAsFactors = FALSE ) ## Warning in file(file, "rt"): cannot open file '/Users/olearyd/Documents/ ## NEON-DS-Met-Time-Series/HARV/FisherTower-Met/hf001-06-daily-m.csv': No ## such file or directory ## Error in file(file, "rt"): cannot open the connection
When reading in files we most often use
stringsAsFactors = FALSE. This
setting ensures that non-numeric data (strings) are not converted to
What Is A Factor?
A factor is similar to a category. However factors can be numerically interpreted (they can have an order) and may have a level associated with them.
Examples of factors:
- Month Names (an ordinal variable): Month names are non-numerical but we know that April (month 4) comes after March (month 3) and each could be represented by a number (4 & 3).
- 1 and 2s to represent male and female sex (a nominal variable): Numerical interpretation of non-numerical data but no order to the levels.
Data Tip: Read more about factors here.
After loading the data it is easy to convert any field that should be a factor by
as.factor(). Therefore it is often best to read in a file with
stringsAsFactors = FALSE.
Data.Frames in R
read.csv() imports our
.csv into a
data.frame object in R.
are ideal for working with tabular data - they are similar to a spreadsheet.
# what type of R object is our imported data? class(harMet.daily) ## Error in eval(expr, envir, enclos): object 'harMet.daily' not found
Once the data are imported, we can explore their structure. There are several ways to examine the structure of a data frame:
head(): shows us the first 6 rows of the data (
tail()shows the last 6 rows).
str(): displays the structure of the data as R interprets it.
Let's use both to explore our data.
# view first 6 rows of the dataframe head(harMet.daily) ## Error in h(simpleError(msg, call)): error in evaluating the argument 'x' in selecting a method for function 'head': object 'harMet.daily' not found # View the structure (str) of the data str(harMet.daily) ## Error in str(harMet.daily): object 'harMet.daily' not found
Data Tip: You can adjust the number of rows
returned when using the
tail() functions. For example you can use
head(harMet.daily, 10) to display the first 10 rows of your data rather than 6.
Classes in R
The structure results above let us know that the attributes in our
are stored as several different data types or
classes as follows:
- chr - Character: It holds strings that are composed of letters and words. Character class data can not be interpreted numerically - that is to say we can not perform math on these values even if they contain only numbers.
- int - Integer: It holds numbers that are whole integers without decimals. Mathematical operations can be performed on integers.
- num - Numeric: It accepts data that are a wide variety of numeric formats including decimals (floating point values) and integers. Numeric also accept larger numbers than int will.
Storing variables using different
classes is a strategic decision by R (and
other programming languages) that optimizes processing and storage. It allows:
- data to be processed more quickly & efficiently.
- the program (R) to minimize the storage size.
Differences Between Classes
Certain functions can be performed on certain data classes and not on others.
a <- "mouse" b <- "sparrow" class(a) ##  "character" class(b) ##  "character" # subtract a-b a-b ## Error in a - b: non-numeric argument to binary operator
You can not subtract two character values given they are not numbers.
c <- 2 d <- 1 class(c) ##  "numeric" class(d) ##  "numeric" # subtract a-b c-d ##  1
Additionally, performing summary statistics and other calculations of different types of classes can yield different results.
# create a new object speciesObserved <- c("speciesb","speciesc","speciesa") speciesObserved ##  "speciesb" "speciesc" "speciesa" # determine the class class(speciesObserved) ##  "character" # calculate the minimum min(speciesObserved) ##  "speciesa" # create numeric object prec <- c(1,2,5,3,6) # view class class(prec) ##  "numeric" # calculate min value min(prec) ##  1
We can calculate the minimum value for
SpeciesObserved, a character
data class, however it does not return a quantitative minimum. It simply
looks for the first element, using alphabetical (rather than numeric) order.
Yet, we can calculate the quantitative minimum value for
prec a numeric
Plot Data Using qplot()
Now that we've got classes down, let's plot one of the metrics in our data,
air temperature --
airt. Given this is a time series dataset, we want to plot
air temperature as it changes over time. We have a date-time column,
let's use that as our x-axis variable and
airt as our y-axis variable.
We will use the
qplot() (for quick plot) function in the
The syntax for
qplot() requires the x- and y-axis variables and then the R
object that the variables are stored in.
Data Tip: Add a title to the plot using
# quickly plot air temperature qplot(x=date, y=airt, data=harMet.daily, main="Daily Air Temperature\nNEON Harvard Forest Field Site") ## Error in ggplot(data, mapping, environment = caller_env): object 'harMet.daily' not found
We have successfully plotted some data. However, what is happening on the x-axis?
R is trying to plot EVERY date value in our data, on the x-axis. This makes it hard to read. Why? Let's have a look at the class of the x-axis variable - date.
# View data class for each column that we wish to plot class(harMet.daily$date) ## Error in eval(expr, envir, enclos): object 'harMet.daily' not found class(harMet.daily$airt) ## Error in eval(expr, envir, enclos): object 'harMet.daily' not found
In this case, the
date column is stored in our
data.frame as a character
class. Because it is a character, R does not know how to plot the dates as a
continuous variable. Instead it tries to plot every date value as a text string.
airt data class is numeric so that metric plots just fine.
Date as a Date-Time Class
We need to convert our
date column, which is currently stored as a character
date-time class that can be displayed as a continuous variable. Lucky
for us, R has a
date class. We can convert the
date field to a
# convert column to date class harMet.daily$date <- as.Date(harMet.daily$date) ## Error in as.Date(harMet.daily$date): object 'harMet.daily' not found # view R class of data class(harMet.daily$date) ## Error in eval(expr, envir, enclos): object 'harMet.daily' not found # view results head(harMet.daily$date) ## Error in h(simpleError(msg, call)): error in evaluating the argument 'x' in selecting a method for function 'head': object 'harMet.daily' not found
Now that we have adjusted the date, let's plot again. Notice that it plots
much more quickly now that R recognizes
date as a date class. R can
aggregate ticks on the x-axis by year instead of trying to plot every day!
# quickly plot the data and include a title using main="" # In title string we can use '\n' to force the string to break onto a new line qplot(x=date,y=airt, data=harMet.daily, main="Daily Air Temperature w/ Date Assigned\nNEON Harvard Forest Field Site") ## Error in ggplot(data, mapping, environment = caller_env): object 'harMet.daily' not found
Challenge: Using ggplot2's qplot function
- Create a quick plot of the precipitation. Use the full time frame of data available
- Do precipitation and air temperature have similar annual patterns?
- Create a quick plot examining the relationship between air temperature and precipitation.
Hint: you can modify the X and Y axis labels using
xlab="label text" and
## Error in ggplot(data, mapping, environment = caller_env): object 'harMet.daily' not found ## Error in ggplot(data, mapping, environment = caller_env): object 'harMet.daily' not found
Get Lesson Code
If you have questions or comments on this content, please contact us.Contact Us