Use the neonDataStackR Package to Access NEON Data

Megan A. Jones
This tutorial goes over how to convert data downloaded from the NEON Data Portal in zipped month-by-site files into individual files with all data from the given site(s) and months. Temperature data are used as an example.

Download the Data

To start, you must have your data of interest downloaded from the NEON Data Portal.

The stacking function will only work on zipped Comma Seperated Value (.csv) files and not the NEON data stored in other formats (HDF5, etc).

Your data will download in a single zipped file.

The example data below are any single-aspirated air temperature available from 1 January 2015 to 31 December 2016.

neonDataStackR package

This package was written to stack data downloaded in month-by-site files into a full table with all the data of interest from all sites in the downloaded date range.

More information on the package see the README in the associated GitHub repo NEONScience/NEON-utilities.

First, we must install the package from the GitHub repo. You must have the devtools package installed and loaded to do this. Then load the package.

# install devtools - can skip if already installed

# load devtools

# load neonDataStackR from GitHub
install_github("NEONScience/NEON-utilities/neonDataStackR", dependencies=TRUE)

## Downloading GitHub repo NEONScience/NEON-utilities@master
## from URL

## Installing neonDataStackR

## '/Library/Frameworks/R.framework/Resources/bin/R' --no-site-file  \
##   --no-environ --no-save --no-restore --quiet CMD INSTALL  \
##   '/private/var/folders/0p/x8phw1_156511_jqkryx2t8m2vn2t3/T/RtmpPD2fan/devtools52c155f1e8a2/NEONScience-NEON-utilities-5872bcd/neonDataStackR'  \
##   --library='/Users/mjones01/Library/R/3.4/library' --install-tests


# load neonDataStackR
library (neonDataStackR)


There is a single function to run in this package stackByTable(). The output will yield data grouped into new files by table name. For example the single aspirated air temperature data product contains 1 minute and 30 minute interval data. The output from this function is one .csv with 1 minute data and one .csv with 30 minute data.

Depending on your file size this function may run for a while. The 2015 and 2016 single aspirated air temperature from two sites that I used for a 2017 workshop took about 25 minutes to complete.

To run the stackByTable() function, simply use the file path to the downloaded and zipped file.

# stack files - Mac OSX file path shown

Joining, by = c("domainID", "siteID", "horizontalPosition", "verticalPosition", 
"startDateTime", "endDateTime", "tempSingleMean", "tempSingleMinimum", 
"tempSingleMaximum", "tempSingleVariance", "tempSingleNumPts", "tempSingleExpUncert", 
"tempSingleStdErMean", "finalQF")
Joining, by = c("domainID", "siteID", "horizontalPosition", "verticalPosition", 
"startDateTime", "endDateTime", "tempSingleMean", "tempSingleMinimum", "tempSingleMaximum", 
"tempSingleVariance", "tempSingleNumPts", "tempSingleExpUncert", "tempSingleStdErMean", 

# Note that I've removed some of the "Joining" output for ease of reading

Finished: All of the data are stacked into  2  tables!
Copied the first available variable definition file to /stackedFiles and renamed as variables.csv
Stacked  SAAT_1min
Stacked  SAAT_30min

From the single-aspirated air temperature data we are given two final tables. One with 1 minute intervals: SAAT_1min and one for 30 minute intervals: SAAT_30min.

In the same directory as the zipped file, you should now have an unzipped directory of the same name. When you open this you will still see all the month-by-site files, however, you will also see a new directory called stackedFiles. This directory contains one or more .csv files (depends on the data product you are working with) with all the data from the months & sites you downloaded. There will also be a single copy of the associated varibles.csv and validation.csv files.

These .csv files are now ready for use with the program of your choice.

