Chapter 4 Data Frames
4.1 Tutorial Video
R Data Frames
4.2 Reference
Data Frames
Data frames in R language are the type of data structure that is used to store data in a tabular form, which are the most useful for data analysis. You can think of a data frame as R’s equivalent to the Excel spreadsheet.
There are some restrictions and characteristics of the data frame, e.g.,
- The column name is (usually) required
- The number of items in each column should be the same
There are different ways to make a data frame, here I only listed a few:
Making data frames by assembling atomic vectors
A data frame is used for storing data tables. It is a list of vectors of equal length. Remember in our last Chapter, we’ve already created some atomic vectors? Now it is time to put them into better use, i.e., to make a data frame. Let’s make a variable people which is a data frame, which contains four atomic vectors names, males, weights and heights.
and make the people data frame is as easy as by using data.frame
function
people <- data.frame(ourNames = c("Tom", "Jack", "Rose","Kate","Betty","Dong","Tony"), ourWeights = c(80, 85, 60,65,70,71,73), ourHeights = c(180.5, 190.3, 170.32, 172,173, 176,193), ourMales=c(TRUE, TRUE, FALSE, FALSE, FALSE, TRUE,TRUE))
you can now ask R GUI to give you info. stored inside people variable by typing print(people)
or simply people
in your console:
you get the answer from R’s GUI:
> people ourNames ourWeights ourHeights ourMales 1 Tom 80 180.50 TRUE 2 Jack 85 190.30 TRUE 3 Rose 60 170.32 FALSE 4 Kate 65 172.00 FALSE 5 Betty 70 173.00 FALSE 6 Dong 71 176.00 TRUE 7 Tony 73 193.00 TRUE
Making data frames by reading in existing data files
It would not always be feasible to make data frame by typing into your R console. Another way to do this is to read the external files into R, where R can ‘automatically’ make data frame for you. We will cover this in latter Chapter
Some useful things/functions for Data Frames
We will cover R’s Function in the later Chapter. For now, you are advised to follow along this tutorial.
R’s $ sign
Let’s say if you want only extract one column of data from the people data frame,
the basic way is to use R’s $ sign after data frame’s name, for example, I need all weights from people:
people$ourWeights [1] 80 85 60 65 70 71 73
likewise, you can get other column’s info. by typing people$ourMales, people$ourWeight, people$ourHeights
.
attach and detach functions
But the above people$ could be such a hassle if your data frame has too many columns.Can we retrieve a specific column’s info. without typing those data frame name and $ sign? No, we got an error:
ourHeights Error: object 'ourHeights' not found
attach()
function to rescue. If you run the below in order, you will find you can get weights info. without typing people$ any more:
attach(people) ourWeights [1] 80 85 60 65 70 71 73
Handy? however, please be aware, this runs some risk when you have the column which happens to has the same name shared by multiple data frames. Or, you already have some variables with the same names as the data frame’s column names you want to attach()
to, R then will give you warning message automatically if so.
You can remove those variables from attach(people)
by running detach(people)
. You should be no longer able to access ourHeights, ourMales, ourNames and ourWeights variables in R console.
head function
Instead of printing out the entire data frame byprint(people)
, it is often desirable to preview it with the head function beforehand, if you get a huge size of data frame in terms of row and column numbers.
head(people) names weights heights males 1 Tom 80 180.50 TRUE 2 Jack 85 190.30 TRUE 3 Rose 60 170.32 FALSE 4 Kate 65 172.00 FALSE 5 Betty 70 173.00 FALSE 6 Dong 71 176.00 TRUE
As you can see, the record of Tony is not in the preview.
nrow/ncol/dim functions
The number of data rows in the data frame is given by the nrow function; the number of columns of a data frame is given by the ncol function; the number of rows and columns of a data frame is given by the dim functionnrow(people) # number of data rows [1] 7 ncol(people) # number of columns [1] 4 dim(people) # number of rows and columns [1] 7 4
R Data Frame Exercises (optional)
Exercise 1
Create a dummy data frame, named teammates for your team, try use different data types Numeric type, logical type, character type for each column
Exercise 2
Try functions you learned on your data frame, can you modify the values in your data frame, hint:(see FAQ), are there any ways to do that?
FAQ
4.2.1 I am new to R, and still like MS Excel spreadsheet and its functionalities; is there a way in R that allows me to edit the table like the way in Excel, or even do some data analysis like in Excel?
Good question(s).
If you just need a quick and easy GUI way to edit your existing data frame in your R console, say, the people data frame, all you need to do is to run fix(people)
. A window named “Data Editor” will be opened up, you can click the targeted cell to modify cell values and even column name and data types of that column. You’ll need to close “Data Editor” to make data editing effective.
There are some R packages e.g., Rcmdr can do much better jobs in terms of data editing, supporting copy/paste between Excel spreadsheet and R, do advanced data analysis and modeling with friendly GUI. In our teaching, we just barely scratched the surface, we won’t cover the width and depth for most of data analytics, but I can recommend a book for you to read if interested below
R GUI Book
Using the R Commander: A Point-and-Click Interface for RThis book provides a general introduction to the R Commander graphical user interface (GUI) to R for readers who are unfamiliar with R. It is suitable for use as a supplementary text in a basic or intermediate-level statistics course. It is not intended to replace a basic or other statistics text but rather to complement it, although it does promote sound statistical practice in the examples. The book should also be useful to individual casual or occasional users of R for whom the standard command-line interface is an obstacle.