Chapter 4 Data Frames

4.1 Tutorial Video

R Data Frames

Card image

R Data Frames

GO!

22-Jan-2021

4.2 Reference

Data Frames

Data frames in R language are the type of data structure that is used to store data in a tabular form, which are the most useful for data analysis. You can think of a data frame as R’s equivalent to the Excel spreadsheet.

There are some restrictions and characteristics of the data frame, e.g.,

  • The column name is (usually) required
  • The number of items in each column should be the same

There are different ways to make a data frame, here I only listed a few:

Making data frames by assembling atomic vectors

A data frame is used for storing data tables. It is a list of vectors of equal length. Remember in our last Chapter, we’ve already created some atomic vectors? Now it is time to put them into better use, i.e., to make a data frame. Let’s make a variable people which is a data frame, which contains four atomic vectors names, males, weights and heights.

and make the people data frame is as easy as by using data.frame function

people <- data.frame(ourNames = c("Tom", "Jack", "Rose","Kate","Betty","Dong","Tony"),
           ourWeights = c(80, 85, 60,65,70,71,73),
           ourHeights = c(180.5, 190.3, 170.32, 172,173, 176,193),
           ourMales=c(TRUE, TRUE, FALSE, FALSE, FALSE, TRUE,TRUE))

you can now ask R GUI to give you info. stored inside people variable by typing print(people) or simply people in your console:

you get the answer from R’s GUI:

  >  people
  ourNames ourWeights ourHeights ourMales
1      Tom         80     180.50     TRUE
2     Jack         85     190.30     TRUE
3     Rose         60     170.32    FALSE
4     Kate         65     172.00    FALSE
5    Betty         70     173.00    FALSE
6     Dong         71     176.00     TRUE
7     Tony         73     193.00     TRUE

Making data frames by reading in existing data files

It would not always be feasible to make data frame by typing into your R console. Another way to do this is to read the external files into R, where R can ‘automatically’ make data frame for you. We will cover this in latter Chapter

Some useful things/functions for Data Frames

We will cover R’s Function in the later Chapter. For now, you are advised to follow along this tutorial.

R’s $ sign

Let’s say if you want only extract one column of data from the people data frame,
the basic way is to use R’s $ sign after data frame’s name, for example, I need all weights from people:

 people$ourWeights
[1] 80 85 60 65 70 71 73

likewise, you can get other column’s info. by typing people$ourMales, people$ourWeight, people$ourHeights.

attach and detach functions
But the above people$ could be such a hassle if your data frame has too many columns.
Can we retrieve a specific column’s info. without typing those data frame name and $ sign? No, we got an error:
ourHeights
Error: object 'ourHeights' not found
one way is to use R’s attach() function to rescue. If you run the below in order, you will find you can get weights info. without typing people$ any more:

attach(people)
ourWeights
[1] 80 85 60 65 70 71 73

Handy? however, please be aware, this runs some risk when you have the column which happens to has the same name shared by multiple data frames. Or, you already have some variables with the same names as the data frame’s column names you want to attach() to, R then will give you warning message automatically if so.

You can remove those variables from attach(people) by running detach(people). You should be no longer able to access ourHeights, ourMales, ourNames and ourWeights variables in R console.

head function
Instead of printing out the entire data frame by print(people), it is often desirable to preview it with the head function beforehand, if you get a huge size of data frame in terms of row and column numbers.
  head(people)
  names weights heights males
1   Tom      80  180.50  TRUE
2  Jack      85  190.30  TRUE
3  Rose      60  170.32 FALSE
4  Kate      65  172.00 FALSE
5 Betty      70  173.00 FALSE
6  Dong      71  176.00  TRUE

As you can see, the record of Tony is not in the preview.

nrow/ncol/dim functions
The number of data rows in the data frame is given by the nrow function; the number of columns of a data frame is given by the ncol function; the number of rows and columns of a data frame is given by the dim function
nrow(people)    # number of data rows 
[1] 7
ncol(people)    # number of columns 
[1] 4
 dim(people)  # number of rows and columns
[1] 7 4

R Data Frame Exercises (optional)

Exercise 1

Create a dummy data frame, named teammates for your team, try use different data types Numeric type, logical type, character type for each column

Exercise 2

Try functions you learned on your data frame, can you modify the values in your data frame, hint:(see FAQ), are there any ways to do that?

FAQ

4.2.1 I am new to R, and still like MS Excel spreadsheet and its functionalities; is there a way in R that allows me to edit the table like the way in Excel, or even do some data analysis like in Excel?

Good question(s). If you just need a quick and easy GUI way to edit your existing data frame in your R console, say, the people data frame, all you need to do is to run fix(people). A window named “Data Editor” will be opened up, you can click the targeted cell to modify cell values and even column name and data types of that column. You’ll need to close “Data Editor” to make data editing effective.

There are some R packages e.g., Rcmdr can do much better jobs in terms of data editing, supporting copy/paste between Excel spreadsheet and R, do advanced data analysis and modeling with friendly GUI. In our teaching, we just barely scratched the surface, we won’t cover the width and depth for most of data analytics, but I can recommend a book for you to read if interested below

R GUI Book

Using the R Commander: A Point-and-Click Interface for R

This book provides a general introduction to the R Commander graphical user interface (GUI) to R for readers who are unfamiliar with R. It is suitable for use as a supplementary text in a basic or intermediate-level statistics course. It is not intended to replace a basic or other statistics text but rather to complement it, although it does promote sound statistical practice in the examples. The book should also be useful to individual casual or occasional users of R for whom the standard command-line interface is an obstacle.