5 handy options in R data.table’s fread

Like all features in the info.table R deal, fread is speedy. Quite speedy. But there is much more to fread than velocity. It has a number of beneficial characteristics and options when importing exterior info into R. In this article are five of the most valuable.

Observe: If you’d like to follow along, download the New York Situations CSV file of every day Covid-19 cases by U.S. county at https://github.com/nytimes/covid-19-info/uncooked/master/us-counties.csv.

Table of Contents

Use fread’s nrows selection

Is your file big? Would you like to take a look at its framework right before importing the full detail – without acquiring to open it in a text editor or Excel? Use fread’s nrows selection to import only a portion of a file for exploration.

The code down below imports just the to start with 10 rows of the CSV.

mydt10 <- fread("us-counties.csv", nrows = 10)

If you just want to see column names without any info at all, you can use nrows = .

Use fread’s pick selection

The moment you know the file framework, you can opt for which columns to import. fread’s pick selection allows you pick columns you want to keep. pick normally takes a vector of possibly column names or column-position numbers. If names, they need to be in quotation marks, like most vectors of character strings:

mydt <- fread("us-counties.csv", 
pick = c("date", "county", "state", "cases"))

As often, numbers never need quotation marks:

mydt <- fread("us-counties.csv", select = c(1,2,3,5))

You can use an R item with a vector of column names within fread, as you can see in this next group of code. I create a vector my_cols with date, county, state, and cases then I use that vector within fread.

my_cols <- c("date", "county", "state", "cases")
mydt <- fread("us-counties.csv", select = my_cols)

The opposite of pick is drop. You can opt for to import all columns besides the ones you specify with drop, this kind of as:

mydt <- fread("us-counties.csv", drop = c("fips", "deaths"))

Like with pick, drop normally takes a vector of column names or numerical positions.

Use fread with grep

If you are acquainted with Unix, you can execute command-line applications correct from within fread. For case in point, if I just wanted California info, I could use grep to only import lines that comprise the text “California.” Observe that this queries every single overall row as a text string, not a precise column, so your info has to be in a structure in which that will make sense.

ca <- fread("grep California us-counties.csv")

Sadly, grep does not realize the initial file’s column names, so you close up with default names.

head(ca)
           V1          V2         V3   V4 V5 V6
1: 2020-01-twenty five      Orange California 6059  1  
2: 2020-01-26 Los Angeles California 6037  1  
3: 2020-01-26      Orange California 6059  1  
four: 2020-01-27 Los Angeles California 6037  1  
5: 2020-01-27      Orange California 6059  1  
6: 2020-01-28 Los Angeles California 6037  1

Nonetheless, fread allows us specify column names with the col.names selection. I can set the names based on names from mydt10 that I designed previously mentioned.

ca <- fread("grep California us-counties.csv", 
             col.names = names(mydt10))> head(ca)
         date      county      state fips cases deaths
1: 2020-01-twenty five      Orange California 6059     1      
2: 2020-01-26 Los Angeles California 6037     1      
3: 2020-01-26      Orange California 6059     1      
four: 2020-01-27 Los Angeles California 6037     1      
5: 2020-01-27      Orange California 6059     1      
6: 2020-01-28 Los Angeles California 6037     1

We can also use typical expressions, with grep’s -E selection, letting us do much more complicated queries, this kind of as on the lookout for 4 states at at the time.

states4 <- fread(cmd = "grep -E 'Texas|Arizona|Florida|South Carolina' us-counties.csv", 
col.names = names(mydt10))

The moment once more, a reminder: This is on the lookout for every single of all those state names everywhere in the row, not just in the state column. If you operate the code previously mentioned and check out what states are involved in the results with exclusive(states4$state), you are going to see Oklahoma and Missouri in the states column along with Texas, Arizona, Florida, and South Carolina. Which is mainly because equally Oklahoma and Missouri have counties named Texas.

So, grep throughout file import is a way to filter out a lot of info you never want from a quite big info set but it does not assurance you only get what you want. Just after this type of import, you should really still filter exclusively on column info to make sure you didn’t get something unanticipated.

Use fread’s colClasses selection

You can set column classes throughout import – for just a several columns, not each and every one. For case in point, the date column in this info is coming in as character strings, even though it’s in year-month-day structure. We can set the column named date to the info form Date during import applying the colClasses selection.

mydt <- fread("us-counties.csv", colClasses = c("date" = "Date"))

Now, dates are Dates.

> str(mydt)
Classes ‘data.table’ and 'data.frame':322651 obs. of  6 variables:
 $ date  : Date, structure: "2020-01-21" "2020-01-22" "2020-01-23" ...
 $ county: chr  "Snohomish" "Snohomish" "Snohomish" "Prepare dinner" ...
 $ state : chr  "Washington" "Washington" "Washington" "Illinois" ...
 $ fips  : int  53061 53061 53061 17031 53061 6059 17031 53061 4013 6037 ...
 $ cases : int  1 1 1 1 1 1 1 1 1 1 ...
 $ deaths: int            ...

Use fread on zipped documents

You can import a zipped file without unzipping it to start with. fread can import gz and bz2 documents right, this kind of as mydt <- fread("myfile.gz"). If you need to import a zip file, you can unzip it with the unzip system command in fread, applying the syntax mydt <- fread(cmd = 'unzip -cq myfile.zip').

For much more R tips, head to InfoWorld’s Do A lot more With R site.

Tags: 5 datatables fread handy Options

5 handy options in R data.table’s fread

Use fread’s nrows selection

Use fread’s pick selection

Are You Aware of the Latest Tech News?

Top 7 Questions To Ask Your Computer Repair Service Provider

How To Fix Msxml3 DLL Errors On Windows

Automatic Cooking Machine, Commercial Kitchen Equipment, Meat Stewing Machine

Common Cloud Hosting Security Challenges and Their Solutions

Say No to Runtime Error 429 – How to Fix “Runtime Error 429 Activex Component Can’t Create Object”

Buying iPad Accessories From Online iPad Forums

Use fread’s nrows selection

Use fread’s pick selection

More Stories

Are You Aware of the Latest Tech News?

Top 7 Questions To Ask Your Computer Repair Service Provider

How To Fix Msxml3 DLL Errors On Windows

Related Article

Automatic Cooking Machine, Commercial Kitchen Equipment, Meat Stewing Machine

Common Cloud Hosting Security Challenges and Their Solutions

Say No to Runtime Error 429 – How to Fix “Runtime Error 429 Activex Component Can’t Create Object”

Buying iPad Accessories From Online iPad Forums