Skip to content

Importing data directly from MS Excel

August 10, 2012

R is great for exploring, analysing and graphing your valuable data. No question about it. Unfortunately though, there’s no base package support for importing data directly from MS Excel. This means that you have to faff about saving it in another format, and THEN import this new file. This just adds another file to your computer, taking up more space or cluttering (if your machine looks anything like mine at least) your folders.

There are methods though around this. You CAN import files directly from Excel. You can even write Excel files.

Basically, there’s two options, the gdata package which relies on the perl language (dont fret, you dont have to learn perl too, just download it and forget about it!), or the xlsReadWrite package which uses Java (again, not to worry, this is used in web browsers, so you probably already have it).

gdata supports the xlsx format, while xlsReadWrite only supports xls. xlsReadWrite also makes creating xls files super easy! I use this for taking analysis results, putting them into a table in R, create the xls file with the table in and then copy that into Word for inclusion in my manuscripts. Can save lots of copying and pasting, especially if youre running lots of tests or have lots of means and SEs to put in a table.

Ive created a small xlsx file to show how it works. You can download it here: http://db.tt/zOCGC4ve or alternatively just cite the path in the call to the functions.

install.packages(pkgs="gdata")
library(gdata)
dat <- read.xls("http://db.tt/zOCGC4ve")
trying URL 'http://db.tt/zOCGC4ve'
Content type 'application/vnd.openxmlformats-officedocument.spreadsheetml.sheet' length 8759 bytes
opened URL
downloaded 8759 bytes

head(dat)
   Person  Potato Carrot Tomato
1       1       4     10      1
2       2      10      5      1
3       3       6     10      8
4       4       9      2      4
5       5       1      3      9
6       6      10      9      7

Easy huh? Theres loads of other options available too. If you have multiple sheets in your workbook, you can reference them by name or number:

dat <- read.xls("http://db.tt/zOCGC4ve", sheet="bread")
trying URL 'http://db.tt/zOCGC4ve'
Content type 'application/vnd.openxmlformats-officedocument.spreadsheetml.sheet' length 9363 bytes
opened URL
downloaded 9363 bytes

head(dat)
  Person White Brown Wholewheat
1      1     4     6          9
2      2     2     9         10
3      3     4     9          8
4      4     9     7          8
5      5     1     1          8
6      6     4     1          3

Handy! Other options are much the same as those in read.table and read.csv, such as head, skip, nrow, dec, etc.

If you dont install perl (referring here to gdata of course) that R cant find the executable and gives you an error message. If this happens you need to find the perl.exe file and specify it in the call to read.xls as below (this is where my perl exe is located, yours might be different).

read.xls(..., perl="C:/Strawberry/perl/bin/perl.exe")

The read.xls function in the xlsReadWrite package works in much the same way so I’m not going to go into it.

UPDATE: Ive just found there is also a package called XLConnect. Ive yet to play with that though…

UPDATE 2: Theres also an xlsx package which I am yet to try which seems to support both reading and writing to xlsx files, which is naturally handy to keep everything the same (and even adding sheets to the original data files perhaps…)

HTH

About these ads

From → R

19 Comments
  1. What about multi-sheet workbooks? Can you show read/write examples for that case?

    • That is what the second example demonstrates. Using gdata you just need to add another argument named sheet and give it either the sheet number (which tab in excel) or its name (“bread”) in the example.
      read.xls(“http://db.tt/zOCGC4ve”, sheet=”bread”)
      OR
      read.xls(“http://db.tt/zOCGC4ve”, sheet=2)
      Ive not actually tried writing to multi-sheet work books, but i would imagine it works the same way.

      HTH

  2. There’s also an xlsx package. I have a short post on how to write to xlsx, with reading achieved in a similar fashion. http://danganothererror.wordpress.com/2012/02/12/write-data-frame-to-excel-file/

  3. Is perl available only for windows? What about Mac?

  4. Dr. Khan Amir Maroof permalink

    why i m getting this error?

    kk kk<-read.xls("C:/Users/myname/Desktop/kk")
    Error in findPerl(verbose = verbose) :
    perl executable not found. Use perl= argument to specify the correct path.
    Error in file.exists(tfn) : invalid 'file' argument

    • As the error message says, gdata cant find your perl installation. You need to track down the perl.exe file. On my machine its under C:/Strawberry/perl/bin/perl.exe. Try using
      read.xls(“C:/Users/myname/Desktop/kk”, perl=”C:/Strawberry/perl/bin/perl.exe”)
      but you’ll probably have to change the path in the perl argument.

  5. Ankur permalink

    Is there a row limit that you can import with gdata ? I was trying to import an excel file with 650K rows it threw up some perl errors ? With csv it was a breeze !

    • Not that Ive ever found. Ive imported > 2000 rows quite happily. I have experienced memory issues though…perhaps if your data file also has many columns… I think this was more of a problem with xlsReadWrite though (and on a netbook so it wasnt really unexpected). gdata actually converts your file to a temporary csv file and then imports that so it shouldnt be a problem in any case.

  6. Thanks for this post. V. useful indeed.
    After a bit of frustration, I discovered that this doesn’t work with Excel 2010. You have to save your Excel file as an earlier version. Excel-97-2003 workbook works fine.

    • gdata works fine with xlsx files (which excel 2007 and above produces). xlsReadWrite on the other hand, only supports the older xls format that Excel 2003 and before use. Did you try gdata (after downlading and installing perl)? Other options are the XLConnect and xlsx packages which I too am yet to investigate.
      HTH

  7. Alec Zwart permalink

    (Possibly OT) I’ve gone in a different direction here – opting instead to purchase an Excel add-in allowing easy *export* of data to CSV format, thus avoiding all of Excel’s ‘Are you sure you want to save in this format?’ annoyances. This despite once hearing a comment from the maintainer of a large data repository who estimated their repository was currently storing ‘ten terabytes worth of commas’… :-)

  8. You can also try EasyXLS.

    http://www.easyxls.com

    It reads and writes xls, xlsx, xlsb and any other MS Excel file format.

  9. Lippo permalink

    In my experience, xlsx::read.xlsx() is way too slow for files with >10000 rows. But compare reading in 1000 rows and 14 cols from xlsx-file (size 425kB)

    > library(“RODBC”)
    > system.time(sqlFetch(odbcConnectExcel2007(“test.xlsx”), “data1000″))
    user system elapsed
    0.08 0.02 0.08
    > system.time(read.xlsx(“test.xlsx”, sheetName=”data1000″))
    user system elapsed
    19.97 0.01 20.01

    Fast, huh? At the moment the RODBC- package works for 32-bit R, but not for 64-bit.

    • Wow! Impressive speed increase! I’ll have to look into RODBC!! Thanks for pointing it out.
      I havent checked, but I get the impression that xlsx:::read.xlsx is faster than gdata:::read.xls too.

Trackbacks & Pingbacks

  1. Import/Export data to and from xlsx files | Insights of a PhD student

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Follow

Get every new post delivered to your Inbox.

%d bloggers like this: