Subsetting data frame in R after reading in data with scan

Go To StackoverFlow.com

0

I'm reading in data about an HTTP access log. I've got a file with columns for the ip address, year, month, day, hour and requested URL. I read the file in like this:

ipdata = scan(file="sample_r.log", what=list(ip="", year=0, month=0, day=0, hour=0, verb="", url=""))

This seems to work. R-Studio says that ipdata is a list[7] and "names(ipdata)" returns

[1] "ip"    "year"  "month" "day"   "hour"  "verb"  "url"  

So that seems cool. I wanted to do something fun, like graph some data for a specific hour. I tried doing a subset:

s <- subset(ipdata, ipdata$hour==3)

This data looks remarkably different than the first data frame. s is a list[297275] and the following doesn't work right:

> table(ipdata$verb)

GET    POST 
2870709 1596748 

> table(s$verb)
character(0)

Am I going about this the correct way? What I typically do is wrap my data frame in a table() and then barplot or dotplot it. Is R a good way to do this? I want to say "Show me all of the top URLs in hour 3", for example. Or "How many times did this IP address show up per hour?"

Update It looks like by using read.table instead of scan I was able to get a data frame. Apparently scan returns a list of lists or something? Definitely confusing to a n00b like myself but I'm feeling good about it now.

2012-04-05 13:53
by Dave


0

If you ran

dat <- as.data.frame(ipdata)
str(dat)

.... you would probably see that it was pretty much the same as the results of your read.table() operation. read.table is a wrapper for scan and does a lot of formatting and consistency checking.

2012-04-05 16:20
by 42-
Ah! Does read.table essentially do a "as.data.frame" when it's done - Dave 2012-04-05 19:15
Well, read.table assigns "data.frame" as the class of its returned object. It does a lot of checking of names and lengths and classes before it assigns the class. Just type read.table at your console. In addition to seeing the amount of consistency enforcement, you will get an appreciation for why it is sometimes slow - 42- 2012-04-05 19:22
Awesome, thanks again. I'm still getting used to R and R Studio and just this morning learned about the help feature : - Dave 2012-04-06 15:21
Ads