Retrieve links from XMLNodeList

Go To StackoverFlow.com

0

how to get the links from these nodes:

script <- getURL("www.r-bloggers.com")
doc <- htmlParse(script)
li <- getNodeSet(doc, "//ul[@class='xoxo blogroll']")

thanks in advance for any hints.

2012-04-04 23:04
by Kay


3

You can extract the a elements and call xmlGetAttr on them.

library(RCurl)
library(XML)
script <- getURL("www.r-bloggers.com")
doc <- htmlParse(script)
li <- getNodeSet(doc, "//ul[@class='xoxo blogroll']//a")
sapply(li, xmlGetAttr, "href")

You can also use xpathApply directly:

xpathSApply(doc, 
  "//ul[@class='xoxo blogroll']//a", 
  xmlGetAttr, "href"
)
2012-04-05 03:47
by Vincent Zoonekynd
many thanks, again - Kay 2012-04-05 06:23
Ads