how to get the links from these nodes:
script <- getURL("www.r-bloggers.com")
doc <- htmlParse(script)
li <- getNodeSet(doc, "//ul[@class='xoxo blogroll']")
thanks in advance for any hints.
You can extract the a
elements and call xmlGetAttr
on them.
library(RCurl)
library(XML)
script <- getURL("www.r-bloggers.com")
doc <- htmlParse(script)
li <- getNodeSet(doc, "//ul[@class='xoxo blogroll']//a")
sapply(li, xmlGetAttr, "href")
You can also use xpathApply
directly:
xpathSApply(doc,
"//ul[@class='xoxo blogroll']//a",
xmlGetAttr, "href"
)