How should I avoid 404 errors when scraping with R

Go To StackoverFlow.com

0

I'm accessing web pages by looping over a couple of variables to insert into the URL

There will be occasional 404 errors.

How do I insert some sort of catch for these pages to avoid breaking the code. I currently use the XML package but of course could load others if appropriate

TIA

2012-04-04 20:25
by pssguy
try thisJustin 2012-04-04 20:28
@Justin. Tx I used that as basis. Did you want to make it an answe - pssguy 2012-04-05 00:32


0

Most of times I use RCurl::url.exists(). In case you have a list or a data frame containing all the urls you can try this:

map(p, ~ifelse(RCurl::url.exists(.), ., NA))

HTH!

2017-12-05 20:37
by Tito Sanz
Ads