I am using this code to do a query over list of words using google and extract number of search results from google. It worked fine, but since last night it keeps giving me this error after doing 200 queries (I guess google flagged me!): Exception in thread "main" java.io.IOException: Server returned HTTP response code: 503 for URL: http://www.google.com/sorry/?continue=http://www.google.com/...
the "red" is just an example.
public class Google {
public static void main(String[] args) throws IOException {
String query = "red";
String urlName = "http://www.google.com/search?q=\""+query+"\"";
URL url = new URL(urlName);
URLConnection conn = url.openConnection();
conn.setRequestProperty("User-Agent",
"Mozilla/5.0 (X11; U; Linux x86_64; en-GB; rv:1.8.1.6) Gecko/20070723 Iceweasel/2.0.0.6 (Debian-2.0.0.6-0etch1)");
BufferedReader in = new BufferedReader(new InputStreamReader(conn.getInputStream()));
Pattern pattern = Pattern.compile("<div>About (.*?) results</div>"); //<div>About 1,620,000 results</div>
String line;
while ((line = in.readLine()) != null) {
Matcher m = pattern.matcher(line);
if (m.find()) {
System.out.println(m.group(1)); // m.group(1) coresponds to results number: i.e.: 1,620,000
}
}
in.close();
}
}
any solution, suggestion?
You've been flagged as a bot, probably due to the frequency of your query. Try running this from a different IP (before that one is flagged as a bot).
Regardless, you should probably use the Google search API. From the site https://developers.google.com/custom-search/v1/overview:
Free quota
Usage is free for all users, up to 100 queries per day.
Paid Usage
Any usage beyond the free usage quota will fail if you are not signed up for billing. Once you have enabled billing, you will continue to receive 100 free queries per day. However, you will be billed for all additional requests at the rate of $5 per 1000 queries, for up to 10,000 queries per day. If you need additional quota, please request additional quota from the console.
From the error page specifically:
"Our systems have detected unusual traffic from your computer network. This page checks to see if it's really you sending the requests, and not a robot."
Since you obviously are a robot hitting their page they've put in certain measures to circumvent the traffic you're costing them an do not wish for you to continue such a practice.
Having said that you'll need to verify your identity with Google at some point. The way this page recommends doing so it to present the author (yourself) with the image, manually bypass the captcha and then save the cookie for use by your program.