java: get html contents

Go To StackoverFlow.com

-1

I have a HTML file containing some java script tags. When I run this file in some browser such as IE, some contents are cached from its source and displayed on browser(for example weather of some cities). How can I run run this html file and get contents of web page that was displayed on web browser before? I don't want to display contents on my application; I want to parse returned data and extract some special contents(for example extract weather of each city). can anyone guide me please?

2012-04-04 07:15
by sajad
This question is far too unspecific, and as it seeems, has nothing to do with java. I cut the java ta - yunzen 2012-04-04 07:25
I want a java application to use it in a server. I gets input and returns the data got from site. I need a java library to parse html file or contents received from web server and extract my favorite tags. So my question is about java - sajad 2012-04-04 07:35


1

What you're trying to do is called html scraping.

Your best option is to get help in the form of a library, since this is a conmon and complex task.

See this question: Options for HTML scraping?

2012-04-04 07:28
by daveb


0

Selenium is a good bet. It supports HtmlUnit, Firefox, Chrome amongst other browsers.

Link: http://seleniumhq.org/

2012-04-04 07:30
by Alp
Ads