I want to index xml files of Wikipedia into Solr.
But I am getting an error, it is unable to index. Solr has a specific format for xml files. I changed the schema.xml
and data-config.xml
files to suit the tags of the wikipedia files.
Still it is unable to index the files. My actual intention is to index wikipedia which is an xml file of 30 GB.
How would I go about indexing all wikipedia files into Solr?
Basically, you use the DataImportHandler
and some XPath to pull the metadata you care about out of the Wikipedia XML, and put it in flat Solr field listings.
*:*
- Dan Fitch 2012-04-03 20:35
<commit>
operation after adding the file? See this page for how to do that - most Solr libraries and wrappers will make this very easy - Dan Fitch 2012-04-04 20:54