JSON parser for Talend

Go To StackoverFlow.com


I need some help devising a strategy to parse JSON docs within a Talend job (Java job, not Perl). I am using Talend Version: 5.0.2 and developing on a Mac, planning to run on a Linux box.

Unfortunately, I cannot use the tFileInputJSON component because of the format of my files -- each file contains several hundred JSON docs, with a complete JSON doc taking up one line in the file. I think the right solution is to read the file line by line then pass it into a JSON parser and from there send the results to the rest of the job.

As I see it my options are:

a) send the line input to some sort of Java JSON parser. If that's the strategy I need to take, I'd like some advice on how to deal with the output and getting

b) find a Talend component that parses JSON docs, but within a flow as opposed to on a single file in valid JSON format.

I've searched around for this component but can't seem to find it. From my search, it seems even the tFileInputJSON component is relatively new.

I definitely know this is something Java can do pretty easily. My problem is getting the whole thing synced up within the Talend framework.

Anyone have some advice on where I should turn next?

Thanks in advance.

2012-04-03 23:50
by badgley
Hey everyone -- I ended up going with the answer provided by the people over at Talend Forge -- http://www.talendforge.org/forum/viewtopic.php?pid=82606#p82606 -- While I made progrss with llaen's suggestion, the more hackish approach suggested at Talend Forge gets the job done for me now - badgley 2012-04-11 17:51


Have you tried creating a custom routine? You can do so under Code (in the repository window on the left), right click on Routines and create your custom routine. This lets you write a Java function which can then be called from somewhere in your job (tMap, tJava, whatever). You could read your input file and call a function on each line/element or whatever that does something you want.

Like any Java function, the routine can then write to file, print to screen or return some list object that you can further work on in another tJava, tJavaFlex, tJavaRow or whatever Talend components in your job.

It may feel a little hacky, but you can do a lot just using custom routines.

If you want to go all the way and create your own component, this may be a good way to start: http://www.talendforge.org/forum/viewtopic.php?id=17650 Of course, creating components is much more time-consuming, but may be useful if you think you'll be reusing this code in multiple projects/cases.

2012-04-09 00:10
by sdragnev
Alright -- this seems like a good direction to explore -- going to give it a go. Will report back! Thanks - badgley 2012-04-11 00:12
While I ended up going with the suggestion from Talend Forge, I am accepting this answer -- thanks for the help. I managed to get a routine to work and think it was a great direction to take. Thanks again - badgley 2012-04-11 17:52


Read the file line by line, and construct a JSON Object for each line.

final BufferedReader br = new BufferedReader(new FileReader(file));
String line;

while ((line = br.readLine()) != null)         // read until EOF
  final JSONObject json = new JSONObject(line);

2012-04-04 00:13
by Greg Kopff
greg -- thanks for getting things started. I don't think I was clear enough in my question. I really need help slapping this sort of funcationality into the talend component framework -- something I am struggling with right now. I'll edit my pos - badgley 2012-04-06 16:02