Best approach for bringing 180K records into an app: core data: yes? csv vs xml?

I've built an app with a tiny amount of test data (clues & answers) that works fine. Now I need to think about bringing in a full set of clues & answers, which roughly 180K records (it's a word game). I am worried about speed and memory usage of course. Looking around the intertubes and my library, I have concluded that this is probably a job for core data. Within that approach however, I guess I can bring it in as a csv or as an xml (I can create either one from the raw data using a scripting language). I found some resources about how to handle each case. What I don't know is anything about overall speed and other issues that one might expect in using csv vs xml. The csv file is about 3.6 Mb and the data type is strings.

I know this is dangerously close to a non-question, but I need some advice as either approach requires a large coding commitment. So here are the questions:

For a file of this size and characteristics, would one expect csv or xml to be a better approach? Is there some other format/protocol/strategy that would make more sense?
Am I right to focus on core data?

Maybe I should throw some fake code here so the system doesn't keep warning me about asking a subjective question. But I have to try! Thanks for any guidance. Links to discussions appreciated.

2012-04-04 19:41
by Bryan Hanson

I would avoid csv unless you have a really fast, efficient parser. See my results hereprogrmr 2012-04-04 20:32

Thanks: I'll study that. I think I could also write the data in some binary form - Bryan Hanson 2012-04-04 20:53

Where you only have two fields, not 100, and they are both strings, so no further interpretation is required after the comma is found, you have an easier parsing problem - DRVic 2012-04-04 22:38

Database speed will always trump plaintext storage speed because all it has to do to skip data is add a fixed number of bytes to its pointer position whereas plaintext has to parse to find the next position (XML can do this ahead of time, but that will just move the slowdown to a different place). However, it really doesn't matter unless you start noticing it in your app - borrrden 2012-04-04 23:56

What about sqlite? If I can write the data into this format, would it be faster or more straightforward to read from this as opposed to csv, xml, or binary? Again, I'm asking about the big picture here. Thank you all - Bryan Hanson 2012-04-05 01:12

As for file size CSV will always be smaller compared to an xml file as it contains only the raw data in ascii format. Consider the following 3 rows and 3 columns.

Column1, Column2, Column3

1, 2, 3

4, 5, 6

7, 8, 9

Compared to it's XML counter part which is not even including schema information in it. It is also in ascii format but the rowX and the ColumnX have to be repeated mutliple times throughout the file. Compression of course could help fix this but I'm guessing even with compression the CSV will still be smaller.

<root>
    <row1>
        <Column1>1</Column1>
        <Column2>2</Column2>
        <Column3>3</Column3>
    </row1>
    <row2>
        <Column1>4</Column1>
        <Column2>5</Column2>
        <Column3>6</Column3>
    </row2>
    <row3>
        <Column1>7</Column1>
        <Column2>8</Column2>
        <Column3>9</Column3>
    </row3>
</root>

As for your other questions sorry I can not help there.

2012-04-04 19:50
by Dan P

Thanks. I would have guessed the same but being new to ios and objective-c, I don't have any real context for how fast the information might be absorbed. I suppose that might be only partly linked to the size of the file itself - Bryan Hanson 2012-04-04 19:54

This is large enough that the i/o time difference will be noticeable, and where the CSV is - what? 10x smaller? the processing time difference (whichever is faster) will be negligible compared to the difference in reading it in. And CSV should be faster, outside of I/O too.

Whether to use core data depends on what features of core data you hope to exploit. I'm guessing the only one is query, and it might be worth it for that, although if it's just a simple mapping from clue to answer, you might just want to read the whole thing in from the CSV file into an NSMutableDictionary. Access will be faster.

2012-04-04 20:32
by DRVic

I was thinking of reading it into a NSDictionary. The data will not change, only be queried. Clues will be drawn at random, so not all of it needs to be in memory at once I suppose - Bryan Hanson 2012-04-04 20:55

The only reason I suggested NSMutableDictionary is then you can read in a line, put it in the dictionary, read in another line, ... you don't need to buffer any more than one line - DRVic 2012-04-04 22:34