Daily File maintenance Job in C/C++ - 【StackMirror】|c++|c

I'm running a daily maintenance job for Resetting and Syncing up a set of related flat file database.

Problem Statement: In case when the maintenance application dumps in between then, I start over from the beginning instead of continuing from where I left.

Tried Solution: To fix the issue, I started to log the processed offset to a file. This way, when I return back I can check my completion status before starting over from the begining.

Issue: Logging the offset for every record being processed drastically increases the processing time.

Can Someone suggest a better way to handle this situation?

2012-04-04 19:20
by user1168037

This just indicates that your logger is slow, but we can't tell you how to improve on it without actually seeing how it's implemented. A start may be to have the logger run on a separate thread, so your primary thread only enqueues the logging data without waiting for it to be written/flushed - ildjarn 2012-04-04 19:43

As the previous comment said, tell us how you're doing the logging. Naive implementations can take orders of magnitude longer than they need to - je4d 2012-04-04 19:50

@ildjarn - I'm trying to find a better logic for this design. Btw to answer the questions regarding the design, I have a general process file function which accepts a file name and a file pointer. This function would then read the given file in size of 512 bytes and then calls the function passed through the file pointer. This function being called will process the data and update the status file with the latest processed offset - user1168037 2012-04-04 20:55

And you're saying that updating the status file with the latest processed offset is what drastically increases the processing time, correct? That's the implementation that we need to see - ildjarn 2012-04-04 20:57

@ildjarnOkay, so this is how it happens. For every record getting processed, on the successfull completion the offset number is incremented by 512(Or size of the data read). The status file, has predifined offset for every file.eg:File A - 1000, B-2000 etc... So, for example if I need to update the offset number for File A, then I can directly go to the offset 1000 and just update the currect processed offset number. This makes a additional I/O for every record being processed, creating the overhead - user1168037 2012-04-04 22:29

The implementation is slow, clearly, so we need to actually see the implementation in order to see how to improve it. Like I said, I think updating the status file on a second thread would be my approach - ildjarn 2012-04-04 23:20

Yes. Add a counter to your program, Every 1000 blocks, write your information to the log file. That will reduce your logging IO by 3 orders of magnitude, at the cost of having to redo up to 1000 block in the event of a restart.

2012-09-16 00:49
by EvilTeach

The issue with this might be 'can you detect which records were done but not recorded as done'? If you can't, then the recovery code might do the updates on data that has already been updated, which would probably make someone unhappy - Jonathan Leffler 2012-09-16 01:12

Well, I envision that at the time, that all of the seek positions from all of the input and output files are stored, so that the restart code, and simply load those seek positions and continue on from that exact point. In that case, the rewriting is a minor bit of overhead - EvilTeach 2012-09-16 01:14