How do we process records in a large table using paging, where the state of the records keep changing, thereby affecting the contents in a given page?

Go To StackoverFlow.com

0

There is a table with more than 100,000 records matching a particular criteria of interest. A job runs to get 1000 records at a time from this set and performs some operation on each record. There are other processes which might be changing the records in this table which can affect the set of interest. Due to these changes, the records that are supposed be in the pages when the job started will keep changing. This is leading to either missing out certain records or processing certain records more than once.

Though I understand that the system should have been designed in a way to avoid this situation, re-designing the system is not a choice at this point. So, I was wondering if this is a common scenario and if so, what kind of solution patterns exist? I tried googling but couldn't get any meaningful hits.

thanks.

2012-04-05 02:10
by Vikdor


1

One approach is to clone the database when it is idle and then work on the copy in the job. The data will be a little out of date, but if your database is being constantly updated while you are trying to access the entire contents, then its not possible to be "up to date."

2012-04-05 02:12
by Vinnie Falco
Thanks much for the response, Vinnie Falco. Since the table is large is in size and can be potentially updated all the time means we have to lock the table exclusively and take a snapshot, this is a little unfeasible at this point - Vikdor 2012-04-05 02:16


1

Assuming grabbing your 1000 records and performing calculations on them isn't a time consuming process you could wrap your code into a transaction. Of course this means that other connections will not be able to modify the table of interest at the same time though. They basically have to wait until the transaction completes before their update/insert statements are ran. So if your code isn't taking a large amount of time and the timeout of the other connections is reasonable this is a possible solution.

2012-04-05 02:22
by Dan P
Thanks for the response, Dan. It appears that the records have to be locked exclusively - Vikdor 2012-04-06 03:32
Ads