I am trying to figure out a better way to extract data from a large number of files, run additional calculations on some data, and finally format it into a something readable by a spreadsheet application. Below is how I am currently doing this but I am convinced there must be an easier way.
First I create a definition that will search the files and extract the data. I use a definition since sometimes the data will be compiled from several locations. I am doing this via a mixture of regular expressions and find all. This can be formatted differently if needed. The result of this is something like:
RawData=
[[M1,A1,305.91,288.12,Variable_M1_A1],
[M1,A2,319.07,303.70,Variable_M1_A2],
[M2,A1,237437.32,191460.91,Variable_M2_A1],
[M2,A2,270773.28,192581.05,Variable_M2_A2]]
What I wish to do with some of the data is organize it in such a way that a grid is created from the variables A and M and then the data from a column is placed in the correct grid location. Realistically looks like a simple two dimension matrix with the first row and first column indicating the variables.
What the csv file should look like:
Placeholder,A1,A2
M1,Variable_M1_A1,Variable_M1_A2
M2,Variable_M2_A1,Variable_M2_A2
What I currently am doing is creating an empty matrix (3x3 in this case) calling it Result then running the following code. The basically iterates over all lines and all variables when the variables match, it assigns the Result matrix a value from the RawData.
MVar=[M1,M2]
AVar=[A1,A2]
for a in range(len(RawData):
for b in range(len(MVar)):
for c in range(len(AVar)):
if RawData[a][0]==Mvar[b] and RawData[a][1]==AVar[c]:
Result[b+1][c+1]=RawData[a][4]
I wrote a chunk of code that will take the RawData matrix and create the Result matrix based off all possible combinations of M1 and A1. If I want to outport this to .csv I simply use csv.writer. So this works great if I just want to organize the data that is already there. However when I want to do calculations with the data- such as generating another column that is based off values in the RawData that did not end up in the matrix- it becomes difficult. For example to take what the csv file output could look like:
Placeholder,A1,A2,NewA
M1,Variable_M1_A1,Variable_M1_A2,(RawData[0][3]*RawData[1][2])
M2,Variable_M2_A1,Variable_M2_A2,(RawData[2][3]*RawData[3][2])
Notice that the data for the calculations require the same M value but different A values. While this can be done it quickly becomes very convoluted.
Is there a simpler way to do this?
Edit: Auto generation of the M and A list using:
[MethodList[i] for i,x in enumerate(MethodList) if x not in MethodList[i+1:]]
This seems makes it easier to work with, however its still a convolute process!
This was completed with the for loops and corresponding if conditions. I am still convinced there is a better way but this is entirely possible to do.