I have some hourly acquired data files. They filenames look like:
20120101-00.raw
20120101-01.raw
...
YYYYMMDD-HH.raw
I have to aggregate hourly files to daily, daily to monthly etc. Syntax of the aggregate script is following:
aggregate output-file input-file1 input-file2 ...
Schema of aggregating is:
20120101-[0-2][0-9].raw -> 20120101.raw
201201[0-3][0-9].raw -> 201201.raw
etc.
I am trying to write Makefile to automate process, but I am completely stuck - I don't know how to deal with problem of extensions - source and target files has the same extension. I use:
$(shell find . -type f | grep -e "\.raw1$$" | cut -c 8 | sort -u )
to find files I have to generate.
I agree with Oli Charlesworth that Make isn't the best tool for this job-- I'd use a Perl script. But if you want to use Make, it can be done. Here's a not-too-horrible hack using calls to sed
. It can be tightened up a little, but I'm going for readability.
FILES := $(shell ls *-??.raw)
DAYS := $(sort $(shell ls *-??.raw | sed 's/\(........\).*/\1.raw/'))
MONTHS := $(sort $(shell ls *-??.raw | sed 's/\(......\).*/\1.raw/'))
YEARS := $(sort $(shell ls *-??.raw | sed 's/\(....\).*/\1.raw/'))
all.raw: $(YEARS)
aggregate $@ $^
$(YEARS): %.raw : $(MONTHS)
aggregate $@ $(filter $*%, $^)
$(MONTHS): %.raw : $(DAYS)
aggregate $@ $(filter $*%, $^)
$(DAYS): %.raw :
aggregate $@ $(filter $*%, $(FILES))
201201.raw
will become 201201.r.raw
in the DAYS
variable - Brian Swift 2012-04-06 21:02
ls
- Beta 2012-04-06 21:48
If I wrote a script for this it would read a list of .raw filenames, sort the list, for each filename create a shortened name by deleting the last two digits, if this shortened name is the same as the previous shortened name, add the full filename to a list to be aggregated, if the shortened name differs from the previous shortened name create the output-file name based on the last entry added to the list, if the output-file already exists and is newer than the last entry added to the list do nothing because it is already up-to-date, otherwise run the aggregate command using the output-file name and list of input files.
To use the script first run it with all the hourly files, then run it again with all the daily files, (and if desired it could be run again with all the monthly files to produce yearly files.)
There are some constraints on the outlined script: