How to write Makefile where target and source files have the same extension?

I have some hourly acquired data files. They filenames look like:

20120101-00.raw
20120101-01.raw
...
YYYYMMDD-HH.raw

I have to aggregate hourly files to daily, daily to monthly etc. Syntax of the aggregate script is following:

aggregate output-file input-file1 input-file2 ...

Schema of aggregating is:

20120101-[0-2][0-9].raw -> 20120101.raw
201201[0-3][0-9].raw -> 201201.raw
etc.

I am trying to write Makefile to automate process, but I am completely stuck - I don't know how to deal with problem of extensions - source and target files has the same extension. I use:

$(shell find . -type f | grep -e "\.raw1$$" | cut -c 8 | sort -u )

to find files I have to generate.

makefile

2012-04-05 23:50
by mefju

Make doesn't sound like the ideal tool to solve this problem.. - Oliver Charlesworth 2012-04-05 23:58

I agree with Oli Charlesworth that Make isn't the best tool for this job-- I'd use a Perl script. But if you want to use Make, it can be done. Here's a not-too-horrible hack using calls to sed. It can be tightened up a little, but I'm going for readability.

FILES := $(shell ls *-??.raw)

DAYS :=   $(sort $(shell ls *-??.raw | sed 's/\(........\).*/\1.raw/'))
MONTHS := $(sort $(shell ls *-??.raw | sed 's/\(......\).*/\1.raw/'))
YEARS :=  $(sort $(shell ls *-??.raw | sed 's/\(....\).*/\1.raw/'))

all.raw: $(YEARS)
    aggregate $@ $^

$(YEARS): %.raw : $(MONTHS)
    aggregate $@ $(filter $*%, $^)

$(MONTHS): %.raw : $(DAYS)
    aggregate $@ $(filter $*%, $^)

$(DAYS): %.raw :
    aggregate $@ $(filter $*%, $(FILES))

2012-04-06 13:21
by Beta

Since the prereq-patterns are the same for each target, won't all output files be re-aggregated each time the make is run. Also, there are unintended results when run in a directory that already contains aggregated filenames. For example 201201.raw will become 201201.r.raw in the DAYS variable - Brian Swift 2012-04-06 21:02

@BrianSwift: I don't understand your first point, but your second is valid; I'll fix the call to ls - Beta 2012-04-06 21:48

I withdraw the first point. However, I believe another issues is that if an hourly file is updated or a new hourly file for the same day is added after a make has been done, it won't trigger a rebuild of output files - Brian Swift 2012-04-06 22:48

@BrianSwift: true, but fixing that (without making it oversensitive to changes) would have made the makefile much more complex. As I said, 1) Make isn't the best tool for this, and 2) I was going for readability - Beta 2012-04-07 22:52

If I wrote a script for this it would read a list of .raw filenames, sort the list, for each filename create a shortened name by deleting the last two digits, if this shortened name is the same as the previous shortened name, add the full filename to a list to be aggregated, if the shortened name differs from the previous shortened name create the output-file name based on the last entry added to the list, if the output-file already exists and is newer than the last entry added to the list do nothing because it is already up-to-date, otherwise run the aggregate command using the output-file name and list of input files.

To use the script first run it with all the hourly files, then run it again with all the daily files, (and if desired it could be run again with all the monthly files to produce yearly files.)

There are some constraints on the outlined script:

It should only be given lists of one type of file at a time (e.g. Hourly, Daily)
All the files in each group to be aggregated need to be in the same directory or the initial sort needs to use only the basename (not directory) portion of filename as the sort key.
If these are log files that could be being updated while the script is running there is a chance of missing data that is logged while the aggregate command is running. This is due to the timestamp on the output file (which is used to determine if it is up-to-date with respect to the input files) being the time aggregate finished, not when it started. A workaround would be to touch a timestamp file (based on output-file name) before starting aggregate, and using the timestamp file rather than output-file to determine if output-file is up-to-date.

2012-04-06 00:44
by Brian Swift