Annotated ETL Code Examples with Make


When we make data at DataMade, we use GNU make to achieve a reproducible data transformation workflow

For more background on make, see our overview of make & makefiles

For ETL best practices, see our DataMade ETL styleguide

hover (or click if you're on a touchscreen) on highlighted text for annotations

Example Make Rules

General Structure of a Rule

target: dependencies

1. Setup phony targets

.PHONY: all clean


    rm -Rf finished/*

2. Downloading a zip directory
    wget --no-use-server-timestamps \ -O $@

3. Unzipping a zip directory

.INTERMEDIATE: chicomm.shp
    unzip -o $<

4. Converting excel to csv

.INTERMEDIATE: parcel_survey.csv
parcel_survey.csv: parcel_survey.xlsx
    in2csv $< > $@

5. Grabbing select columns from an excel doc, & creating a csv with a new header

school_id_lookup.csv: School_data_8-3-14.xlsx
    in2csv $< |\
    csvcut -c "1,2" |\
    (echo "school_id,school_name"; tail +2) > finished/$(notdir $@)

6. Join csvs, using an implicit rule

%hourly.joined.csv: %hourly.csv stations.csv
    csvjoin -c "3,4" $< stations.csv > finished/$(notdir $@)