Annotated ETL Code Examples with Make

Intro

When we make data at DataMade, we use GNU make to achieve a reproducible data transformation workflow

For more background on make, see our overview of make & makefiles

For ETL best practices, see our DataMade ETL styleguide

hover (or click if you're on a touchscreen) on highlighted text for annotations

Example Make Rules

General Structure of a Rule

target: dependencies
    recipe

1. Setup phony targets

.PHONY: all clean

all: $(GENERATED_FILES)

clean:
    rm -Rf finished/*

2. Downloading a zip directory

parcels.zip:
    wget --no-use-server-timestamps \
    http://maps.indiana.edu/download/Reference/Land_Parcels_County_IDHS.zip -O $@

3. Unzipping a zip directory

.INTERMEDIATE: chicomm.shp
chicomm.shp: chicomm.zip
    unzip -o $<

4. Converting excel to csv

.INTERMEDIATE: parcel_survey.csv
parcel_survey.csv: parcel_survey.xlsx
    in2csv $< > $@

5. Grabbing select columns from an excel doc, & creating a csv with a new header

school_id_lookup.csv: School_data_8-3-14.xlsx
    in2csv $< |\
    csvcut -c "1,2" |\
    (echo "school_id,school_name"; tail +2) > finished/$(notdir $@)

6. Join csvs, using an implicit rule

%hourly.joined.csv: %hourly.csv stations.csv
    csvjoin -c "3,4" $< stations.csv > finished/$(notdir $@)