At TrueJob we've spent a significant amount of time building up an email list to use for marketing and outreach. Inevitably, some processes have gotten a bit messy over time - we store contacts in our CRM, but we've been sending out email campaigns via Sendgrid. Sendgrid ends up with important information not reflected in our CRM (bounced emails, blocks, spam reports, etc).

Maintaining good email list 'hygiene' is essential for continuing to get your messages delivered (and it's the law), so we're careful about removing any email addreses from our list that have delivery problems.

Bad Old Days

The workflow when creating a list for an email campaign used to look something like this:

  • Download list of relevant contacts from the CRM
  • Download a bunch of 'suppression' reports from Sendgrid (separate reports for bounces, blocks, spam, invalid addresses, etc.)
  • Copy the CRM contacts list into a new spreadsheet
  • Copy all the 'suppressions' into another column in that spreadsheet
  • Use some excel formulas (e.g. VLOOKUP) to filter out all the suppressed emails from the list of contacts
  • If there were any specific organizations we didn't want to email, search the resulting list for any with that domain
  • Copy the final filtered list into a CSV ready for upload

This didn't take a huge amount of time (say 10-15 minutes) but it was annoying and repetitive.

Good New Days

The new process looks like this:

  • Download contacts from the CRM and all the reports from Sendgrid into one folder
  • Open a spreadsheet and run a macro that does all the necessary filtering and writes out the final file for upload

Why use Excel macros and not a real programming language like Python? Sometimes it is nice to skip the command line and open a spreadsheet with a big button that says 'GO'. More importantly spreadsheets make it easy to provide configuration visually (by updating cell values), and when working with CSV files it's also nice to be able to view and search entries when debugging.


Step 1: Import all the CSVs

The first step was to write a macro to import all the data we need into one spreadsheet. This isn't strictly necessary - but it looks cool, makes things easier later, and leaves all the data in one place if there is any debugging to do.

This macro opens every CSV file in a directory one at a time and copies the contents into a new worksheet of the main workbook.

Sub import_all_csvs()
  Dim s As String
  Dim totalSheets As Long
  Dim thisBook As String
  Dim baseDir As String
  baseDir = Range("input_directory").Value
  thisBook = Application.ActiveWorkbook.Name
  s = Dir(baseDir + "*.csv")
  Do Until s = ""
    If s <> Range("out_file_name").Value Then
      Workbooks.Open (baseDir + s)
      totalSheets = Workbooks(thisBook).Worksheets.Count
      Workbooks(s).Worksheets(1).Copy after:=Workbooks(thisBook).Worksheets(totalSheets)
    End If

    s = Dir
End Sub

Note the Dir function - call once in a target directory to get the first matching file, then subsequent calls without arguments return the next matching file (see generators).

The result when you see this run (a rapid opening and closing of spreadsheets) will make you look like an Excel wizard.

Step 2: Read the CRM contacts

In our case, a user's email address might appear in one of several columns that start with email (e.g., 'Email (Personal)' or 'Email (Work)'). The function below handles this very typical data cleaning problem by storing all potential email columns in a Dictionary, then iterating through each potential email column for each row.

Function get_starting_list() As Dictionary
  Dim cur As Worksheet
  Dim c As Long
  Dim r As Long
  Dim key As Variant
  Dim emailColumns As New Dictionary
  Dim emails As New Dictionary
  For Each cur In Worksheets
    'In our case this identifies which sheet has the list of contacts
    If Left(cur.Name, 7) = "truejob" Then
      'First find the columns that might hold email addresses
      c = 1
      Do Until cur.Cells(1, c) = ""
        If LCase(Left(cur.Cells(1, c), 5)) = "email" Then
          emailColumns.Add c, 1
        End If
        c = c + 1
      'Now go through each row and find an email address'
      r = 2
      Do Until cur.Cells(r, 1) = ""
        For Each key In emailColumns.Keys()
          If cur.Cells(r, key) <> "" Then
            emails.Add LCase(cur.Cells(r, key).Value), 1
            Exit For
          End If
        r = r + 1
    End If
  Set get_starting_list = emails
End Function

Step 3: Read the 'Suppressions'

Now, we loop through each worksheet - skipping the sheet we read the CRM contacts from - and build up a list of email addresses to suppress. These are the ones we don't want to appear in the final output. This is straightforward so I'm leaving this code out: on each sheet we find the column with a heading that starts with 'email' and just read down the list.

Of course in our case there is some extra complexity which comes from another common data cleaning challenge: some of the CSVs we download come in a slightly different format with a single column of email addresses and no header. We use a simple and reliable test: check for an @ sign in the contents of the first cell (alternatively, you could use a regex).

Step 4: Write the output file

Finally, we put it all together. One main function (do_all) calls each of the steps in turn, and another writes the output file. It's particular easy to write CSV files that only have a single column (maybe version 2.0 will include additional information like first and last names for more personalized campaigns). Note I've included a convenient ability to filter out particular domains.

Sub do_all()
  Call import_all_csvs
  Dim startingList As Dictionary
  Dim suppressionsList As Dictionary
  Set startingList = get_starting_list()
  Set suppressionsList = get_suppressions_list()
  Call write_output_file(startingList, suppressionsList)
End Sub

Sub write_output_file(start As Dictionary, suppress As Dictionary)
  Dim domainsSuppress As Dictionary
  Dim out As String
  Dim email As Variant
  Dim fn As Long
  Dim i As Long
  Dim domain As String
  'get_excluded_domains reads values from a named range on the spreadsheet'
  Set domainsSuppress = get_excluded_domains()
  out = Range("input_directory").Value + Range("out_file_name").Value
  fn = FreeFile
  i = 0

  Open out For Append As #fn
    Print #fn, "email"
    For Each email In start.Keys()
      domain = Split(email, "@")(1)
      If Not suppress.Exists(email) And Not domainsSuppress.Exists(domain) Then
        Print #fn, email
        i = i + 1
      End If
  Close #fn

  MsgBox Str(start.Count) & " initial emails" & Chr(13) & Str(i) & " written to file"
End Sub

Fill in the relevant fields, click GO, and:

You can find the spreadsheet code on Github. To use it, you'll first need to add it to a spreadsheet with the appropriate named ranges.

The ideal world might see us with a better system to keep our data in sync, but sometimes simple is better and I'm guessing this spreadsheet will keep things working for some time.