Getting Author-level blog performance from Google Analytics

Blogging has become common practice among businesses looking to connect with their customers. Behind every blogging operation though is a team of authors full of industry knowhow. Giving writers a way to track their performance can compel them to write better and more often. Using R and Google Analytics, I was able to aggregate blog traffic for my coworkers. Google Analytics is a commonly used tool for tracking site usage, and the googleAnalyticsR package is a great API wrapper for getting its data into R. GA has important metrics for individual web pages, but doesn’t have information identifying the author of a post. This makes aggregating blogs by author a pain, particularly for companies who have hundreds of blog posts around the web.

library(tidyverse)
library(rvest)
library(googleAnalyticsR)
ga_auth()

We start by querying the Google Analytics data we want. Here, the only metrics I want to see are sessions for each blog. Luckily, our url structure is such that every blog has “blog/” in its url.

organic <- segment_ga4("organic",segment_id = "gaid::-5")
dim <- dim_filter("landingPagePath","REGEXP","blog/") %>% list() %>% filter_clause_ga4()
pages <- google_analytics(ga_id, #your id here
                 date_range = range, #your date range
                 dimensions = c("landingPagePath"),
                 metrics = c("sessions"),
                 dim_filters = dim,
                 segments = organic)

Google Analytics will return a list of url paths. We want to append them to the host name to get the author name from the url.

pages <- pages %>% mutate(page = paste0("https://www.alloymagnetic.com",landingPagePath))

To retrieve the the author names, the rvest package comes in handy. It’s generally useful for adding context to standard reports. The author can be identified by its CSS selector.

pages <- pages %>% mutate(author = map(page,~try(read_html(.) %>% 
                                           html_node("span.author") %>% 
                                           html_text())) %>% unlist())

It’s likely that you’ll get status errors for some of the pages, particularly if some of your blogs have been unpublished in your chosen range.

Finally, we group the landing page traffic by author to get our list of writers.

 authors <- pages %>% group_by(author) %>% summarise(sessions = sum(sessions),
                                         `# of blogs` = n()) %>%
  filter(!str_detect(author, "404")) %>%
  mutate(author = str_remove(author, " by ")) %>%
  arrange(desc(sessions))

authors %>% 
  mutate(name = starwars[1:12,]$name) %>% 
  select(name,2,3,-author) %>% kable() #Protecting author identity
name sessions # of blogs
Luke Skywalker 1530 5
C-3PO 1519 6
R2-D2 644 4
Darth Vader 378 6
Leia Organa 303 22
Owen Lars 246 7
Beru Whitesun lars 175 5
R5-D4 137 4
Biggs Darklighter 96 5
Obi-Wan Kenobi 12 4
Anakin Skywalker 8 4
Wilhuff Tarkin 1 1