Getting Author-level blog performance from Google Analytics
Jan 20, 2019R
Blogging has become common practice among businesses looking to connect with their customers. Behind every blogging operation though is a team of authors full of industry knowhow. Giving writers a way to track their performance can compel them to write better and more often. Using R and Google Analytics, I was able to aggregate blog traffic for my coworkers. Google Analytics is a commonly used tool for tracking site usage, and the googleAnalyticsR package is a great API wrapper for getting its data into R. GA has important metrics for individual web pages, but doesn’t have information identifying the author of a post. This makes aggregating blogs by author a pain, particularly for companies who have hundreds of blog posts around the web.
library(tidyverse)
library(rvest)
library(googleAnalyticsR)
ga_auth()
We start by querying the Google Analytics data we want. Here, the only metrics I want to see are sessions for each blog. Luckily, our url structure is such that every blog has “blog/” in its url.
organic <- segment_ga4("organic",segment_id = "gaid::-5")
dim <- dim_filter("landingPagePath","REGEXP","blog/") %>% list() %>% filter_clause_ga4()
pages <- google_analytics(ga_id, #your id here
date_range = range, #your date range
dimensions = c("landingPagePath"),
metrics = c("sessions"),
dim_filters = dim,
segments = organic)
Google Analytics will return a list of url paths. We want to append them to the host name to get the author name from the url.
pages <- pages %>% mutate(page = paste0("https://www.alloymagnetic.com",landingPagePath))
To retrieve the the author names, the rvest package comes in handy. It’s generally useful for adding context to standard reports. The author can be identified by its CSS selector.
pages <- pages %>% mutate(author = map(page,~try(read_html(.) %>%
html_node("span.author") %>%
html_text())) %>% unlist())
It’s likely that you’ll get status errors for some of the pages, particularly if some of your blogs have been unpublished in your chosen range.
Finally, we group the landing page traffic by author to get our list of writers.
authors <- pages %>% group_by(author) %>% summarise(sessions = sum(sessions),
`# of blogs` = n()) %>%
filter(!str_detect(author, "404")) %>%
mutate(author = str_remove(author, " by ")) %>%
arrange(desc(sessions))
authors %>%
mutate(name = starwars[1:12,]$name) %>%
select(name,2,3,-author) %>% kable() #Protecting author identity
name | sessions | # of blogs |
---|---|---|
Luke Skywalker | 1530 | 5 |
C-3PO | 1519 | 6 |
R2-D2 | 644 | 4 |
Darth Vader | 378 | 6 |
Leia Organa | 303 | 22 |
Owen Lars | 246 | 7 |
Beru Whitesun lars | 175 | 5 |
R5-D4 | 137 | 4 |
Biggs Darklighter | 96 | 5 |
Obi-Wan Kenobi | 12 | 4 |
Anakin Skywalker | 8 | 4 |
Wilhuff Tarkin | 1 | 1 |