Migrating to Google Analytics 4

ava-s-ruslan-hrebenozhko

You already know that Google is transitioning from Universal Analytics (UA) to its updated version, Google Analytics 4 (GA4). Furthermore, as Google Support says, Google feels a sense of urgency to begin the migration. As of July 1, 2023, standard Universal Analytics has stopped data collection, and starting from July 1, 2024, you will no longer have access to the Universal Analytics interface and API.

Numerous distinctions exist between Universal Analytics and Google Analytics (GA4), with not all features from UA being included in GA4, so the migration process is not straightforward. In this article, I will share our experience migrating the application to use GA4 APIs instead of UA, highlight the challenges we’ve encountered, and discuss how we addressed them.

Our project uses Google’s Java library and Scala 2.13; nevertheless, the discussion is not dedicated to Scala-specific details and can be applied to a project with a different stack.

Initial application details

Let’s examine our initial setup: our application is integrated with Universal Analytics (UA) and utilizes the Analytics Management API and Analytics Reporting API v4 to retrieve data such as account, property, and goals lists. We use this data by running reports to obtain analytics data for further specific internal purposes. Additionally, it leverages the Google Auth Library for authorization.

To integrate with Google Analytics 4, we rely on both the Google Analytics Admin API and the Google Analytics Data API. Before migrating our code to Google Analytics 4, certain non-code-related steps are necessary. First, we need to migrate the website to use a Google Analytics property. Additionally, we must enable the Google Analytics APIs. This can be achieved conveniently through the buttons provided in the official documentation for the Admin API and Data API. Alternatively, the activation can be carried out manually within the Google Cloud Platform (GCP) console. Lastly, we include the required libraries in our build.sbt file.

ThisBuild / version := "1.0.0"

ThisBuild / scalaVersion := "2.13.13"

lazy val root = (project in file("."))
  .settings(
    name := "MigratingToGA4"
  )

libraryDependencies ++= Seq(
    // Analytics Reporting API v4
  "com.google.apis" % "google-api-services-analyticsreporting" % "v4-rev174-1.25.0", 
  "com.google.apis" % "google-api-services-analytics" % "v3-rev169-1.25.0",
  // Google Auth Library 
  "com.google.auth" % "google-auth-library-oauth2-http" % "1.16.1", 
  // Google Analytics Admin API
  "com.google.analytics" % "google-analytics-admin" % "0.46.0",
  // Google Analytics Data API
  "com.google.analytics" % "google-analytics-data" % "0.47.0",
)Code language: Scala (scala)

We have a trait (interface) and a set of models to abstract our application logic from the Google library. This allows us to concentrate solely on implementing this interface for GA4. Consequently, the main focus of this article will be on detailing the adaptation of these models and the implementation of the interface for GA4.

Changes in Analytics Models

We’ll start by considering models and methods for converting instances to and from Google’s libraries.

AnalyticsAccount

The first model we’ll consider is AnalyticsAccount, which represents an account and its child properties. Here are references to the documentation for GA4 and UA Account Summary that we use to convert to this model.

case class AnalyticsAccount(id: String, name: String, properties: List[AnalyticsWebProperty])

object AnalyticsAccount {
  def fromUAAccountSummary(account: com.google.api.services.analytics.model.AccountSummary): AnalyticsAccount =
    AnalyticsAccount(
      account.getId,
      account.getName,
      account.getWebProperties.asScala.toList.map(AnalyticsWebProperty.fromWebPropertySummery)
    )
  def fromGA4AccountSummary(account: com.google.analytics.admin.v1beta.AccountSummary): AnalyticsAccount = {
    AnalyticsAccount(
      account.getAccount,
      account.getDisplayName,
      account.getPropertySummariesList.asScala.toList.map(AnalyticsWebProperty.fromGA4PropertySummery)
    )
  }
}Code language: Scala (scala)

In Universal Analytics, the Account ID is usually in the format “UA-XXXXXXXXX-Y”. However, in the GA4 AccountSummary instance, there isn’t a method named getId. Instead, we utilize getAccount, which returns a string in the format “accounts/{account_id}”. This format serves as both a unique identifier and a means to retrieve reports.

Moreover, in Google Analytics, we choose for getDisplayName over getName. This preference arises because, in GA4 Account Summary, the getName method returns a resource name in the format “accountSummaries/{account_id}”. For our purposes, we require the actual account name, hence the use of getDisplayName.

AnalyticsProperty

AnalyticsProperty represents GA4 property and UA web property.

case class AnalyticsProperty(id: String, name: String, profiles: List[AnalyticsProfile])

object AnalyticsProperty {
  def fromUAWebPropertySummery(property: WebPropertySummary): AnalyticsProperty =
    AnalyticsProperty(
      property.getId,
      property.getName,
      property.getProfiles.asScala.toList.map(AnalyticsProfile.fromProfileSummary)
    )
  def fromGA4PropertySummery(property: PropertySummary): AnalyticsProperty =
    AnalyticsProperty(
      property.getProperty,
      property.getDisplayName,
      profiles = List.empty
    )
}Code language: Scala (scala)

Similarly to AnalyticsAccount, in this case, the getId method is not available, as in the case of Universal Analytics (UA). Instead, we utilize the getProperty method, which returns an ID in the format “properties/{property_id}.”

The challenge here is in the differing hierarchy between Universal Analytics and Google Analytics 4. In Universal Analytics, the structure is Account -> Properties -> Views (Profiles), whereas in Google Analytics 4, it is organized as Account -> Properties -> Data Streams.

IMPORTANT NOTE: data streams is not equivalent to views. However, for our use case, we treat them as views. Carefully evaluate this nuance in your use case.

A significant obstacle lies in our inability to retrieve data streams from PropertySummary directly. Consequently, we initialize the list as empty and later enrich it with data during API calls in the code.

AnalyticsProfile

As mentioned earlier, GA4 does not have Profiles like Universal Analytics. Therefore, the AnalyticsProfile now represents a DataStream in GA4.

case class AnalyticsProfile(id: String, name: String)

object AnalyticsProfile {
  def fromProfileSummary(profile: ProfileSummary): AnalyticsProfile =
    AnalyticsProfile(
      profile.getId,
      profile.getName
    )
  def fromDataStream(dataStream: DataStream): AnalyticsProfile =
    AnalyticsProfile(
      """\d+$""".r.findFirstIn(dataStream.getName).get,
      dataStream.getDisplayName
    )
}Code language: Scala (scala)

The getName method in the DataStream object provides a name in the format “properties/{property_id}/dataStreams/{stream_id}”. However, in the future, we only require the {stream_id}, so we utilize a regex to retain only the last digits (stream id).

AnalyticsGoal

In Universal Analytics, we had goals, but in Google Analytics 4, we have conversion events. They are quite distinct concepts, but we can use them interchangeably in our model.

case class AnalyticsGoal(id: String, name: String)

object AnalyticsGoal {
  def fromGoal(goal: Goal): AnalyticsGoal =
    AnalyticsGoal(goal.getId, goal.getName)
  def fromGA4ConversionEvent(conversion: ConversionEvent): AnalyticsGoal =
    AnalyticsGoal(conversion.getName, conversion.getEventName)
}Code language: Scala (scala)

AnalyticsReportRow

AnalyticsReportRow represents a row of data in an analytics report. It contains dimensions and metrics. We retrieve it from ReportRow UA and Row GA4.

case class AnalyticsReportRow(dimensions: List[String], metrics: List[Float])

object AnalyticsReportRow {
  def fromUAReportRow(row: ReportRow): AnalyticsReportRow = AnalyticsReportRow(
    Option(row.getDimensions).map(_.asScala.toList).getOrElse(List.empty),
    row.getMetrics.asScala.flatMap(_.getValues.asScala.map(_.toFloat)).toList
  )

  def fromGA4ReportRow(row: Row): AnalyticsReportRow =
    AnalyticsReportRow(
      row.getDimensionValuesList.asScala.map(_.getValue).toList,
      row.getMetricValuesList.asScala.map(_.getValue).map(_.toFloat).toList)
}Code language: Scala (scala)

AnalyticsReport

AnalyticsReport represents a complete analytics report. It includes dimensions, metrics, metric types, row count, totals, and a list of rows (AnalyticsReportRow). We transform it from a Universal Analytics (UA) Report and a Google Analytics 4 (GA4) BatchRunReportsResponse.

case class AnalyticsReport(dimensions: List[String], metrics: List[String], metricTypes: List[String],
                           rowCount: Int, totals: List[Float], rows: List[AnalyticsReportRow])

object AnalyticsReport {
  def fromUAReport(report: Report): AnalyticsReport = {
    val headers = report.getColumnHeader.getMetricHeader.getMetricHeaderEntries.asScala.toList
    AnalyticsReport(
      report.getColumnHeader.getDimensions.asScala.toList,
      headers.map(_.getName),
      headers.map(_.getType),
      report.getData.getRowCount,
      report.getData.getTotals.asScala.flatMap(_.getValues.asScala.map(_.toFloat)).toList,
      report.getData.getRows.asScala.toList.map(AnalyticsReportRow.fromUAReportRow)
    )
  }

  def fromGA4BatchRunReport(report: BatchRunReportsResponse): List[AnalyticsReport] = {
    report.getReportsList.asScala.toList.map(report =>
    AnalyticsReport(
      report.getDimensionHeadersList.asScala.map(_.getName).toList,
      report.getMetricHeadersList.asScala.map(_.getName).toList,
      report.getMetricHeadersList.asScala.map(_.getType.name).toList,
      report.getRowCount,
      report.getTotalsList.asScala.flatMap(_.getMetricValuesList.asScala.map(_.getValue)
          .map(metricValue => Try(metricValue.toFloat).getOrElse(0.0F))).toList,
      report.getRowsList.asScala.map(AnalyticsReportRow.fromGA4ReportRow).toList
    ))
  }
}Code language: Scala (scala)

AnalyticsReportRequest

AnalyticsReportRequest represents a request for an analytics report, including date ranges, metrics, dimensions, page number, and page size. It can be translated into a UA ReportRequest and a GA4 RunReportRequest.

case class AnalyticsReportRequest(dateRanges: List[AnalyticsDateRange], metrics: List[String],
                                  dimensions: List[String], goalId: Option[String], pageNumber: Int, pageSize: Int) {
  def toGoogleRequest(viewId: String): ReportRequest = {
    val metricsWithGoals = goalId.fold {
      metrics.map(metric => if (metric.contains("XX")) {
        metric.replaceAll("XX", "") + "All"
      } else metric)
    } {
      goalId =>
        metrics.map(metric =>
          if (metric.startsWith("ga:goal"))
            metric.replace("XX", goalId)
          else metric)
    }
    new ReportRequest()
      .setViewId(viewId)
      .setDateRanges(dateRanges.map(_.dateRange).asJava)
      .setMetrics(metricsWithGoals.map(new com.google.api.services.analyticsreporting.v4.model.Metric().setExpression(_)).asJava)
      .setDimensions(dimensions.map(new com.google.api.services.analyticsreporting.v4.model.Dimension().setName(_)).asJava)
      .setPageSize(pageSize)
      .setPageToken((pageSize * pageNumber).toString)
  }

  def toGA4Requests(propertyId: String, viewId: String): RunReportRequest = {
    val metricsWithConversionEvents = goalId.fold(metrics) { goalId =>
      metrics.map(metric =>
        if (List("sessionConversionRate", "userConversionRate").contains(metric)) s"$metric:$goalId"
        else metric)
    }
    RunReportRequest.newBuilder
      .setProperty(propertyId)
      .addAllDimensions(dimensions.map(com.google.analytics.data.v1beta.Dimension.newBuilder.setName(_).build).asJava)
      .addAllMetrics(metricsWithConversionEvents.map(com.google.analytics.data.v1beta.Metric.newBuilder.setName(_).build).asJava)
      .addAllDateRanges(dateRanges.map(_.ga4DateRange).asJava)
      .setLimit(pageSize)
      .setOffset(pageNumber * pageSize)
      .setDimensionFilter(FilterExpression.newBuilder
        .setFilter(Filter.newBuilder.setFieldName("streamId").setStringFilter(
          StringFilter.newBuilder.setMatchType(StringFilter.MatchType.EXACT).setValue(viewId)))) // we filter datastream to get data for specifiec "view"
      .build
  }
}Code language: Scala (scala)

We dynamically include the Goal ID in all metrics associated with it in Universal Analytics (UA). However, in Google Analytics 4 (GA4), only two default metrics are accessible through conversion events: sessionConversionRate and userConversionRate, so we just add an event name to them.

While it’s not mandatory, you can retain the metrics as they are and consider adding Goal ID dynamically in the future. This flexibility is particularly useful for custom metrics, as specifying them upfront can be challenging.

AnalyticsRequest

AnalyticsRequest encapsulates the necessary information for making analytics data requests. It converted to ReportsRequest in UA and BatchRunReportsRequest in GA4.

case class AnalyticsRequest(token: String, propertyId: String, viewId: String, requests: List[AnalyticsReportRequest]) {
  def toUAGetReportsRequest: GetReportsRequest = {
    new GetReportsRequest()
      .setReportRequests(requests.map(request => request.toGoogleRequest(viewId)).asJava)
  }

  def toGA4BatchRunReportsRequest: BatchRunReportsRequest = {
    BatchRunReportsRequest
      .newBuilder
      .setProperty(propertyId)
      .addAllRequests(requests.map(requests => requests.toGA4Requests(propertyId, viewId)).asJava)
      .build
  }
}Code language: Scala (scala)

AnalyticsDateRange

The AnalyticsDateRange case class represents a date range for analytics reporting, consisting of a startDate and an endDate. This case class provides two instances of date range objects tailored for UA and GA4 reporting.

case class AnalyticsDateRange(startDate: String, endDate: String) {
  val uaDateRange: com.google.api.services.analyticsreporting.v4.model.DateRange =
    new com.google.api.services.analyticsreporting.v4.model.DateRange()
      .setStartDate(startDate).setEndDate(endDate)
  val ga4DateRange: com.google.analytics.data.v1beta.DateRange =
    com.google.analytics.data.v1beta.DateRange
      .newBuilder.setStartDate(startDate).setEndDate(endDate).build
}Code language: Scala (scala)

MetadataType

AnalyticsMetadata is an enum that defines two values: Dimension and Metric. These values represent the possible types of metadata.

object MetadataType extends Enumeration {
  type MetadataType = Value
  val Dimension, Metric = Value
}Code language: Scala (scala)

AnalyticsMetadata

AnalyticsMetadata represents metadata associated with analytics columns. Here, the conversion of GA4 metadata is not defined because it is divided into two distinct instances: DimensionMetadata and MetricMetadata, and we will transform where we retrieve the metadata.

case class AnalyticsMetadata(id: String, metadataType: MetadataType.Value, group: String, name: String)

object AnalyticsMetadata {
  def fromUAColumn(metadata: Column): AnalyticsMetadata = {
    val attributes = metadata.getAttributes

    AnalyticsMetadata(
      metadata.getId,
      if (attributes.get("type") == "DIMENSION") MetadataType.Dimension else MetadataType.Metric,
      attributes.get("group"),
      attributes.get("uiName")
    )
  }
}Code language: Scala (scala)

GA4 Service Implementation

As mentioned previously, our application has an interface that separates the application logic from the Google Analytics API calls.

trait AnalyticsService {
  def getAccounts(token: String): List[AnalyticsAccount]
  def getMetadata(token: String, propertyId: String): List[AnalyticsMetadata]
  def getGoals(token: String, accountId: String, propertyId: String, viewId: String): List[AnalyticsGoal]
  def getReports(request: AnalyticsRequest): List[AnalyticsReport]
}Code language: Scala (scala)

It defines a set of methods that serve as a contract for interacting with analytics data and metadata from different platforms. In this section, we will only discuss the GA4 service implementation. However, the code for implementing the UA service can be found on GitHub.

Authentication

To utilize any of these Google APIs, OAuth 2.0 Client ID credentials are required. Information on how to create and obtain these credentials can be found here.

class AnalyticsServiceGA4 extends AnalyticsService {
  // Replace "<client_secret_file_location>" with the actual path to your client secret JSON file
  private val keyFileLocation = "<client_secret_file_location>"
  private val jsonFactory = GsonFactory.getDefaultInstance
  private val clientSecrets = GoogleClientSecrets.load(jsonFactory, new InputStreamReader(new FileInputStream(keyFileLocation)))

  private def getCredentials(token: String) =
    UserCredentials.newBuilder()
      .setClientId(clientSecrets.getDetails.getClientId)
      .setClientSecret(clientSecrets.getDetails.getClientSecret)
      .setRefreshToken(token)
      .build

  private def getAdminClient(token: String) = {
    val analyticsAdminServiceSettings =
      AnalyticsAdminServiceSettings
        .newBuilder
        .setCredentialsProvider(FixedCredentialsProvider.create(getCredentials(token)))
        .build
    AnalyticsAdminServiceClient.create(analyticsAdminServiceSettings)
  }

  private def getDataClient(token: String) = {
    val betaAnalyticsDataClient = BetaAnalyticsDataSettings
      .newBuilder
      .setCredentialsProvider(FixedCredentialsProvider.create(getCredentials(token)))
      .build()
    BetaAnalyticsDataClient.create(betaAnalyticsDataClient)
  }
}Code language: Scala (scala)

The client secret JSON file’s location is specified with the keyFileLocation variable. While, for simplicity, it is used directly as a variable here, it is apparent that, for best practices, it should be stored in a configuration file. The JSON file is loaded using the GoogleClientSecrets.load method, providing the necessary jsonFactory and an InputStreamReader initialized with the file input stream.

The getCredentials method creates user credentials using the loaded client secret details and the provided token.

The getAdminClient and getDataClient method initializes an AnalyticsAdminServiceClient and BetaAnalyticsDataClient. It uses a credentials provider with the user credentials obtained earlier.

getAccounts

getAccounts retrieves a list of analytics accounts associated with the provided authentication token.

   override def getAccounts(token: String): List[AnalyticsAccount] = {
    val listAccountSummariesRequest = ListAccountSummariesRequest
      .newBuilder
      .build()
    Using(getAdminClient(token))(adminClient =>
      adminClient
        .listAccountSummaries(listAccountSummariesRequest).iterateAll.asScala.toList
        .map(AnalyticsAccount.fromGA4AccountSummary)
        .map(accountSummery =>
          accountSummery.copy(properties = accountSummery.properties.map(property =>
            property.copy(profiles = adminClient.listDataStreams(property.id).iterateAll.asScala.toList
              .map(AnalyticsProfile.fromDataStream)))))).get
  }Code language: Scala (scala)

The Using block is utilized to ensure proper resource management, in this case, the adminClient, since it should be closed to clean up resources such as threads. The result is encapsulated within a Try monad, indicating that it should be handled for potential errors. For simplicity, the .get method is used, but in practice, a more robust error-handling approach should be implemented.

As mentioned in the AnalyticsProfile section, direct retrieval of data streams from properties is not possible, as it was in Universal Analytics. Therefore, after obtaining all accounts, we enrich them by including data about data streams through a nested map and API calls.

While this approach might appear somewhat complicated and involves nested maps with additional object copies, an alternative could be making this API call within the mapper method of AnalyticsProfile. However, introducing the dataClientand making an API class in a method whose primary task is mapping one object to another may lead to overwhelmed method.

getMetadata

This method retrieves metadata, including metrics and dimensions, for a specific property using the Google Analytics 4 Data API.

  override def getMetadata(token: String, propertyId: String): List[AnalyticsMetadata] =
    Using(getDataClient(token))(dataClient => {
      val metadata = dataClient.getMetadata(s"$propertyId/metadata")
      metadata.getMetricsList.asScala
        .map(metrics => AnalyticsMetadata(metrics.getApiName, MetadataType.Metric, metrics.getCategory, metrics.getUiName)).toList ++
        metadata.getDimensionsList.asScala
          .map(dimension => AnalyticsMetadata(dimension.getApiName, MetadataType.Dimension, dimension.getCategory, dimension.getUiName)).toList
    }).getCode language: Scala (scala)

The getMetadata function is called on the dataClient with the provided property ID to retrieve the metadata information. The retrieved metrics and dimensions are then mapped to instances of AnalyticsMetadata and combined into a list. The result is a list of AnalyticsMetadata objects representing the metadata for the given property.

getGoals

This method retrieves conversion events.

  override def getGoals(token: String, accountId: String, propertyId: String, viewId: String): List[AnalyticsGoal] =
    Using(getAdminClient(token))(adminClient =>
      adminClient.listConversionEvents(propertyId).iterateAll.asScala.toList
      .map(conversion => AnalyticsGoal(conversion.getEventName, conversion.getEventName))).getCode language: Scala (scala)

The method uses the listConversionEvents method of the AdminClient, specifically targeting the specified property by its ID.

It is noteworthy that, unlike Universal Analytics, Google Analytics 4 does not require the account ID (accountId) and view ID (viewId) parameters to retrieve conversion events. This is because, in GA4, conversion events are associated directly with properties, making the account and view distinctions unnecessary.

getReports

This method retrieves analytics reports from Google Analytics 4.

  override def getReports(request: AnalyticsRequest): List[AnalyticsReport] =
    AnalyticsReport.fromGA4BatchRunReport(Using(getDataClient(request.token)) { dataClient =>
      dataClient.batchRunReports(request.toGA4BatchRunReportsRequest)
    }.get)Code language: Scala (scala)

The method then invokes the batchRunReports method of the DataClient, passing the GA4-specific batch run report request obtained from the AnalyticsRequest. The result is a BatchRunReportsResponse, which is converted into a list of AnalyticsReport instances using the fromGA4BatchRunReport method.

Usage example

Here is a simple example of how you can use it:

    val googleGA4AnalyticsService = new AnalyticsServiceGA4
    val token = "<access_token>"

    val accounts = googleGA4AnalyticsService.getAccounts(token)
    val metadata = googleGA4AnalyticsService.getMetadata(token, accounts.head.id)
    val goals = googleGA4AnalyticsService.getGoals(token, accounts.head.id,
      accounts.head.properties.head.id,
      accounts.head.properties.head.profiles.head.id)
    val request = AnalyticsRequest(
      token,
      accounts.head.properties.head.id,
      accounts.head.properties.head.profiles.head.id,
      List(
        AnalyticsReportRequest(
          List(AnalyticsDateRange("2023-09-01", "2024-02-01")),
          List("totalRevenue", "userConversionRate", "sessionConversionRate"),
          List("campaignName"),
          Some("purchase"),
          1,
          10
        )
      )
    )
    val report = googleGA4AnalyticsService.getReports(request)Code language: Scala (scala)

To acquire the access token, simply use this URL and specify your client ID along with the redirect URL (previously set in your application on the Google Cloud Platform console). Upon successful authentication, the redirection to the specified URL will include the access_token as a query parameter, which can be used in the example.

https://accounts.google.com/o/oauth2/auth
?client_id=<client_id>
&response_type=token&
&redirect_uri=<redirect_url>
&scope=https://www.googleapis.com/auth/analytics.readonlyCode language: HTML, XML (xml)

In conclusion, the strategic use of traits and models to abstract application logic significantly streamlines the migration process, reducing it to the task of writing mapping methods for models and implementing the interface. The primary complexity lies in mapping instances from Google Analytics 4 to models initially designed for Universal Analytics. While this article mostly focuses on the ins and outs of APIs and their usage, it doesn’t go into enough detail about the differences between GA4 and UA, including variations in data collection methods and other marketing-related aspects. Therefore, a comprehensive examination of these differences is imperative before initiating the migration, as overlooking such nuances may potentially impact the logic of your application.

The full code can be found on GitHub.