CUSTOMER DATA INFRASTRUCTURE

BigQuery and Customer.io, connected properly

Customer.io's identify and track calls look simple. But BigQuery has customer IDs that don't match Customer.io's id or email, nested STRUCT fields that need flattening before they can become attributes, and per-byte billing that punishes full-table scans. Meiro Pipes resolves the identity gap, transforms your warehouse data into Customer.io's schema in the transform layer (not in expensive SQL), and keeps profiles enriched in both directions, without the custom pipeline you'd otherwise have to build.

Talk to a Consultant

Free trial · No credit card · Live in minutes

Customer.io is simple to start. Connecting it to BigQuery is not.

Identity is the first structural problem. Customer.io identifies users by a customer id you define, with email as optional. BigQuery stores records keyed on Firebase installation IDs, internal user IDs, or Stripe customer IDs depending on the data source. When these don't map to Customer.io's customer id, identify calls create duplicates or miss the intended user, anonymous-to-identified lifecycle merges fail at whichever stage the identifier breaks.

BigQuery introduces two additional failure points. Its STRUCT and ARRAY types must be flattened before they can map to Customer.io's flat attribute model, and that flattening is a maintenance liability on every schema change. BigQuery also bills per byte scanned, so naive change-detection queries against large tables are a GCP cost problem as well as an engineering one. The identify versus track classification, which determines whether data becomes a persistent attribute or a behavioral trigger, still must be made explicitly regardless of the warehouse. B2B teams add a further layer: Customer.io Objects require a separate API endpoint, a different schema, and manual object-to-person relationship maintenance.

Customer.io's warehouse export covers Redshift and BigQuery natively, but the reverse, BigQuery to Customer.io, still requires direct API integration. For teams that need both directions, the full loop is infrastructure work, not configuration.

Five ways the BigQuery → Customer.io pipeline breaks

Identity mismatch

Problem

BigQuery has Stripe IDs, internal user IDs, email addresses, keyed differently depending on the upstream system. Customer.io expects a customer id and optionally email. When these diverge, identify calls create duplicate profiles or miss the right user. Anonymous-to-known merges fail silently.

Meiro solves it

Pipes resolves identity across email, user_id, anonymous ID, Stripe customer ID, CRM contact ID using deterministic matching. One unified profile, regardless of which identifier Customer.io sees at any given touchpoint.

Nested STRUCT fields and billing

Problem

BigQuery stores product event properties in nested STRUCTs and repeated ARRAYs. Customer.io requires flat attribute objects. Flattening in BigQuery SQL means unnesting at query time, more bytes scanned, higher costs on every sync run.

Meiro solves it

Pipes transform functions receive BigQuery rows and flatten nested fields in the JavaScript sandbox. Your BigQuery query stays simple: DATE_SUB(CURRENT_DATE(), INTERVAL 1 DAY), CAST for type coercion, backtick-quoted table names. The transform layer handles the rest.

Identify vs. track classification

Problem

Customer.io uses identify for persistent attributes and track for behavioral events. Getting this wrong affects segmentation, triggers, and pricing. BigQuery data arrives as rows in tables, the identify/track split is a modeling decision that has to be made explicitly.

Meiro solves it

Pipes lets you model your BigQuery data before it reaches Customer.io. Decide what becomes a persistent attribute (identify call) versus a behavioral event (track call) at the infrastructure layer, visible, version-controlled, and changeable without touching Customer.io.

No native BigQuery reverse sync

Problem

Customer.io has a Data Warehouse destination but no native source connector for BigQuery. Getting BigQuery data into Customer.io requires direct API calls. Getting Customer.io engagement data back into BigQuery requires workarounds, and the loop is incomplete without both directions.

Meiro solves it

Pipes handles both directions natively. Customer.io engagement events flow into BigQuery. BigQuery data enriches profiles. Enriched profiles push back to Customer.io via identify and track calls. One platform, bidirectional, no workarounds.

Lifecycle identity gaps

Problem

SaaS users move from anonymous visitor to trial to paid customer, accumulating different identifiers at each stage. BigQuery may carry all of them in separate tables. Reconciling the full identity graph and keeping Customer.io synchronized across the entire lifecycle requires infrastructure above any single API call.

Meiro solves it

Pipes builds a cross-system identity graph that spans anonymous IDs, trial user IDs, paid customer IDs, and email, and keeps Customer.io profiles unified as users transition through lifecycle stages. No duplicate profiles. No dropped attributes.

One pipeline. Identity-resolved. Schema-aware.

Collect from Customer.io

Customer.io engagement data (email opens, clicks, conversions, campaign events) flows into Pipes via webhook or export. Events land without replacing your existing Customer.io setup.

Load & Model in BigQuery

Events land in BigQuery automatically. Pipes connects directly: browse datasets, map columns, join with product usage data, billing records, or any warehouse source. BigQuery stays your source of truth.

Resolve Identity

Pipes stitches profiles across Customer.io customer ids, email addresses, BigQuery user_ids, anonymous IDs, and Stripe or CRM identifiers. Deterministic matching with configurable limits. Full lifecycle coverage from anonymous to paid.

Activate Back to Customer.io

Enriched profiles push back to Customer.io via correctly structured identify calls and track events. Nested BigQuery fields flattened in the transform layer. Scheduled or real time. No custom API client. No batch job to maintain.

Use case: Onboarding sequence triggered by product activation milestones from BigQuery

Your SaaS product tracks every product activation milestone in BigQuery, when a user completes onboarding steps, connects integrations, or invites teammates. Those events land in BigQuery event tables with nested property STRUCTs. You want Customer.io to trigger a specific onboarding sequence when a user completes each milestone.

The problem: milestone events are in BigQuery, not in Customer.io. The user who hit the milestone may be identified by an internal user_id that doesn't match the customer id Customer.io uses. The event properties are nested in a STRUCT field.

Without Meiro: You'd write a BigQuery job using CAST and DATE_SUB(CURRENT_DATE(), INTERVAL 1 DAY) to fetch recent events, unnest STRUCT fields (scanning more bytes), resolve the Customer.io customer id, and call the track API for each event. You'd maintain that job across schema changes, handle retries, and debug silent failures when identifiers don't match.

With Meiro Pipes: Milestone events from BigQuery are modeled as Customer.io track calls. The Pipes transform flattens nested STRUCT fields in the JavaScript sandbox, no expensive unnesting in BigQuery SQL. Pipes resolves internal user_id to Customer.io customer id using the identity graph. Milestone events push to Customer.io automatically with the correct event name, timestamp, and properties. Your lifecycle team triggers onboarding branches from those events without waiting for engineering.

From product milestone to triggered onboarding email: minutes.

Pipes speaks Customer.io's schema so your BigQuery doesn't have to

Your BigQuery table

SELECT
  user_id,
  email,
  event_name,
  occurred_at,
  CAST(plan_tier AS STRING) AS plan_tier,
  feature_key,
  is_paid_customer,
  company_id
FROM `project.analytics.product_events`
WHERE occurred_at > DATE_SUB(CURRENT_DATE(), INTERVAL 1 DAY)

Pipes transform

// Pipes send function (Event Destination)
async function send(payload, headers) {
  return payload.events.map(row => ({
    type: 'identify',
    userId: row.user_id,
    traits: {
      email: row.email,
      churn_risk_score: row.churn_risk_score,
      account_tier: row.account_tier,
      last_active_date: row.last_active_date
    }
  }));
}

What Customer.io receives

{
  "type": "identify",
  "userId": "usr_8472",
  "traits": {
    "email": "[email protected]",
    "churn_risk_score": 0.82,
    "account_tier": "enterprise",
    "last_active_date": "2026-03-15"
  }
}

No custom API client code. No STRUCT unnesting in BigQuery SQL. Pipes handles nested field flattening in the transform layer, identity resolution, schema compliance, and delivery, and adapts when your BigQuery schema changes.

Standard stack vs. Meiro Pipes

The standard stack

Custom API job, query BigQuery with backtick table names, resolve customer id, batch identify/track calls
Unnesting nested STRUCTs in BigQuery SQL, drives up bytes scanned on every sync
No identity resolution, silent failures when `user_id` and Customer.io id diverge
Manual classify-as-attribute-vs-event logic with no visibility or version control
Customer.io data warehouse destination: no native BigQuery source connector
Lifecycle identity gaps, anonymous → trial → paid transitions break silently
Breaks on every BigQuery schema change or new event type

Meiro Pipes

Native connectors for Customer.io and BigQuery
Nested field flattening in the transform layer, no expensive BigQuery SQL unnesting
Deterministic identity matching across customer id, email, `user_id`, anonymous ID, CRM ID
Full lifecycle identity coverage, anonymous to paid, no gaps
Model-layer control over identify vs. track classification
Bidirectional: Customer.io engagement events land in BigQuery automatically
Correct API format, correct event schema, every sync

A reverse ETL tool syncs rows. It doesn't resolve lifecycle identity, classify attributes versus events, or flatten nested BigQuery fields without blowing up your bytes-scanned bill. Meiro Pipes does all of that.

One platform. Two problems solved.

For the Lifecycle Marketer

You want to trigger Customer.io campaigns based on real product behavior, feature adoption, activation milestones, upgrade signals, data that your data team has in BigQuery but you can't access from Customer.io today.

Describe the trigger you need; Piper builds it
Product usage attributes and activation events appear in Customer.io without engineering tickets
Plan tier, feature adoption, activation milestones, all available for Customer.io segmentation
Build onboarding and lifecycle sequences on complete customer context
B2B teams: account-level data syncs to Customer.io Objects automatically

For the Data Engineer

You're tired of maintaining the BigQuery → Customer.io pipeline. The customer id resolution logic. The nested STRUCT unnesting that costs bytes. The identify/track classification code that lives in a script nobody documents.

Connect BigQuery and Customer.io once, Pipes handles schema translation
Nested field flattening in the JavaScript sandbox, not in expensive BigQuery SQL
Identity resolution across customer id, email, `user_id`, anonymous ID, CRM ID
Bidirectional sync, Customer.io engagement events land in BigQuery automatically
CI/CD-native config management via mpcli

Under the hood

Customer.io Event Destination

Native connector. Sends identify calls (user attributes) and track calls (behavioral events) to Customer.io in the correct API format. Handles timestamp formatting, property serialization, and B2B Object API calls with relationship mapping.

BigQuery Connector

Direct warehouse connection supporting backtick-quoted table references, DATE_SUB, TIMESTAMP_DIFF, and CAST. Browse datasets, map identifier columns to Meiro identity types. Model warehouse data as identify attributes, track events, or B2B Object records.

Identity Resolution

Deterministic stitching across Customer.io customer id, email, user_id, anonymous ID, Stripe ID, and CRM IDs. Full lifecycle coverage from anonymous visitor through paid customer. Configurable merge limits to prevent false merges.

Transform Sandbox

Sandboxed JavaScript functions for schema translation. Flatten nested BigQuery STRUCT and ARRAY fields without expensive SQL unnesting. Classify data as identify or track calls. Map fields, coerce types, format timestamps. 47 allowlisted packages available.

Reverse ETL / Profile Sync (Meiro Engage)

Scheduled or real-time Live Profile Sync. Partition-aware change detection to minimize BigQuery bytes scanned. Push enriched profiles and events to Customer.io via identify and track calls. Full delivery history and retry logic.

B2B Object Sync

Model BigQuery company and account records as Customer.io Objects. Pipes handles the Object API endpoint, schema differences, and person-to-object relationship maintenance, so B2B teams can sync account context alongside person records.

Why connecting BigQuery and Customer.io requires more than a connector

Identity is the first structural problem. Customer.io identifies users by a customer id you define, with email as an optional secondary identifier. Snowflake has customer records keyed on internal IDs, Stripe customer IDs, CRM contact IDs, or email depending on the data source. When these don't reconcile with Customer.io's customer id, identify calls create duplicate profiles or miss the intended user. No standard reverse ETL connector resolves this cross-system identity problem.

BigQuery's nested schema model adds a second structural layer. Product event data frequently arrives with nested STRUCT and ARRAY fields, properties that BigQuery stores efficiently but Customer.io cannot consume directly. Flattening those nested fields in BigQuery SQL means unnesting at query time, which increases bytes scanned and drives up per-query costs. The right approach is to flatten in the transform layer, keeping the warehouse query simple and cost-efficient.

The identify versus track decision is the third structural problem. Persistent user attributes belong in identify calls. Behavioral occurrences belong in track calls. Getting this classification wrong affects segmentation, trigger logic, and billing. BigQuery data arrives as rows in tables. The identify/track classification is a modeling decision that has to be made explicitly and maintained when the underlying data model changes.

The enrichment loop is the fourth gap. Customer.io's data warehouse destination doesn't include a native BigQuery source connector. Getting Customer.io engagement data into BigQuery requires workarounds. And the reverse, BigQuery to Customer.io, requires direct API integration that no native feature provides.

Stop debugging the pipeline. Start activating the data.

Connect BigQuery and Customer.io through Meiro Pipes. Identity-resolved, schema-aware, bidirectional. Start free.

Talk to a Consultant