Meiro Pipes Integration

Connect Amplitude and BigQuery. Behavioral data plus commercial context, finally joined.

Amplitude captures product behavior. BigQuery has the rest — CRM records, billing tiers, BQML model outputs. Pipes resolves identity across both and closes the loop in both directions.

Talk to a Consultant

Free trial · No credit card · Live in minutes

Two teams. Same broken pipe.

You're in Amplitude. Retention funnels, feature adoption, experiment results — all there. What's missing is commercial context: which of the users who dropped off were on a paid plan? Which power users are up for renewal next quarter?

That data is in BigQuery, joined from CRM and billing tables. Getting it into Amplitude as user properties means a reverse sync — which works until you realize the BigQuery records are keyed on email and Amplitude has those users tracked by device_id from their first anonymous session. The properties sync. They land on partial matches. Your experiment results look clean. Your targeting cohorts aren't.

The Real Problem

Why connecting Amplitude and BigQuery is more complex than it looks

Amplitude's native BigQuery export handles the outbound leg. The gaps are in the return direction and in identity.

BigQuery stores Amplitude event exports with nested STRUCT fields — event properties live inside a repeated record, not as flat columns. Building enrichment models on top requires unpacking those nested structures before joining against CRM or product tables. When enriched records sync back into Amplitude, they need to be flat, correctly typed, and mapped to Amplitude's exact property schema. A FLOAT64 column syncing to an Amplitude integer property fails silently. A nested field that wasn't fully flattened gets dropped. BigQuery's per-byte billing means the change detection query — finding which enriched records changed since the last sync — is a real GCP cost if written without partition pruning.

Identity is the deeper problem. Amplitude builds its internal identity graph within Amplitude: anonymous device sessions merge into authenticated user records when a user logs in. That graph doesn't extend to BigQuery. CRM records are keyed on email. Product database records use account_id. Billing uses customer_id. A reverse ETL connector configured to match on user_id misses every warehouse record carrying a different identifier. Enriched properties land on a subset of the intended users — and Amplitude's UI gives no indication that the sync was incomplete.

Pipes resolves identity across device_id, user_id, email, account_id, and any other identifier before data moves. Nested field flattening and type validation against Amplitude's schema happen in the transform layer before the API call. When the sync runs, the right properties reach the right Amplitude user, and failures surface before ingestion rather than silently inside it.

One platform. Collect, resolve, model, activate.

1

Collect

Pipes connects to Amplitude via its export API and warehouse connector. Events are ingested on a scheduled or near-real-time basis — no replacement of your existing Amplitude SDK or tracking plan required.

2

Load & Model

Events land in your BigQuery warehouse automatically. Pipes connects directly — browse tables, map columns, model data. Your warehouse stays your source of truth.

3

Resolve Identity

Pipes stitches user profiles across Amplitude events and BigQuery records using deterministic matching on email, user_id, device_id, or any identifier you define. Configurable merge limits prevent false matches on shared devices. No probabilistic guesswork.

4

Activate

Enriched profiles and segments flow back into Amplitude via scheduled or real-time sync. Your growth team gets warehouse-enriched cohorts directly in the tool they already use — no reverse ETL vendor required.

Use case: BQML churn scores surfaced as Amplitude user properties

Your data science team builds a churn propensity model using BQML. It combines Amplitude behavioral signals — feature adoption, session frequency, last active date — with commercial data from BigQuery: contract value, support ticket volume, renewal date proximity. The model writes a churn_risk_score per user back to a BigQuery table.

You want product and growth teams to filter Amplitude cohorts by churn_risk_score directly — without a SQL query on every request.

Without Pipes: you write a reverse ETL job that reads the BQML output table and calls Amplitude's Identify API. The model output carries email and account_id. Amplitude tracks users by device_id pre-login and user_id post-login. The join fails for anonymous users and multi-device users. Amplitude silently drops churn scores typed as FLOAT64 when the Amplitude property was created as an integer. The cohort your PM builds on "high churn risk + enterprise plan" misses 30% of the at-risk users.

With Pipes: the BQML output is a warehouse source. Pipes resolves email and account_id to the correct Amplitude user_id via the identity graph. FLOAT64 gets coerced to the right Amplitude property type in the transform layer before the API call. The churn_risk_score lands on the correct profile. Cohorts built on it are complete.

The pain is real

Extracting full value usually requires a dedicated analyst or someone with strong technical skills to manage schemas, plan taxonomies, and validate events.
— Amplitude user review, G2
ETL tools often run into problems with the ever-changing nature of customer behavioral data, making this a sticking point where single source of truth initiatives break down.
— Data engineering community, 2024

Under the hood

Amplitude Connector

Connects to Amplitude via its export API and warehouse connector. Ingests events on a scheduled or near-real-time basis. Supports event filtering and transformation via Pipes sandbox functions. No replacement of your existing Amplitude SDK.

BigQuery Connector

Direct BigQuery connection via service account credentials. Browse datasets, tables, and nested `STRUCT` columns. Map identifier columns to Meiro identity types. Handles nested and repeated field flattening natively — no manual `UNNEST` before sync. Uses partition pruning for change detection to avoid full-table scans and unnecessary GCP query costs.

Identity Resolution

Deterministic stitching across identifier types: email, user_id, device_id, cookie. Configurable merge limits (maxIdentifiers) and priority hierarchy prevent false merges. No probabilistic matching.

Reverse ETL / Profile Sync

Scheduled exports or real-time Live Profile Sync. Push enriched profiles and audience segments back to Amplitude or any downstream destination via custom send functions.

Transform Layer

Sandboxed JavaScript functions for event transformation, filtering, and enrichment. Run inline — no external orchestrator needed.

Self-Hosted Option

Deploy on your own infrastructure for full data sovereignty. Or use Meiro Cloud. Your data never leaves your perimeter unless you want it to.

Live in minutes, not months

1

Connect Amplitude

Add Amplitude as a Source via its export API or warehouse connector. Events start landing in your pipeline.

2

Connect BigQuery

Add your BigQuery credentials. Browse tables, map identifiers, start modeling.

3

Resolve & Activate

Pipes stitches identity across both systems. Push enriched profiles back to Amplitude or anywhere in your stack.

Stop syncing BQML outputs to the wrong Amplitude users.

Connect Amplitude and BigQuery through Pipes. Resolve identity across device, email, and account. Start free.

Talk to a Consultant