CUSTOMER DATA INFRASTRUCTURE
Customer.io's identify and track calls look simple. But Databricks has Delta Lake tables with evolving schemas, Spark ML upgrade likelihood scores with StructType outputs, and Unity Catalog permission boundaries at every integration point. Meiro Pipes resolves the identity gap, adapts to Delta Lake schema evolution in the transform layer, and keeps ML-enriched profiles flowing to Customer.io — without a custom pipeline that breaks every time a data scientist updates a model.
Free trial · No credit card · Live in minutes
Identity is the first structural problem. Customer.io identifies users by a customer id you define, with email as optional. Databricks stores records keyed on internal IDs, Salesforce contact IDs, or other upstream-assigned identifiers. When these don't map to Customer.io's customer id, identify calls create duplicates or miss the intended user — anonymous-to-identified lifecycle merges fail at whichever stage the identifier breaks.
The identify versus track classification is the second problem. Persistent attributes (plan tier, feature flags) belong in identify calls; behavioral events (feature activations, milestones) belong in track calls. Getting this wrong affects segmentation, triggers, and billing. Databricks tables arrive without that label — and Delta Lake schema evolution can shift column types between runs, causing silent failures when Customer.io receives an unexpected property type. B2B teams add another layer: Customer.io Objects require a separate API endpoint, a different schema, and manual object-to-person relationship maintenance.
Customer.io's warehouse export targets Redshift and BigQuery natively — not Databricks. Getting engagement data into Databricks requires S3 exports or a third-party connector. The reverse direction requires direct API integration. Neither is configuration; both are infrastructure work.
Problem
Data scientists update model schemas between notebook runs — new columns, renamed fields, changed types. Delta Lake handles it. The downstream Customer.io sync doesn't. Changed upgrade score columns mean wrong identify calls or silent failures.
Meiro solves it
Pipes is schema-aware at the transform layer. When Delta Lake schemas evolve, you update the transform function — not the pipeline infrastructure. Version-controlled transforms mean schema changes are deliberate and auditable, not silent breaking changes.
Problem
Customer.io uses identify for persistent attributes and track for behavioral events. Getting this wrong affects segmentation, triggers, and pricing. Databricks data doesn't arrive pre-classified — the identify/track split is a modeling decision that has to be made explicitly.
Meiro solves it
Pipes lets you model your Databricks data before it reaches Customer.io. Decide what becomes a persistent attribute versus a behavioral event at the infrastructure layer — visible, version-controlled, and changeable without touching Customer.io.
Problem
Spark ML upgrade likelihood models produce DoubleType scores, StructType prediction metadata, and ArrayType feature vectors. Customer.io's API requires flat attribute objects and property dictionaries. Converting Spark ML output types requires explicit transformation logic outside the notebook.
Meiro solves it
Pipes transform functions handle Spark type conversion in the JavaScript sandbox. DoubleType scores become float attributes. StructType metadata gets traversed and mapped to Customer.io traits. ArrayType feature vectors get summarized or selectively extracted. The transform layer bridges the type gap.
Problem
Databricks model training uses internal customer_id or numeric user IDs. Customer.io expects a customer id and optionally email. When these diverge, identify calls create duplicate profiles or miss the right user. Anonymous-to-known merges fail silently.
Meiro solves it
Pipes resolves identity across every identifier type — email, user_id, anonymous ID, Stripe customer ID, CRM contact ID — using deterministic matching. One unified Customer.io profile, regardless of which identifier Databricks model training used.
Problem
Your data science team builds upgrade likelihood models in Databricks. Outputs land in Delta tables. Getting those scores into Customer.io to trigger upgrade campaigns requires a pipeline that doesn't exist out of the box — and breaks when the model output schema changes.
Meiro solves it
Pipes connects directly to the Delta table where model outputs land. Upgrade likelihood scores become Customer.io identify attributes. Users who cross the upgrade threshold receive a track event that triggers the upgrade campaign. When the model schema evolves, you update the transform, not the pipeline.
Customer.io engagement data — email opens, clicks, conversions, campaign events — flows into Pipes via webhook or export. Events land without replacing your existing Customer.io setup.
Events land in Databricks Delta tables automatically. Pipes connects via Unity Catalog — browse schemas, map columns, join with Spark ML model outputs or feature store tables. Databricks stays your source of truth for ML-enriched user intelligence.
Pipes stitches profiles across Customer.io customer ids, email addresses, Databricks customer_ids, and model training identifiers. Deterministic matching with configurable limits. Full lifecycle coverage from anonymous to paid.
Enriched profiles push back to Customer.io via correctly structured identify calls and track events. Spark ML type conversions handled in the transform layer. Delta schema evolution absorbed at the transform layer. Scheduled or real time.
Your data science team builds an upgrade likelihood model using Spark ML in Databricks. The model scores SaaS users on their probability of converting from free to paid, producing a Delta table with customer_id, upgrade_likelihood_score (DoubleType), account_tier, and a StructType feature_summary. Users who score above 0.65 should receive a targeted upgrade campaign in Customer.io.
The problem: the Delta table schema changed last week — the data science team added a confidence_interval field and renamed upgrade_score to upgrade_likelihood_score. Customer.io identifies users by customer id, not the internal customer_id the model uses. The StructType feature_summary needs to be unpacked before it can become a Customer.io attribute.
Without Meiro: You'd write a Databricks job that queries the Delta table using Spark SQL (::DOUBLE casts and DATEADD(DAY, -1, CURRENT_DATE()) for change detection), resolves Customer.io customer id from internal customer_id, converts StructType fields manually, classifies high-scoring users as identify calls (persistent attribute update) versus track calls (milestone event), and pushes via the Customer.io API. Every model schema change requires a pipeline rewrite.
With Meiro Pipes: The Delta table is connected via Unity Catalog. A Spark SQL query with DATEADD(DAY, -1, CURRENT_DATE()) fetches recent model outputs. The Pipes transform handles StructType traversal and type coercion in the JavaScript sandbox — the renamed field gets mapped to the correct Customer.io attribute without a pipeline rewrite. Pipes resolves internal customer_id to Customer.io customer id using the identity graph. Upgrade likelihood scores push as identify attributes. Users above the 0.65 threshold receive a track event that fires the upgrade campaign flow in Customer.io.
Time from Spark ML model output to triggered Customer.io upgrade campaign: hours, not sprints.
Your Databricks Delta table
SELECT
user_id,
email,
upgrade_likelihood_score::DOUBLE AS upgrade_score,
account_tier,
last_active_date
FROM catalog.ml_outputs.upgrade_scores
WHERE updated_at > DATEADD(DAY, -1, CURRENT_DATE()) Pipes transform
// Pipes send function (Event Destination)
async function send(payload, headers) {
return payload.events.map(row => ({
type: 'identify',
userId: row.user_id,
traits: {
email: row.email,
churn_risk_score: row.churn_risk_score,
account_tier: row.account_tier,
last_active_date: row.last_active_date
}
}));
} What Customer.io receives
{
"type": "identify",
"userId": "usr_8472",
"traits": {
"email": "[email protected]",
"churn_risk_score": 0.82,
"account_tier": "enterprise",
"last_active_date": "2026-03-15"
}
} No custom API client code. Spark ML type conversion handled in the transform layer — not in Databricks notebooks. When the Delta table schema evolves, you update the transform function, not the pipeline infrastructure.
The standard stack
Meiro Pipes
A reverse ETL tool syncs rows. It doesn't handle Delta Lake schema evolution gracefully, convert Spark ML output types, or resolve lifecycle identity. Meiro Pipes does all of that — and the pipeline that remains is one your team can actually understand.
You want to trigger Customer.io upgrade campaigns, churn prevention flows, and retention sequences based on ML scores your data science team produces in Databricks — signals that exist today but never make it to Customer.io.
You're tired of maintaining the Databricks → Customer.io pipeline. The customer id resolution. The Spark ML type conversion code. The sync job that breaks silently every time a data scientist updates the model output schema.
Native connector. Sends identify calls (user attributes) and track calls (behavioral events) to Customer.io in the correct API format. Handles timestamp formatting, property serialization, and B2B Object API calls with relationship mapping.
Direct connection via Unity Catalog. Supports Spark SQL syntax including ::DOUBLE casts, DATEADD(DAY, -1, CURRENT_DATE()), and Delta table references. Browse catalogs, schemas, and tables. Model warehouse data as identify attributes, track events, or B2B Object records.
Deterministic stitching across Customer.io customer id, email, user_id, anonymous ID, Stripe ID, and CRM IDs. Full lifecycle coverage from anonymous visitor through paid customer. Configurable merge limits to prevent false merges.
Sandboxed JavaScript functions for schema translation. Handle Spark ML type conversions — DoubleType, StructType, ArrayType — to Customer.io-compatible flat JSON. Classify data as identify or track calls. Adapts to Delta Lake schema evolution without pipeline rewrites. 47 allowlisted packages available.
Scheduled or real-time Live Profile Sync. Delta table watermark-based change detection. Push ML-enriched profiles and events to Customer.io via identify and track calls. Full delivery history and retry logic.
Model Databricks company and account records as Customer.io Objects. Pipes handles the Object API endpoint, schema differences, and person-to-object relationship maintenance — so B2B teams can sync account context alongside person records from Delta tables.
Delta Lake schema evolution is the first structural problem. Data science teams iterate on models between deployments. Delta Lake handles schema changes automatically. Downstream sync pipelines don't. A renamed field or a new confidence interval column silently breaks the Customer.io identify call that was working last week. A durable integration needs to be schema-aware at the transform layer.
The identify versus track decision is the second structural problem. Persistent user attributes — upgrade likelihood score, account tier, feature adoption flags — belong in identify calls. Behavioral occurrences — milestone completions, API calls, feature activations — belong in track calls. Getting this classification wrong affects segmentation, trigger logic, and billing. Databricks data arrives as rows in Delta tables. The identify/track classification is a modeling decision that has to be made explicitly and maintained when the underlying data model changes.
Spark ML type mapping adds a third layer. Databricks MLflow and Spark ML model outputs carry Spark-native types — DoubleType scores, StructType prediction metadata, ArrayType feature vectors — that Customer.io's API cannot consume directly. Converting these types requires explicit transformation logic that lives outside the Databricks notebook.
Identity reconciliation is the fourth gap. Databricks stores customer records using whatever identifier the model training pipeline used. Customer.io identifies users by a customer id you define, with email as an optional secondary identifier. When these don't reconcile, identify calls create duplicate profiles or miss the intended user.
Connect Databricks and Customer.io through Meiro Pipes. Identity-resolved. Schema-aware. Bidirectional. Start free.