CUSTOMER DATA INFRASTRUCTURE

Databricks and Iterable, connected properly

Iterable expects userId or email, catalog events, and `dataFields` in a specific shape. Databricks has Delta Lake tables with evolving schemas, Spark ML model outputs with ArrayType and StructType fields, and Unity Catalog permission boundaries that add friction at every integration point. Meiro Pipes resolves the identity gap, translates your Delta Lake schema into Iterable's API format, and keeps ML-enriched profiles flowing to Iterable, without a custom pipeline that breaks every time a data scientist adds a column.

Talk to a Consultant

Free trial · No credit card · Live in minutes

What actually breaks when you connect Databricks to Iterable

Identity is the first obstacle. Iterable uses email as the canonical identifier in most deployment configurations. Databricks stores records by internal user ID, Salesforce account ID, or other upstream-assigned identifiers depending on the data source. When these don't resolve to an Iterable email, syncs silently fail, create duplicate profiles, or associate data with the wrong user. No standard connector resolves this at the Databricks layer.

Iterable has two distinct data models. User profiles update via a flat dataFields dictionary. Events use the Track API: eventName, createdAt, userId or email, and a typed dataFields object. Delta Lake schema evolution can shift column types between runs, which causes Iterable to silently reject events with inconsistent property types. Catalog event types like order.purchased have strict schemas used for revenue attribution; wrong shape means the record is ignored for those purposes.

List management closes out the problem. Iterable is list-first: syncing audiences means computing membership deltas and calling subscribe/unsubscribe APIs separately from profile updates. Getting Iterable behavioral data back into Databricks requires an S3 export pipeline, not native integration. The complete collect-enrich-activate loop needs multiple tools.

Five ways the Databricks → Iterable pipeline breaks

Delta Lake schema evolution

Problem

Data scientists add columns between notebook runs. Delta Lake handles it, your downstream sync doesn't. A churn score field gets renamed, a StructType prediction output gets split, and the Iterable sync that was working last week now sends wrong data or fails silently.

Meiro solves it

Pipes is schema-aware at the transform layer, not at the connector level. When Delta Lake schemas evolve, you update the transform function, not a brittle column mapping. Version-controlled transforms mean schema changes are auditable and deliberate.

Spark ML type mapping

Problem

Spark ML model outputs contain ArrayType, StructType, and MapType fields that have no direct JSON equivalent. Iterable's API requires flat JSON with typed values. Converting Spark ML output types into Iterable-compatible payloads requires transformation logic that lives outside the notebook.

Meiro solves it

Pipes transform functions handle Spark type conversion in the JavaScript sandbox. ArrayType fields become flat arrays. StructType metadata gets traversed and mapped to Iterable dataFields. MapType categorical encodings get resolved. The transform layer bridges the type gap.

Identity mismatch

Problem

Databricks model training uses internal customer_id or model training IDs. Iterable uses email as the primary identifier in most configurations. When churn scores land in Databricks keyed on customer_id, getting them to the right Iterable profile requires cross-system identity resolution.

Meiro solves it

Pipes resolves identity across email, customer_id, user_id, and any other identifier, using deterministic matching with configurable merge limits. ML model scores reach the correct Iterable profile regardless of which key the model training pipeline used.

Unity Catalog permission overhead

Problem

Databricks Unity Catalog adds a permission boundary at every integration point. New service principals need to be provisioned. Table grants need to be configured. Each new sync job or connector requires another round of access management before data flows.

Meiro solves it

Pipes uses a single, auditable service principal connection to Databricks. One permission grant, one place to manage access. Unity Catalog ACLs are respected, Pipes only sees what you grant it. No proliferation of service accounts across sync jobs.

Spark ML scores stuck in Delta tables

Problem

Your data science team builds churn propensity or feature adoption models in Databricks. Outputs land in Delta tables. Getting those scores into Iterable to trigger lifecycle sequences requires a pipeline that doesn't exist out of the box, and breaks when the model output schema changes.

Meiro solves it

Pipes connects directly to the Delta table where Spark ML outputs land. Model scores become Iterable dataFields on user profiles. Qualifying users are subscribed to the correct Iterable list. The lifecycle sequence fires automatically, and the transform adapts to model output schema changes.

One pipeline. Identity-resolved. Schema-aware.

Collect from Iterable

Iterable engagement data (email opens, clicks, conversions, custom events) flows into Pipes via webhook or export. Events land without replacing your existing Iterable setup.

Load & Model in Databricks

Events land in Databricks Delta tables automatically. Pipes connects directly: browse Unity Catalog schemas, map columns, join with Spark ML model outputs or feature tables. Databricks remains your source of truth for ML-enriched user intelligence.

Resolve Identity

Pipes stitches profiles across Iterable userIds, email addresses, Databricks customer_ids, and model training identifiers. Deterministic matching with configurable limits. No duplicate profiles. No dropped records.

Activate Back to Iterable

Enriched profiles push back to Iterable with correctly formatted dataFields, properly shaped catalog events, and list membership changes. Spark ML type conversions handled in the transform layer. Scheduled or real time. No custom ETL.

Use case: Churn propensity model output driving Iterable lifecycle sequences

Your data science team builds a churn propensity model using Spark ML in Databricks. The model runs weekly, and outputs a Delta table with user_id, churn_risk_score (DoubleType), account_tier (StringType), and a feature importance StructType. Users with a churn_risk_score above 0.7 should trigger a targeted retention sequence in Iterable.

The problem: the Delta table output schema changes between model iterations, data scientists add feature columns. Iterable identifies users by email, not user_id. The feature importance StructType needs to be simplified before it can become a dataFields value.

Without Meiro: You'd write a Databricks notebook or job that queries the Delta table using Spark SQL (with ::DOUBLE casts and DATEADD(DAY, -1, CURRENT_DATE()) for change detection), resolves email from user_id via a join, converts StructType fields manually, calls Iterable's user update API in batches, and subscribes qualifying users to the retention list. Every model iteration that changes the output schema breaks the job.

With Meiro Pipes: The Delta table is connected directly via Unity Catalog. A Spark SQL query with DATEADD(DAY, -1, CURRENT_DATE()) fetches recent model outputs efficiently. The Pipes transform handles StructType traversal and type coercion in the JavaScript sandbox. Pipes resolves user_id to Iterable email using the identity graph. Enriched profiles, including churn_risk_score and account_tier, push to Iterable as dataFields. Qualifying users are subscribed to the retention list automatically. When the model output schema evolves, you update the transform function, not the pipeline infrastructure.

From Spark ML model output to live Iterable retention campaign: hours.

Pipes speaks Iterable's schema so your Databricks doesn't have to

Your Databricks Delta table

SELECT
  user_id,
  email,
  churn_risk_score::DOUBLE AS churn_risk_score,
  account_tier,
  last_active_date,
  updated_at
FROM catalog.analytics.user_churn_scores
WHERE updated_at > DATEADD(DAY, -1, CURRENT_DATE())

Pipes transform

// Pipes send function (Event Destination)
async function send(payload, headers) {
  return payload.events.map(row => ({
    email: row.email,
    userId: row.user_id,
    dataFields: {
      churn_risk_score: row.churn_risk_score,
      account_tier: row.account_tier,
      last_active_date: new Date(row.last_active_date)
        .toISOString()
    }
  }));
}

What Iterable receives

{
  "email": "[email protected]",
  "userId": "usr_8472",
  "dataFields": {
    "churn_risk_score": 0.82,
    "account_tier": "enterprise",
    "last_active_date": "2026-03-15T00:00:00.000Z"
  }
}

No raw API construction. Spark ML type conversion handled in the transform layer, not in Databricks notebooks. Pipes handles identity resolution, schema compliance, and delivery, and adapts when your Delta table schema evolves.

Standard stack vs. Meiro Pipes

The standard stack

Custom Databricks job or notebook, query Delta tables, resolve email from `user_id`, batch API calls
Manual Spark ML type conversion: ArrayType, StructType, MapType to flat JSON
No identity resolution, silent failures when `customer_id` and Iterable email diverge
Breaks silently when data scientists add columns (Delta schema evolution)
Unity Catalog permission provisioning required for every new connector or job
Separate list subscribe/unsubscribe logic, not handled by any standard connector
Iterable S3 export → Databricks load for engagement data, multi-step with schema drift

Meiro Pipes

Native connectors for Iterable and Databricks via Unity Catalog
Schema-aware transforms that adapt to Delta Lake schema evolution
Spark ML type mapping (ArrayType, StructType, MapType) handled in the transform sandbox
Deterministic identity matching across `userId`, email, `customer_id`, model training ID
List membership sync as a first-class operation, computes deltas automatically
Bidirectional: Iterable engagement events land in Databricks Delta tables automatically
Single service principal, one permission boundary, not many

A reverse ETL tool syncs rows. It doesn't handle Delta Lake schema evolution gracefully, convert Spark ML output types, or resolve identity across Databricks and Iterable. Meiro Pipes does all of that.

One platform. Two problems solved.

For the Lifecycle Marketer

You want to build Iterable campaigns that trigger based on Spark ML churn scores and feature adoption signals, data your data science team produces in Databricks but that never makes it to Iterable today.

Describe the audience you need; Piper builds it
ML model scores and Delta table attributes appear in Iterable without engineering tickets
Churn risk, upgrade likelihood, feature adoption, all available as Iterable `dataFields`
Build lifecycle sequences on full customer intelligence
Audience sync to Iterable lists happens automatically when model scores update

For the Data Engineer

You're tired of maintaining the Databricks → Iterable pipeline. The `user_id`-to-email resolution. The Spark ML type conversion code. The sync job that breaks silently every time a data scientist adds a column to the model output Delta table.

Connect Databricks and Iterable once, Pipes handles schema translation
Transform functions adapt to Delta schema evolution without pipeline rewrites
Spark ML type mapping (ArrayType, StructType, MapType) in the JavaScript sandbox
Identity resolution across `userId`, email, `customer_id`, model training identifiers
CI/CD-native config management via mpcli

Under the hood

Iterable Event Destination

Native connector. Pushes user profile updates, custom events, and catalog events (order.purchased, cart.abandon, etc.) to Iterable in the correct API format. Handles dataFields serialization, ISO 8601 date formatting, and list subscribe/unsubscribe calls.

Databricks Connector

Direct connection via Unity Catalog. Supports Spark SQL syntax including ::DOUBLE casts, DATEADD(DAY, -1, CURRENT_DATE()), and Delta table references. Browse catalogs, schemas, and tables. Model warehouse data as profile attributes, events, or audience definitions.

Identity Resolution

Deterministic stitching across email, userId, customer_id, phone, and model training identifiers. Configurable maxIdentifiers and merge priority. Resolves the Databricks customer_id → Iterable email gap automatically, even as model training pipelines change.

Transform Sandbox

Sandboxed JavaScript functions for schema translation. Convert Spark ML output types, ArrayType, StructType, MapType, to Iterable-compatible flat JSON. Map fields, coerce types, construct dataFields dictionaries. Adapts to Delta Lake schema evolution without pipeline rewrites. 47 allowlisted packages available.

Reverse ETL / Profile Sync (Meiro Engage)

Scheduled or real-time Live Profile Sync. Push ML-enriched profiles, events, and list membership changes to Iterable. Delta table watermark-based change detection. Full delivery history and retry logic.

List Membership Sync

Model Databricks-derived ML audiences as Iterable list memberships. Pipes computes membership deltas between runs and issues the correct subscribe/unsubscribe API calls. No manual delta logic required.

Why connecting Databricks and Iterable requires more than a connector

Delta Lake schema evolution is the first obstacle. Data scientists add columns, rename fields, and change model output schemas between notebook runs. Delta Lake handles this gracefully. Downstream sync jobs don't. Every schema change silently breaks the pipeline, either sending wrong values to Iterable dataFields or failing on type mismatches. A durable integration needs to be schema-aware at the transform layer, not brittle at the column mapping level.

Spark ML type mapping is the second obstacle. Databricks MLflow and Spark ML model outputs carry Spark-native types: ArrayType for lists of model features, StructType for nested prediction metadata, MapType for categorical encodings. Iterable's API requires flat JSON with typed values. Converting these types requires explicit transformation logic that lives outside the Databricks notebook and outside the warehouse.

Identity is the third obstacle. Databricks stores customer records keyed on internal IDs or model training identifiers. Iterable's identity model is built around email (or userId as a secondary key). Resolving the gap between a Databricks customer_id and an Iterable email requires cross-system identity resolution that no standard connector provides.

Unity Catalog permissions add a fourth layer. Every new integration point requires provisioning a service principal and configuring table grants. Fine-grained access control is a feature, but it creates operational overhead that multiplies when sync jobs proliferate.

List management compounds all of this. Iterable is a list-first platform. Getting Databricks-derived ML audiences into Iterable as list memberships means computing current state, calculating deltas, and issuing subscribe and unsubscribe API calls separately from the profile update API. This is not a feature most reverse ETL tools provide.

Stop debugging the pipeline. Start activating the data.

Connect Databricks and Iterable through Meiro Pipes. Identity-resolved, schema-aware, bidirectional. Start free.

Talk to a Consultant