CUSTOMER DATA INFRASTRUCTURE

Databricks and Braze, connected properly

Spark SQL complex types don't map to Braze JSON. Delta Lake schema evolution breaks downstream tools between pipeline runs. Meiro Pipes handles Delta Lake schema translation, resolves identity, and keeps profiles enriched in both directions. No Hightouch, no Census, no custom Spark job.

Talk to a Consultant

Free trial · No credit card · Live in minutes

What Delta Lake schema evolution does to your Databricks → Braze plan

Identity is the first problem. Databricks stores records keyed on whatever upstream systems assigned: internal user IDs, Salesforce account IDs, emails. Braze expects an external_id. When these don't align, syncs drop records or create duplicate profiles. No standard Databricks connector reconciles cross-system identity.

Braze's event model is strict. Every custom event requires a name, ISO 8601 timestamp, and a typed JSON properties object under 100 KB. One event per row, no reserved key names. CDI requires a PAYLOAD column with a handcrafted JSON string. That means writing change-detection logic against Delta Lake's change data feed, handling insert/update/delete cases separately, and rebuilding the payload on every source schema change. Delta Lake's schema evolution is useful for analytics; it doesn't help you maintain a Braze payload template.

Every attribute sync costs a Braze data point; events count against your contract. Teams overspend because the attribute-versus-event decision happens in SQL rather than at the data model layer. Braze CDI is one-directional. Closing the enrichment loop from Braze behavioral data back through Databricks requires a separate reverse ETL vendor or more custom plumbing.

Five ways the Databricks → Braze pipeline breaks

Spark SQL type mapping

Problem

Databricks ArrayType, StructType, and MapType columns are first-class in your Delta tables. Braze CDI can't handle them. Every complex Spark type has to be mapped to flat JSON before it can sync, and that mapping breaks every time a data scientist updates the feature table schema.

Meiro solves it

Pipes transform functions handle Spark complex type translation in JavaScript: unpack StructType fields, map ArrayType elements, flatten MapType entries into Braze-compatible attribute shapes. When the Delta table schema evolves, you update the transform once, not every downstream query.

Delta Lake schema evolution

Problem

Delta tables support schema evolution as a feature. For Braze CDI, it's a liability. A column added or renamed between pipeline runs breaks the CDI sync silently: change detection stops working, payloads stop matching, and the pipeline silently stops updating.

Meiro solves it

Pipes detects schema changes at the connector level and surfaces them before they cause silent failures. Transforms are version-controlled and explicit about what they consume. Schema drift in the Delta table triggers a review before it causes a production failure.

Identity mismatch

Problem

Databricks has internal user IDs, email addresses, Salesforce IDs from upstream CRM data. Braze has external_id. No standard CDI or pipeline tool reconciles them. Duplicate profiles, dropped records, broken segments.

Meiro solves it

Pipes resolves identity across email, user_id, device_id, phone, and CRM IDs using deterministic matching with configurable merge limits. One unified profile regardless of which system the identifier came from.

Unity Catalog permission complexity

Problem

Databricks Unity Catalog requires precise permissions at metastore, catalog, schema, and table level for every integration. Granting CDI or reverse ETL access to the right Delta tables means navigating the full hierarchy, and repeating that work for every new dataset or destination.

Meiro solves it

Pipes maintains one managed connection to Databricks with scoped Unity Catalog permissions. Add datasets, adjust access, rotate credentials, all in one place. No per-sync permission configuration scattered across CDI, Hightouch, and custom pipelines.

No enrichment loop

Problem

Braze CDI pulls Delta table data in. It doesn't push Braze behavioral events back to Databricks for Spark ML or MLflow retraining, and it can't close the loop (Braze events → Delta table → MLflow model → scored profiles → Braze) without a separate reverse ETL tool.

Meiro solves it

Pipes collects from both directions. Braze behavioral events flow into Databricks. MLflow model outputs enrich profiles. Enriched profiles flow back to Braze via scheduled or real-time sync. One platform, bidirectional, identity-resolved.

One pipeline. Identity-resolved. Schema-aware.

Collect from Braze

Braze engagement data (opens, clicks, conversions, custom events) flows into Pipes via Currents or webhook. Events land without replacing your Braze SDK.

Load & Model in Databricks

Events land in Databricks Delta tables automatically. Pipes connects directly: browse Unity Catalog, map columns, join with Spark ML feature tables or any Delta source. Databricks stays your source of truth.

Resolve Identity

Pipes stitches profiles across Braze external_ids, Databricks user_ids, CRM emails, device IDs, any identifier. Deterministic matching with configurable limits. No duplicate profiles, no dropped records.

Activate Back to Braze

Enriched profiles push back to Braze in the exact schema Braze expects: Spark complex types translated, Delta schema evolution handled, attributes as JSON payloads, events properly formatted. Scheduled or real-time.

Use case: Churn prevention powered by Spark ML and MLflow

Your data science team builds a churn propensity model in Databricks using Spark ML and MLflow. It combines product usage data (from Braze events landed in Delta tables) with commercial data (contract value, support ticket volume, NPS scores) stored as feature tables in Unity Catalog.

The MLflow model writes predictions back to a Delta table: a churn_risk_score for every customer, alongside Spark StructType metadata from the prediction run.

Without Meiro: writing a Databricks job that flattens the StructType prediction metadata, formats the score as a JSON payload in Braze CDI's exact shape, navigates Unity Catalog permissions to give CDI access, sets up the sync, then rebuilds everything when the MLflow model output schema changes between experiment runs. Or paying for Hightouch.

With Meiro Pipes: the churn_risk_score is modeled as an attribute in Meiro. The transform function handles the StructType metadata, extracts the score, and maps it to Braze attribute names. Pipes resolves identity between the Databricks user_id and the Braze external_id. The enriched profile pushes to Braze as a custom attribute in the correct format. Your lifecycle team builds a Canvas that triggers a retention campaign for anyone with churn_risk > 0.7.

From MLflow model output to live Braze campaign: hours.

Pipes speaks Braze's schema so your Delta Lake doesn't have to

Your Databricks Delta table

SELECT
  user_id,
  email,
  churn_score::DOUBLE,
  last_purchase_date,
  account_tier,
  updated_at
FROM analytics.customer_scores
WHERE updated_at > DATEADD(DAY, -1, CURRENT_DATE())

Pipes transform

// Pipes send function (Event Destination)
async function send(payload, headers) {
  return payload.events.map(row => ({
    external_id: row.user_id,
    attributes: {
      churn_risk_score: row.churn_score,
      account_tier: row.account_tier,
      last_purchase_date: new Date(row.last_purchase_date)
        .toISOString()
    }
  }));
}

What Braze receives

{
  "external_id": "usr_8472",
  "attributes": {
    "churn_risk_score": 0.82,
    "account_tier": "enterprise",
    "last_purchase_date":
      "2026-03-15T00:00:00.000Z"
  }
}

No manual StructType flattening. No `PAYLOAD` column construction. No Unity Catalog permission debugging. Pipes handles Delta Lake schema translation, Spark type mapping, and delivery, and surfaces schema drift before it causes silent failures.

Standard stack vs. Meiro Pipes

The standard stack

Braze CDI: requires `PAYLOAD` column, cannot handle Spark ArrayType or StructType
Manual complex type flattening SQL for every Delta table, breaks on schema evolution
No identity resolution, silent failures on type mismatches
Hightouch or Census: another vendor, another contract
Another Unity Catalog permission boundary to navigate
Delta table schema evolution breaks CDI syncs silently between pipeline runs
Custom Databricks jobs for StructType unpacking and JSON payload construction
Breaks every time MLflow model output schema changes
No enrichment loop from Braze behavioral events back through Spark ML

Meiro Pipes

Native connectors for Braze and Databricks with Unity Catalog support
JavaScript transforms that handle Spark complex types automatically
Deterministic identity matching across all identifiers
Schema drift detection: surfaces Delta table schema changes before they cause failures
Configurable merge limits, no duplicate profiles
Correct JSON format, correct types, every sync
Scheduled or real-time bidirectional sync

Braze CDI is a data pipe. Hightouch is a sync tool. Neither handles Spark type mapping, Delta schema evolution, or identity resolution. Meiro Pipes does all three.

One platform. Two problems solved.

For the Lifecycle Marketer

You want to build a Braze Canvas that targets high-value customers at risk of churning, using Spark ML model outputs and feature table data from Databricks you can't currently access.

Describe the audience you need; Piper builds it
Warehouse-enriched attributes appear in Braze without engineering tickets
Churn scores, LTV, account tier as Braze custom attributes
Build Canvases on full customer context, beyond Braze engagement data
Optimize attributes vs. events to control data point spend

For the Data Engineer

You're tired of maintaining the Databricks → Braze pipeline. The StructType flattening SQL. The Unity Catalog permissions sprawl. The CDI config that breaks when a data scientist adds a new column to the MLflow output table.

Connect Databricks and Braze once. Pipes handles Spark type translation and Delta schema evolution
JavaScript transforms replace raw SQL payload construction and complex type unpacking
Identity resolution across `external_id`, email, `user_id`, CRM ID
Schema drift detection: catch Delta table changes before they break downstream
Bidirectional sync. Events from Braze land in Databricks Delta tables automatically

Under the hood

Braze Event Destination

Native connector. Pushes attributes, events, and purchases to Braze in the exact /users/track API format. Handles JSON serialization, ISO 8601 date formatting, and property type validation.

Databricks Connector

Direct Delta Lake connection with Unity Catalog support. Browse catalogs, schemas, and Delta tables including complex Spark types. Map identifier columns to Meiro identity types. Spark SQL type coercion and schema drift detection between pipeline runs.

Identity Resolution

Deterministic stitching across email, external_id, user_id, device_id, phone, any identifier. Configurable maxIdentifiers and priority to prevent false merges. Cross-system, not per-tool.

Transform Sandbox

Sandboxed JavaScript functions for schema translation. Handle Databricks StructType, ArrayType, and MapType columns. Flatten Delta table complex types into Braze-compatible payloads. No raw Spark SQL. 47 allowlisted packages available.

Reverse ETL / Profile Sync (Meiro Engage)

Scheduled or real-time Live Profile Sync. Push enriched profiles and segments to Braze or any destination. On-demand exports for backfills. Full delivery history and retry.

Data Point Optimization

Model data before it reaches Braze. Decide at the infrastructure layer what becomes an attribute (costs data points), event (costs events), or event property (free).

Why connecting Databricks and Braze requires more than a connector

Spark's complex types are the first wall. ArrayType, StructType, and MapType are natural in Delta tables built by data science teams. MLflow model output tables frequently include StructType prediction metadata. Feature tables built for Spark ML use nested types throughout. Braze CDI cannot ingest any of this directly. Every complex column requires explicit type mapping before it reaches Braze, and that mapping becomes a maintenance liability the moment any upstream schema changes (which in Databricks happens constantly).

Delta Lake's schema evolution is a feature that becomes a liability at the integration boundary. Delta supports adding columns, changing types, and renaming fields between pipeline runs by design. For Braze CDI, a schema change between runs breaks the sync silently. Payloads stop matching expected fields. Change detection queries return unexpected results. The pipeline silently stops updating until a campaign breaks and someone asks why.

Unity Catalog adds permission complexity at every integration boundary. Granting any external tool access to Delta tables requires navigating the full hierarchy (metastore, catalog, schema, table) with grants at each level. As teams add datasets and destinations, this permission overhead compounds.

Identity is the foundational problem. Databricks stores records with whatever identifiers data engineering assigned: internal user IDs, email addresses, Salesforce account IDs from upstream CRM data. Braze identifies users by external_id. The gap between these systems is where records get dropped or duplicated.

Stop debugging the pipeline. Start activating the data.

Connect Databricks and Braze through Meiro Pipes. Delta Lake schema-aware, Spark types translated, identity-resolved, bidirectional. Start free.

Talk to a Consultant