Job Description
Job Title Databricks Data Lead
Location
- San Rafael, CA - Bay Area, California (Local candidates only)
- Hybrid: 3 days/week onsite at client office in San Rafael, CA
Experience Level
- 8-12 years in data engineering, analytics engineering, or Distributed data systems
Role Overview
We are seeking a Databricks Data Lead to support the design, implementation, and optimization of cloud-native data platforms built on the Databricks Lakehouse Architecture . This is a hands-on, engineering-driven role requiring deep experience with Apache Spark, Delta Lake, and scalable data pipeline development , combined with early-stage architectural responsibilities.
The role involves close onsite collaboration with client stakeholders , translating analytical and operational requirements into robust, high-performance data architectures , while adhering to best practices for data modeling, governance, reliability, and cost efficiency .
Key Responsibilities
- Design, develop, and maintain batch and near-real-time data pipelines using Databricks, PySpark, and Spark SQL
- Implement Medallion (Bronze/Silver/Gold) Lakehouse architectures , ensuring proper data quality, lineage, and transformation logic across layers
- Build and manage Delta Lake tables , including schema evolution, ACID transactions, time travel, and optimized data layouts
- Apply performance optimization techniques such as partitioning strategies, Z-Ordering, caching, broadcast joins, and Spark execution tuning
- Support dimensional and analytical data modeling for downstream consumption by BI tools and analytics applications
- Assist in defining data ingestion patterns (batch, incremental loads, CDC, and streaming where applicable)
- Troubleshoot and resolve pipeline failures, data quality issues, and Spark job performance bottlenecks
- Collaborate onsite with client data engineers, analysts, and business stakeholders to:
- Gather technical requirements
- Review architecture designs
- Validate implementation approaches
- Maintain technical documentation covering data flows, transformation logic, table designs, and architectural decisions
- Contribute to code reviews, CI/CD practices, and version control workflows to ensure maintainable and production-grade solutions
Required Skills & Qualifications
- Strong hands-on experience with Databricks Lakehouse Platform
- Deep working knowledge of Apache Spark internals , including:
- Spark SQL
- DataFrames/Datasets
- Shuffle behavior and execution plans
- Advanced Python (PySpark) and SQL development skills
- Solid understanding of data warehousing concepts , including:
- Star and snowflake schemas
- Fact/dimension modeling
- Analytical vs operational workloads
- Experience working with cloud data platforms on AWS, Azure, or GCP
- Practical experience with Delta Lake , including:
- Merge/upsert patterns
- Schema enforcement and evolution
- Data compaction and optimization
- Proficiency with Git-based version control and collaborative development workflows
- Strong verbal and written communication skills for client-facing technical discussions
- Ability and willingness to work onsite 3 days/week in San Rafael, CA
Nice-to-Have Skills
- Exposure to Databricks Unity Catalog , data governance, and access control models
- Experience with Databricks Workflows , Apache Airflow , or Azure Data Factory for orchestration
- Familiarity with streaming frameworks (Spark Structured Streaming, Kafka) and/or CDC patterns
- Understanding of data quality frameworks , validation checks, and observability concepts
- Experience integrating Databricks with BI tools such as Power BI, Tableau, or Looker
- Awareness of cost optimization strategies in cloud-based data platforms
- Prior Lifesciences Domain Experience
Education
- Bachelor’s degree in Computer Science, Engineering, or related field (or equivalent experience)
Why This Role
- Hands-on ownership of Databricks Lakehouse implementations in a real-world enterprise environment
- Direct client-facing exposure with a leading Bay Area organization
- Opportunity to evolve from senior data engineering into formal data architecture responsibilities
- Strong growth path toward Senior Databricks Architect / Lead Data Platform Engineer
Job Tags
Work at office, Local area, 3 days per week,