Job Description

Key Responsibilities:

  • Databricks Platform Expertise:
  • Develop, manage, and optimize data pipelines on the Databricks platform.
  • Debug and troubleshoot Spark applications to ensure reliability and performance.
  • Implement best practices for Spark compute and optimize workloads.
  • Python Development:
  • Write clean, efficient, and reusable Python code using object-oriented programming principles.
  • Design and build APIs to support data integration and application needs.
  • Develop scripts and tools to automate data processing and workflows.
  • MongoDB Management:
  • Integrate, query, and manage data within MongoDB.
  • Ensure efficient storage and retrieval processes tailored to application requirements.
  • Optimize MongoDB performance for large-scale data handling.
  • Collaboration and Problem Solving:
  • Work closely with data scientists, analysts, and other stakeholders to understand data needs and deliver solutions.
  • Proactively identify and address technical challenges related to data processing and system design.

Required Qualifications:

  • Proven experience working with Databricks and Spark compute.
  • Proficient in Python, including object-oriented programming and API development.
  • Familiarity with MongoDB, including querying, data modeling, and optimization.
  • Strong problem-solving skills and ability to debug and optimize data processing tasks.
  • Experience with large-scale data processing and distributed systems.

Preferred Qualifications:

  • Knowledge of other big data technologies like Delta Lake, Hadoop, or Kafka.
  • Experience with cloud platforms (e.g., AWS, Azure, or GCP).
  • Familiarity with CI/CD pipelines and version control systems like Git.
  • Strong understanding of data architecture, ETL processes, and data warehousing concepts.