Job Description
Key Responsibilities:
- Databricks Platform Expertise:
- Develop, manage, and optimize data pipelines on the Databricks platform.
- Debug and troubleshoot Spark applications to ensure reliability and performance.
- Implement best practices for Spark compute and optimize workloads.
- Python Development:
- Write clean, efficient, and reusable Python code using object-oriented programming principles.
- Design and build APIs to support data integration and application needs.
- Develop scripts and tools to automate data processing and workflows.
- MongoDB Management:
- Integrate, query, and manage data within MongoDB.
- Ensure efficient storage and retrieval processes tailored to application requirements.
- Optimize MongoDB performance for large-scale data handling.
- Collaboration and Problem Solving:
- Work closely with data scientists, analysts, and other stakeholders to understand data needs and deliver solutions.
- Proactively identify and address technical challenges related to data processing and system design.
Required Qualifications:
- Proven experience working with Databricks and Spark compute.
- Proficient in Python, including object-oriented programming and API development.
- Familiarity with MongoDB, including querying, data modeling, and optimization.
- Strong problem-solving skills and ability to debug and optimize data processing tasks.
- Experience with large-scale data processing and distributed systems.
Preferred Qualifications:
- Knowledge of other big data technologies like Delta Lake, Hadoop, or Kafka.
- Experience with cloud platforms (e.g., AWS, Azure, or GCP).
- Familiarity with CI/CD pipelines and version control systems like Git.
- Strong understanding of data architecture, ETL processes, and data warehousing concepts.