Responsibilities:
- Design, develop, and maintain scalable data processing solutions using Python and PySpark.
- Collaborate with data engineers, data scientists, and other cross-functional teams to understand data requirements and implement effective solutions.
- Develop and optimize Spark jobs for efficient data processing and analysis.
- Implement data pipelines for ETL (Extract, Transform, Load) processes using PySpark.
- Work with big data technologies and frameworks, handling large-scale data sets efficiently.
- Troubleshoot and optimize existing PySpark applications for performance and reliability.
- Collaborate with database administrators and data architects to ensure data consistency and integrity.
- Stay updated with the latest developments in PySpark and related technologies to recommend improvements and best practices.
Requirements:
- Bachelor's degree in Computer Science, Engineering, or a related field.
- Proven experience as a Python developer with a focus on PySpark.
- Strong understanding of distributed computing concepts and frameworks.
- Hands-on experience with big data technologies such as Apache Spark.
- Proficient in SQL and experience with relational databases.
- Familiarity with data modeling and design principles.
- Solid understanding of data processing and ETL concepts.
- Experience with version control systems, such as Git.
- Strong problem-solving skills and attention to detail.
- Excellent communication and collaboration abilities.
- Ability to work independently and in a team environment.
Preferred:
- Experience with cloud platforms, such as AWS or Azure.
- Knowledge of data warehousing concepts.
- Familiarity with machine learning frameworks and libraries.
This job has now closed
You can find more jobs over on our careers page.
See More Jobs