Python Developer with Pyspark at N Consulting Ltd

View All Jobs

Download File

Job Title: Python Developer with PySpark

Location: Northompton

Job Type: Contract

About the Role:
We are seeking a skilled Python Developer with expertise in PySpark to join our dynamic team. The ideal candidate will have strong experience in building and optimizing large-scale data processing pipelines and a deep understanding of distributed data systems. You will play a key role in designing and implementing data solutions that drive critical business decisions.

Key Responsibilities:

Develop, optimize, and maintain large-scale data pipelines using PySpark and Python.
Collaborate with data engineers, analysts, and stakeholders to gather requirements and implement data solutions.
Perform ETL (Extract, Transform, Load) processes on large datasets and ensure efficient data workflows.
Analyze and debug data processing issues to ensure accuracy and reliability of pipelines.
Work with distributed computing frameworks to handle large datasets efficiently.
Develop reusable components, libraries, and frameworks for data processing.
Optimize PySpark jobs for performance and scalability.
Integrate data pipelines with cloud platforms like AWS, Azure, or Google Cloud (if applicable).
Monitor and troubleshoot production data pipelines to minimize downtime and data issues.

Key Skills and Qualifications:

Technical Skills:

Strong programming skills in Python with hands-on experience in PySpark.
Experience with distributed data processing frameworks (e.g., Spark).
Proficiency in SQL for querying and transforming data.
Understanding of data partitioning, serialization formats (Parquet, ORC, Avro), and data compression techniques.
Familiarity with Big Data technologies such as Hadoop, Hive, and Kafka (optional but preferred).

Cloud Platforms (Preferred):

Hands-on experience with AWS services like S3, EMR, Glue, or Redshift.
Knowledge of Azure Data Lake, Databricks, or Google BigQuery is a plus.

Additional Tools and Frameworks:

Familiarity with CI/CD pipelines and version control tools (Git, Jenkins).
Experience with orchestration tools like Apache Airflow or Luigi.
Understanding of containerization and orchestration tools like Docker and Kubernetes (preferred).

Experience:

Bachelor’s or Master’s degree in Computer Science, Data Engineering, or a related field.
5+ years of experience in Python programming.
4+ years of hands-on experience with PySpark.
Experience with Big Data ecosystems and tools.

This job has now closed

You can find more jobs over on our careers page.

See More Jobs