Know ATS Score

CV/Résumé Score

Expertini Resume Scoring: Our Semantic Matching Algorithm evaluates your CV/Résumé before you apply for this job role: Site Reliability Engineer II (Apache Spark, Python, AWS/GCP).

⌚ Quick Insight
📝 Highlight Skills
🎯 Apply with confidence

Bengaluru Jobs | Expertini

Urgent! Site Reliability Engineer II (Apache Spark, Python, AWS/GCP) - Local Job Opening in Bengaluru

Site Reliability Engineer II (Apache Spark, Python, AWS/GCP)

Expertini
India
Bengaluru
Jobs
Site Reliability Engineer Ii (Apache Spark, Python, Aws/gcp)

🚀 Start Your Application Here

Job Description

Work Location Options:

Hybrid

You Lead the Way.

Weve Got Your Back.

With the right backing, people and businesses have the power to progress in incredible ways.

When you join Team Amex, you become part of a global and diverse community of colleagues with an unwavering commitment to back our customers, communities and each other.

Here, youll learn and grow as we help you create a career journey thats unique and meaningful to you with benefits, programs, and flexibility that support you personally and professionally.

At American Express, youll be recognized for your contributions, leadership, and impactevery colleague has the opportunity to share in the companys success.

Together, well win as a team, striving to uphold our and powerful backing promise to provide the worlds best customer experience every day.

And well do it with the utmost integrity, and in an environment where everyone is seen, heard and feels like they belong.

Join Team Amex and let's lead the way together.

How will you make an impact in this role?

We are seeking an experienced Site Reliability Engineer to join our Big Data infrastructure team.

This role focuses on ensuring the reliability, scalability, and performance of our Apache Spark-based data processing systems and broader big data ecosystem.

The ideal candidate will have 5+ years of hands-on experience with distributed systems, data platforms, and SRE practices.

Key Responsibilities:

Infrastructure Management & Reliability

Design, implement, and maintain highly available Apache Spark clusters and big data infrastructure across cloud and on-premises environments

Monitor and optimize performance of distributed data processing workloads, ensuring SLA compliance and minimal downtime

Implement comprehensive monitoring, alerting, and observability solutions for big data pipelines and infrastructure components

Lead incident response and post-mortem analysis for data platform outages, implementing preventive measures to avoid recurrence

Automation & Operations

Develop and maintain Infrastructure as Code (IaC) solutions using tools like Terraform, Ansible, or CloudFormation for big data infrastructure provisioning

Build automated deployment pipelines and CI/CD workflows for Spark applications and data platform components

Create and maintain runbooks, operational procedures, and disaster recovery plans for critical data systems

Implement capacity planning and auto-scaling solutions to handle varying data processing workloads efficiently

Platform Engineering & Optimization

Collaborate with data engineering teams to optimize Spark job configurations, cluster sizing, and resource allocation

Design and implement data platform governance, security, and compliance measures

Evaluate and integrate new big data technologies and tools to improve platform capabilities and performance

Establish best practices for code deployment, configuration management, and system maintenance

Required Skills and Experience:

Technical Expertise

5+ years of experience in Site Reliability Engineering, DevOps, or similar roles with focus on distributed systems

Deep hands-on experience with Apache Spark (Scala, Python/PySpark) and Spark cluster management (YARN, Kubernetes, or standalone)

Proficiency with big data ecosystem technologies including Hadoop, HDFS, Hive, Kafka, Airflow, and data lakes/warehouses

Strong experience with cloud platforms (AWS, GCP, or Azure) and their big data services (EMR, Dataproc, HDInsight, etc.)

Advanced knowledge of containerization technologies (Docker, Kubernetes) and orchestration in data processing contexts

Infrastructure & Monitoring

Experience with infrastructure monitoring and observability tools (Prometheus, Grafana, ELK stack, Datadog, or similar)

Proficiency in Infrastructure as Code tools (Terraform, CloudFormation, Ansible) for managing big data infrastructure

Strong Linux/Unix system administration skills and experience with configuration management tools

Knowledge of networking, security, and performance tuning in distributed computing environments

Programming & Automation

Proficient in at least one programming language (Python, Scala, Java, or Go) for automation and tooling development

Experience with CI/CD pipelines and version control systems (Git, Jenkins, GitLab CI, or similar)

Strong scripting skills (Bash, Python) for automation and operational tasks

Understanding of software engineering best practices including testing, code review, and documentation

Preferred Qualifications

Experience with stream processing frameworks (Kafka Streams, Apache Flink, or Spark Streaming)

Knowledge of data governance, data quality, and data lineage tools

Familiarity with machine learning operations (MLOps) and model deployment at scale

Experience with database technologies (SQL, NoSQL) and data warehouse solutions

Relevant certifications in cloud platforms or big data technologies

We back you with benefits that support your holistic well-being so you can be and deliver your best.

This means caring for you and your loved ones' physical, financial, and mental health, as well as providing the flexibility you need to thrive personally and professionally:

Competitive base salaries

Bonus incentives

Support for financial-well-being and retirement

Comprehensive medical, dental, vision, life insurance, and disability benefits (depending on location)

Flexible working model with hybrid, onsite or virtual arrangements depending on role and business need

Generous paid parental leave policies (depending on your location)

Free access to global on-site wellness centers staffed with nurses and doctors (depending on location)

Free and confidential counseling support through our Healthy Minds program

Career development and training opportunities

American Express is an equal opportunity employer and makes employment decisions without regard to race, color, religion, sex, sexual orientation, gender identity, national origin, veteran status, disability status, age, or any other status protected by law.

Offer of employment with American Express is conditioned upon the successful completion of a background verification check, subject to applicable laws and regulations.

Other Jobs You May Be Interested In

Site Reliability Engineer II

Phoenix, Arizona, United States

Site Reliability Engineer III