Fulcrum Digital Inc.

System Reliability Engineer (Big Data)

21 March 2025
Apply Now
Deadline date:
£65000 / year

Job Description

Who are we

Fulcrum Digital is an agile and next-generation digital accelerating company providing digital transformation and technology services right from ideation to implementation. These services have applicability across a variety of industries, including banking & financial services, insurance, retail, higher education, food, health care, and manufacturing.

The
Role

  • Plan, manage, and oversee all aspects of a
    Production Environment for Big Data Platforms.
  • Define strategies for Application Performance
    Monitoring, Optimization in Prod environment
  • Respond to Incidents and improvise platform
    based on feedback and measure the reduction of incidents over time.
  • Ensures that batch production scheduling and
    process are accurate and timely.
  • Able to create and execute queries to big data
    platform and relational data tables to identify process issues or to
    perform mass updates, preferred.
  • Performs ad hoc requests from users such as
    data research, file manipulation/transfer, research of process issues,
    etc.
  • Take a holistic approach to problem solving,
    by connecting the dots during a production event through the various
    technology stack that makes up the platform, to optimize meantime to recover.
  • Engage in and improve the whole lifecycle of
    services—from inception and design, through deployment, operation and
    refinement.
  • Analyze ITSM activities of the platform and
    provide feedback loop to development teams on operational gaps or
    resiliency concerns.
  • Support services before they go live through
    activities such as system design consulting, capacity planning and launch
    reviews.
  • Support the application CI/CD pipeline for
    promoting software into higher environments through validation and operational
    gating, and lead  in DevOps automation and best practices.
  • Maintain services once they are live by
    measuring and monitoring availability, latency and overall system health.
  • Scale systems sustainably through mechanisms
    like automation and evolving systems by pushing for changes that improve
    reliability and velocity.
  • Work with a global team spread across tech
    hubs in multiple geographies and time zones.
  • Ability to share
    knowledge and explain processes and procedures to others.

Requirements

  • Experience in Linux and Knowledge on ITSM/ITIL.
  • Experience in the Big Data technologies (Hadoop, Spark, Nifi,
    Impala)
  • 4+ years of Experience in running Big
    Data production systems.
  • Good to have experience in industry
    standard CI/CD tools like Git/BitBucket, Jenkins, Maven,
  • Solid grasp of SQL or Oracle
    fundamentals
  • Experience with scripting, pipeline
    management, and software design.
  • Systematic problem-solving approach,
    coupled with strong communication skills and a sense of ownership and
    drive.Ability to help debug and optimize
    code and automate routine tasks.
  • Ability to support many different
    stakeholders. Experience in dealing with difficult situations and making
    decisions with a sense of urgency is needed.
  • Appetite for change and pushing the
    boundaries of what can be done with automation.Experience in working across
    development, operations, and product teams to prioritize needs and to build
    relationships are a must.
  • Experience designing and implementing an effective and efficient
    CI/CD flow that gets code from dev to prod with high quality and minimal manual
    effort is desired.
  • Good Handle on Change Management and Release Management aspects
    of Software