Fulcrum Digital Inc.
System Reliability Engineer (Big Data)
Job Description
Who are we
Fulcrum Digital is an agile and next-generation digital accelerating company providing digital transformation and technology services right from ideation to implementation. These services have applicability across a variety of industries, including banking & financial services, insurance, retail, higher education, food, health care, and manufacturing.
The
Role
- Plan, manage, and oversee all aspects of a
Production Environment for Big Data Platforms. - Define strategies for Application Performance
Monitoring, Optimization in Prod environment - Respond to Incidents and improvise platform
based on feedback and measure the reduction of incidents over time. - Ensures that batch production scheduling and
process are accurate and timely. - Able to create and execute queries to big data
platform and relational data tables to identify process issues or to
perform mass updates, preferred. - Performs ad hoc requests from users such as
data research, file manipulation/transfer, research of process issues,
etc. - Take a holistic approach to problem solving,
by connecting the dots during a production event through the various
technology stack that makes up the platform, to optimize meantime to recover. - Engage in and improve the whole lifecycle of
services—from inception and design, through deployment, operation and
refinement. - Analyze ITSM activities of the platform and
provide feedback loop to development teams on operational gaps or
resiliency concerns. - Support services before they go live through
activities such as system design consulting, capacity planning and launch
reviews. - Support the application CI/CD pipeline for
promoting software into higher environments through validation and operational
gating, and lead in DevOps automation and best practices. - Maintain services once they are live by
measuring and monitoring availability, latency and overall system health. - Scale systems sustainably through mechanisms
like automation and evolving systems by pushing for changes that improve
reliability and velocity. - Work with a global team spread across tech
hubs in multiple geographies and time zones. - Ability to share
knowledge and explain processes and procedures to others.
Requirements
- Experience in Linux and Knowledge on ITSM/ITIL.
- Experience in the Big Data technologies (Hadoop, Spark, Nifi,
Impala) - 4+ years of Experience in running Big
Data production systems. - Good to have experience in industry
standard CI/CD tools like Git/BitBucket, Jenkins, Maven, - Solid grasp of SQL or Oracle
fundamentals - Experience with scripting, pipeline
management, and software design. - Systematic problem-solving approach,
coupled with strong communication skills and a sense of ownership and
drive.Ability to help debug and optimize
code and automate routine tasks. - Ability to support many different
stakeholders. Experience in dealing with difficult situations and making
decisions with a sense of urgency is needed. - Appetite for change and pushing the
boundaries of what can be done with automation.Experience in working across
development, operations, and product teams to prioritize needs and to build
relationships are a must. - Experience designing and implementing an effective and efficient
CI/CD flow that gets code from dev to prod with high quality and minimal manual
effort is desired. - Good Handle on Change Management and Release Management aspects
of Software