Oracle
Site Reliability Engineer – AIOps
Job Description
Solve complex problems related to infrastructure cloud services and build automation to prevent problem recurrence. Design, write, and deploy software to improve the availability, scalability, and efficiency of Oracle products and services. Design and develop designs, architectures, standards, and methods for large-scale distributed systems. Facilitate service capacity planning and demand forecasting, software performance analysis, and system tuning.
Career Level – IC5
This is a unique opportunity to shape a cutting-edge AI Ops offering from the ground up, collaborating closely with cloud architects, data engineers, and SREs to transform how we ensure reliability at scale.
Key Responsibilities
1. AI Ops Development
- Design, build, and deploy AI/ML models to analyze large-scale monitoring, logging, and telemetry data for insights and automation.
- Develop algorithms for anomaly detection, root cause analysis, and predictive maintenance.
- Implement AI-powered automation to streamline incident detection, response, and resolution.
2. Data Analytics and Engineering
- Collaborate with data engineering teams to design data pipelines that aggregate and preprocess monitoring and log data from diverse cloud environments.
- Ensure data quality and integrity for AI/ML processing.
- Build dashboards and visualizations to present AI-driven insights to engineering and operations teams.
3. SRE Collaboration and Integration
- Partner with the SRE team to align AI Ops initiatives with reliability goals.
- Integrate AI-driven tools into observability platforms and incident management workflows.
- Provide recommendations for optimizing cloud resources and improving system resilience.
4. Innovation and Strategy
- Research and implement state-of-the-art AI Ops tools and techniques.
- Drive the strategic direction of AI Ops within the organization, advocating for its adoption and expansion.
- Mentor junior engineers in AI/ML methodologies and operational practices.
Qualifications
Required Skills and Experience
• AI/ML Expertise:
3+ years of experience in machine learning, data science, or AI-driven automation.
Proficiency in Python, TensorFlow, PyTorch, or similar AI/ML frameworks.
• Cloud Operations Knowledge:
In-depth experience with cloud platforms like OCI, AWS, Azure, or GCP.
Hands-on knowledge of cloud monitoring and logging tools (e.g., Prometheus, Grafana, Open Search).
• Data Handling Skills:
Experience processing and analyzing large-scale datasets using tools like Apache Kafka, Apache Spark, or similar.
• SRE Background:
Familiarity with SRE principles, including incident response, SLIs/SLOs, and resilience engineering.
• Automation Focus:
Proven track record in building automation solutions for cloud operations or DevOps processes.
• Preferred Skills
Experience with AI Ops platforms such as Moogsoft, BigPanda, or Dynatrace.
Knowledge of containerization and orchestration technologies (e.g., Docker, Kubernetes).
Background in observability and telemetry standards (e.g., OpenTelemetry).
Education
• Bachelor’s or Master’s degree in Computer Science, Data Science, Engineering, or a related field.
• Relevant certifications (e.g., AWS Certified Machine Learning, GCP Professional Data Engineer) are a plus.
As a world leader in cloud solutions, Oracle uses tomorrow’s technology to tackle today’s problems. True innovation starts with diverse perspectives and various abilities and backgrounds.
When everyone’s voice is heard, we’re inspired to go beyond what’s been done before. It’s why we’re committed to expanding our inclusive workforce that promotes diverse insights and perspectives.
We’ve partnered with industry-leaders in almost every sector—and continue to thrive after 40+ years of change by operating with integrity.
Oracle careers open the door to global opportunities where work-life balance flourishes. We offer a highly competitive suite of employee benefits designed on the principles of parity and consistency. We put our people first with flexible medical, life insurance and retirement options. We also encourage employees to give back to their communities through our volunteer programs.
We’re committed to including people with disabilities at all stages of the employment process. If you require accessibility assistance or accommodation for a disability at any point, let us know by calling +1 888 404 2494, option one.
Disclaimer:
Oracle is an Equal Employment Opportunity Employer*. All qualified applicants will receive consideration for employment without regard to race, color, religion, sex, national origin, sexual orientation, gender identity, disability and protected veterans’ status, or any other characteristic protected by law. Oracle will consider for employment qualified applicants with arrest and conviction records pursuant to applicable law.
* Which includes being a United States Affirmative Action Employer