Software Engineer III, Data Platform
Job Description
Remote – United StatesReddit is a community of communities. It’s built on shared interests, passion, and trust and is home to the most open and authentic conversations on the internet. Every day, Reddit users submit, vote, and comment on the topics they care most about.
With 100,000+ active communities and approximately 101M+ daily active unique visitors, Reddit is one of the internet’s largest sources of information. For more information, visit [redditinc. com](http://redditinc. com/).
The Data Platform team is looking to hire a Software Engineer who is excited to solve large scale data platform and efficiency challenges. Reddit’s mission is to bring community and belonging to everyone in the world.
Reddit is a community of communities where people can dive into anything through experiences built around their interests, hobbies, and passions. With more than 50 million people visiting 100,000+ communities daily, it is home to the most open and authentic conversations on the internet. From pets to parenting, skincare to stocks, there’s a community for everybody on Reddit.
For more information, visit redditinc. com. Our community of users generates over 100B analytics events per day, each of which is ingested into a data warehouse that sees 55,000+ daily queries.
We utilize this data to enable both batch and streaming based data analytics at the company. Critical teams such as ads, feed generation, and ML experimentation rely on the Data Warehouse to generate revenue for Reddit.
As a software engineer, you will partner with your team and partner teams like machine learning and ads to create and improve scalable, fault tolerant, self-serve systems. You will also also:- Refine and maintain our data infrastructure technologies to support ML and analytics on data collected from hundreds of millions of users. – Own the Data Warehouse Platform used for long term storage of this data and Airflow Platform used to efficiently orchestrate how this data is processed.
– Take part in building opinionated guardrails to drive improvements in data quality, cost efficiency, and data governance. – Build automation software which minimizes toilsome work for data users at Reddit and provides a declarative, self-service experience for working with data. – Monitoring/alerting for our core systems and the mechanisms built on top.
If you have a passion for building and maintaining high quality code, want to improve how Reddit makes strategic decisions at the company level, and are excited about applying engineering best practices to one of the most powerful corpus of data in the world, then this is the team for you!**In your day-to-day, you can expect to:**- Collaborate effectively with a team of proficient software engineers to develop and maintain the fundamental platform that powers the cutting-edge Reddit’s data infrastructure- Engage in the complete data lifecycle at Reddit, participating in the development process and working with one of the world’s most extensive and data-rich datasets. – Design, build and deliver end-to-end data solutions to improve the reliability, scalability, latency and efficiency of Reddit’s Data Platform- Implement automation for key elements of the development process, including data quality, managing alerts and handling critical infrastructure operations.
– Collaborate and Share on-call responsibilities, including incident management**Who you might be:**- 2+ years of software engineering experience in a production setting writing clean, maintainable, and well-tested code- Proficient in object-oriented programming languages like Python, Scala, Go, or Java. – Demonstrated expertise in designing and implementing large-scale systems, diligently monitoring project progress, and showcasing proactive leadership as a self-starter on diverse projects. – Experience working with cloud services, Airflow, Kubernetes, CI/CD, Spark, Flink and/or working with modern cloud-based infrastructure.
EWJD3