Redwood Materials is looking for motivated and talented data engineers to help model and manage data assets in a data lake architecture, overseeing the full data lifecycle from ingestion to processing to consumption. The ideal candidate is someone experienced with both data engineering and creating/managing/supporting AWS infrastructure. This is an opportunity to join during a critical growth phase and build green field software experiences and capabilities which will have a significant impact on the company's day-to-day operations and ability to scale.
Responsibilities include:
- Build and manage a data lake in AWS leveraging and augmenting existing LakeFormation based architecture.
- Build and maintain data pipelines from various data sources, including streaming datasets, APIs, and various data stores, leveraging PySpark and AWS Glue.
- Create data sets from the data lake to support various use cases, such as business analytics, dashboards, reports and machine learning.
- Drive technical decisions on the best ways to serve data consumers.
- Leverage existing AWS architectures and design new ones where needed, using the CDK tool kit.
- Operationalize data workloads in AWS, automating pipelines and implementing appropriate monitoring.
- Work with cross-functional teams to discover business needs and design appropriate data flows.
Desired qualifications:
- Bachelor's degree in computer science, similar technical field of study, or equivalent practical experience.
- Minimum 3 years of hands-on experience developing data solutions in a modern cloud environment.
- Fluency in Python.
- Experience authoring and maintaining ETL jobs (PySpark experience a plus).
- Experience designing and interacting with relational and non-relational data stores.
- Experience with AWS ecosystem and resources and using Infrastructure-as-code methodologies (CDK a plus).
- Demonstrated ability to manage production data workloads.
The position is full-time. Compensation will be commensurate with experience.