LLM Engineer (Data Platform)

42dot is an AI technology company specializing in autonomous driving and large-scale AI systems.
Pangyo-ro, Bundang-gu, Seongnam-si, Gyeonggi-do, South Korea
Data
Senior Software Engineer
Hybrid
5+ years of experience
AI

Description For LLM Engineer (Data Platform)

42dot is seeking a Senior LLM Engineer for their Data Platform team to develop systems managing petabyte-scale text, image, and video data for Generative Model training. The role focuses on building efficient data management systems that integrate with ML training pipelines to reliably supply necessary data, thereby improving service quality. The position offers opportunities to work with cutting-edge AI technologies and data engineering tools, while developing technical leadership in LLM training data design and optimization. The ideal candidate will have strong experience in distributed systems, cloud platforms, and data engineering, with the ability to handle large-scale data processing and optimization. The role combines technical expertise in data engineering with AI/ML knowledge, offering a unique opportunity to impact large-scale AI model development.

Last updated 28 minutes ago

Responsibilities For LLM Engineer (Data Platform)

  • Design and develop data collection, processing, storage, and utilization pipeline for petabyte-scale text, image, and video data to improve model performance
  • Generate and manage large-scale synthetic data to contribute to model training quality improvement
  • Define data quality metrics and design/build automated quality verification and monitoring systems
  • Maximize data preprocessing efficiency using industrial standard formats and tools like Parquet, WebDataset, TorchData, TFRecord, datatrove
  • Automate version management and labeling processes for continuously changing datasets
  • Develop storage and transmission technologies considering data integrity and security, comply with regulations and internal security policies

Requirements For LLM Engineer (Data Platform)

Python
  • 5+ years of experience in software/data engineering
  • Experience with large-scale distributed processing environments (Spark, Hadoop)
  • Development experience in cloud environments (AWS, GCP, Azure)
  • Ability to use cloud-based storage and distributed processing platforms (S3, EMR, DataProc)
  • Experience in optimizing large datasets through compression, indexing, and sharding
  • Strong software engineering capabilities with high proficiency in programming languages including Python and C++
  • Understanding of model training, preprocessing, optimization processes and collaboration skills

Interested in this job?

Jobs Related To 42dot LLM Engineer (Data Platform)

Senior Software Engineer - Trading Data Fabric

Senior Software Engineer position at Belvedere Trading, focusing on building and managing data and research platforms for high-volume trading operations using cloud technologies.

Senior Data Engineer

Senior Data Engineer position at Titan Wealth's Cape Town Tech Hub, focusing on Azure data solutions with hybrid work options and comprehensive benefits.

Senior Data Engineer - Integrations Services

Senior Data Engineer position at StackAdapt, building scalable data pipelines and integrations for a leading programmatic advertising platform.

Senior Python Data Engineer (Finance Technology)

Senior Python Data Engineer position at Crypto.com, focusing on building financial data pipelines and transformation systems using Python, Airflow, and SQL.

Sr Data Platform Engineer

Senior Data Platform Engineer position at Apollo, working remotely on large-scale data infrastructure and analytics projects with competitive compensation and benefits.