Data Scientist Data Operations

Responsibilities:

Design, develop, and oversee the implementation of processes and tools to assess the quality of data used in LLM model training.
Create metrics and KPIs to evaluate data accuracy, consistency, and relevance.
Work with Engineering teams to develop automated logic checks that will identify inconsistencies and potential issues in the training data.
Lead the integration of quality processes into existing data pipelines.
Collaborate with Data Scientists to scrutinize annotation data and develop strategies for continuous data quality improvement.
Provide feedback loops and ensure alignment of data quality with annotation guidelines.
Engage with Machine Learning Engineers to determine how data quality variations influence LLM model performance.
Recommend adjustments to data collection, preprocessing, and utilization based on model performance analysis.
Keep abreast of the latest trends and advancements in data quality management.
Recommend and implement enhancements to our quality processes, tools, and methodologies based on industry best practices.

Requirements:

7+ years of design/test/implementation/consulting experience in data quality management for machine learning model training
Understanding of machine learning principles, especially in the context of NLP and LLMs
Fundamental knowledge in relevant programming and tools (e.g., Python, SQL)
Demonstrated experience in project management and cross-functional collaboration
Exceptional analytical, problem-solving, and organizational skills
Proven ability to think strategically about business, product, and technical challenges

Location	Singapore
Discipline	Information & Communications Technology
Job Reference	BBBH135476_1705892800
Salary	Negotiable

Consultant Email	[email protected]
EA License No.	02C3423