Data Scientist (LLM)

Location Singapore
Discipline Information & Communications Technology
Job Reference BBBH135670_1706172406
Salary Negotiable
Consultant Email [email protected]
EA License No. 02C3423


Responsibilities:

  • Design, develop, and oversee the implementation of processes and tools to assess the quality of data used in LLM model training.
  • Create metrics and KPIs to evaluate data accuracy, consistency, and relevance.
  • Work with Engineering teams to develop automated logic checks that will identify inconsistencies and potential issues in the training data.
  • Lead the integration of quality processes into existing data pipelines.
  • Collaborate with Data Scientists to scrutinize annotation data and develop strategies for continuous data quality improvement.
  • Provide feedback loops and ensure alignment of data quality with annotation guidelines.
  • Engage with Machine Learning Engineers to determine how data quality variations influence LLM model performance.
  • Recommend adjustments to data collection, preprocessing, and utilization based on model performance analysis.
  • Keep abreast of the latest trends and advancements in data quality management.
  • Recommend and implement enhancements to our quality processes, tools, and methodologies based on industry best practices.



Requirements:

  • 7+ years of design/test/implementation/consulting experience in data quality management for machine learning model training
  • Understanding of machine learning principles, especially in the context of NLP and LLMs
  • Fundamental knowledge in relevant programming and tools (e.g., Python, SQL)
  • Demonstrated experience in project management and cross-functional collaboration
  • Exceptional analytical, problem-solving, and organizational skills
  • Proven ability to think strategically about business, product, and technical challenges