Available for remote work

AI Linguistic Evaluator & Quality Specialist

I assess AI-generated responses with a linguist's eye — identifying biases, verifying factual accuracy, and delivering structured feedback that helps language models improve.

View my work →
LLM Evaluation Linguistic QA Data Annotation Bias Detection Italian · English

Language meets machine intelligence

With a degree in Languages and a Master's in Anglophone Literature, I bring deep linguistic expertise to the evaluation of AI systems. I have worked across multiple platforms assessing model outputs, annotating training data, and ensuring quality at every step of the pipeline.

Linguistic background

Laurea in Lingue, Master's thesis on Anglophone Literature. Native Italian, fluent English. Experienced in translation quality assessment and localization QA.

AI evaluation expertise

3+ years evaluating LLM responses across platforms including Outlier AI, Appen, Centific, and WeLo. Specialized in bias detection, factual accuracy, and structured feedback.

QA & testing

Bug reporting and web/app testing via TestIO and BetaTesting. Detailed, reproducible reports. Comfortable with developer tools, device testing, and structured QA workflows.

Currently learning

Full-stack development (HTML, CSS, JavaScript, SQL) via Mimo. Building technical skills to complement linguistic expertise for senior evaluation roles.

What I bring to every evaluation

A combination of linguistic precision and methodological rigour.

🔍

LLM Response Evaluation

Accuracy, relevance, safety, tone, and completeness assessment

⚖️

Bias Detection

Authority, fluency, length, and personal experience biases

✍️

Written Feedback

Clear, timestamped, actionable feedback for model improvement

🌐

Linguistic QA

Italian localization review, translation quality assessment

🐛

Bug Reporting

Structured, reproducible reports for web and mobile apps

🎙️

Audio & Transcription

TTS evaluation, audio tagging, and Italian transcription

Work samples

Concrete examples of my evaluation methodology and output quality.

Evaluation Framework

AI Evaluator Assessment Rubric

A structured scoring rubric for assessing AI evaluator competencies. Built from first principles, it covers five observable dimensions — each with a 1–4 scale and concrete anchor behaviours.

Rubric Design Bias Awareness Metacognition Score Calibration
# Criterion What it measures
1Bias IdentificationNames ≥2 biases + countermeasure
2Guideline AdherenceCites criterion + justifies score
3Feedback QualityTimestamps + issue + solution
4Score CalibrationConsistency + justified differences
5MetacognitionSelf-awareness of personal biases

ScoreLabelDescription (Criterion 1 example)
4 Exemplary Names ≥2 biases, explains impact, provides ≥1 countermeasure with concrete example
3 Advanced Names ≥2 biases, explains impact, but countermeasure is inadequate or absent
2 Beginner Names only 1 bias or does not explain its impact on the evaluation
1 Inadequate Does not name specific biases, vague or absent response

View full rubric on GitHub →

Feedback Sample

Written Feedback Examples

Examples of structured evaluator feedback on AI-generated responses. Each piece of feedback cites specific text with timestamps, identifies the issue, and proposes a concrete improvement — following the methodology outlined in the rubric above.

Written Feedback Timestamped Citations Actionable Italian · English

Platforms & roles

3+ years across AI training, linguistic evaluation, and quality assurance.

2023–present

AI Response Evaluator — Outlier AI, WeLo Data

LLM response assessment, prompt evaluation, written feedback for model training

2022–present

Data Annotator & Linguistic Reviewer — Appen, Centific

Italian text annotation, translation quality assessment, localization QA

2022–present

QA Tester — TestIO, BetaTesting

Web and mobile app testing, bug reporting, exploratory and structured test cycles

2021–2023

Audio Specialist — CrowdGen, Firstsource

TTS evaluation, Italian voice recording, audio tagging and transcription

2020–2022

Cultural Mediator — PNRR Projects (×2)

Intercultural mediation, community linguistic support in Sardinia

Let's work together

I'm available for AI evaluation contracts, linguistic QA projects, and data annotation roles — remote, full-time or part-time.