Senior+ Data Scientist - ML & Image Generation

Hapiko

Software Engineering, Data Science

Brooklyn, NY, USA

USD 150k-250k / year + Equity

Posted on Jan 28, 2026

Apply now

Hapiko Senior+ Data Scientist - ML & Image Generation Brooklyn, NY · Full time Company website

Apply for Senior+ Data Scientist - ML & Image Generation

Build and optimize the ML pipeline behind Stickerbox, an AI-powered voice-to-sticker printer for kids.

About Hapiko

Hapiko is a Brooklyn-based company building the future of play.

Description

Founded by Arun Gupta (former CEO of Grailed, which sold to GOAT Group in 2022) and Bob Whitney (Anthropic, NYT Games), we're on a mission to create safe, hands-on AI experiences that fuel kids' imaginations rather than replace them.

Our first product, Stickerbox, is the world's first voice-to-sticker printer. A device that instantly transforms a child's spoken ideas into printable, colorable stickers. We sold out our first run shipping for the holidays, and it's already being called "one of the first products to make AI feel magical for kids and grounded for parents."

We have a $7M funding round led by Maveron (backers of Lovevery), Serena Ventures, and Ai2 (The Allen Institute). Stickerbox is bringing imagination to life for kids nationwide!

Why are we hiring?

The technical challenge is real. We're running real-time audio transcription, proprietary content safety systems, and custom image generation, all serving thousands of concurrent users with sub-second latency. We're training our own models from scratch, optimizing for kid-friendly aesthetics, and building safety guardrails that actually work. We need a Data Scientist to own data quality, evaluation, and ML optimization across this entire pipeline. You'll work with the team to define what to train on, how to measure success, and how to make our models better every day.

What you'll do

As our first Data Science hire, you'll collaborate with us on:

Model Training & Data

- Build and curate large-scale image datasets for training custom models

- Design annotation pipelines and data quality processes

- Analyze training runs and model outputs to guide iteration

- Work with our team to define what to train on and how to evaluate it

ML Pipeline Optimization

- Optimize our transcription pipeline for accuracy and latency

- Improve image generation quality, prompt adherence, and consistency

- Identify bottlenecks and failure modes across the pipeline

- Run experiments and A/B tests to measure improvements

Safety & Content Moderation

- Refine content safety systems for child-appropriate outputs, and develop new ones

- Build on our evaluation datasets for safety edge cases

- Analyze moderation performance and reduce false positives/negatives

- Stay current on best practices for AI safety in generative systems

Evaluation & Metrics

- Build evaluation frameworks to measure model performance at scale

- Define metrics that correlate with user satisfaction (aesthetic quality, relevance, safety)

- Develop automated evaluation pipelines (LLM-as-judge, CLIP scores, human eval)

- Track experiments and communicate findings to the team

Prompt Engineering

- Optimize prompts for transcription accuracy and image generation quality

- Develop systematic approaches to prompt testing and iteration

- Build prompt templates and guidelines for different use cases

What we're looking for

- 5+ years in data science or applied ML

- Experience optimizing production ML systems

- Strong statistical and analytical skills

- Familiarity with LLMs and image generation models

- Python proficiency; comfortable with PyTorch

- Experience building evaluation frameworks

- Track record of improving ML system performance through data and experimentation

Nice to have

- Experience with content moderation or trust & safety

- Background in speech/audio ML or computer vision

- Experience with human annotation pipelines (Label Studio, Scale AI)

- Familiarity with prompt engineering techniques and LLM-based evaluation

Location: NYC only, On-site (flexible on WFH but we like to be in office the majority of the week) in our Brooklyn based office, close to most major train lines.

Salary Range: $150k - $250k base + equity and benefits

Salary

$150,000 - $250,000 per year

Apply for Senior+ Data Scientist - ML & Image Generation