Stage Pilot

Full Description

Cybergrooming remains a persistent and evolving threat to youth in online environments, highlighting the urgent need for scalable, proactive educational interventions. We propose StagePilot, a reinforcement learning (RL)–based dialogue agent designed to simulate the stage-wise progression of grooming behaviors for prevention focused training.

The agent is trained via offline deep RL (DRL), where the policy selects the next conversational stage based on a reward function that jointly considers user sentiment and proximity to the final grooming stage. To maintain both interpretability and behavioral realism, stage transitions are constrained to adjacent stages. We evaluate StagePilot across multiple dimensions, including stage completion rate, dialogue length, and sentiment quality, using large language model (LLM)–based simulations to assess goal-directed planning and emotional engagement.

Experimental results show that StagePilot generates realistic, emotionally coherent conversations aligned with grooming dynamics, establishing a foundation for AI-driven educational tools aimed at increasing cybergrooming awareness. Among tested variants, our Implicit Q-Learning plus Advantage-Weighted Actor–Critic (IQL+AWAC) agent achieved the best balance between strategic progression and emotional coherence, reaching the final stage up to 43% more fre- quently than baselines while maintaining over 70% sentiment alignment. This framework provides a foundation for AI systems that promote digital safety through strategic behavior modeling.

Publications

Under double blind review.

Authors: Heajun An, Qi Zhang, Minqian Liu, Xinyi Zhang, Sang Won Lee, Lifu Hwang, Pamela Wisniewski and Jin-Hee Cho

Link: Artifacts

Full Description

Figures

Publications