A modular framework combining HateBERT encoder with intent-specific BART decoders to generate contextually appropriate and rhetorically impactful counterspeech.
We explore intent-specific counterspeech generation to tackle hate speech online. Using the IntentCONAN v2 datasetβwith 9,532 training examples balanced across four rhetorical intentsβwe propose a modular framework with a shared HateBERT encoder and intent-specific BART decoders.
Our research investigates three fusion mechanisms (Linear, Shared, and Cross Attention) to combine hate speech embeddings with intent representations. For evaluation, we introduce DialoRank, a zero-shot DialoGPT method that ranks responses by intent relevance.
Results show our intent-aware models outperform DialoGPT and GPS baselines across lexical and semantic metrics, with SharedFusion achieving the best performance.
Providing facts and accurate information to counter false claims
Calling out hate speech and condemning harmful behavior
Challenging assumptions through thought-provoking questions
Promoting empathy, understanding, and constructive dialogue
Raw hateful text
Pre-trained on hate speech for contextual embeddings
Combines hate embeddings with intent signals
Intent-specific generation
Generated response
Simple linear combination of hate speech embeddings and intent vectors through concatenation and projection.
Shared representation learning that jointly models hate speech and intent in a unified embedding space.
Cross-attention mechanism that allows intent signals to selectively attend to relevant hate speech features.
| Model | ROUGE-1 | ROUGE-2 | ROUGE-L | METEOR |
|---|---|---|---|---|
| SharedFusion | 0.251 | 0.065 | 0.176 | 0.158 |
| LinearFusion | 0.250 | 0.064 | 0.175 | 0.154 |
| CrossFusion | 0.242 | 0.061 | 0.171 | 0.152 |
| GPS | 0.176 | 0.030 | 0.132 | 0.116 |
| DialoGPT | 0.130 | 0.003 | 0.105 | 0.040 |
| Model | BS (P) | BS (R) | BS (F1) | CA |
|---|---|---|---|---|
| SharedFusion | 0.871 | 0.870 | 0.871 | 0.751 |
| LinearFusion | 0.869 | 0.871 | 0.870 | 0.752 |
| CrossFusion | 0.870 | 0.869 | 0.870 | 0.752 |
| GPS | 0.240 | 0.121 | 0.180 | 0.754 |
| DialoGPT | 0.791 | 0.808 | 0.799 | 0.681 |
BS: BERTScore | CA: Classification Accuracy | R1/R2/RL: ROUGE scores | M: METEOR
See how our model generates different types of counterspeech responses to hate speech. Each intent type addresses the harmful content in a unique, rhetorically strategic way.
"Immigrants are taking all our jobs and ruining our country."
Actually, immigrants contribute significantly to the economy. According to a study by the National Bureau of Economic Research, immigrants have a positive impact on job creation and innovation. Additionally, immigrants often fill labor gaps, boosting economic growth.
"Muslims don't belong in Western society."
It is unacceptable to generalize and stereotype an entire religious group based on the actions of a few individuals. We should promote understanding and respect for all individuals, regardless of their religious background.
"Jews control all the banks and media."
What evidence do you have to support the claim that Jews are controlling banks and media? Can you provide examples to support this claim? Isn't it unfair to generalize an entire group based on stereotypes?
"Women are too emotional to be leaders."
I understand that everyone has their own opinions, but it's important to remember that equality and respect are fundamental values in our society. Women have proven time and again that they are more than capable of leading and excelling in any field they choose.
Pre-trained weights for DialoGPT FineTuned, Linear Fusion, Shared Fusion, and Cross Attention models
Download from Google Drive βOriginal research paper on Intent Distribution Learning and Persistent Fusion
View on GitHub βFull research report with detailed methodology, experiments, and analysis
Download PDF β9,532 training examples balanced across four rhetorical intents