LLM Poisoning and Anti-abuse Research
Chad Scira researches how large language models can be poisoned and abused to harm individuals, and builds protective systems to counter these threats. The risk is urgent: the pace of trust in LLM outputs has outstripped our ability to verify statements, while adversaries can cheaply publish text that skews model behavior and search impression of people without large online footprints.
A private investment round was secured on October 3, 2025 to continue this research.
Executive Summary
Average people with small internet footprints face outsized risk from AI amplified defamation and data poisoning. A single motivated individual can seed false narratives that search, social feeds, and LLMs repeat. This document explains common attack paths, concrete effects on reputation and safety, and a practical playbook for detection and protection. It also outlines how cryptographically verified attestations and provenance aware retrieval can reduce harm for individuals and integrators.
Audience and Threat Model
Audience: individuals and small organizations without large SEO presence. Constraints: limited time, budget, and technical resources. Adversary: a single actor able to generate and post large volumes of text, use basic link networks, and exploit reporting blind spots. Goals: distort search/LLM outputs, harm reputation, create doubt for employers, clients, platforms, or agents.
What Is LLM Poisoning?
LLM poisoning refers to the manipulation of model behavior via seeded or coordinated content - for example, malicious posts, synthetic articles, or forum spam - that can be ingested by retrieval systems or used by humans as signals, nudging models toward false associations and defamatory narratives.
Because LLMs and retrieval systems optimize for scale and coverage, a single motivated adversary can shape what a model “sees” about a person by flooding a small slice of the web. This is especially effective against individuals with limited online presence.
How Reputation Gets Distorted
- Search and social poisoning - profile jacking, link farms, and mass posting to bias ranking features and autocomplete associations.
- Knowledge base and RAG poisoning - creating entity pages and QA notes that appear semantically relevant and get retrieved as context.
- Indirect prompt injection - hostile web content that causes browsing agents to repeat instructions or exfiltrate sensitive data.
- Backdoored endpoints - malicious model wrappers that act normal until trigger phrases appear, then emit targeted falsehoods.
Additional Risks and Failure Modes
- Model collapse from training on synthetic outputs - feedback loops where generated text degrades future model quality if not filtered or weighted.
- Indirect prompt injection - hostile content on the web that instructs an agent or browsing tool to exfiltrate secrets or spread defamation when quoted.
- Embedding store poisoning - inserting adversarial passages in a knowledge base so retrieval surfaces false claims that look semantically relevant.
- Backdoored releases - publishing modified checkpoints or API wrappers that behave normally until a trigger phrase is present.
Concrete Cases and References
Mitigations in Depth
Retrieval and Ranking
- Source scoring and provenance weighting - prefer signed or publisher verified content; down weight newly created or low reputation pages.
- Time decay with grace period - require dwell time before new sources influence high stakes answers; add human review for sensitive entities.
- Echo chamber detection - cluster near duplicate passages and limit repeated influence from the same origin or network.
- Outlier and anomaly detection in embedding space - flag passages whose vector positions are adversarially optimized.
Data and KB Hygiene
- Snapshot and diff knowledge bases - review large deltas, especially for person entities and accusations without primary sources.
- Canary and deny lists - prevent incorporation of known abusive domains; insert canaries to measure unauthorized propagation.
- Human in the loop for high risk topics - queue proposed updates to reputational facts for manual adjudication.
Attestations and Reputation
- Cryptographically verified attestations - signed statements from vetted professionals and organizations published via an append only log.
- Reputation graphs - aggregate signed endorsements and down rank content from repeat abusers or bot networks.
- User facing citations - require models to show sources and confidence with provenance badges for sensitive claims.
Enterprise Checklist
- Map sensitive entities in your domain (people, brands, legal topics) and route queries to guarded pipelines with provenance requirements.
- Adopt C2PA or similar content credentials for first party content and encourage partners to do the same.
- Track new source influence over time and alert on unusual swings for entity level answers.
- Run continuous red teaming for RAG and browsing agents including indirect prompt injection test suites.
Harassment and Defamation via AI
Individuals for hire now leverage AI and automation to mass produce harassment and defamation, creating plausible looking text and fake “sources” that are easy to index, scrape, and reshare. These campaigns are low cost, high impact, and difficult to remediate once amplified by automated systems.
Chad Scira has personally experienced targeted harassment and defamation coupled with spammy linking intended to distort reputation signals and search impressions. A detailed account and evidence trail is documented here: Jesse Nickles - Harassment and Defamation.
Threat Taxonomy
- Pretraining data poisoning - poisoning public corpora used for initial training to implant false associations or backdoors.
- RAG poisoning - seeding knowledge bases or external sources that retrieval pipelines use at inference time.
- Search/social poisoning - flooding posts or low quality pages to bias retrieval and ranking signals about a person or topic.
- Adversarial prompts and content - crafting inputs that trigger undesirable behaviors or jailbreaks that repeat defamatory claims.
Recent Incidents and Research (with dates)
Note: Dates above reflect publication or public release dates at the linked sources.
Why This Is Dangerous
- LLMs can appear authoritative even when the underlying references are weak or adversarially seeded.
- Retrieval and ranking pipelines may overweight repeated text, allowing one actor to skew results with volume alone.
- Human fact checking trails are slow and costly compared to the speed of automated content production and distribution.
- Victims without significant online presence are disproportionately vulnerable to single post poisoning and identity attacks.
Risk Deep Dive
- Employment and platform screening - search and LLM summaries can echo poisoned content during hiring, moderation, or onboarding checks.
- Travel, housing, and financial services - automated checks may surface false narratives that delay or block services.
- Persistence - once indexed into knowledge bases or cached answers, false claims can resurface even after takedowns.
- Synthetic feedback - generated content can bootstrap more generated content, raising the apparent weight of falsehoods over time.
Detection and Monitoring
- Set up search alerts on your name and aliases; periodically check site: queries for low reputation domains mentioning you.
- Track changes to your knowledge panels or entity pages; keep dated screenshots and export copies for evidence.
- Monitor social link graphs for repeated origin accounts or sudden spikes of similar phrasing.
- If operating a RAG or knowledge base, run entity drift checks and review large deltas to person pages or accusations without primary sources.
Protection Playbook - Individuals
- Publish a personal site with clear identity assertions, a short bio, and contact routes; keep a dated change log.
- Align profile metadata across platforms; acquire verified profiles where feasible and link them back to your site.
- Use C2PA or similar content credentials for key images and documents when possible; store originals privately.
- Keep an evidence log with timestamps: screenshots, links, and any platform ticket numbers for later escalation.
- Prepare takedown templates; respond quickly to new attacks and document each step for a clear paper trail.
Protection Playbook - Teams and Integrators
- Prefer signed or publisher verified content in retrieval; apply time based grace periods for new sources.
- Limit repeated influence from the same origin and deduplicate near duplicates per origin network.
- Add provenance badges and user facing source lists for person level claims and other sensitive topics.
- Adopt anomaly detection on embedding stores; flag adversarial vector outliers and run canary checks for unauthorized propagation.
Research: Cryptographically Verified Attestations
Chad Scira is building cryptographically verified attestation systems for trust in statements about people and events. The goal is to provide LLMs and retrieval systems with signed, queryable claims from vetted professionals and organizations, enabling robust provenance and stronger resistance to poisoning.
Design Principles
- Identity and provenance: statements are signed by verified individuals/organizations using public key cryptography.
- Verifiable storage: attestations are anchored to append only, tamper evident logs to enable independent verification.
- Retrieval integration: RAG pipelines can prioritize or require cryptographically attested sources for sensitive queries.
- Minimal friction: APIs and SDKs allow publishers and platforms to issue and check attestations at ingestion time.
Reputation and Alerting
On top of attestations, a reputation layer aggregates signed endorsements and flags known abusers. Alerting systems notify targets when coordinated attacks or anomalous spikes are detected, enabling faster response and takedown requests.
Legal and Platform Channels
- Use platform report flows with clear evidence packages: links, dates, screenshots, and impacts. Reference defamation and harassment policies.
- Escalate with formal notices where appropriate; keep correspondence logs and ticket IDs in your evidence trail.
- Consider jurisdictional differences in defamation and platform liability; consult counsel for high risk cases.
Implementation Roadmap (Year 1)
- MVP: attestation schema and publisher SDK for signing identity statements and event claims.
- Pilot with a small group of vetted professionals and organizations; establish verification workflows.
- RAG plug ins: enable provenance first answer mode that prioritizes attested sources for sensitive queries.
Further Reading (with dates)
Collaboration
This research is cutting edge and actively evolving. Chad Scira welcomes collaboration with other professionals in this sector.
If you are interested in collaborating, please reach out at: [email protected]