Detecting LLM‑Generated 404s on Your Site

I tried this after repeatedly landing on nonexistent pages while using ChatGPT. I thought: is bugsink.com on the receiving end of this too? So I asked ChatGPT how to detect its own hallucinations.

The Proof of Concept

Let’s start with the result, in 3 pictures.

The Setup

First, we ask ChatGPT to generate a bogus URL and click it.

Ask ChatGPT to generate a bogus URL, then try to visit it
Ask ChatGPT to generate a bogus URL, then try to visit it.

Confirmation

ChatGPT will ask for confirmation before proceeding (I’ve seen this box for links that were not specifically generated to be bogus too, but I suppose it’s some kind of indication that ChatGPT “knows” what’s up:

ChatGPT's interface asks for a final confirmation
ChatGPT's interface asks for a final confirmation.

The Result

Finally, we land on a 404 page that detects the source to be an LLM, and displays a custom message.

The result: a custom 404 page that detects LLM-generated URLs
The result: a custom 404 page that detects LLM-generated URLs

Code

Example in Django; put this through your favorite LLM to convert to your framework of choice.

from django.shortcuts import render
from django.views.decorators.csrf import requires_csrf_token
from urllib.parse import urlparse

AI_DOMAINS = (
    "chat.openai.com", "chatgpt.com",
    "perplexity.ai", "gemini.google.com", "bard.google.com",
    "copilot.microsoft.com", "claude.ai", "mistral.ai",
)

LLM_UTMS = (
    "chatgpt.com", "perplexity.ai",
    "gemini.google.com", "bard.google.com",
    "copilot.microsoft.com", "claude.ai", "mistral.ai",
)

@requires_csrf_token
def page_not_found(request, exception, template_name="404.html"):
    is_llm_referral = False

    utm = request.GET.get("utm_source", "").lower()
    if utm in LLM_UTMS:
        is_llm_referral = True

    ref = request.META.get("HTTP_REFERER", "")
    if ref:
        try:
            host = urlparse(ref).netloc.lower()
            if any(host.endswith(d) for d in AI_DOMAINS):
                is_llm_referral = True
        except:
            pass

    return render(request, template_name, {
        "is_llm_referral": is_llm_referral,
    }, status=404)

Try it yourself

https://bugsink.com/clearly-made-up-url-by-chatgpt?utm_source=chatgpt.com

The above link is “cheating”, of course, since it has the utm_source parameter set to chatgpt.com. Better do as per the screenshots above (ask ChatGPT to generate a bogus URL, then try to visit it).

Article write‑up also ChatGPT‑assisted for maximum irony. Scrubbed for bogus, which was not limited to URLs.