Detecting LLM‑Generated 404s on Your Site
I tried this after repeatedly landing on nonexistent pages while using ChatGPT. I thought: is bugsink.com on the receiving end of this too? So I asked ChatGPT how to detect its own hallucinations.
The Proof of Concept
Let’s start with the result, in 3 pictures.
The Setup
First, we ask ChatGPT to generate a bogus URL and click it.
Confirmation
ChatGPT will ask for confirmation before proceeding (I’ve seen this box for links that were not specifically generated to be bogus too, but I suppose it’s some kind of indication that ChatGPT “knows” what’s up:
The Result
Finally, we land on a 404 page that detects the source to be an LLM, and displays a custom message.
Code
Example in Django; put this through your favorite LLM to convert to your framework of choice.
from django.shortcuts import render from django.views.decorators.csrf import requires_csrf_token from urllib.parse import urlparse AI_DOMAINS = ( "chat.openai.com", "chatgpt.com", "perplexity.ai", "gemini.google.com", "bard.google.com", "copilot.microsoft.com", "claude.ai", "mistral.ai", ) LLM_UTMS = ( "chatgpt.com", "perplexity.ai", "gemini.google.com", "bard.google.com", "copilot.microsoft.com", "claude.ai", "mistral.ai", ) @requires_csrf_token def page_not_found(request, exception, template_name="404.html"): is_llm_referral = False utm = request.GET.get("utm_source", "").lower() if utm in LLM_UTMS: is_llm_referral = True ref = request.META.get("HTTP_REFERER", "") if ref: try: host = urlparse(ref).netloc.lower() if any(host.endswith(d) for d in AI_DOMAINS): is_llm_referral = True except: pass return render(request, template_name, { "is_llm_referral": is_llm_referral, }, status=404)from django.shortcuts import render from django.views.decorators.csrf import requires_csrf_token from urllib.parse import urlparse AI_DOMAINS = ( "chat.openai.com", "chatgpt.com", "perplexity.ai", "gemini.google.com", "bard.google.com", "copilot.microsoft.com", "claude.ai", "mistral.ai", ) LLM_UTMS = ( "chatgpt.com", "perplexity.ai", "gemini.google.com", "bard.google.com", "copilot.microsoft.com", "claude.ai", "mistral.ai", ) @requires_csrf_token def page_not_found(request, exception, template_name="404.html"): is_llm_referral = False utm = request.GET.get("utm_source", "").lower() if utm in LLM_UTMS: is_llm_referral = True ref = request.META.get("HTTP_REFERER", "") if ref: try: host = urlparse(ref).netloc.lower() if any(host.endswith(d) for d in AI_DOMAINS): is_llm_referral = True except: pass return render(request, template_name, { "is_llm_referral": is_llm_referral, }, status=404)
Try it yourself
https://bugsink.com/clearly-made-up-url-by-chatgpt?utm_source=chatgpt.com
The above link is “cheating”, of course, since it has the utm_source parameter set to chatgpt.com. Better do as per
the screenshots above (ask ChatGPT to generate a bogus URL, then try to visit it).
Article write‑up also ChatGPT‑assisted for maximum irony. Scrubbed for bogus, which was not limited to URLs.
