Show HN: I built an SDK that scrambles HTML so scrapers get garbage

12 points • by larsmosr • today at 1:27 PM • 29 comments • view on HN

Hey HN -- I'm a solo dev. Built this because I got tired of AI crawlers reading my HTML in plain text while robots.txt did nothing.

The core trick: shuffle characters and words in your HTML using a seed, then use CSS (flexbox order, direction: rtl, unicode-bidi) to put them back visually. Browser renders perfectly. textContent returns garbage.

On top of that: email/phone RTL obfuscation with decoy characters, AI honeypots that inject prompt instructions into LLM scrapers, clipboard interception, canvas-based image rendering (no img src in DOM), robots.txt blocking 30+ AI crawlers, and forensic breadcrumbs to prove content theft.

What it doesn't stop: headless browsers that execute CSS, screenshot+OCR, or anyone determined enough to reverse-engineer the ordering. I put this in the README's threat model because I'd rather say it myself than have someone else say it for me. The realistic goal is raising the cost of scraping -- most bots use simple HTTP requests, and we make that useless.

TypeScript, Bun, tsup, React 18+. 162 tests. MIT licensed. Nothing to sell -- the SDK is free and complete.

Best way to understand it: open DevTools on the site and inspect the text.

GitHub: https://github.com/obscrd/obscrd

Comments

lich_king • today at 2:21 PM

You break highlighting and copy-and-paste. If I want to share or comment on a piece of your website... I can't. I guess this can be a "feature" in some rare cases, but a major usability pain otherwise.

I'm not a fan of all the documentation and marketing content for this project evidently being AI-generated because I don't know which parts of it are the things you believe and designed for, and which are just LLM verbal diarrhea. For example, your GitHub threat model says this stops "AI training crawlers (GPTBot, ClaudeBot, CCBot, etc.)" - is this something you've actually confirmed, or just something that AI thinks is true? I don't know how their scrapers work; I'd assume they use headless browsers.

➕ show 2 replies

ozgurozkan • today at 2:57 PM

[flagged]

➕ show 1 reply

yesitcan • today at 3:02 PM

The irony of building an anti-AI project but writing your marketing and HN post with AI.

obsrcdsucks • today at 2:33 PM

    function decodeObscrd(htmlOrElement) {
      let root;
      if (typeof htmlOrElement === 'string') {
        root = new DOMParser().parseFromString(htmlOrElement, 'text/html').body;
      } else {
        root = htmlOrElement || document;
      }
    
      const container = root.querySelector('[class*="obscrd-"]');
      if (!container) { return; }
    
      const words = [...container.children].filter(el => el.hasAttribute('data-o'));
      words.sort((a, b) => +a.dataset.o - +b.dataset.o);
    
      const result = words.map(word => {
        const chars = [...word.querySelectorAll('[data-o]')]
          .filter(el => el.querySelector('[data-o]') === null);
        chars.sort((a, b) => +a.dataset.o - +b.dataset.o);
        return chars.map(c => c.textContent).join('');
      }).join('');
    
      console.log(result);
      return result;
    }

➕ show 1 reply

dec0dedab0de • today at 2:21 PM

Reminds me of when AOL broke all the script kiddy tools in 1996 by adding an extra space to the title of the window. I didn't have AOL, but my friend made one of those tools, and I helped him figure it out.

lokimedes • today at 2:30 PM

All I want is an API for my AI, you can ask me for my public key, if you want my human identity verified. The collateral damage of this bot hunting is the emergence of personal AIs. Do we really want that? It feels regressive. (I see the hypocrisy here, we are fighting the scrapers that feed the LLMs that runs our personal agents)

➕ show 1 reply

dwa3592 • today at 2:21 PM

Nice. I have been working on something which utilizes obfuscation, honeypots etc and I have come to a few realizations-

- today you don't have to be a dedicated/motivated reverse engineer- you just need Sonnet 4.6 and let it do the work.

- you need to throw constant/new gotchas to LLMs to keep them on their tows while they try to reverse engineer your website.

➕ show 1 reply

costco • today at 2:36 PM

This is an interesting idea... it'd be a fun side project to implement enough of a CSS engine to undo this

➕ show 1 reply

verse • today at 2:45 PM

couldn't read the hero text on my phone

it's white text and the shader background is also mostly white

➕ show 1 reply

GaryBluto • today at 2:41 PM

> Your content, obscured.

Is that supposed to be a good thing?

➕ show 1 reply

gzread • today at 2:33 PM

Another thing you can do is to install a font with jumbled characters: "a" looks like "x", "b" looks like "n", and so on. Then instead of writing "abc" you write "jmw" and it looks like "abc" on the screen. This has been used as a form of DRM for eBooks.

It breaks copy/paste and screen readers, but so does your idea.

➕ show 1 reply

well_ackshually • today at 2:43 PM

I too, hate people that:

* Copy text

* use a screen reader for accessibility purposes (not just on the web, but on mobile too. Your 'light' obfuscation is entirely broken with TalkBack on Android. individual words/characters read, text is not a single block)

* use an RSS feed

* use reader mode in their browser

If you don't want your stuff to be read, and that includes bots, don't put it online.

> Built this because I got tired of AI crawlers reading my HTML in plain text while robots.txt did nothing.

You could have spent that time working on your project, instead of actively making the web worse than it already is.

➕ show 1 reply

h2zizzle • today at 2:44 PM

I hate everything about this, please use your time on this planet to make life better for people instead of worse.

It is better for a million AI crawlers to get through than for even one search index crawler, that might expose the knowledge on your site to someone who needs it, to be denied.

➕ show 1 reply

kevinsync • today at 2:42 PM

I'm surprised that you don't appear to be using it on obscrd.dev lol

➕ show 1 reply

mystraline • today at 1:29 PM

This is also what Facebook does.

Same result: screen readers and assistive software is rendered useless. Basically is a sign of "I hate disabled people, and AI too"

➕ show 1 reply

larsmosr • today at 1:30 PM

[dead]

alt Hacker News

Show HN: I built an SDK that scrambles HTML so scrapers get garbage

Comments