SumYou Logo
SumYou
March 25, 2026·5 min read·SumYou Team

Change detection explained: content hash, diff, and AI

How does a tool know whether a website has changed — without comparing every word? A look at hashing, diffs and their limitations.

The problem: what does "changed" even mean?

Most people intuitively think of change detection like this: compare the old content with the new content, and if something differs, it's a change. Sounds simple. In practice it's a minefield.

An average website has three layers:

  1. Real content — articles, product info, prices, press releases
  2. Structural chrome — navigation, footer, cookie banner, ads
  3. Dynamic noise — timestamps like "3 minutes ago", live counters, rotating ads

Good change detection has to recognise layer 1 and ignore layers 2 and 3. Otherwise you get constant false alarms.

Method 1: visual pixel diff

Tools like Visualping diff screenshots pixel by pixel.

  • Works on any page, including JavaScript-heavy apps
  • Problem: extremely sensitive. A new ad triggers an alert; a button shifting position triggers another
  • Workaround: you can usually exclude regions — but that's manual work per source

Method 2: DOM-based comparison

Compare the HTML tree (DOM) using CSS selectors that target exactly the elements you care about.

  • Works very precisely if you know what you want
  • Problem: per-page setup is work. As soon as the target site changes its layout, the selectors break
  • Tool: Distill.io is the classic example

Method 3: content hash (how SumYou does it)

The page is first reduced to its editorial content (layer 1 above), and that content is then run through a cryptographic hash — typically SHA-256 or MD5.

Example:

```

Original text: "Apple iPhone 15 Pro - $1,299"

SHA-256: a4f8e2c9d1b7...

New text: "Apple iPhone 15 Pro - $1,199"

SHA-256: 9c2e1f6a4d8b... (completely different hash)

```

  • Pro 1: fast. Hash comparison is O(1) instead of O(n) like a text diff
  • Pro 2: storage-efficient. 64 characters instead of the full page
  • Pro 3: precise. With good content extraction there are no false positives from layout changes
  • Prerequisite: the content extraction has to be solid. If the cookie banner sneaks in, the hash flips on every banner update

Method 4: hash + diff (the full picture)

A hash only tells you "something changed". It doesn't tell you what changed. So SumYou also keeps the original text and computes a classic text diff on change — a list of added and removed lines.

With hash + diff you get:

  • Fast change detection
  • A precise description of what changed
  • Low storage cost

Method 5: AI as a reading layer

Even a diff is tedious for humans. "Line 145 removed, line 146 added — what does that mean?"

SumYou hands the diff to a large language model (GPT-4o-mini, Claude as fallback) and asks it for a 2-3 sentence summary. The model only receives the changed regions, not the whole page, plus a strict prompt:

> "Describe in at most three sentences what changed. Do not speculate. If unclear, say 'unclear'."

That turns a raw diff into a readable sentence like:

> "Apple cut the price of the iPhone 15 Pro by $100 to $1,199."

Where the method hits its limits

Hash-based detection is robust but not perfect:

  1. Timestamps in the content — "updated 3 minutes ago" flips the hash without anything substantial changing. SumYou tries to detect and normalise such stamps during extraction.
  2. Personalised content — if a page returns different content based on geo-IP, the hash can flip between checks even though nobody edited anything
  3. A/B tests — some pages show variant A to 50 % and variant B to the rest. The hash will flip randomly
  4. Real but unimportant changes — for example typo fixes. The hash doesn't distinguish between "comma fixed" and "price halved"

Point 4 is exactly where the AI layer helps: it can classify importance (low / medium / high / breaking) so you don't get an email for every typo fix.

Conclusion

Change detection sounds simple but is a tradeoff between sensitivity (catch everything) and precision (only flag what matters). Hash-based detection with good content extraction is today's gold standard for textual change — combined with an AI layer that makes the result readable.

Try SumYou for free and feel for yourself how hash + diff + AI behaves in practice.

Ready to get started?

Start free with 10 sources and get AI summaries.

Start Free