Skipping unchanged RSS feeds with conditional GET.

Say you’re polling a bunch of RSS feeds on a schedule. The naive way to do this is to download every feed, every time, and diff it against what you already have. That works, but it’s wasteful. Most of the time the feed hasn’t changed at all, and you just burned a few hundred kilobytes (and some goodwill with the feed’s server) to find that out.

There’s a much politer way: conditional GET. You ask the server “has this changed since the last time I looked?” and if it hasn’t, you get back a tiny 304 Not Modified with no body.

The two headers that matter

HTTP gives you two validators for this, and feed servers usually support at least one:

The flow is the same for both. The server gave you these on the last successful fetch, you stored them, and now you hand them back so the server can decide if anything is new.

Sending the request

On the way out, if I have a stored etag or lastModified for this feed, I add them to the headers. If this is the first time I’ve ever fetched the feed they’re both empty, so it’s just a plain GET.

const headers = { "User-Agent": userAgent };

if (etag) headers["If-None-Match"] = etag;
if (lastModified) headers["If-Modified-Since"] = lastModified;

After that first fetch I always send back whatever the server last gave me.

Handling the response

The first thing I check on the way back is the status code. A 304 Not Modified means nothing changed. There’s no body to read, so I stop right here.

if (res.status === 304) {
  return { notModified: true };
}

That early return is the whole point. No body to download, nothing to parse, no articles to diff. On a feed that updates once a day but gets polled every fifteen minutes, the overwhelming majority of fetches end on that line.

If it’s a normal 200, I pull the fresh validators off the response before reading the body, so I have them for next time.

const result = {
  status: res.status,
  etag: res.headers.get("etag"),
  lastModified: res.headers.get("last-modified"),
};

Store them for next time

The trick that makes this actually work is persistence. After each successful fetch I save the returned ETag and Last-Modified against the feed (a column each in my feeds table) and read them back on the next poll. No storage, no conditional GET. You’d send empty headers every time and always get the full feed.

Putting it all together

Here’s the trimmed version of the fetch. Pass in the etag and lastModified you saved last time, send them as conditional headers, and bail out early on a 304.

async function fetchFeed(feedUrl, etag, lastModified) {
  const headers = { "User-Agent": userAgent };

  if (etag) headers["If-None-Match"] = etag;
  if (lastModified) headers["If-Modified-Since"] = lastModified;

  const res = await fetch(feedUrl, { headers });

  const result = {
    status: res.status,
    etag: res.headers.get("etag"),
    lastModified: res.headers.get("last-modified"),
  };

  // Nothing changed — no body, stop here.
  if (res.status === 304) {
    result.notModified = true;
    return result;
  }

  result.body = await res.text();
  return result;
}

One caveat: not every server plays along. Some ignore the headers entirely and hand you the full feed every time, and a few send a brand-new ETag on every request so you never get a 304. You can’t do much about that end, so just send both validators when you have them and let the server decide. For the servers that do honor it (most of them), you go from downloading a whole feed to downloading a single status line.