Building a Self-Hosted CDN for BSD Cafe Media

Building a Self-Hosted CDN for BSD Cafe Media

FreeBSD
12 min read

Introduction

For just over a year, BSD Cafe's media was hosted on a FreeBSD physical server jail with an outgoing bandwidth of 250 Mbit/sec. To mitigate bandwidth congestion, I initially integrated Cloudflare with a tunnel to serve media (and only media) through Cloudflare. The goal was to georeplicate the media and reduce the load on my server. To do this, the media server had to be on a separate domain managed by Cloudflare since the DNS for the primary bsd.cafe domain was managed by Bunny.net.

The first step (mainly because I discovered that Bunny's DNS does not support IPv6) was to bring the DNS back in-house using two FreeBSD jails (running on different VPS providers), both powered by PowerDNS. PowerDNS supports LUA records, which would come in handy later.

In line with the principles of self-hosting and data ownership, I decided to remove Cloudflare. I created a dedicated subdomain (media.bsd.cafe) and configured the reverse proxy in front of the jail running Minio to respond to that domain. I also reconfigured Mastodon for the new address, and after some fine-tuning, everything worked seamlessly. However, this led to some bandwidth congestion when media was posted, resulting in slower download speeds for users, especially during peak times. This is because, once content is published and federated servers are notified, they all attempt to download the newly published content - media included - almost simultaneously.

Not wanting to abandon my media server (a dedicated jail with spinning disks, offering 4 TB of storage), I opted for a different approach that I’ll describe here, as it might be useful for similar setups.

While this setup was implemented on FreeBSD, the configuration and tools - Nginx, Varnish, WireGuard and PowerDNS - are compatible with many operating systems, including Linux, with only minor adjustments required.

The Approach: Building a Self-Hosted CDN

The idea is to create reverse proxies with local caching. These proxies would cache the content on the first request and serve it directly afterward. The proxies would be distributed across different regions, and the DNS would route requests to the nearest proxy based on the caller’s location. All this is achieved without relying on external CDNs, using self-managed tools instead.

To establish a direct connection between Minio and the reverse proxies, I configured WireGuard inside the jail. The reverse proxies connect via WireGuard, allowing them to access Minio securely as if they were on the same LAN.

No further changes were needed on the media jail itself.

Setting Up the Reverse Proxies

I began configuring the reverse proxies (also running FreeBSD jails, OpenBSD (setup described in another post) and NetBSD (also described in another post), hosted on different providers). The choice of Varnish is based on several factors, with the main ones being the ability to keep the cache in RAM (which means it can run on read-only systems) and the ability to flush the cache remotely. For example, with each change to my blog, I can choose whether to perform an immediate flush (such as for a new article or an error) or wait for the cache's "natural" expiration (such as for a typo or minor, non-critical changes).

First, I connected them via WireGuard to the Minio jail (I won’t detail the steps here; I’ve covered similar setups in other posts). Then, I installed Nginx and Varnish. A more granular setup would have Varnish on a separate jail, but this way, I can move the reverse proxy jails to different hosts with minimal hassle. Currently, these reverse proxies also serve this blog.

Next, I installed and configured Varnish inside the jail:

pkg install varnish7

I created the directory /usr/local/etc/varnish and wrote a custom VCL file to manage this setup, named default.vcl:

vcl 4.1;
import std;

# Backend - it-notes.dragas.net
backend it_notes {
    .host = "itnotesip";
    .port = "itnotesport";
}

# Backend - media.bsd.cafe
backend media_bsd {
    .host = "minioWGip";
    .port = "minioport";
}

# ACL - IPs allowed to purge - it-notes.dragas.net
acl purge_it_notes {
    "a.b.c.d";
}

# ACL - IPs allowed to purge - media.bsd.cafe
acl purge_media_bsd {
    "e.f.g.h";
}

sub vcl_recv {

    # it-notes.dragas.net
    if (req.http.Host == "it-notes.dragas.net") {
        set req.backend_hint = it_notes;
        set req.http.Host = "it-notes.dragas.net";

        # PURGE - it-notes.dragas.net
        if (req.method == "PURGE") {

            std.log("Purge request received for " + req.url);

            if (!std.ip(req.http.X-Real-IP, "0.0.0.0") ~ purge_it_notes) {
                return (synth(405, "Not allowed."));
            }

        if (req.url == "/" || req.url == "/*") {
                ban("req.http.host == " + req.http.host);
                return(synth(200, "Entire cache has been cleared."));
        }
            return (purge);
        }

    # media.bsd.cafe
    } elsif (req.http.Host == "media.bsd.cafe") {
        set req.backend_hint = media_bsd;
        set req.http.Host = "media.bsd.cafe";

        # PURGE - media.bsd.cafe
        if (req.method == "PURGE") {
            if (!std.ip(req.http.X-Real-IP, "0.0.0.0") ~ purge_media_bsd) {
                return (synth(405, "Not allowed."));
            }
            if (req.url == "/" || req.url == "/*") {
                ban("req.http.host == " + req.http.host);
                return(synth(200, "Entire cache has been cleared."));
            }
            return (purge);
        }

    } else {
        # Other domains - 404
        return (synth(404, "Domain not found"));
    }

    if (req.method != "GET" && req.method != "HEAD") {
        return (pipe);
    }

    return (hash);
}

sub vcl_backend_response {
    # TTL - it-notes.dragas.net
    if (bereq.http.host == "it-notes.dragas.net") {
        if (bereq.url ~ "\.(gif|jpg|jpeg|png|ico|css|js)$") {
            set beresp.ttl = 1w;
            set beresp.grace = 1d;
            set beresp.keep = 7d;
            unset beresp.http.Set-Cookie;
            unset beresp.http.Cache-Control;
            set beresp.http.Cache-Control = "public, max-age=604800";
        } else {
            set beresp.ttl = 15m;
            set beresp.grace = 48h;
            set beresp.keep = 7d;
        }

    # TTL - media.bsd.cafe
    } elsif (bereq.http.host == "media.bsd.cafe") {
        if (bereq.url ~ "\.(mp4|mp3|wav|flac|ogg)$") {
            set beresp.ttl = 1d;
            set beresp.grace = 6h;
            set beresp.keep = 3d;
            unset beresp.http.Set-Cookie;
            unset beresp.http.Cache-Control;
            set beresp.http.Cache-Control = "public, max-age=86400";
        } else {
            set beresp.ttl = 30m;
            set beresp.grace = 12h;
            set beresp.keep = 3d;
        }
    }

    # Remove some headers
    unset beresp.http.Server;
    unset beresp.http.X-Powered-By;
    unset beresp.http.Via;

    return (deliver);
}

sub vcl_deliver {
    # ADD header X-Cache
    if (obj.hits > 0) {
        set resp.http.X-Cache = "HIT";
    } else {
        set resp.http.X-Cache = "MISS";
    }

  std.log("Delivering content for " + req.url + " - Cache: " + resp.http.X-Cache);


    # Remove Varnish headers
    unset resp.http.Via;
    unset resp.http.X-Varnish;

    return (deliver);
}

sub vcl_hash {
    hash_data(req.url);
    if (req.http.host) {
        hash_data(req.http.host);
    } else {
        hash_data(server.ip);
    }
    return (lookup);
}

sub vcl_hit {
    return (deliver);
}

sub vcl_miss {
    return (fetch);
}

sub vcl_purge {
    std.log("Purge executed for " + req.url);
    return (synth(200, "Purge successful"));
}

sub vcl_synth {
    set resp.http.Content-Type = "text/html; charset=utf-8";
    set resp.http.Retry-After = "5";
    synthetic( {"<!DOCTYPE html>
        <html>
            <head>
                <title>"} + resp.status + " " + resp.reason + {"</title>
            </head>
            <body>
                <h1>Status "} + resp.status + " " + resp.reason + {"</h1>
                <p>"} + resp.reason + {"</p>
                <h3>Guru Meditation:</h3>
                <p>XID: "} + req.xid + {"</p>
                <hr>
                <p>Varnish cache server</p>
            </body>
        </html>
    "} );
    return (deliver);
}

This setup allows Varnish to handle both domains with distinct configurations but within the same cache.

To enable Varnish, I updated the /etc/rc.conf file with the following lines, setting a maximum cache size of 2GB:

varnishd_enable="YES"
varnishd_listen="127.0.0.1:8080"
varnishd_config="/usr/local/etc/varnish/default.vcl"
varnishd_storage="default,2000M"

You can now start Varnish:

service varnishd start

The next step is to create two virtual hosts on Nginx (one for it-notes.dragas.net and one for media.bsd.cafe) that will listen on both IPv4 and IPv6 for HTTP and HTTPS. HTTP connections will be redirected to HTTPS, and incoming HTTPS traffic will be passed to Varnish, which will either return cached data or fetch it from the original server (Minio via WireGuard, for media.bsd.cafe). Let's see the media.bsd.cafe part:

server {
   server_name  media.bsd.cafe;

   [...]

   location / {
    proxy_set_header Host $http_host;
    proxy_set_header X-Real-IP $remote_addr;
    proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
    proxy_set_header X-Forwarded-Proto $scheme;

    proxy_connect_timeout 300;
    proxy_http_version 1.1;
    proxy_set_header Connection "";
    chunked_transfer_encoding off;

    expires 12h;
    add_header Cache-Control public;

    add_header X-Cache-Status $upstream_cache_status;
    add_header X-Content-Type-Options nosniff;

    add_header Strict-Transport-Security "max-age=31536000; includeSubDomains; preload" always;
    add_header Referrer-Policy "no-referrer-when-downgrade";
    add_header Permissions-Policy "geolocation=(), microphone=(), camera=()";

    proxy_pass http://127.0.0.1:8080;

    [...]

}

[...]

}

server {
    if ($host = media.bsd.cafe) {
        return 301 https://$host$request_uri;
    }
   listen       80;
   listen  [::]:80;
   server_name  media.bsd.cafe;
    return 404;
}

This configuration isn’t complete, but it provides a good idea of how to set up Nginx - each setup will vary. TTL and caching sizes will also differ based on the characteristics of each reverse proxy. For example, one of the proxies has an 8GB cache since I have ample resources there.

Generating certificates is another important aspect. In this case, as the reverse proxies are distributed, they all need to respond to the same addresses. One approach is to generate the certificate on one proxy and distribute it to the others. In my case, I opted to use lego, which, through PowerDNS’s API, adds a DNS record for validation. This way, each reverse proxy can independently generate and renew its certificates when needed.

Configuring DNS for Optimal Routing

Once everything is set up, it’s important to ensure that DNS responds correctly. In my case, I implemented a strategy like this:

  • Track reverse proxies that respond on port 443 (further refinements are possible and will be done later).
  • Return the closest reverse proxy based on the client’s IP address.

Unfortunately, PowerDNS on FreeBSD does not include GeoIP support by default, but I have my poudriere ready to compile and install the necessary packages. Alternatively, you could compile it within the jail using the port system.

After that, I installed the geoipupdate package (which requires a free license from MaxMind), updated the IP list, and configured PowerDNS to use the GeoIP database. I added the GeoIP backend alongside the existing SQLite3 backend and specified the database to use:

launch=gsqlite3,geoip
geoip-database-files=mmdb:/usr/local/share/GeoIP/GeoLite2-City.mmdb

Finally, I created a LUA record to return the correct address:

testclosest.bsd.cafe   60      IN      LUA     A "ifportup(443, {'proxy1ip', 'proxy2ip','proxy3ip'}" ", {selector='pickclosest'})"
testclosest.bsd.cafe   60      IN      LUA     AAAA "ifportup(443, {'proxy1ip6', 'proxy2ip6','proxy3ip6'}" ", {selector='pickclosest'})"

And voilà! We now have a small, self-hosted CDN, keeping full control and ownership of our data. Adding a new reverse proxy is straightforward — simply clone an existing proxy, update the WireGuard configuration (adding a peer on the Minio jail and changing the keys on the new proxy), and add it to the DNS.

Happy caching!