Quick comparison: Plausible vs logs
Date
Tags
#selfhosting
About a month ago, I started collecting website usage data using both Plausible.io and logs generated by Caddyserver, my reverse proxy. The goal was to compare the data sources, just like Marko Saric did in a post on the Plausible blog.
Here's a quick overview of the results. For more details, read the post mentioned above, the results are nearly identical and Marko does a great job explaining the results.
Results
Quantitative data
The table below summarizes key metrics computed by both Plausible and GoAccess (based on Caddyserver logs). Data used was collected between June 13th and July 13th.
Metric | Plausible.io | Logs + GoAccess | Δ factor |
---|---|---|---|
Visitors | 32.1k | 76.9k | x2.4 |
Pageviews | 44.5k | 468.6k | x10.5 |
Bandwidth | - | 16.6 GiB | - |
Just as Marko noticed, logs show much higher numbers of visitors and pageviews, likely due to crawlers and bots that get noticed in the logs but do not run javascript and therefore are not picked up by Plausible.
I could compare other metrics like referrers and top pages, but again, I suggest you read the post on the Plausible blog.
I'd like to add that the logs can provide some information about bandwidth usage and which files are downloaded the most. This would allow you to make informed decisions when optimizing caching and file loading. Plausible can't help you with this data, one needs logs for this.
Qualitative data
The experience with Plausible was more convenient than with GoAccess, as the website of the former loads in seconds whilst the latter took 3 minutes to process the logs and generate the results.
Conclusion
Both methods have advantages and disadvantages. Plausible gives fast and precise results but potentially impacts page load (although minimally). Server logs don't impact page load, can provide bandwidth stats but inflate numbers due to traffic noise generated by search engines, crawlers and bots. Personally, I will continue using both for the foreseeable future.
Methodology
Plausible
Visit the Plausible.io website and simply look at the website's stats.
Caddy logs
Logs were collected using the following snippet in the Caddyfile:
log {
output file /var/log/caddy/access.log {
roll_size 100MiB
roll_keep 10
roll_keep_for 2160h
}
}
GoAccess
As GoAccess cannot read Caddy logs directly, a small bash script is needed:
today_date=$(date -u +"%Y-%m-%d")
today_date=$(date -u --date="$today_date -30 day" +"%Y-%m-%d")
today_ts=$(date -d $today_date +%s)
goaccess <(zcat -f logs/access* | jq --raw-output '
.request.remote_addr |= .[:-6] |
select(.request.remote_addr != "1.1.1.1") |
select(.request.remote_addr != "2.2.2.2") |
select(.ts >= '$today_ts') |
[
.common_log,
.request.headers.Referer[0] // "-",
.request.headers."User-Agent"[0],
.duration
] | @csv') \
--log-format='"%h - - [%d:%t %^] ""%m %r %H"" %s %b","%R","%u",%T' --time-format='%H:%M:%S' --date-format='%d/%b/%Y'
This was adapted from the bash script described by Alessandro in this blog post.