Optimizing weechat log usage
weechat is a curses-based IRC client and a very sane alternative to irssi. For those with IRC bouncers, or those who spend a great deal of time on IRC, popular channels can start to accrue rather large logs. By default, those logs are quite verbose and are never rotated. In my case, weechat’s logs were taking up 2GB worth of disk space on my VPS.
Investigating the issue
My first question, when my monitoring showed that disk usage had passed the 90% mark, was “what’s growing and can it be trimmed?” After using du to find which user directories were the issue, I was able to pinpoint a couple of culprits. The biggest one was weechat.
$ du -h -d1 ~ ... elided ... 2.0G ~/.weechat ... elided ...
It makes sense that weechat’s logs would be large, since it’s constantly running on this machine. Still, my next question was “which channels are taking up the most space?”
$ ls -lhS | head total 2.0G -rw-rw-r-- 1 irc git 574M Nov 2 12:14 irc.freenode.##news.weechatlog -rw-rw-r-- 1 irc git 335M Nov 2 12:14 irc.freenode.##programming.weechatlog -rw-rw-r-- 1 irc git 149M Apr 9 2016 irc.freenode.##c++.weechatlog -rw-rw-r-- 1 irc git 102M Nov 2 12:20 irc.freenode.##slackware.weechatlog -rw-rw-r-- 1 irc git 93M Nov 2 12:22 irc.freenode.#clojure.weechatlog -rw-rw-r-- 1 irc git 78M Nov 2 12:26 irc.freenode.#osdev.weechatlog -rw-rw-r-- 1 irc git 59M Apr 9 2016 irc.freenode.##opengl.weechatlog -rw-rw-r-- 1 irc git 52M Nov 2 12:26 irc.freenode.#nixos.weechatlog -rw-rw-r-- 1 irc git 52M Apr 8 2016 irc.freenode.#lisp.weechatlog -rw-rw-r-- 1 irc git 51M Apr 9 2016 irc.freenode.##c++-general.weechatlog
Much to my initial surprise, the ##news channel log was significantly larger than anything else. The freenode ##news channel contains various bots for sharing links to news as it comes out. In fact, it’s not actually something I care to log, since it’s just links to news articles.
Asking weechat to ignore certain buffers
Given that the ##news channel logs aren’t going to be useful to me, I then looked to disable logging on that channel entirely.
/set logger.level.irc.freenode.##news 0 /save
From there, I was safe to delete the ##news weechat log. That brought the disk usage down to 1.4G already.
Choosing what you log
After inspecting the documentation for weechat’s logger plugin,
I saw that the default logging level for each channel was
9. Here’s a breakdown
of the levels, from the documentation:
1: user message, notice, private 2: nick change 3: server message 4: join/part/quit 9: all other messages
In my logs, I don’t care about join/part/quit events, and whatever other messages there may be; dropping the level down to 3 for every channel seems perfectly reasonable.
/set logger.level.irc 3
Cleaning up existing logs
Once weechat was logging more minimally, and not logging specific channels,
there was still the issue of 1.4GB worth of logs, all of which had been formed
using weechat’s level
9 setting. So I investigated how much that difference
actually was, knowing that my
»»» and my
# Total size of the largest log (335MB) $ wc -c irc.freenode.##programming.weechatlog 350526327 irc.freenode.##programming.weechatlog # Total number of joins/parts/quits in that file (1.3 million) $ egrep -c "»»»|«««" irc.freenode.##programming.weechatlog 1363355 # Total size of the log without those joins/parts/quits (208MB) $ wc -c <(sed '/»»»\|«««/d' irc.freenode.##programming.weechatlog) 208668563 /dev/fd/63
Alright! I could trim off 97MB from my largest buffer, just by removing the joins/parts/quits. Even better, some channels have a much higher join/part/quit to message ratio.
# This may take a few moments # Note that this assumes GNU sed; BSD sed treats -i a bit differently for log in *.weechatlog; do sed -i '/»»»\|«««/d' "$log"; done
And now for the final numbers:
$ ls -lhS | head total 721M -rw-rw-r-- 1 irc git 200M Nov 2 13:05 irc.freenode.##programming.weechatlog -rw-rw-r-- 1 irc git 71M Nov 2 13:05 irc.freenode.##slackware.weechatlog -rw-rw-r-- 1 irc git 49M Nov 2 13:05 irc.freenode.#osdev.weechatlog -rw-rw-r-- 1 irc git 47M Nov 2 13:05 irc.freenode.##c++.weechatlog -rw-rw-r-- 1 irc git 34M Nov 2 13:05 irc.freenode.#osdev-offtopic.weechatlog -rw-rw-r-- 1 irc git 30M Nov 2 13:05 irc.freenode.#nixos.weechatlog -rw-rw-r-- 1 irc git 26M Nov 2 13:05 irc.freenode.##c++-social.weechatlog -rw-rw-r-- 1 irc git 24M Nov 2 13:05 irc.freenode.##opengl.weechatlog -rw-rw-r-- 1 irc git 24M Nov 2 13:05 irc.freenode.##csharp.weechatlog
Removing the join/part/quit events cut the total log size in half, from 1.4GB to 721MB. As far as I’m concerned, the useful content is still there.
Compressing, rotating, and next steps
This setup could be further improved by rotating large logs and compressing non-active logs. Out of the box, this would impede on the ability to use those logs within weechat, via grep.py or similar. I found an open issue on weechat’s Github that requests for gzip functionality. Hopefully something like this could be officially supported.