chatdb, IRC log query, and analysis

IRC communication is important for many people, and finding those logs again later can be just as important.

chatdb, IRC log query, and analysis

While many folks likely do not use IRC or haven't used IRC in years, it is still a terrific collaboration tool to communicate with users in a community, provide support, or just basic socializing.  I heavily use IRC and frequently use it to discover new information and/or get help on things so my logs are important to me to be able to find the information again.  

So after many years of grepping log files, I decided to write a tool called chatdb which parses my log files and puts them in a database.  Indices make query speeds almost instantaneous.  There is a tradeoff of setup complexity and disk space, but depending on your needs the results may offset the requirements.

Finding Information

Finding information from a huge pool of information is really filtering that information down to the important bits.  chatdb has many filters in order to narrow down the results, such as (fuzzy!) date ranges, by nick, in a specific channel, containing words/phrases, etc.  Once you have narrowed in a bit, you could use grep, a pager, whatever is comfortable to find what you're looking for.  If you need some context, chatdb can do that too!

Contextual Information

Often a line in a conversation isn't enough to reconstruct the information, so chatdb can show you N lines around a conversation in order to pick up all the context you need.  Ideally, you'll want to narrow in on a particular message and then chatdb can dig out lines before, and after, for you.

Statistical Information

Channel stats can be useful for users or moderators of a channel.  chatdb has a number of ways to create statistics, such as activity in a channel by lines, by the user, what channels a user has been in, how much activity a particular user you've seen, etc.  I've written the stats that have sounded useful to me over the time I've built chatdb but suggestions are always welcome.

Tracking Users

Unfortunately, IRC can also involve moderation, which sometimes means understanding user history.  Sometimes a user will come in and blatantly encourage moderation, other times they may repeatedly disrupt to the point of moderation but pull back before they get themselves in trouble.  Others just come in as different nicks to avoid moderation, avoid bans, etc.  chatdb has a few queries to help identify these users, although it's never going to be perfect.

Where does chatdb get all this info??

Your own log files.  Currently, irssi and znc logs are supported, Weechat is coming soon.  Any channels you're in can be logged in your client and chatdb can parse those logs.  There is no central location to 'get' logs, and chatdb is (currently!) not designed to be accessed by anyone except you.  The query interface is purely command line.  I have considered a web front end a few times to make it more convenient for me in some cases but UI dev is not one of my talents and so far I just haven't taken the time to really dive in and work on it.

Patches welcome!

Seriously, suggestions, bug reports, new log formats, etc, are all welcome.  I will be adding Weechat as I have begun to use that quite a bit recently but I don't use any other IRC clients so I do not plan on supporting clients X, Y, and Z that I don't myself use.  The log parsing and queries are all designed around IRC but I see no reason that one could not do things to parse other logs such as Matrix or Discord etc, given you have the log files in a format you can parse.

chatdb is entirely Perl with some bash shell scripts to support it.

Where do I get it, get more info?

The chatdb GitLab repo has the code and documentation.  There will be more code as I find more things to do, and there can always be more documentation, so if you start using chatdb make sure to keep an eye on the repo to see if there are changes.