How to strip color codes used by mIRC users?
Asked Answered
T

6

8

I'm writing a IRC bot in Python using irclib and I'm trying to log the messages on certain channels.
The issue is that some mIRC users and some Bots write using color codes.
Any idea on how i could strip those parts and leave only the clear ascii text message?

Toothache answered 9/6, 2009 at 14:55 Comment(1)
Working mIRC colour removal in Python: https://stackoverflow.com/questions/68968234/do-you-need-to-remove-strip-mirc-colour-format-codes-in-pythonPlumbery
E
14

Regular expressions are your cleanest bet in my opinion. If you haven't used them before, this is a good resource. For the full details on Python's regex library, go here.

import re
regex = re.compile("\x03(?:\d{1,2}(?:,\d{1,2})?)?", re.UNICODE)

The regex searches for ^C (which is \x03 in ASCII, you can confirm by doing chr(3) on the command line), and then optionally looks for one or two [0-9] characters, then optionally followed by a comma and then another one or two [0-9] characters.

(?: ... ) says to forget about storing what was found in the parenthesis (as we don't need to backreference it), ? means to match 0 or 1 and {n,m} means to match n to m of the previous grouping. Finally, \d means to match [0-9].

The rest can be decoded using the links I refer to above.

>>> regex.sub("", "blabla \x035,12to be colored text and background\x03 blabla")
'blabla to be colored text and background blabla'

chaos' solution is similar, but may end up eating more than a max of two numbers and will also not remove any loose ^C characters that may be hanging about (such as the one that closes the colour command)

Ethic answered 9/6, 2009 at 15:17 Comment(1)
Perfect, thanks. Nice answer and great explanation. I've added \x1f|\x02| so that it would also filter bold and underline. re.compile("\x1f|\x02|\x03(?:\d{1,2}(?:,\d{1,2})?)?", re.UNICODE)Toothache
C
7

The second-rated and following suggestions are defective, as they look for digits after whatever character, but not after the color code character.

I have improved and combined all posts, with the following consequences:

  • we do remove the reverse character
  • remove color codes without leaving digits in the text.

Solution:

regex = re.compile("\x1f|\x02|\x12|\x0f|\x16|\x03(?:\d{1,2}(?:,\d{1,2})?)?", re.UNICODE)

Crabber answered 17/8, 2010 at 15:21 Comment(0)
H
1

As I found this question useful, I figured I'd contribute.

I added a couple things to the regex

regex = re.compile("\x1f|\x02|\x03|\x16|\x0f(?:\d{1,2}(?:,\d{1,2})?)?", re.UNICODE)

\x16 removed the "reverse" character. \x0f gets rid of another bold character.

Highbrow answered 16/3, 2010 at 1:12 Comment(0)
D
1

AutoDl-irssi had a very good one written in perl, here it is in python:

def stripMircColorCodes(line) : line = re.sub("\x03\d\d?,\d\d?","",line) line = re.sub("\x03\d\d?","",line) line = re.sub("[\x01-\x1F]","",line) return line

Dardan answered 26/3, 2015 at 13:30 Comment(1)
It is incorrect to do it in several steps, you must do it with one substitution. Your code will for example convert '\x03\x033,012345' to '45', but it should convert it to '2345'.Kemme
A
1

I know I posted wanting a regex solution because it could be cleaner, I have created a non regex solution that works perfect.

def colourstrip(data):
    find = data.find('\x03')
    while find > -1:
        done = False
        data = data[0:find] + data[find+1:]
        if len(data) <= find+1:
            done = True
        try:
            assert int(data[find])
            data = data[0:find] + data[find+1:]
        except:
            done = True
        try:
            assert not done
            assert int(data[find])
            data = data[0:find] + data[find+1:]
        except:
            if not done and (data[find] != ','):
                done = True
        if (len(data) > find+1) and (data[find] == ','):
            try:
                assert not done
                assert int(data[find+1])
                data = data[0:find] + data[find+1:]
                data = data[0:find] + data[find+1:]
            except:
                done = True
            try:
                assert not done
                assert int(data[find])
                data = data[0:find] + data[find+1:]
            except: pass

        find = data.find('\x03')
    data = data.replace('\x1d','')
    data = data.replace('\x1f','')
    data = data.replace('\x16','')
    data = data.replace('\x0f','')
    return data

datastring = '\x0312,4This is coolour \x032,4This is too\x03'    
print(colourstrip(datastring))

Thank you for all the help everyone.

Akron answered 15/4, 2015 at 8:19 Comment(0)
B
0

I even had to add '\x0f', whatever use it has

regex = re.compile("\x0f|\x1f|\x02|\x03(?:\d{1,2}(?:,\d{1,2})?)?", re.UNICODE)
regex.sub('', msg)
Busywork answered 20/10, 2009 at 1:12 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.