Telegram does not escape some markdown characters
Asked Answered
B

13

40

Telegram does not escape some markdown characters, for example:

  • This works fine

    _test\_test_

  • But this return parse error

    *test\*test*

What I do wrong?

Boudreaux answered 16/11, 2016 at 8:7 Comment(1)
Have you faced the similar issue for MarkdownV2 core.telegram.org/bots/api#markdownv2-styleMilitarize
R
38
String escapedMsg = toEscapeMsg
    .replace("_", "\\_")
    .replace("*", "\\*")
    .replace("[", "\\[")
    .replace("`", "\\`");

Do not escape ] character. If [ is escaped, ] is treated like a normal character.

Rambling answered 19/4, 2018 at 15:7 Comment(6)
This works fine for me with the current Telegram version 1.5.6Berthold
This fails on text like "test_test". E.g. it will produce message "test\_test".Acolyte
For escaping more than one character, it'd be best to use regular expressions with the global flag.Wayless
This fixed it for me! Cheers!Shirt
When escaping multiple dots . try .replace(/\./g, "\\.") worked for meTrunnion
There are a lot more symbols to replace. Try to send whole ASCII symbols to test. Its better to use ParseMode.HTML if you can't control message contentJaniecejanifer
A
14

The only workaround is to use HTML in the parse_mode

https://core.telegram.org/bots/api#html-style

Abecedarian answered 28/3, 2018 at 15:37 Comment(2)
So sad but true!Vicky
still need replace "<" with "&lt;" because it will be read as tag start and probably unrecognizedJaniecejanifer
P
12

according to telegram API

In all other places characters '_', '*', '[', ']', '(', ')', '~', '`', '>', '#', '+', '-', '=', '|', '{', '}', '.', '!' must be escaped with the preceding character '\'.

so here is function I've used

text
  .replace(/\_/g, '\\_')
  .replace(/\*/g, '\\*')
  .replace(/\[/g, '\\[')
  .replace(/\]/g, '\\]')
  .replace(/\(/g, '\\(')
  .replace(/\)/g, '\\)')
  .replace(/\~/g, '\\~')
  .replace(/\`/g, '\\`')
  .replace(/\>/g, '\\>')
  .replace(/\#/g, '\\#')
  .replace(/\+/g, '\\+')
  .replace(/\-/g, '\\-')
  .replace(/\=/g, '\\=')
  .replace(/\|/g, '\\|')
  .replace(/\{/g, '\\{')
  .replace(/\}/g, '\\}')
  .replace(/\./g, '\\.')
  .replace(/\!/g, '\\!')

But keep in mind, this means if you mean to use *some text* as bold text this script will render *some text* instead without applying the bold effect

Passionate answered 1/3, 2022 at 19:53 Comment(2)
Yeah, use a capture group instead lol text.replace(/([|{\[\]*_~}+)(#>!=\-.])/gm, '\\$1');`Suzannasuzanne
I have ended up using this line in python: text = re.sub(r'[_*[\]()~>#\+\-=|{}.!]', lambda x: '\\' + x.group(), text)Radiance
E
8

Actually both are getting error.

{
  "ok": false,
  "error_code": 400,
  "description": "Bad Request: Can't parse message text: Can't find end of the entity starting at byte offset 11"
}

sounds like Telegram doesn't support escape characters for markdown, so i suggest you to use HTML instead: <b>test*test</b>

Elsaelsbeth answered 16/11, 2016 at 12:4 Comment(2)
This display bold string test\test without '*'. But I want display bold test*test...Boudreaux
Yes you're right. Telegram has some issues in escape characters. i've edited my answer and suggested another way. Maybe report this bug to telegram bot support.Elsaelsbeth
H
6

Ironically if the parse_mode argument is set to markdown instead of using the ParseMode.MARKDOWN_V2 constant in the python without escaping any character send_message function works fine.

Harvey answered 29/11, 2021 at 16:54 Comment(0)
B
4

You should use '\\' to escape markup tokens *_[`, i.e. send this instead:

*test\\*test*

Basset answered 12/4, 2018 at 13:9 Comment(2)
@Oleg, you are wrong. Have you tried to send this sequence directly in Insomnia or other REST tool, or from your code? If your language use `` as escape characher you have to double it.Basset
It doesn't relate on language, because Telegram itself doesn't support this feature. For testing I've used Postman. Furthermore you can find in Telegram documentation that they provide some of HTML entities (which is the same as escaping) core.telegram.org/bots/api#markdown-styleAbecedarian
S
2

Use MarkdownV2.

{
    "chat_id": telegram_chat_id,
    "parse_mode": "MarkdownV2",
    "text": "*test\*test*"
}

Just escape these characters:

'_', '*', '[', ']', '(', ')', '~', '`', '>', '#', '+', '-', '=', '|', '{', '}', '.', '!'

and telegram will handle the rest.

Saltine answered 9/11, 2021 at 4:6 Comment(1)
Nice answer. If you need to escape just escape.Airt
B
2
function escapeMarkdown(text) {
    return text.replace(/[_*[\]()~`>#\+\-=|{}.!]/g, '\\$&'); 
}

// Example usage

const originalText = '_*Hello [World]*~';
const escapedText = escapeMarkdown(originalText);
console.log(escapedText); // Result: '\\_\\*Hello \\[World\\]\\*\\~'
Bare answered 19/8, 2023 at 5:11 Comment(0)
M
0
{{ $json.content.replace(/_|[*`~>#+=|{}()\\.!-]/g, "\\$&") }}

This version helped to me.

Mediatorial answered 19/6, 2023 at 22:42 Comment(1)
Your answer could be improved with additional supporting information. Please edit to add further details, such as citations or documentation, so that others can confirm that your answer is correct. You can find more information on how to write good answers in the help center.Perron
U
0

In python

import re

REFACTOR_REGEX = r"(?<!\\)(_|\*|\[|\]|\(|\)|\~|`|>|#|\+|-|=|\||\{|\}|\.|\!)"
text = re.sub(REFACTOR_REGEX, lambda t: "\\"+t.group(), text)

In JS

const REFACTOR_REGEX = /(?<!\\)(_|\*|\[|\]|\(|\)|\~|`|>|#|\+|-|=|\||\{|\}|\.|!)/g;
text = text.replace(REFACTOR_REGEX , (match) => "\\" + match);
Uncanny answered 12/4 at 21:33 Comment(0)
M
0

I had an issue with escaping symbols on my C# app, and the problem with plain replacement is that it works poorly with formatting symbols added intentionally (like `,```,_ or * blocks).

Now that I faced the exact same issue on python, I've made a poor python translation of the C# code.

Logic behind is we iterate through lines, check if symbol is special or formatting, check if it is closed in the same line, and treat them differently.

Similar with ``` block, but termination is looked for in the end.   Might've missed a few moments, but so far it works for me.

SPECIAL_SYMBOLS = "[]()~>#+-=|{}.!''"
FORMAT_SYMBOLS = "*_~"

def escape_string(input, add_closing_code_block=True):
    if add_closing_code_block and len(input.split("```")) % 2 == 0:
        input += "\n```"

    inside_code_block = False
    inside_inline_code = False
    
    inside_blocks = {
        "*": False,
        "**" : False,
        "_": False,
        "__": False
    }
    result = []

    i = 0
    while i < len(input):
        if code_block_start_at(input, i):
            inside_code_block = not inside_code_block
            result.append("```")
            i += 3
            continue

        if inside_code_block:
            i = handle_inside_code_block(input, result, i)
            i += 1
            continue

        if inside_inline_code:
            inside_inline_code = handle_inside_inline_code(input, result, i)
        else:
            i, inside_inline_code, inside_blocks = handle_outside_inline_code(input, result, i, inside_blocks)

        i += 1

    return ''.join(result)

def handle_inside_code_block(input, sb, index):
    if special_symbol_at(input, index):
        sb.append(input[index])
    elif inline_code_at(input, index):
        sb.append("\\`")
    elif format_symbol_at(input, index):
        sb.append(input[index])
    elif code_block_start_at(input, index):
        sb.append("\\`\\`\\`")
        index += 2
    else:
        sb.append(input[index])
    return index

def handle_inside_inline_code(input, sb, index):
    inside_inline_code = True
    is_special = special_symbol_at(input, index)
    is_format = format_symbol_at(input, index)
    if is_special or is_format:
        sb.append('\\')
        sb.append(input[index])
    elif code_block_start_at(input, index):
        sb.append("\\`\\`\\`")
        index += 2
    elif inline_code_at(input, index):
        inside_inline_code = False
        sb.append('`')
    else:
        sb.append(input[index])
    return inside_inline_code

def handle_outside_inline_code(input, sb, index, inside_blocks):
    inside_inline_code = False
    if input[index:index+2] == '**':
        if inside_blocks["**"]:
            sb.append('**')
            inside_blocks["**"] = False
            index += 1
        elif inline_code_has_closing_in_line(input, index, '**'):
            sb.append('**')
            inside_blocks["**"] = True
            index += 1
        else:
            sb.append('\\**')
            index += 1
    elif input[index:index+2] == '__':
        if inside_blocks["__"]:
            sb.append('__')
            inside_blocks["__"] = False
            index += 1
        elif inline_code_has_closing_in_line(input, index, '__'):
            sb.append('__')
            inside_blocks["__"] = True
            index += 1
        else:
            sb.append('\\__')
            index += 1
    elif input[index] == '*':
        if inside_blocks["*"]:
            sb.append('*')
            inside_blocks["*"] = False
        elif inline_code_has_closing_in_line(input, index, '*'):
            sb.append('*')
            inside_blocks["*"] = True
        else:
            sb.append('\\*')
    elif input[index] == '_':
        if inside_blocks["_"]:
            sb.append('_')
            inside_blocks["_"] = False
        elif inline_code_has_closing_in_line(input, index, '_'):
            sb.append('_')
            inside_blocks["_"] = True
        else:
            sb.append('\\_')
    elif special_symbol_at(input, index):
        sb.append('\\')
        sb.append(input[index])
    elif format_symbol_at(input, index):
        sb.append('\\')
        sb.append(input[index])
    elif inline_code_at(input, index):
        if inline_code_has_closing_in_line(input, index, '`'):
            inside_inline_code = True
            sb.append('`')
        else:
            sb.append("\\`")
    elif code_block_start_at(input, index):
        sb.append("\\`\\`\\`")
        index += 2
    else:
        sb.append(input[index])
    return index, inside_inline_code, inside_blocks

def code_block_start_at(input, index):
    return index + 2 < len(input) and input[index] == '`' and input[index + 1] == '`' and input[index + 2] == '`'

def inline_code_at(input, index):
    return input[index] == '`' and not code_block_start_at(input, index)

def special_symbol_at(input, index):
    return input[index] in SPECIAL_SYMBOLS

def format_symbol_at(input, index):
    return input[index] in FORMAT_SYMBOLS

def inline_code_has_closing_in_line(input, index, symbol):
    return has_closing_symbol_in_line(input, index, symbol)

def has_closing_symbol_in_line(input, index, symbol):
    search_start = index + len(symbol)
    end_of_line = input.find('\n', search_start)
    if end_of_line == -1:
        end_of_line = len(input)
    possible_closing_index = input.find(symbol, search_start)
    return possible_closing_index != -1 and possible_closing_index <= end_of_line and possible_closing_index != index + 1

Malia answered 15/5 at 19:15 Comment(1)
...nvm, I think I've fixed the double __ or ** cases.Malia
E
0

To add to the list, here's a Rust version of the MarkdownV2 escape:

use std::fmt::{self, Write};

pub struct MD2Esc<'a>(pub &'a str);
impl<'a> fmt::Display for MD2Esc<'a> {
    fn fmt(&self, ft: &mut fmt::Formatter<'_>) -> fmt::Result {
        for ch in self.0.chars() {
            match ch {
                '_' | '*' | '[' | ']' | '(' | ')' | '~' | '`' | '>' | '#' | '+' | '-' | '='
                | '|' | '{' | '}' | '.' | '!' => {
                    ft.write_char('\\')?;
                    ft.write_char(ch)?
                }
                _ => ft.write_char(ch)?,
            }
        }
        fmt::Result::Ok(())
    }
}

fn main() {
    println!("{}", MD2Esc("a-b-c"));
}

playground

Eckardt answered 8/6 at 5:9 Comment(0)
W
-2

pdenti's answer only replaces the first character found in the message. Use regular expressions with the global tag to replace all of them.

String escapedMsg = toEscapeMsg
    .replace(/_/g, "\\_")
    .replace(/\*/g, "\\*")
    .replace(/\[/g, "\\[")
    .replace(/`/g, "\\`");
Wayless answered 10/9, 2019 at 20:17 Comment(1)
that is not correct. Every occurrence gets replaced, as from javadocs: docs.oracle.com/javase/8/docs/api/java/lang/…Rambling

© 2022 - 2024 — McMap. All rights reserved.