Is there a way to validate telegram bot api's markdown syntax?
Asked Answered
C

3

6

I'm currently working on with telegram Bot API, but I have to validate the markdown syntax to prevent parse errors. But telegram bot api's markdown doesn't follow the regular markdown syntax so I'm kind of struggling how to do it. Is there a proper way to validate it? Or is there such kind of library that I can use?

Coolie answered 5/2, 2021 at 4:11 Comment(0)
H
1

Probably not the answer OP was expecting, but still sharing so others may find this useful.

I've been trying to 'validate' the MarkDown messages to prevent the Bad Request: can't parse entities: error received by Telegram. For example, the same issue this user encountered

Unfortunately I was unable to parse this with 100% accuracy, probably because (as you already mentioned) Telegram doesn't use the default Markdown syntax.


My 'solution' as I've implemented in quite some bots, and is working decent.

After sending a message (I've created a custom function to prevent duplicate code), check if Telegram responded with the Bad Request: can't parse entities error, If so, send the same message again, but this time with HTML parse_mode, this way there won't be any parse errors.

Not the most clean solution, but it gets the message to the user, and that was my greatest concern.

Hades answered 5/2, 2021 at 14:46 Comment(0)
E
1

I have implemented this validation function (Golang). Works pretty good for telegram's markdown:

func GetUnclosedTag(markdown string) string {
    // order is important!
    var tags = []string{
        "```",
        "`",
        "*",
        "_",
    }
    var currentTag = ""

    markdownRunes := []rune(markdown)

    var i = 0
outer:
    for i < len(markdownRunes) {
        // skip escaped characters (only outside tags)
        if markdownRunes[i] == '\\' && currentTag == "" {
            i += 2
            continue
        }
        if currentTag != "" {
            if strings.HasPrefix(string(markdownRunes[i:]), currentTag) {
                // turn a tag off
                i += len(currentTag)
                currentTag = ""
                continue
            }
        } else {
            for _, tag := range tags {
                if strings.HasPrefix(string(markdownRunes[i:]), tag) {
                    // turn a tag on
                    currentTag = tag
                    i += len(currentTag)
                    continue outer
                }
            }
        }
        i++
    }

    return currentTag
}
func IsValid(markdown string) bool {
    return GetUnclosedTag(markdown) == ""
}

func FixMarkdown(markdown string) string {
    tag := GetUnclosedTag(markdown)
    if tag == "" {
        return markdown
    }
    return markdown + tag
}
Entrap answered 1/4, 2023 at 7:47 Comment(0)
S
0

Better way seems to be convert markdown to html because in my case there is example "Calculation is: 15 * 10" and telegram parser fails on it... Here is typescript function that works fine for now:

export const markdownToHtml = (markdown: string): string => {
  // Escape html tags
  markdown = markdown.replace(/[&<>"']/g, (match) => {
    switch (match) {
      case '&':
        return '&amp;'
      case '<':
        return '&lt;'
      case '>':
        return '&gt;'
      case '"':
        return '&quot;'
      case "'":
        return '&#39;'
      default:
        return match
    }
  })
  // Combine header replacements into one regex
  markdown = markdown.replace(/^(#{1,6}) (.*)$/gim, (_, __, p2) => {
    return `<b>${p2}</b>`
  })

  // Convert code blocks
  markdown = markdown.replace(/```(\w+)?\n([\s\S]*?)```/g, (_, p1, p2) => {
    if (p1) {
      return `<pre language="language-${p1}">${p2}</pre>`
    }
    return `<code>${p2}</code>`
  })

  // Convert inline code
  markdown = markdown.replace(/`([^`]+)`/g, '<code>$1</code>')

  // Combine bold, italic, underline, and strikethrough replacements into one regex
  markdown = markdown.replace(/(\*\*|__)(.*?)\1/g, '<b>$2</b>') // Bold
  markdown = markdown.replace(/(\*|_)(.*?)\1/g, '<em>$2</em>') // Italic
  markdown = markdown.replace(/~~(.*?)~~/g, '<s>$1</s>') // Strikethrough

  // Convert links
  markdown = markdown.replace(/\[([^\]]+)\]\(([^)]+)\)/g, '<a href="$2">$1</a>')

  // Convert images
  markdown = markdown.replace(/!\[([^\]]*)\]\(([^)]+)\)/g, '<a src="$2">$1</a>')

  // Return empty space if there is no string
  return markdown.trim() || 'ㅤ'
}
Sulphuryl answered 19/6, 2024 at 12:58 Comment(0)

© 2022 - 2025 — McMap. All rights reserved.