How can I download the chat history of a group in Telegram?
Asked Answered
I

5

22

I would like to download the chat history (all messages) that were posted in a public group on Telegram. How can I do this with python?

I've found this method in the API https://core.telegram.org/method/messages.getHistory which I think looks like what I'm trying to do. But how do I actually call it? It seems there's no python examples for the MTproto protocol they use.

I also looked at the Bot API, but it doesn't seem to have a method to download messages.

Intermediacy answered 9/6, 2017 at 22:3 Comment(2)
bot api has no access to messages. You need telegram API for this.Pulchritude
You may also want to take a look at github.com/LonamiWebs/Telebackup to get some inspiration (the project has still a lot of flaws), a Python program which sole purpose is to download the history of a chat (hence creating a backup of it).Rachal
P
13

You can use Telethon. Telegram API is fairly complicated and with the telethon, you can start using telegram API in a very short time without any pre-knowledge about the API.

pip install telethon

Then register your app (taken from telethon):

                  

the link is: https://my.telegram.org/

Then to obtain message history of a group (assuming you have the group id):

chat_id = YOUR_CHAT_ID
api_id=YOUR_API_ID
api_hash = 'YOUR_API_HASH'

from telethon import TelegramClient
from telethon.tl.types.input_peer_chat import InputPeerChat

client = TelegramClient('session_id', api_id=api_id, api_hash=api_hash)
client.connect()
chat = InputPeerChat(chat_id)

total_count, messages, senders = client.get_message_history(
                        chat, limit=10)

for msg in reversed(messages):
    # Format the message content
    if getattr(msg, 'media', None):
        content = '<{}> {}'.format(  # The media may or may not have a caption
        msg.media.__class__.__name__,
        getattr(msg.media, 'caption', ''))
    elif hasattr(msg, 'message'):
        content = msg.message
    elif hasattr(msg, 'action'):
        content = str(msg.action)
    else:
        # Unknown message, simply print its class name
        content = msg.__class__.__name__

    text = '[{}:{}] (ID={}) {}: {} type: {}'.format(
            msg.date.hour, msg.date.minute, msg.id, "no name",
            content)
    print (text)

The example is taken and simplified from telethon example.

Pulchritude answered 11/6, 2017 at 1:36 Comment(6)
Please note if your group is a supergroup you need to treat is as a channel, and use InputPeerChannel method, which takes both chat_id and access_hash of the group.Pulchritude
Also note that InteractiveTelegramClient will stop being shipped with the pip version, since it was only ever meant to be an example. Use TelegramClient instead.Rachal
@Rachal Thanks for the tip. Updated the answer to use TelegramClient instead of InteractiveTelegramClient. Also thanks for posting an example using TelegramClient on the github page. It was really helpful!Pulchritude
how many messages could I download with this script? 5k or unlimited?Thanks!Friederike
Could you please update the code because it raises error as shown here gist.github.com/aabdulwahed/5e785a0bd911d9f4f47256b5b25f3c42Boone
For people coming here in the future, this answer applies to Telethon prior to 1.0 (that is, 0.x versions). For a 1.x solution see my answer below.Rachal
H
7

With an update (August 2018) now Telegram Desktop application supports saving chat history very conveniently. You can store it as json or html formatted.

To use this feature, make sure you have the latest version of Telegram Desktop installed on your computer, then click Settings > Export Telegram data.

https://telegram.org/blog/export-and-more

Hornbeam answered 2/12, 2018 at 19:6 Comment(3)
this is the real thing.Perfectible
Yes but programmatically it can be useful to automate the process. Imagine you need to save 100 chats, you click manually to export on each of them or you do a nice script you can run as a cron job? I prefer the python way tbh ;)Irretentive
When I tried to export history, telegram just failed to download some images this is why I'm researching ways to schedule downloading of history and conveniently convert it to some formatChihuahua
R
6

The currently accepted answer is for very old versions of Telethon. With Telethon 1.0, the code can and should be simplified to the following:

# chat can be:
# * int id (-12345)
# * str username (@chat)
# * str phone number (+12 3456)
# * Peer (types.PeerChat(12345))
# * InputPeer (types.InputPeerChat(12345))
# * Chat object (types.Chat)
# * ...and many more types
chat = ...
api_id = ...
api_hash = ...

from telethon.sync import TelegramClient

client = TelegramClient('session_id', api_id, api_hash)

with client:
    # 10 is the limit on how many messages to fetch. Remove or change for more.
    for msg in client.iter_messages(chat, 10):
        print(msg.sender.first_name, ':', msg.text)

Applying any formatting is still possible but hasattr is no longer needed. if msg.media for example would be enough to check if the message has media.

A note, if you're using Jupyter, you need to use async directly:

from telethon import TelegramClient

client = TelegramClient('session_id', api_id, api_hash)

# Note `async with` and `async for`
async with client:
    async for msg in client.iter_messages(chat, 10):
        print(msg.sender.first_name, ':', msg.text)
Rachal answered 2/4, 2020 at 12:29 Comment(4)
the chat variable should be evaluated by chat = InputPeerChat(chat_id)?Manizales
Then what is the chat variable here? is it the same as chat_id?Manizales
The chat is just an example indicating that "anything that can be interpreted as a chat" will work. Yes, input peers or chat IDs work fine.Rachal
How can i fetch message history from group chat instead?Tevis
R
2

Now, you can use TDesktop to export chats.

Here is the blog post about Aug 2018 update.


Original Answer:

Telegram MTProto is hard to use to newbies, so I recommend telegram-cli.

You can use third-party tg-export script, but still not easy to newbies too.

Rhinitis answered 9/6, 2017 at 22:45 Comment(0)
S
0

You can use the Telethon library. for this you need to register your app and connect your client code to it (look at this). Then to obtain message history of a entry (such as channel, group or chat):

from telethon.sync import TelegramClient
from telethon.errors import SessionPasswordNeededError


client = TelegramClient(username, api_id, api_hash, proxy=("socks5", proxy_ip, proxy_port))  # if in your country telegram is banned, you can use the proxy, otherwise remove it.
client.start()

# for login
if not client.is_user_authorized():
    client.send_code_request(phone)
    try:
        client.sign_in(phone, input('Enter the code: '))
    except SessionPasswordNeededError:
        client.sign_in(password=input('Password: '))

async for message in client.iter_messages(chat_id, wait_time=0):            
    messages.append(Message(message))
        # write your code
Sportswoman answered 20/4, 2021 at 21:41 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.