Save HTML of some website in a txt file with python

Asked 19/6, 2014 at 1:5 Answered 26/9, 2019 at 6:47

Solved python html parsing python-3.x urllib

I need save the HTML code of any website in a txt file, is a very easy exercise but I have doubts with this because a have a function that do this:

import urllib.request

def get_html(url):
    f=open('htmlcode.txt','w')
    page=urllib.request.urlopen(url)
    pagetext=page.read() ## Save the html and later save in the file
    f.write(pagetext)
    f.close()

But this doesn't work.

Tieratierce answered 19/6, 2014 at 1:5 Comment(9)

You can ask your browser to save the HTML for a page. Why do it this way? There are programs like wget (on Unix/Linux, probably also on OSX, and also on Windows as part of CygWin) that can download a complete website. – Edition 19/6, 2014 at 1:9

Lots of programmers use python to download urls. I do. I guess I could hire a bunch of people to click save from the browser. I could send them email telling them which pages I want. But python is less expensive. – Baronage 19/6, 2014 at 1:13

I had a strange error, say something like: "No str, needed bytes" – Tieratierce 19/6, 2014 at 1:41

Great! The problem is that you need to convert the buffer to a string form. Pagetext=page.read().decode() is probably all you need. This gives you UTF8. – Baronage 19/6, 2014 at 2:3

Yes, your right! Finally I get it, thanks for all :D – Tieratierce 19/6, 2014 at 2:20

Don't put answers in the question. – Darice 28/10, 2014 at 19:3

I applied all what you all said using that code but i'm still getting an error saying UnicodeEncodeError: 'charmap' codec can't encode character '\u2665' in position :'( – Izettaizhevsk 27/11, 2017 at 3:39

@MohammedAminAIMEUR Hello, have you tried this one? import urllib2 def get_html(url): file("my_file.txt", "w").write(urllib2.urlopen(url).read()) if name == 'main': url=raw_input("Say me a website: ") get_htmll("http://"+url) – Tieratierce 27/11, 2017 at 12:40

well I've found a solution to it, i'm using python3 and the code i used is the following:

page = urllib.request.urlopen(url1)     f = open("./offlinesaved.html", "wb")     shutil.copyfileobj(page, f)     f.close()

– Izettaizhevsk 27/11, 2017 at 15:49

Easiest way would be to use urlretrieve:

import urllib

urllib.urlretrieve("http://www.example.com/test.html", "test.txt")

For Python 3.x the code is as follows:

import urllib.request    
urllib.request.urlretrieve("http://www.example.com/test.html", "test.txt")

Sulphuryl answered 19/6, 2014 at 1:18 Comment(1)

Thanks! I have done the next way, and working: import urllib2 def Obtener_Html(url): file("my_file.txt", "w").write(urllib2.urlopen(url).read()) if name == 'main': url=raw_input("Say me a website: ") Obtener_Html("http://"+url) – Tieratierce 19/6, 2014 at 1:42

I use Python 3.
pip install requests - after install requests library you can save a webpage in txt file.

import requests

url = "https://mcmap.net/q/656050/-save-html-of-some-website-in-a-txt-file-with-python"

r = requests.get(url)
with open('file.txt', 'w') as file:
    file.write(r.text)

Joeannjoed answered 26/9, 2019 at 6:47 Comment(1)

Might want to also check status_code to make sure that you are not running into http 404 or some server error. It should be http 200, ok=true – Arand 27/6, 2021 at 21:51

Recommended topics

Hot tags