Mechanize: too many values to unpack (expected 2)
Asked Answered
P

5

6

I have tried to write the following code, I am trying to write a code in Python 3.7 that just opens a web browser and the website fed to it in the Command Line:

Example.py

import sys

from mechanize import Browser
browser = Browser()

browser.set_handle_equiv(True)
browser.set_handle_gzip(True)
browser.set_handle_redirect(True)
browser.set_handle_referer(True)
browser.set_handle_robots(False)

# pretend you are a real browser
browser.addheaders = [('Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/71.0.3578.98 Safari/537.36')]

listOfSites = sys.argv[1:]
for i in listOfSites:
    browser.open(i)

I have entered the following command in the cmd:

python Example.py https://www.google.com

And I have the following traceback:

Traceback (most recent call last):
  File "Example.py", line 19, in <module>
    browser.open(i)
  File "C:\Python37\lib\site-packages\mechanize\_mechanize.py", line 253, in open
    return self._mech_open(url_or_request, data, timeout=timeout)
  File "C:\Python37\lib\site-packages\mechanize\_mechanize.py", line 283, in _mech_open
    response = UserAgentBase.open(self, request, data)
  File "C:\Python37\lib\site-packages\mechanize\_opener.py", line 188, in open
    req = meth(req)
  File "C:\Python37\lib\site-packages\mechanize\_urllib2_fork.py", line 1104, in do_request_
    for name, value in self.parent.addheaders:
ValueError: too many values to unpack (expected 2)

I am very new to Python. This is my first code here. I am stuck with the above traceback but haven't found the solution yet. I have searched for a lot of questions on SO community as well but they didn't seem to help. What should I do next?

UPDATE:

As suggested by @Jean-François-Fabre, in his answer, I have added 'User-agent' to the header, now there is no traceback, but still there is an issue where my link cannot be opened in the browser.

Here is how the addheader looks like now:

browser.addheaders = [('User-agent', 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/71.0.3578.98 Safari/537.36')]
Plutonian answered 5/2, 2019 at 6:24 Comment(0)
T
1

Here you go :)

import sys
from mechanize import Browser, Request


browser = Browser()

browser.set_handle_equiv(True)
browser.set_handle_gzip(True)
browser.set_handle_redirect(True)
browser.set_handle_referer(True)
browser.set_handle_robots(False)

# setup your header, add anything you want
header = {'User-Agent': 'Mozilla/5.0 (Windows NT 5.1; rv:14.0) Gecko/20100101 Firefox/14.0.1', 'Referer': 'http://whateveritis.com'}


url_list = sys.argv[1:]
for url in url_list:
    request = Request(url=url, data=None, headers=header)
    response = browser.open(request)
    print(response.read())
    response.close()
Teyde answered 11/2, 2019 at 13:17 Comment(8)
I am having an error as "ModuleNotFoundError: No module named 'urllib2"Plutonian
what is urllib2 here?Plutonian
I am still having this error "NameError: name 'urllib2' is not defined"Plutonian
I have used your updated code, still I am getting an errorPlutonian
"NameError: name 'urllib2' is not defined" this is the errorPlutonian
Do I need to install the Request package manually for this?Plutonian
Updated with just mechanize library onlyTeyde
Thanks man, it worked too, but the answer that I was seeking, was the best by @hyperTrashPanda, but your updates were the best suited, +1 for thatPlutonian
P
3

I have just found out a way around to this issue even if the above issue still exists. I am posting this only to let the readers know that we can do it this way too:

Instead of using the mechanize package, we can use the webbrowser package and write the following python code in the Example.py:

import webbrowser
import sys

#This is an upgrade suggested by @Jean-François Fabre
listOfSites = sys.argv[1:]

for i in listOfSites:
    webbrowser.open_new_tab(i)

Then we can run this python code by executing the following command in the terminal/command prompt:

python Example.py https://www.google.com https://www.bing.com

This command mentioned above in the example will open two sites at a time. One is Google and the other is Bing

Plutonian answered 5/2, 2019 at 9:25 Comment(2)
This is the correct solution as mechanize does not open a graphical user interface (AFAIK).Haleakala
@ThomasHesse are you sure, because when I was trying to do the above implementation, one of the methods out of a bunch of others was to use Mechanize. And the code worked for sometime, but the next day when I tried to run the code, this issue happened, not sure what happened in a single night.Plutonian
F
2

I don't know mechanize at all, but the traceback and variable names (and some googling) can help.

You're initializing addheaders with a list of strings. Some other examples (ex: Mechanize Python and addheader method - how do I know the newest headers?) show a list of tuples, which seem to match the traceback. Ex:

browser.addheaders = [('User-agent', 'Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9.0.1) Gecko/2008071615 Fedora/3.0.1-1.fc9 Firefox/3.0.1')]

so it unpacks properly into name and value in loop

for name, value in whatever.addheaders:

You have to add the 'User-agent' property name (you can pass other less common parameters than the browser name)

Frederickafredericks answered 5/2, 2019 at 6:28 Comment(11)
I have added 'User-agent' to the header, now the code runs fine, there is no traceback, but still the link is not opened in the browser.Plutonian
Thanks a lot for this upgrade listOfSites = sys.argv[1:]Plutonian
that probably doesn't help much for your issueNepenthe
yopur question has +3 score now, If it's not answered, you can place a bounty in a few. It could need it to attract more readers. Note: my answer would not be eligible (and it doesn't need to be)Nepenthe
Yeah, that's what I am thinking of doing since, no one is answering right for the mechanize issue.Plutonian
I think it's not answerable as is, but people can comment and ask for precisions later. Even 50 attracts a lot of peopleNepenthe
yeah, but no one is doing that either, so the last option is a bounty. Although I have figured a way around this issue by using webbrowser libraryPlutonian
By using webbrowser.open_new_tab(i) instead of browser.open(i)Plutonian
maybe you can answer your own question then, with this solutionNepenthe
Yeah let me try thatPlutonian
I have just answered my own questionPlutonian
U
1

Let me try to answer your question in parts:

  • You're correct in adding "Browser Headers". Many servers might outright drop your connection as it's a definite sign of being crawled by a bot.

  • mechanize as stated by the docs "is a tool for programmatic web browsing".
    This means it is primarily used to crawl webpages, parse their contents, fill forms, click on things, send requests, but not using a "real" web browser, with parts such as CSS rendering. For example you cannot open a page and take a screenshot, as there's not something "rendered", and to achieve this you would need to save the page, and render it using another solution.

  • If this suits you needs, check headless browsers as a technology, there are a lot of them. In the Python ecosystem, other than mechanize, I'd check "headless chromium", as "phantomjs" is unfortunately discontinued.

But if I understand correctly, you need the actual web browser to open up with the webpage, right? For this reason, you actually need, well, a browser in your system to take care of that!

Case 1 : Use your native system's browser

Find out where your browser's executable lies in your system. For example, my Firefox executable lies in "C:\Program Files\Mozilla Firefox\firefox.exe" and add it to your PATH.

As you're using Windows, use the start menu to navigate to Advanced System Settings --> Advanced --> Environment Variables, and add the path above to your PATH variable.

If you're using Linux export PATH=$PATH:"/path/to/your/browser" will take care of things.

Then, your code can run as simply as

import subprocess
import sys

listOfSites = sys.argv[1:]
links = ""
for i in listOfSites:
    links += "-new-tab " + i
print(links)
subprocess.run(["firefox", links])

Firefox will open new windows, one for each of the links you have provided.

Case 2 : Use selenium

Then comes Selenium, which in my opinion is the most mature solution to browser-related problems, and what most people use. I've used it in a production setting with very good results. It provides both the UI/frontend of a browser that renders the webpages, but also allows you to programmatically work with these webpages.

It needs some setup, (for example, if you're using Firefox, you'll need to download the geckodriver executable from their releases page, and then add it your PATH variable again.

Then you define your webdriver, spawn one for each of the websites you need to visit, and get the webpage. You can also take screenshot, as a proof that the page has been rendered correctly.

from selenium import webdriver
import sys

listOfSites = sys.argv[1:]
for i in listOfSites:
    driver = webdriver.Firefox()
    driver.get('http://'+i)
    driver.save_screenshot(i+'-screenshot.png')

# When you're finished
# driver.quit()

I've tested both of these code snippets, and they work as expected. Please let me know how all these sound, and if you need any more additional information..! ^^

Uribe answered 11/2, 2019 at 13:10 Comment(10)
I am just trying to create a script to surf the net, its not a dependency to use actual web browser, and Selenium is something that is mostly used by Automation Testers, I primarily need to focus on python as this something that I need to learn as best as possible for AI development. Can you please elaborate more how can I avoid an actual browser but still surf the net?Plutonian
What does "surf the net" consist of? If you want to render and display the webpages, you need a browser, and I don't think you can replace its functionality, (or at least I don't know of a way).. If you want to just crawl from URL to URL and obtain the HTML only, mechanize or webbrowser are good to go, but I'd recommend also looking at the requests package, but you won't have a graphical user interface. Selenium is indeed primarily used for automation testing, but it's quite simple to grasp, and I think has its uses for other tasks in the industry.Uribe
Its okay, I dont need a graphical interface, all I need is to search and get the search result and process form data, what data should be passed and what should be the value that I need from that query.. that's what I am hoping for in surf the netPlutonian
How can I use mechanize / webbrowser / requests package for this, since I am very new to python. Is there any link or API that I refer to for the reference of these packages?Plutonian
Ah, okay, now I understand your requirements in a better way, thanks for the clarification! For your specific use case, the best thing IMO, is indeed mechanize. Keep the official docs nearby, as well as these and these tutorials, and start hacking away! StackOverflow is also here for questionsUribe
Final thing, if you need to load and execute javascript in your webpages, Selenium is a must, but for now, I think you're free to ignore it.Uribe
I dont need to load Javascript into the webpage, I am just trying to create a bot that can surf the net, lets see what else I can explore in this while making this. I am seriously excited to learn about AI and how can I make one using Python.Plutonian
Have fun in doing so! Sentiment Analysis in webpages is a good idea that can be explored, and could be straightforward if you have the parsed data. Alexa's Top 500 list is a nice starting point.Uribe
is there any way to debug python script in notepad++, or can you suggest me some other python editor that can help me debug the code too?Plutonian
Notepad++ (and most text editors) don't provide debugging capabilities AFAIK, so you'd have to either use pdb, gdb to debug from the command line, or a proper Python IDE such as Pycharm, Eclipse, Spyder, you have lots to choose from.Uribe
T
1

Here you go :)

import sys
from mechanize import Browser, Request


browser = Browser()

browser.set_handle_equiv(True)
browser.set_handle_gzip(True)
browser.set_handle_redirect(True)
browser.set_handle_referer(True)
browser.set_handle_robots(False)

# setup your header, add anything you want
header = {'User-Agent': 'Mozilla/5.0 (Windows NT 5.1; rv:14.0) Gecko/20100101 Firefox/14.0.1', 'Referer': 'http://whateveritis.com'}


url_list = sys.argv[1:]
for url in url_list:
    request = Request(url=url, data=None, headers=header)
    response = browser.open(request)
    print(response.read())
    response.close()
Teyde answered 11/2, 2019 at 13:17 Comment(8)
I am having an error as "ModuleNotFoundError: No module named 'urllib2"Plutonian
what is urllib2 here?Plutonian
I am still having this error "NameError: name 'urllib2' is not defined"Plutonian
I have used your updated code, still I am getting an errorPlutonian
"NameError: name 'urllib2' is not defined" this is the errorPlutonian
Do I need to install the Request package manually for this?Plutonian
Updated with just mechanize library onlyTeyde
Thanks man, it worked too, but the answer that I was seeking, was the best by @hyperTrashPanda, but your updates were the best suited, +1 for thatPlutonian
G
-1

Same as above. Guess I didn't read all the answers before digging in. LOL

import sys
import webbrowser

from mechanize import Browser
browser = Browser()

browser.set_handle_equiv(True)
browser.set_handle_gzip(True)
browser.set_handle_redirect(True)
browser.set_handle_referer(True)
browser.set_handle_robots(False)

# pretend you are a real browser
browser.addheaders = [('User-agent', 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/71.0.3578.98 Safari/537.36')]

listOfSites = sys.argv[1:]
for i in listOfSites:
    webbrowser.open(i)
Ghats answered 7/2, 2019 at 16:37 Comment(1)
This didn't help. Otherwise, it wont be a questionPlutonian

© 2022 - 2024 — McMap. All rights reserved.