Using Python to ask a web page to run a search

Asked 19/12, 2012 at 22:2 Answered 25/6 at 12:20

I have a list of protein names in the "Uniprot" format, and I'd like to convert them all to the MGI format. If you go to www.uniprot.org and type the uniprot protein name into the "Query" bar, it will generate a page with a bunch of information about that protein, including its MGI name (albeit much further down the page).

For example, one Uniprot name is "Q9D880", and by scrolling down, you can see that its corresponding MGI name is "1913775".

I already know how to use Python's urllib to extract the MGI name from a page once I get to that page. What I don't know how to do is write Python code to get the main page to run a query of "Q9D880". My list contains 270 protein names, so it would be nice to avoid copying&pasting each protein name into the Query bar.

I saw the "Google Search from a Python App" post, and I have a firmer understanding of this concept, but I suspect that running a google search is different from running the search function on some other website, like uniprot.org.

I'm running Python 2.7.2, but I'm open to implementing solutions that use other versions of Python. Thanks for the help!

Rosalynrosalynd answered 19/12, 2012 at 22:2 Comment(2)

Look at the url you get when you perform a Query: uniprot.org/uniprot/Q9D880 If you look really hard you can figure out where your query went... – Mcalpine 19/12, 2012 at 22:4

I know nothing about web developing, but even I should have been able to see this! Thanks! – Rosalynrosalynd 19/12, 2012 at 22:31

Easier way to do this is with the requests library. My solution for you also grabs the information itself from the page using BeautifulSoup4.

All you'd have to do, given a dictionary of your query parameters, is:

from bs4 import BeautifulSoup as BS
for protein in my_protein_list:
    text = requests.get('http://www.uniprot.org/uniprot/' + protein).text
    soup = BS(text)
    MGI = soup.find(name='a', onclick="UniProt.analytics('DR-lines', 'click', 'DR-MGI');").text
    MGI = MGI[4:]
    print protein +' - ' + MGI

Diffractometer answered 19/12, 2012 at 22:20 Comment(5)

I'm getting a name error when my program reaches soup = BS(text) It says name 'BS' is not defined. Any ideas? – Rosalynrosalynd 20/12, 2012 at 0:21

yeah sorry, forgot to explicitly say the import, look now – Diffractometer 20/12, 2012 at 0:37

Thanks, that helped. Unfortunately, I'm running into another problem now. The line MGI = soup.find(name.... is returning a "None" type. I know that that element IS on the webpage, so I went to the troubleshooting section of the BS documentation. It suggested upgrading my parser by downloading lxml. Is that something you've done already? If so, maybe you could help me figure out how to download the two requirements, "libxml2 2.6.21 or later" and "libxslt 1.1.15 or later". At this url xmlsoft.org/libxml2 I just see a really long and confusing list of files. Don't know where to begin. – Rosalynrosalynd 21/12, 2012 at 0:5

@Rosalynrosalynd No, no need for any of that, it was my bad--I put the URL as http://www.uniprot.org/' + protein instead of http://www.uniprot.org/uniprot/' + protein, as it should be. Try again, look at my update. – Diffractometer 21/12, 2012 at 0:17

@Rosalynrosalynd glad to hear that. If your question has been answered, it would benefit the community for you to check the best answer to your question. – Diffractometer 21/12, 2012 at 3:39

Running the search appears to do a GET on

http://www.uniprot.org/?dataset=uniprot&query=Q9D880&sort=score&url=&lucky=no&random=no

Which eventually redirects you to

http://www.uniprot.org/uniprot/Q9D880

So you should be able to use urllib or an http library (I use httplib2) to do a GET on that address, parameterizing the protein name in the URL so you can search for whichever protein name you want.

Bumbailiff answered 19/12, 2012 at 22:14 Comment(1)

I have a similar problem, but it doesn't redirect to a predictable URL. All of the answers to this question are for this particular problem, but don't address the more general problem. – Karmen 25/12, 2020 at 17:19

You can also do this with PyQuery:

>>> from pyquery import PyQuery as pq    
>>> url = "http://www.uniprot.org/uniprot/{name}"
>>> name = "Q9D880"
>>> response = pq(url=url.format(name=name))
>>> print html("a").filter(lambda e: pq(this).text().startswith("MGI:")).text()
MGI:1913775

Koster answered 19/12, 2012 at 22:32 Comment(0)

The query is in the URL, you can call:
http://www.uniprot.org/uniprot/?query=1913775&sort=score

I didn't have time to test this script since I don't have 2.x installed anymore butthe code in 2.x should be something like this:

import urllib
MGIName = "1913775"
print urllib.urlopen(
    "http://www.uniprot.org/uniprot/?query="+ MGIName +"&sort=score").read()

The code in 3.2 I ran was this and it worked fine:

>>> import urllib.request
>>> MGIName = "1913775"
>>> print(urllib.request.urlopen("http://www.uniprot.org/uniprot/?query="+ MGIName +"&sort=score").read())

Just loop the MGIname over the list of names

Cinematography answered 19/12, 2012 at 22:22 Comment(0)

You can use webbrowser library, code like this:

import webbrowser

webbrowser.open(URL)

Schopenhauerism answered 25/6 at 12:20 Comment(0)

Recommended topics

Hot tags