Why is this python request returning the same value each time?
Asked Answered
A

3

5

I'm using they python requests library for the first time, and I am confused. When I run the function below in a for loop, with different base URLs, it appears to receive a response, but the content that is returned is the same for both URLs.

If I look at the API URL in my browser, then I see that it's the content for the first URL that's being returned both times. What am I missing?

base_urls = ['http://kaweki.wikia.com/','http://solarmovie.wikia.com/']


def getEdits(wikiObj, limit=500):     
    payload = {'limit': limit}                             
    r = requests.get('{}api/v1/Activity/LatestActivity'.format(wikiObj),
                     params=payload)
    edits = r.json()
    return edits['items']

for url in base_urls:
    print getEdits(url)  
Amberambergris answered 5/3, 2015 at 20:41 Comment(5)
I looked it over and can't see anything that's obviously problematic. Assuming both of the destination urls behave the same and have different content, then I can't explain why the results would appear the same.Melitamelitopol
Could it be a bug on their side? Maybe it's caching where it shouldn't?Amberambergris
I doubt it's caching that's the problem exactly, but I agree that it's totally possible the API is broken. It may also simply be hard to use, returning irrelevant results because of a non-obvious usage detail.Melitamelitopol
It's not only Python, it's the same when you place the request using curl. Funny enough, when you run the requests in your browser (chrome) you *will get different results. I tried adding the header 'Cache-Control': 'no-cache' to the request but it didn't solve it. In order to further debug we need to see the server-side apache logs to see why it treats it as similar requests when it's done through code/curl and why it treats it as different requests when they're placed through a browser.Yellowgreen
@Amberambergris problem solved - see more details in my answer below.Yellowgreen
Y
5

There is a bug on the server side which ignores cache-control headers and such for a period of time.

Introducing sleep of 5 secs (maybe even shorter periods) works around the bug. I've marked the lines that were added below:


import requests
import json
from time import sleep #ADDED

base_urls = ['http://kaweki.wikia.com/', 'http://solarmovie.wikia.com/']


def getEdits(wikiObj, limit=500):       
    payload = {'limit': limit}   
    url = '{}api/v1/Activity/LatestActivity'.format(wikiObj)
    r = requests.get(url, params=payload) 
    edits = json.loads(r.content)
    return edits['items']

for url in base_urls:    
    print getEdits(url)  
    sleep(5) # ADDED

OUTPUT

[{u'article': 1461, u'revisionId': 14, u'user': 26127114, u'timestamp': 1424389645}, {u'article': 1461, u'revisionId': 13, u'user': 26127114, u'timestamp': 1424389322}, {u'article': 1461, u'revisionId': 12, u'user': 26127114, u'timestamp': 1424389172}, {u'article': 1461, u'revisionId': 5, u'user': 26127114, u'timestamp': 1424388924}]
[{u'article': 1461, u'revisionId': 14, u'user': 26127165, u'timestamp': 1424389107}, {u'article': 1461, u'revisionId': 7, u'user': 26127165, u'timestamp': 1424388706}]
Yellowgreen answered 5/3, 2015 at 22:2 Comment(2)
Hmm, very interesting. But I think the Host: header might be a red herring. After all, requests already should add that by itself. And it does in fact seem to work with just the 5s delay.Kiloliter
@LukasGraf yes, you are right indeed! I'll update the answer - thanks!Yellowgreen
O
3

The API endpoints are "broken". Refreshing the two endpoints in a browser repeatedly has them switching back and forth between two responses. You can replicate it by making refreshing one request half a dozen times, and then refreshing the other request half a dozen times and switching back and forth every half a dozen requests.

Request A:

http://solarmovie.wikia.com/api/v1/Activity/LatestActivity

Request B:

http://kaweki.wikia.com/api/v1/Activity/LatestActivity

Response 1:

{
    items: [
        {
            article: 1461,
            user: 26127114,
            revisionId: 14,
            timestamp: 1424389645
        },
        {
            article: 1461,
            user: 26127114,
            revisionId: 13,
            timestamp: 1424389322
        },
        {
            article: 1461,
            user: 26127114,
            revisionId: 12,
            timestamp: 1424389172
        },
        {
            article: 1461,
            user: 26127114,
            revisionId: 5,
            timestamp: 1424388924
        }
    ],
    basepath: "http://kaweki.wikia.com"
}

Response 2:

{
    items: [
        {
            article: 1461,
            user: 26127165,
            revisionId: 14,
            timestamp: 1424389107
        },
        {
            article: 1461,
            user: 26127165,
            revisionId: 7,
            timestamp: 1424388706
        }
    ],
    basepath: "http://solarmovie.wikia.com"
}
Organdy answered 5/3, 2015 at 21:16 Comment(7)
While it's true it's not really helpful/useful! Further, can you reproduce the browser behavior with curl/python?Yellowgreen
I've sent an email to their API team to let them know that it looks like there's a bug. Thanks so much!Amberambergris
@alfasin the question has the Python that originally showed the results. How is answer the question not useful?Organdy
I experienced the same behavior during testing this. So I think the correct answer here is in fact "this API is broken".Kiloliter
BTW: All the subdomains on wikia.com are just aliases (CNAME records) for wikia.com. So the distinction between different wikis has to happen via some sort of named virtual hosting based on the Host: header, which the API or some caching proxy in between seems to mess up.Kiloliter
I believe that @LukasGraf is correct. Something weird though: if I hardcode the Host header (i.e. 'Host': 'kaweki.wikia.com') - I'll get a different answer per request, but when I extract the host from wikiObj the bug is back. There's something messy here which I believe is both a combination of python dynamic binding with conjunction to the subdomain configuration on the server-side.Yellowgreen
@Organdy The OP showed a problem and you said - "hey there's a problem" and then you go into more details that shows how to repro the problem on a broaswer - but you didn't pinpoint the issue and introduced a way to fix/work-around it. That's why I wrote it's not useful.Yellowgreen
M
0

I downloaded and ran the script and got apparently identical output. There doesn't seem to be anything wrong with the script, though! I think the output is simply identical, for some reason. Try to change return edits['items'] to just return edits and you'll see that the output is different in that case. If there really is a bug in the code, that should help you isolate it; if not, then maybe you can figure out why the real output is like that.

Melitamelitopol answered 5/3, 2015 at 21:5 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.