Not able to get Country of a Tweet - Twython API
Asked Answered
R

4

9

I am using the following code to collect Tweets pertaining to a certain topic but in all the tweets that I have extracted the 'places' attribute is None. Am I doing something wrong? Also, the code is meant to extract existing tweets and I do not need streaming api solution and not looking for this solution of streaming API : https://www.quora.com/How-can-I-get-a-stream-of-tweets-from-a-particular-country-using-Twitter-API

api =   Twython(consumer_key, consumer_secret, access_key, access_secret)

tweets                          =   []
MAX_ATTEMPTS                    =   200
COUNT_OF_TWEETS_TO_BE_FETCHED   =   10000
in_max_id = sys.argv[1]
next_max_id = ''
for i in range(0,MAX_ATTEMPTS):

    if(COUNT_OF_TWEETS_TO_BE_FETCHED < len(tweets)):
        break # we got 500 tweets... !!

    #----------------------------------------------------------------#
    # STEP 1: Query Twitter
    # STEP 2: Save the returned tweets
    # STEP 3: Get the next max_id
    #----------------------------------------------------------------#

    # STEP 1: Query Twitter
    if(0 == i):
        # Query twitter for data. 
        results    = api.search(q="#something",count='100',lang='en',max_id=in_max_id,include_entities='true',geo= True)
    else:
        # After the first call we should have max_id from result of previous call. Pass it in query.
        results    = api.search(q="#something",include_entities='true',max_id=next_max_id,lang='en',geo= True)

    # STEP 2: Save the returned tweets
    for result in results['statuses']:

        temp = ""
        tweet_text = result['text']
        temp += tweet_text.encode('utf-8') + " "
        hashtags = result['entities']['hashtags']
        for i in hashtags:
            temp += i['text'].encode('utf-8') + " " 
        print result
        #temp += i["place"]["country"] + "\n"
        #output_file.write(temp)




    # STEP 3: Get the next max_id
    try:
        # Parse the data returned to get max_id to be passed in consequent call.
        next_results_url_params    = results['search_metadata']['next_results']
        next_max_id        = next_results_url_params.split('max_id=')[1].split('&')[0]
    except:
        # No more next pages
        break
Rozanna answered 12/12, 2015 at 13:47 Comment(3)
Are you getting an error? If so what type of error?Flummery
No errors. Just that the "places" attribute is empty !Rozanna
Edit your code according to my answer, then it should work fine.Flummery
B
1

If place field is a MUST for all the tweet that you app will process, then you can limit your search over a place to make sure all the result will definitely have it.

You can doing so by setting geocode (latitude,longitude,radius[km/mi]) parameter, to limit your search within an area.

An example such request via Twython is:

geocode = '25.032341,55.385557,100mi'
api.search(q="#something",count='100',lang='en',include_entities='true',geocode=geocode)
Brout answered 15/12, 2015 at 7:47 Comment(3)
This answer is not technically correct. This filter just ensures that results are more likely to have a place assigned.Cryptograph
This answer is also not correct because search/tweet actually does return place tags. However, the place tag is almost always empty. Only about 1% of all tweets have data in place tag.Goldfish
@Goldfish Ive rephrased my answer. and its not "technically" wrong, it is completely depends on app's need and how you want to use the result.Archiphoneme
G
1

The short answer is, No, you are doing nothing wrong. The reason why all place tags are empty is because statistically they are very unlikely to contain data. Only about 1% of all tweets have data in their place tag. This is because users rarely tweet their location. Location is off by default.

Download 100 or more tweets and you probably will find place tag data.

Goldfish answered 14/1, 2016 at 20:48 Comment(0)
F
0

Not all tweets have all fields like tweet_text, place, country, language etc.,

So, to avoid KeyError use the following approach. Modify your code so that when the key that you're looking for is not found, a default value is returned.

result.get('place', {}).get('country', {}) if result.get('place') != None else None

Here, the above line means "search for the key country after fetching the key place if it exists, otherwise return None "

Flummery answered 20/12, 2015 at 12:18 Comment(2)
Thanks for the answer but as I have written that there is no error. Its just that the place attribute is NONE. So thanks but it will not be helpful in this case.Rozanna
You're not doing anything wrong. You need to get more tweets. I inspected ~50k tweets, but I were able to find only a hundred "place" field, the rest were "null". Inspect the fetched JSON before processing.Flummery
C
0

kmario is right. Most tweets don't have this information, but a small percent do. Doing a location search will increase this chance e.g. https://api.twitter.com/1.1/search/tweets.json?q=place%3Acba60fe77bc80469&count=1

  "place": {
    "id": "cba60fe77bc80469",
    "url": "https://api.twitter.com/1.1/geo/id/cba60fe77bc80469.json",
    "place_type": "city",
    "name": "Tallinn",
    "full_name": "Tallinn, Harjumaa",
    "country_code": "EE",
    "country": "Eesti",
    "contained_within": [],
    "bounding_box": {
      "type": "Polygon",
      "coordinates": [
        [
          [
            24.5501404,
            59.3518286
          ],
          [
            24.9262886,
            59.3518286
          ],
          [
            24.9262886,
            59.4981855
          ],
          [
            24.5501404,
            59.4981855
          ]
        ]
      ]
    },
    "attributes": {}
  },
Cryptograph answered 13/1, 2016 at 20:58 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.