Batch posting on blogger using gdata python client
Asked Answered
E

2

8

I'm trying to copy all my Livejournal posts to my new blog on blogger.com. I do so by using slightly modified example that ships with the gdata python client. I have a json file with all of my posts imported from Livejournal. Issue is that blogger.com has a daily limit for posting new blog entries per day — 50, so you can imagine that 1300+ posts I have will be copied in a month, since I can't programmatically enter captcha after 50 imports.

I recently learned that there's also batch operation mode somewhere in gdata, but I couldn't figure how to use it. Googling didn't really help.

Any advice or help will be highly appreciated.

Thanks.

Update

Just in case, I use the following code

#!/usr/local/bin/python
import json
import requests

from gdata import service
import gdata
import atom
import getopt
import sys

from datetime import datetime as dt
from datetime import timedelta as td
from datetime import tzinfo as tz

import time

allEntries = json.load(open("todays_copy.json", "r"))

class TZ(tz):
    def utcoffset(self, dt): return td(hours=-6)

class BloggerExample:
    def __init__(self, email, password):
        # Authenticate using ClientLogin.
        self.service = service.GDataService(email, password)
        self.service.source = "Blogger_Python_Sample-1.0"
        self.service.service = "blogger"
        self.service.server = "www.blogger.com"
        self.service.ProgrammaticLogin()

        # Get the blog ID for the first blog.
        feed = self.service.Get("/feeds/default/blogs")
        self_link = feed.entry[0].GetSelfLink()
        if self_link:
            self.blog_id = self_link.href.split("/")[-1]

    def CreatePost(self, title, content, author_name, label, time):
        LABEL_SCHEME = "http://www.blogger.com/atom/ns#"
        # Create the entry to insert.
        entry = gdata.GDataEntry()
        entry.author.append(atom.Author(atom.Name(text=author_name)))
        entry.title = atom.Title(title_type="xhtml", text=title)
        entry.content = atom.Content(content_type="html", text=content)
        entry.published = atom.Published(time)
        entry.category.append(atom.Category(scheme=LABEL_SCHEME, term=label))

        # Ask the service to insert the new entry.
        return self.service.Post(entry, 
            "/feeds/" + self.blog_id + "/posts/default")

    def run(self, data):
        for year in allEntries:
            for month in year["yearlydata"]:
                for day in month["monthlydata"]:
                    for entry in day["daylydata"]:
                        # print year["year"], month["month"], day["day"], entry["title"].encode("utf-8")
                        atime = dt.strptime(entry["time"], "%I:%M %p")
                        hr = atime.hour
                        mn = atime.minute
                        ptime = dt(year["year"], int(month["month"]), int(day["day"]), hr, mn, 0, tzinfo=TZ()).isoformat("T")
                        public_post = self.CreatePost(entry["title"],
                            entry["content"],
                            "My name",
                            ",".join(entry["tags"]),
                            ptime)
                        print "%s, %s - published, Waiting 30 minutes" % (ptime, entry["title"].encode("utf-8"))
                        time.sleep(30*60)


def main(data):
    email = "[email protected]"
    password = "MyPassW0rd"

    sample = BloggerExample(email, password)
    sample.run(data)

if __name__ == "__main__":
    main(allEntries)
Engird answered 26/8, 2014 at 5:37 Comment(6)
Can you bypass the and just manually write each record from one database to the other via python standalone script? Not familiar with livejournal or blogger but I've had to batch a large amount of posts, so I'd be interested in helping.Stichometry
@Joaq2Remember I'm sorry, I'm not really following, would you please clarify? Thanks.Engird
Potentially establish two connections to both databases; livejournal and blogger. Select from live journal then write the replica to blogger or establish the blogger db connection and write to by parsing the Json.Stichometry
@Joaq2Remember I wish I could do that, but blogger provides only REST API, so I don't have direct access to their database.Engird
So your issue is the hard limit on their API post requests? Is there any other of posting items other than the API? I might recommend using a bot to publish these posts through the CMS or trying to get in touch with someone at blogger and see if they can help you out.Stichometry
@Joaq2Remember dude, that's what I'm doing, if you look at the python script above it does exactly that – posts every 30 mins, so 48 posts a day. I did wrote to blogger help forum and answer was – NO, there's no way to increase that limit. But my question is really about batch functionality that is contained in GData, but I have not a single clue how to use it. As far as I understood each batch request is still one request, so it shouldn't hit the limit. I'm not sure though. That's why I'm asking here.Engird
T
13

I would recommend using Google Blog converters instead ( https://code.google.com/archive/p/google-blog-converters-appengine/ )

To get started you will have to go through

https://github.com/google/gdata-python-client/blob/master/INSTALL.txt - Steps for setting up Google GData API https://github.com/pra85/google-blog-converters-appengine/blob/master/README.txt - Steps for using Blog Convertors

Once you have everything setup , you would have to run the following command (its the LiveJournal Username and password)

livejournal2blogger.sh -u <username> -p <password> [-s <server>]

Redirect its output into a .xml file. This file can now be imported into a Blogger blog directly by going to Blogger Dashboard , your blog > Settings > Other > Blog tools > Import Blog

Here remember to check the Automatically publish all imported posts and pages option. I have tried this once before with a blog with over 400 posts and Blogger did successfully import & published them without issue

Incase you have doubts the Blogger might have some issues (because the number of posts is quite high) or you have other Blogger blogs in your account. Then just for precaution sake , create a separate Blogger (Google) account and then try importing the posts. After that you can transfer the admin controls to your real Blogger account (To transfer , you will first have to send an author invite , then raise your real Blogger account to admin level and lastly remove the dummy account. Option for sending invite is present at Settings > Basic > Permissions > Blog Authors )

Also make sure that you are using Python 2.5 otherwise these scripts will not run. Before running livejournal2blogger.sh , change the following line (Thanks for Michael Fleet for this fix http://michael.f1337.us/2011/12/28/google-blog-converters-blogger2wordpress/ )

PYTHONPATH=${PROJ_DIR}/lib python ${PROJ_DIR}/src/livejournal2blogger/lj2b.py $*

to

PYTHONPATH=${PROJ_DIR}/lib python2.5 ${PROJ_DIR}/src/livejournal2blogger/lj2b.py $*

P.S. I am aware this is not the answer to your question but as the objective of this answer is same as your question (To import more than 50 posts in a day) , Thats why I shared it. I don't have much knowledge of Python or GData API , I setup the environment & followed these steps to answer this question (And I was able to import posts from LiveJournal to Blogger with it ).

Tonita answered 3/9, 2014 at 8:34 Comment(2)
OK, this one looks legit answer. I could transfer 1000 posts in one import. Looks like 1000 is a new limit. Way better than 50. So I'll accept this one. Thanks @PrayagVerma.Engird
So, this is actually a complete answer. The 1000 posts limitation turned out to be livejournal limitation, not blogger.Engird
S
7
# build feed
request_feed = gdata.base.GBaseItemFeed(atom_id=atom.Id(text='test batch'))
# format each object 
entry1 = gdata.base.GBaseItemFromString('--XML for your new item goes here--')
entry1.title.text = 'first batch request item'
entry2 = gdata.base.GBaseItemFromString('--XML for your new item here--')
entry2.title.text = 'second batch request item'

# Add each blog item to the request feed 
request_feed.AddInsert(entry1)
request_feed.AddInsert(entry2)

# Execute the batch processes through the request_feed (all items)
result_feed = gd_client.ExecuteBatch(request_feed)
Stichometry answered 29/8, 2014 at 17:2 Comment(4)
Oh man, that looks very promising. I'll test it right away. Thanks and +1. I'll mark it as a solution as soon as I test it properly.Engird
i've also found different documentations. building the request feed seems to differ but this is some documentation on gdata-python-client.googlecode.com/hg/pydocs/…. to build the feed it uses GDatafeed. let me know what you thinkStichometry
@Engird bounty's coming to a close soon... let me know how you're getting on so it can be awarded/not or whatever the case may be :)Downright
@JonClements unfortunately I couldn't make it work so far. That code misses some other essential parts, and I'm not entirely sure how to use it. Tried different things, but without luck so far. Another thing is I don't have whole a lot of time to check all options, but I'll keep trying.Engird

© 2022 - 2024 — McMap. All rights reserved.