Reading appengine backup_info file gives EOFError
Asked Answered
B

1

3

I'm trying to inspect my appengine backup files to work out when a data corruption occured. I used gsutil to locate and download the file:

gsutil ls -l gs://my_backup/ > my_backup.txt
gsutil cp gs://my_backup/LongAlphaString.Mymodel.backup_info file://1.backup_info

I then created a small python program, attempting to read the file and parse it using the appengine libraries.

#!/usr/bin/python

APPENGINE_PATH='/Applications/GoogleAppEngineLauncher.app/Contents/Resources/GoogleAppEngine-default.bundle/Contents/Resources/google_appengine/'
ADDITIONAL_LIBS = [
'lib/yaml/lib'
]
import sys
sys.path.append(APPENGINE_PATH)
for l in ADDITIONAL_LIBS:
  sys.path.append(APPENGINE_PATH+l)

import logging
from google.appengine.api.files import records
import cStringIO

def parse_backup_info_file(content):
  """Returns entities iterator from a backup_info file content."""
  reader = records.RecordsReader(cStringIO.StringIO(content))
  version = reader.read()
  if version != '1':
    raise IOError('Unsupported version')
  return (datastore.Entity.FromPb(record) for record in reader)


INPUT_FILE_NAME='1.backup_info'

f=open(INPUT_FILE_NAME, 'rb')
f.seek(0)
content=f.read()
records = parse_backup_info_file(content)
for r in records:
  logging.info(r)

f.close()

The code for parse_backup_info_file was copied from backup_handler.py

When I run the program, I get the following output:

./view_record.py 
Traceback (most recent call last):
  File "./view_record.py", line 30, in <module>
    records = parse_backup_info_file(content)
  File "./view_record.py", line 19, in parse_backup_info_file
    version = reader.read()
  File "/Applications/GoogleAppEngineLauncher.app/Contents/Resources/GoogleAppEngine-default.bundle/Contents/Resources/google_appengine/google/appengine/api/files/records.py", line 335, in read
    (chunk, record_type) = self.__try_read_record()
  File "/Applications/GoogleAppEngineLauncher.app/Contents/Resources/GoogleAppEngine-default.bundle/Contents/Resources/google_appengine/google/appengine/api/files/records.py", line 307, in __try_read_record
    (length, len(data)))
EOFError: Not enough data read. Expected: 24898 but got 2112

I've tried with a half a dozen different backup_info files, and they all show the same error (with different numbers.) I have noticed that they all have the same expected length: I was reviewing different versions of the same model when I made that observation, it's not true when I view the backup files of other Modules.

EOFError: Not enough data read. Expected: 24932 but got 911
EOFError: Not enough data read. Expected: 25409 but got 2220

Is there anything obviously wrong with my approach?

I guess the other option is that the appengine backup utility is not creating valid backup files. Anything else you can suggest would be very welcome. Thanks in Advance

Bernetta answered 10/1, 2014 at 2:30 Comment(0)
F
2

There are multiple metadata files created when an AppEngine Datastore backup is run:

LongAlphaString.backup_info is created once. This contains metadata about all of the entity types and backup files that were created in datastore backup.

LongAlphaString.[EntityType].backup_info is created once per entity type. This contains metadata about the the specific backup files created for [EntityType] along with schema information for the [EntityType].

Your code works for interrogating the file contents of LongAlphaString.backup_info, however it seems that you are trying to interrogate the file contents of LongAlphaString.[EntityType].backup_info. Here's a script that will print the contents in a human-readable format for each file type:

import cStringIO
import os
import sys

sys.path.append('/usr/local/google_appengine')
from google.appengine.api import datastore
from google.appengine.api.files import records
from google.appengine.ext.datastore_admin import backup_pb2

ALL_BACKUP_INFO = 'long_string.backup_info'
ENTITY_KINDS = ['long_string.entity_kind.backup_info']


def parse_backup_info_file(content):
    """Returns entities iterator from a backup_info file content."""
    reader = records.RecordsReader(cStringIO.StringIO(content))
    version = reader.read()
    if version != '1':
        raise IOError('Unsupported version')
    return (datastore.Entity.FromPb(record) for record in reader)


print "*****" + ALL_BACKUP_INFO + "*****"
with open(ALL_BACKUP_INFO, 'r') as myfile:
    parsed = parse_backup_info_file(myfile.read())
    for record in parsed:
        print record

for entity_kind in ENTITY_KINDS:
    print os.linesep + "*****" + entity_kind + "*****"
    with open(entity_kind, 'r') as myfile:
        backup = backup_pb2.Backup()
        backup.ParseFromString(myfile.read())
        print backup
Frag answered 6/10, 2014 at 21:19 Comment(1)
According to Google, the google.appengine.api.files package is now deprecated, and I don't see any replacement for those functions inside the suggested Google Cloud Storage library. Is our data locked in Google Cloud Datastore forever now? cloud.google.com/appengine/docs/standard/python/refdocs/…Demarcate

© 2022 - 2024 — McMap. All rights reserved.