I am trying to access and modify data in a newline JSON file pulled from Google Cloud Storage in Google Cloud Functions. The results always show up as numbers despite that not being the data in the JSON.
I see that download_as_string() for blob object returns Bytes (https://googleapis.github.io/google-cloud-python/latest/_modules/google/cloud/storage/blob.html#Blob.download_as_string) but in any references I see, everyone is able to access their data just fine.
I am doing this in Cloud Functions but I think my question would apply in any GCP tool.
My example below simply should load the newline JSON data, add it to a list, select the first two dictionary entries, convert back to newline JSON and output to JSON file on GCS. Samples, code, and bad output listed below.
Sample newline JSON input
{"Website": "Google", "URL": "Google.com", "ID": 1}
{"Website": "Bing", "URL": "Bing.com", "ID": 2}
{"Website": "Yahoo", "URL": "Yahoo.com", "ID": 3}
{"Website": "Yandex", "URL": "Yandex.com", "ID": 4}
Code in Cloud Function
import requests
import json
import csv
from datetime import datetime, timedelta
import sys
from collections import OrderedDict
import os
import random
from google.cloud import bigquery
from google.cloud import storage
def importData(request, execution):
# Read the data from Google Cloud Storage
read_storage_client = storage.Client()
# Set buckets and filenames
bucket_name = "sample_bucket"
filename = 'sample_json_output.json'
# get bucket with name
bucket = read_storage_client.get_bucket('sample_bucket')
# get bucket data as blob
blob = bucket.get_blob('sample_json.json')
# download as string
json_data = blob.download_as_string()
# create list
website_list = []
for u,y in enumerate(json_data):
website_list.append(y)
# select first two
website_list = website_list[0:2]
# Create new-line JSON
results_ready = '\n'.join(json.dumps(item) for item in website_list)
# Write the data to Google Cloud Storage
write_storage_client = storage.Client()
write_storage_client.get_bucket(bucket_name) \
.blob(filename) \
.upload_from_string(results_ready)
Current output in sample_json_output.json file
123
34
Expected output
{"Website": "Google", "URL": "Google.com", "ID": 1}
{"Website": "Bing", "URL": "Bing.com", "ID": 2}
Update 6/6: If I write a file straight from the download_to_string blob, then it writes the JSON file perfectly, but I need to access the contents prior.
import requests
import json
import csv
from datetime import datetime, timedelta
import sys
from collections import OrderedDict
import os
import random
from google.cloud import bigquery
from google.cloud import storage
def importData(request, execution):
# Read the data from Google Cloud Storage
read_storage_client = storage.Client()
# Set buckets and filenames
bucket_name = "sample_bucket"
filename = 'sample_json_output.json'
# get bucket with name
bucket = read_storage_client.get_bucket('sample_bucket')
# get bucket data as blob
blob = bucket.get_blob('sample_json.json')
# convert to string
json_data = blob.download_as_string()
# Write the data to Google Cloud Storage
write_storage_client = storage.Client()
write_storage_client.get_bucket(bucket_name) \
.blob(filename) \
.upload_from_string(json_data)
Update 6/6 Output
{"Website": "Google", "URL": "Google.com", "ID": 1}
{"Website": "Bing", "URL": "Bing.com", "ID": 2}
{"Website": "Yahoo", "URL": "Yahoo.com", "ID": 3}
{"Website": "Yandex", "URL": "Yandex.com", "ID": 4}