AWS CLI for S3 Select

Asked 17/8, 2020 at 6:21 Answered 16/11, 2020 at 5:14

sql amazon-web-services amazon-s3 amazon-s3-select

I have the following code, which is used to run a SQL query on a keyfile, located in a S3 bucket. This runs perfectly. My question is, I do not wish to have the output written over to an output file. Could I see the output on the screen (my preference #1)? If not, what about an ability to append to the output file, rather than over-write it (my preference #2). I am using the AWS-CLI binaries to run this query. If there is another way, I am happy to try (as long as it is within bash)

aws s3api select-object-content \
    --bucket "project2" \
    --key keyfile1 \
    --expression "SELECT * FROM s3object s where Lower(s._1) = '[email protected]'" \
    --expression-type 'SQL' \
    --input-serialization '{"CSV": {"FieldDelimiter": ":"}, "CompressionType": "GZIP"}' \
    --output-serialization '{"CSV": {"FieldDelimiter": ":"}}' "OutputFile"

Kashakashden answered 17/8, 2020 at 6:21 Comment(3)

I see you put a bounty on the question. Does it mean that the answer with /dev/stdout does not work? Can you provide an info why it does not work? Any errors? – Danyelldanyelle 24/8, 2020 at 2:30

Hi Marcin. thanks for checking in. i tried it on cygwin but it didnt work. No errors, no output. I could try on my ubuntu machine as well, in case u think this could be OS specific – Kashakashden 24/8, 2020 at 6:6

No problem. You could comment at @jellycsc so that he gets notified. Maybe he knows how to do it on cygwin? But on ubuntu or any other linux I don't see a reason why it would not work. Thus I asked in the first place. – Danyelldanyelle 24/8, 2020 at 9:31

Of course, you can use AWS CLI to do this since stdout is just a special file in linux.

aws s3api select-object-content \
--bucket "project2" \
--key keyfile1 \
--expression "SELECT * FROM s3object s where Lower(s._1) = '[email protected]'" \
--expression-type 'SQL' \
--input-serialization '{"CSV": {"FieldDelimiter": ":"}, "CompressionType": "GZIP"}' \
--output-serialization '{"CSV": {"FieldDelimiter": ":"}}' /dev/stdout

Note the /dev/stdout in the end.

Dullish answered 17/8, 2020 at 13:33 Comment(1)

@ jellycsc - could you pls confirm if this would work on cygwin? – Kashakashden 24/8, 2020 at 13:23

The AWS CLI does not offer such options.

However, you are welcome to instead call it via an AWS SDK of your choice.

For example, in the boto3 Python SDK, there is a select_object_content() function that returns the data as a stream. You can then read, manipulate, print or save it however you wish.

Seamanship answered 17/8, 2020 at 6:26 Comment(2)

Thanks John. I am trying to move away from Python (long story why). Is there any other workaround here? Could I append to a single output file maybe? – Kashakashden 17/8, 2020 at 7:25

You can use any of the other programming languages, too: C++, Go, Java, .Net, Node, PHP, Ruby. FYI, the AWS CLI is written in Python and simply calls the above command. – Seamanship 17/8, 2020 at 7:35

-1

I think it opens /dev/stdout twice causing kaos.

Pouched answered 16/11, 2020 at 5:14 Comment(0)

Recommended topics

Hot tags