Script to use Google Image Search with local image as input
Asked Answered
G

3

14

I'm looking for a batch or Powershell script to search for similar images on Google images using a local image as input.

enter image description here

My research so far

The syntax for a image search using a URL rather than a local file is as followes:
https://www.google.com/searchbyimage?image_url=TEST
where TEST can be replaced with any image URL you have.

I played with cURL for windows and imgur as temporary image saver. I was able to upload a file to imgur via batch. The image URL was then used to search similar images on Google.

But I wonder if it is possible without using any temporary cache like imgur or any other online picture service. Just a batch, curl, Google and me.

Just a thought. Is a VBS script maybe capable to search on Google Images with a local file as input?
Or are similar web services like Tineye better suited for that task?


This powershell snippet will open Googles Image Search.

$IE= new-object -com InternetExplorer.Application
$IE.navigate2("https://www.google.com/imghp?hl=en")
while ($IE.busy) {
sleep -milliseconds 50
}
$IE.visible=$true

The next steps would be to get the IDs of some buttons and click them programmatically to select the local file. But here I'm not experienced enough to achieve this.

Gamber answered 31/1, 2013 at 20:13 Comment(2)
You would probably need to find out exactly which container is responsible for processing the image search. I don't know the answer to that question but I did put your link to ../imghp?hl=en into my browser and it took me to the actual site. I think what you need to do is figure out how to pass the location of the image into the container that requires it. This is obvious, I know...Anhanhalt
This is a great question, I really like the challenge. I've been working on it in fiddler and from what I can tell thus far is some sort of encoding is being applied to the file name and appended to the URL. I'm still looking into it with fiddler - hopefully it will show me the encoding at some point but I'm relatively new to fiddler so it may just take a more experienced hand. I hope this helps someone solve the issue.Justus
H
24

Cool question! I spent far too much time tinkering with this, but I think finally got it :)

In a nutshell, you have to upload the raw bytes of your image, embedded and properly formatted along with some other stuff, to images.google.com/searchbyimage/upload. The response to that request will contain a new URL which sends you to the actual results page.

This function will return back the results page URL. You can do whatever you want with it, but to simply open the results in a browser, pass it to Start-Process.

Of course, Google could change the workflow for this at any time, so don't expect this script to work forever.

function Get-GoogleImageSearchUrl
{
    param(
        [Parameter(Mandatory = $true)]
        [ValidateScript({ Test-Path $_ })]
        [string] $ImagePath
    )

    # extract the image file name, without path
    $fileName = Split-Path $imagePath -Leaf

    # the request body has some boilerplate before the raw image bytes (part1) and some after (part2)
    #   note that $filename is included in part1
    $part1 = @"
-----------------------------7dd2db3297c2202
Content-Disposition: form-data; name="encoded_image"; filename="$fileName"
Content-Type: image/jpeg


"@
    $part2 = @"
-----------------------------7dd2db3297c2202
Content-Disposition: form-data; name="image_content"


-----------------------------7dd2db3297c2202--

"@

    # grab the raw bytes composing the image file
    $imageBytes = [Io.File]::ReadAllBytes($imagePath)

    # the request body should sandwich the image bytes between the 2 boilerplate blocks
    $encoding = New-Object Text.ASCIIEncoding
    $data = $encoding.GetBytes($part1) + $imageBytes + $encoding.GetBytes($part2)

    # create the HTTP request, populate headers
    $request = [Net.HttpWebRequest] ([Net.HttpWebRequest]::Create('http://images.google.com/searchbyimage/upload'))
    $request.Method = "POST"
    $request.ContentType = 'multipart/form-data; boundary=---------------------------7dd2db3297c2202'  # must match the delimiter in the body, above
    $request.ContentLength = $data.Length

    # don't automatically redirect to the results page, just take the response which points to it
    $request.AllowAutoredirect = $false

    # populate the request body
    $stream = $request.GetRequestStream()
    $stream.Write($data, 0, $data.Length)
    $stream.Close()        

    # get response stream, which should contain a 302 redirect to the results page
    $respStream = $request.GetResponse().GetResponseStream()

    # pluck out the results page link that you would otherwise be redirected to
    (New-Object Io.StreamReader $respStream).ReadToEnd() -match 'HREF\="([^"]+)"' | Out-Null
    $matches[1]
}

Usage:

$url = Get-GoogleImageSearchUrl 'C:\somepic.jpg'
Start-Process $url

Edit/Explanation

Here's some more detail. I'll basically just take you through the steps I took as I figured this out.

First, I just went ahead and did a local image search.

Google image search

The URL it sends you to is very long (~1500 chars in the case of longcat), but not nearly long enough to fully encode the image (60KB). So you can tell right off the bat that it's more complex than simply doing something like a base64 encoding.

Next, I fired up Fiddler and looked at what's actually going on when you do a local image search. After browsing/selecting the image, you see some traffic to images.google.com/searchbyimage/upload. Viewing that request in detail reveals the basic mechanism.

Fiddler session

  1. The data is being sent in the format of multipart/form-data, and you need to specify what string of characters is separating the different fields (red boxes). If you Bing/Google around, you will find that multipart/form-data is some kind of web standard, but it really doesn't matter for this example.
  2. You need to (or at least should) include the original file name (orange box). Perhaps this factors into the search results.
  3. The full, raw image is included in the encoded-image field (green box).
  4. The response does not contain the actual results, it is simply a redirect to the actual results page (purple boxes)

There are a few fields not shown here, way at the bottom. They aren't super interesting.

Once I figured out the basic workflow, it was only a matter of coding it up. I just copied the web request I saw in Fiddler as closely as I could, using standard .NET web request APIs. The answers to this SO question demonstrate the APIs you need in order to properly encode and send body data in a web request.

From some experimentation, I found that you only need the two body fields I included in my code (encoded_image and image_content). Going through the web UI includes more, but apparently they are not required.

More experimentation revealed that none of the other headers or cookies shown in Fiddler are really required.

For our purposes, we don't actually want to access the results page, only get a pointer to it. Thus we should set AllowAutoRedirect to $false. That way, Google's 302 redirect is given to us directly and we can extract the results page URL from it.

While writing this edit, I slapped my forehead and realized that Powershell v3 has the Invoke-WebRequest cmdlet, which could potentially eliminate the need for the .NET web API calls. Unfortunately, I could not get it to work properly after tinkering for 10 min, so I gave up. Seems like some issue with the way the cmdlet is encoding the data, though I could be wrong.

Hypotaxis answered 16/2, 2013 at 2:10 Comment(5)
+1 This works. It will most likely bring you the bounty. One small note: Could you add a second answer which just explains every code line so other users and I can customize it in the future if Google changes something in their code. Would be also cool if you note some possible problems. Your answer is worth to get more attention and these explanations would help.Gamber
I found a bug. From your example: If you replace the filename with so[me]pic.jpg it can't resolve the file path because PS interprets [] as placeholders. Maybe you need to use -literalpath somewhere? Please have in mind I can't use Powershell V3. It has to be a solution for Powershell V2. Although it seems that PS V3 has some fixes for -literpathGamber
I've updated with more detail. Regarding the path issue - I'm afraid I have to say "meh." I think that going over general powershell gotchas with Path/LiteralPath will distract from the main purpose of this question: how to script against google image search. Such issues would be perfect for a separate, follow-up question with dedicated responses.Hypotaxis
I am trying to do something similar using PHP, I get stuck at the 302 error pages, I follow the URL once, then I get another 302, which redirects me to google's homepage Any idea how to work around this? Take a look at that goo.gl/IItOlHuonghupeh
@Hypotaxis Could it be that the boundary/delimiter got changed? At least for me it's not working anymore :(Gamber
Y
1
function Get-GoogleImageSearchUrl
{
    param(
        [Parameter(Mandatory = $true)]
        [ValidateScript({ Test-Path $_ })]
        [string] $ImagePath
    )

    # extract the image file name, without path
    $fileName = Split-Path $imagePath -Leaf

    # the request body has some boilerplate before the raw image bytes (part1) and some after (part2)
    #   note that $filename is included in part1
    $part1 = @"
--7dd2db3297c2202
Content-Disposition: form-data; name="encoded_image"; filename="$fileName"
Content-Type: application/octet-stream`r`n`r`n
"@
    $part2 = @"
`r`n--7dd2db3297c2202--`r`n
"@

    # grab the raw bytes composing the image file
    $imageBytes = [Io.File]::ReadAllBytes($imagePath)

    # the request body should sandwich the image bytes between the 2 boilerplate blocks
    $encoding = New-Object Text.ASCIIEncoding
    $data = $encoding.GetBytes($part1) + $imageBytes + $encoding.GetBytes($part2)

    # create the HTTP request, populate headers
    $request = [Net.HttpWebRequest] ([Net.HttpWebRequest]::Create('http://images.google.com/searchbyimage/upload'))
    $request.Method = "POST"
    $request.ContentType = 'multipart/form-data; boundary=7dd2db3297c2202'  # must match the delimiter in the body, above

    # don't automatically redirect to the results page, just take the response which points to it
    $request.AllowAutoredirect = $false

    # populate the request body
    $stream = $request.GetRequestStream()
    $stream.Write($data, 0, $data.Length)
    $stream.Close()        

    # get response stream, which should contain a 302 redirect to the results page
    $respStream = $request.GetResponse().GetResponseStream()

    # pluck out the results page link that you would otherwise be redirected to
    (New-Object Io.StreamReader $respStream).ReadToEnd() -match 'HREF\="([^"]+)"' | Out-Null
    $matches[1]
}
$url = Get-GoogleImageSearchUrl 'C:\somepic.jpg'
Start-Process $url
Yalonda answered 3/11, 2017 at 22:51 Comment(1)
Thanks for your heads up, unfortunately it seems not to work for me. I always get a 404 errorGamber
S
0

How about use GoogleImageSearch module for PowerShell?

Disclaimer: I'm a developer of this module and I've used previous answers to build up this module.

Sudiesudnor answered 30/11, 2019 at 7:52 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.