The Problem
On Google Flights, search information is encoded in a URL parameter, presumably so users can share flight searches with each other easily. The URL format looks like this:
https://www.google.com/travel/flights/search?tfs=CBwQAhoeagcIARIDSE5MEgoyMDIxLTA5LTEzcgcIARIDU0ZPGh5qBwgBEgNTRk8SCjIwMjEtMDktMTdyBwgBEgNITkxwAYIBCwj___________8BQAFIAZgBAQ
I am trying to write a program that can generate flight search URLs given flight information (origin, destination, flight dates, passengers, etc). To do this I need know how the information is encoded in the URL so I can recreate it.
What I've tried
I know that the flight info is encoded in base64 or some variant of it (I've been using base64decode.org for testing). For a round-trip flight from HNL-SFO on 2021-09-13 - 2021-09-17, Google Flights has this URL:
https://www.google.com/travel/flights/search?tfs=CBwQAhoeagcIARIDSE5MEgoyMDIxLTA5LTEzcgcIARIDU0ZPGh5qBwgBEgNTRk8SCjIwMjEtMDktMTdyBwgBEgNITkxwAYIBCwj___________8BQAFIAZgBAQ
The part of the tfs
query parameter before the underscores decodes to
jHNL
2021-09-13rSFOjSFO
2021-09-17rHNLp
which contains some (but not all) recognizable flight info. What I don't understand is the whitespace between the recognizable information. Using this site, I learned that the whitespace is a mix of characters:
U+0008 : <control> BACKSPACE [BS]
U+001C : <control> INFORMATION SEPARATOR FOUR {file separator (FS)}
U+0010 : <control> DATA LINK ESCAPE [DLE]
U+0002 : <control> START OF TEXT [STX]
U+001A : <control> SUBSTITUTE [SUB]
U+001E : <control> INFORMATION SEPARATOR TWO {record separator (RS)}
U+006A : LATIN SMALL LETTER J
U+0007 : <control> BELL [BEL]
U+0008 : <control> BACKSPACE [BS]
U+0001 : <control> START OF HEADING [SOH]
U+0012 : <control> DEVICE CONTROL TWO [DC2]
U+0003 : <control> END OF TEXT [ETX]
U+0048 : LATIN CAPITAL LETTER H
U+004E : LATIN CAPITAL LETTER N
U+004C : LATIN CAPITAL LETTER L
...
This suggests that I'm not decoding the data properly. I've tried some other variants of base64, but haven't had any luck.
Does anyone know how this info is encoded? Another thing I haven't been able to figure out is how the information after the underscores (8BQAFIAZgBAQ
) is encoded. Based on the behavior of the Google Flights site, I think it encodes passenger information, but it base64 decodes to only whitespace characters.
Additional Context
Two years ago I made a working version of the program which produced URLs like
https://www.google.com/flights?hl=en#flt=ORD.MCO.2021-07-16*MCO.ORD.2021-07-19;c:USD;e:1;px:2,2,0,0;sd:1;t:f
Several months ago Google changed the format they use from the above to the encoded version. I want to figure out how to recreate the encoded URLs so I can update my program instead of retiring it.