Does Google BigQuery support Full-Text search
Asked Answered
A

3

10

I'm thinking about copying my text searchable content to Google's BigQuery and then perform full-text search using BigQuery API.

Does Google BigQuery support that scenario?

I could not find "search" command in Google BigQuery API: https://developers.google.com/bigquery/docs/reference/v2/

Arytenoid answered 29/6, 2012 at 15:43 Comment(0)
B
7

BigQuery support a collection of RegEx and String query functions, making it suitable for text search queries across STRING fields. However, there is a 64k per row (and field) limit for each BigQuery record, so it may not possible to support a totally unstructured, unlimited size, document text search case.

For a full text search capabilities in an App Engine application, I would suggest looking at the new Search API:

Blus answered 29/6, 2012 at 16:28 Comment(6)
How long would it take to perform text search against 1M records with 10K each? Would it be usable from end user's perspective?Arytenoid
Remember that that BigQuery returns aggregate results over table scans, so full text searching for very common values will return very large datasets. As for performance, you should just try this yourself on one of our public datasets. The publicdata:samples.wikipedia dataset contains over 300,000,000 rows. RegEx match: SELECT TOP(comment, 10), COUNT() FROM [publicdata:samples.wikipedia] WHERE (REGEXP_MATCH(comment,r'[Wisconsin]')); You can also use "CONTAINS" SELECT TOP(comment, 10), COUNT() FROM [publicdata:samples.wikipedia] WHERE comment CONTAINS "Britney Spears";Blus
Would that take more or less than 1 second?Arytenoid
The Britney Spears query takes 3.5 seconds. And it's missing an asterisk. Corrected version: SELECT TOP(comment, 10), COUNT(*) FROM [publicdata:samples.wikipedia] WHERE comment CONTAINS "Britney Spears"Ulceration
@DennisGorelik the wisconsin regex query took 7.6 seconds in the console. which for 300 million rows is pretty good goingBosporus
The litecene library for Java implements a user-friendly boolean query syntax with a transpiler to a BigQuery SQL predicate: github.com/sigpwned/litecene. It makes use of regular expressions and/or the new BigQuery SEARCH function, depending on the query and availability of a search index. Disclaimer: I am the library's author.Horny
P
7

10 years late and here we are. Today (07/04/22) BigQuery launched It equivalent of Full Text Search. Here is the doc:

https://cloud.google.com/blog/products/data-analytics/pinpoint-unique-elements-with-bigquery-search-features/

Polytonality answered 7/4, 2022 at 22:31 Comment(1)
The feature is obviously still in development, since the SEARCH function only implements the equivalent of single-word AND queries. But it'll be interesting to see how the feature evolves over time!Horny
H
2

The litecene library provides full-text search support for BigQuery using a "lucene light" syntax.

(smartphone OR "smart phone"~8 OR iphone OR "i phone" OR "apple phone" OR android OR "google phone" OR "windows phone") AND app*

It compiles the boolean query language down to regular expression matches. It also makes use of new BigQuery search features -- namely the SEARCH function and search indexes -- when possible, although at the time of this writing the searches supported by those features are fairly limited. Using litecene, full-text search can also be deployed against existing production datasets without any ETL changes or re-indexing using non-aggregate materialized views. The searches can target one or multiple columns.

Disclaimer: I am the author of the library.

Horny answered 13/6, 2022 at 15:16 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.