Azure blob document - Full text search
Asked Answered
L

2

5

I am using Azure for hosting my project and chosen blob to store all by files (as they are in megabyte and count is huge). I have a requirement to search within all my files in blob (kind of like full text search). I tried integrating it with Azure search but no luck as the indexes are made on sql only. Is there a way to integrate the full text search in blob?

If not, what would be effective way of storing the documents in azure and still making them searchable (full text search) just like what sharepoint provides.

Lagasse answered 16/2, 2015 at 11:8 Comment(0)
B
10

I work on Azure Search. We just shipped preview support for indexing documents stored in Azure blob storage, with support for PDF, Office docs, HTML and a few other formats. Please see https://azure.microsoft.com/en-us/documentation/articles/search-howto-indexing-azure-blob-storage/ for more details.

Thanks, Eugene

Birr answered 19/2, 2015 at 16:57 Comment(5)
Hey Eugene, the files that I can have in blob is office documents and all text files. Not sure how to make office document searchable using Axure Search. I tried doing POCs but never got the result on Azure SearchLagasse
Hi Ankit, you'll want to extract textual content from the docs (because office formats contain lots of XML and markup, not just text) using something like Apache Tika or IFilters, then call Azure Search API to add your documents to an index (see msdn.microsoft.com/en-us/library/azure/dn798930.aspx)Birr
I found this link earlier but was looking for something which Azure would provide out of the box. Is opening an office file on Azure Website possible with this? As this would require Windows component (assuming)Lagasse
Ankit, take a look at the following article, it may help: wp.sjkp.dk/azure-search-pdf-indexingBirr
Some kind about doc, docx, xls, xlsx files working as SQL Server File Table?Uterus
S
0

You can try azure search which now supports cognitive search[Preview] where it does image recognition using OCR. It does a great job with pdf and all type of documents.

It works good even with scanned document.

There is an online demo from microsoft on azure search which does a great job. https://jfk-demo.azurewebsites.net/

Shewchuk answered 25/10, 2018 at 5:19 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.