Sitecore 7 pdf indexing
Asked Answered
G

2

6

I try to index PDF files with Sitecore 7. I installed IFilter , but I received on crawlers log next error :

ManagedPoolThread #17 09:24:20 WARN  LuceneIndexOperations : Update : Could not build document data 4433434-3443-3223-91c4-233232. Skipping.
Exception: System.Runtime.InteropServices.COMException
Message: Error HRESULT E_FAIL has been returned from a call to a COM component.
Source: mscorlib
   at System.Runtime.InteropServices.ComTypes.IPersistFile.Load(String pszFileName, Int32 dwMode)
   at Sitecore.ContentSearch.Extracters.IFilterTextExtraction.FilterLoader.LoadAndInitIFilter(String fileName, String extension)
   at Sitecore.ContentSearch.Extracters.IFilterTextExtraction.FilterReader..ctor(String fileName)
   at Sitecore.ContentSearch.ComputedFields.MediaItemIFilterTextExtractor.ComputeFieldValue(IIndexable indexable)
   at Sitecore.ContentSearch.ComputedFields.MediaItemContentExtractor.ComputeFieldValue(IIndexable indexable)
   at Sitecore.ContentSearch.LuceneProvider.LuceneDocumentBuilder.AddComputedIndexFields()
   at Sitecore.ContentSearch.LuceneProvider.LuceneIndexOperations.GetIndexData(IIndexable indexable, IIndexable latestVersion, IProviderUpdateContext context)
   at Sitecore.ContentSearch.LuceneProvider.LuceneIndexOperations.BuildDataToIndex(IProviderUpdateContext context, IIndexable version, IIndexable latestVersion)
   at Sitecore.ContentSearch.LuceneProvider.LuceneIndexOperations.<>c__DisplayClass7.<Update>b__0(Item version)

What I have to do work because on Sitecore documentation they said it must work out of the box.

Grandma answered 1/8, 2013 at 15:47 Comment(0)
I
5

I had the same issue and I received from Sitecore support next response (it works fine after):

1) Copy all the Adobe iFilter .dll files into the "\System32\Inetsrv" folder. This is the working directory for IIS on Windows Server. The Adobe iFilter .dll files are stored at the "C:\Program Files\Adobe\Adobe PDF iFilter 9 for 64-bit platforms\bin" folder by default. Also you can use the "IFilter Explorer" tool to detect the folder where the .dll files are stored: http://www.citeknet.com/Products/IFilters/IFilterExplorer/tabid/62/Default.aspx For more details please see the screenshot: http://screencast.com/t/xmWukanM+

2) Delete all the files under the "Website/App_Data/MediaCache" folder;

3) Rebuild the Sitecore Search Indexes (Sitecore -> Control Panel -> Indexing -> Indexing Manager);

4) Clear the Sitecore cache (the http://{hostname}/sitecore/admin/cache.aspx tool); 5) Restart the IIS;

Indictment answered 1/8, 2013 at 16:0 Comment(0)
N
2

Here is the solution I took since I didn't like the idea of coping iFilter related DLLs into the system path.

  • install Adobe IFilter 9 (I used this link). Note version 9 is essential as starting at version X they abandoned file based interface.
  • add filter location to the PATH environment variable. In my case it was %ProgramFiles%\Adobe\Adobe PDF iFilter 9 for 64-bit platforms\bin\.
  • run iisreset
  • go back to Sitecore app and run index rebuild for necessary indexes.

For your consideration:

  • while trying to resolve the issue I granted full access to IFilter folder for app pool account. I don't think it's necessary as I removed it at the end and everything was still working fine.

After these steps PDF indexing started working fine on my instance of Sitecore 7 running on Windows 8.1.

Nicolette answered 25/11, 2013 at 23:56 Comment(2)
Updated URL for those looking for it since the above link doesn't work: download.adobe.com/pub/adobe/acrobat/win/9.x/…Ipsambul
Confirmed that these steps worked for me and I did not need to modify any security on the IFilter folderCahilly

© 2022 - 2024 — McMap. All rights reserved.