invalid characters for lucene text search
Asked Answered
L

2

6

On my IndexController i have

    public function buildAction()
    {

    $index = Zend_Search_Lucene::create(APPLICATION_PATH . '/indexes');     

    foreach ($this->pages as $p) {
        $doc = new Zend_Search_Lucene_Document();

        $doc->addField(Zend_Search_Lucene_Field::unIndexed('page_id', $p['page_id']));

        $doc->addField(Zend_Search_Lucene_Field::text('page_name', $p['page_name']));

        $doc->addField(Zend_Search_Lucene_Field::text('page_headline', $p['page_headline']));

        $doc->addField(Zend_Search_Lucene_Field::text('page_content', $p['page_content']));


        $index->addDocument($doc);
    }
    $index->optimize();
    $this->view->indexSize = $index->numDocs();
    }

and i am getting error

[Tue Jan 18 16:23:32 2011] [error] [client 127.0.0.1] PHP Notice:  iconv(): Detected an illegal character in input string in /usr/share/php/libzend-framework-php/Zend/Search/Lucene/Analysis/Analyzer/Common/Text.php on line 58
[Tue Jan 18 16:23:32 2011] [error] [client 127.0.0.1] PHP Notice:  iconv(): Detected an illegal character in input string in /usr/share/php/libzend-framework-php/Zend/Search/Lucene/Field.php on line 222
[Tue Jan 18 16:23:32 2011] [error] [client 127.0.0.1] PHP Notice:  iconv(): Detected an illegal character in input string in /usr/share/php/libzend-framework-php/Zend/Search/Lucene/Analysis/Analyzer/Common/Text.php on line 58
[Tue Jan 18 16:23:32 2011] [error] [client 127.0.0.1] PHP Notice:  iconv(): Detected an illegal character in input string in /usr/share/php/libzend-framework-php/Zend/Search/Lucene/Field.php on line 222
[Tue Jan 18 16:23:32 2011] [error] [client 127.0.0.1] PHP Notice:  iconv(): Detected an illegal character in input string in /usr/share/php/libzend-framework-php/Zend/Search/Lucene/Analysis/Analyzer/Common/Text.php on line 58
[Tue Jan 18 16:23:32 2011] [error] [client 127.0.0.1] PHP Notice:  iconv(): Detected an illegal character in input string in /usr/share/php/libzend-framework-php/Zend/Search/Lucene/Field.php on line 222

and variable

$this->pages

contain array of text copied from wikipedia and i am getting error for characters — (not -) and ö for which i am getting error(i believe). i got relevent similar question at Lucene foreign chars problem which doesn't explain where to do what. Please i would be grateful if i know where to do what and also a little bit of explanation

UPDATES::iconv

 iconv support          enabled
 iconv implementation   glibc
 iconv library version  2.12.1 
Limicolous answered 18/1, 2011 at 10:44 Comment(4)
Can you check which version of iconv you are using from phpinfo()?Loadstone
@Loadstone please have a look at updatesLimicolous
Just as I suspected. I'm getting the same problem right now. Zend Lucene works well on Mac OS X and on Windows XP where the iconv implementation is in libiconv instead of glibc. I also do not know the solution yet.Loadstone
so it's linux that's giving me the problem?Limicolous
C
10

Try adding this to your bootstrap:

Zend_Search_Lucene_Search_QueryParser::setDefaultEncoding('utf-8');
Zend_Search_Lucene_Analysis_Analyzer::setDefault(
    new Zend_Search_Lucene_Analysis_Analyzer_Common_Utf8_CaseInsensitive ()
);
Cayenne answered 4/5, 2011 at 19:23 Comment(1)
Yes, thanks your very much. This got rid of the same error for me. Much appreciated.Clydesdale
C
4

Except in the bootsrap code to add the third parameter encoding for text-based indexes

$doc->addField(Zend_Search_Lucene_Field::text('page_name', $p['page_name'], 'UTF-8'));
Camey answered 8/4, 2012 at 19:43 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.