Full-text search for static HTML files on CD-Rom via javascript
Asked Answered
B

7

16

I will be delivering a set of static HTML pages on CD-Rom; these pages need to be fully viewable with no Internet access whatsoever.

I'd like to provide a full-text search (Lucene-like) for the content of those pages, which should "just work" from the CD-Rom with no software installation on the client machine.

A search engine implementation in javascript would be the perfect solution, but I have trouble finding any that looks solid / current / popular...?

I did find these: + jsFind + js-search

but both projects seem rather inactive?

Another solution, besides a specific search engine in javascript, would be the ability to access local Lucene indices from javascript: the indices themselves would be built with Lucene and copied to the CD-Rom along with the HTML files.

Edit: built it myself (see below).

Beeson answered 31/8, 2009 at 12:12 Comment(0)
B
11

Well in fact I built it myself.

The existing solutions (that I could find) were unconvincing.

I wanted to be able to search a very long tree (ul/li/ul...) that is displayed as one page; it contains 5000+ items.

It sounds a little weird to display such a long tree on one page but in fact with collapse / expand it's much more intuitive than separate pages, and since we're offline, download times are not a problem (parsing times are, though, but Chrome is amazing ;-)

The "search" function provided with modern browsers (FF and Chrome anyway) have two big problems: they only search visible items on the page, and they can't search non-consecutive words.

I want to be able to search collapsed items (not visible on the screen); I want to find "one two three" when searching "one three" (just like with Google / Lucene); and I want to open just the branches of the tree containing found items.

So, what I did was:

  1. create an inverted index of words <-> ids of items from the list (via xslt) (approx. 4500 unique words in the document)
  2. convert this index to bunch of javascript arrays (one word = one array, containing ids)
  3. when searching, intersect the arrays represented by the search words
  4. step 3 returns an array of ids that I can then open / highlight

It does exactly what I needed and it's really fast. Better yet, since it searches from an independant "index" (arrays of ids) it can search when the list is not even loaded in the browser!

Beeson answered 10/12, 2009 at 1:12 Comment(2)
Are there any examples of this we could look at?Desertion
Do you have this public anywhere?Flight
S
6

Initial question was asked in '09

As of '14, there is lunr.js described as :

Simple full-text search in your browser

See the Demo, and Github repo.


UPDATE September 2016: Lightweight fuzzy-search, in JavaScript http://fusejs.io/

Schindler answered 1/4, 2014 at 22:16 Comment(0)
A
2

Zoom Search Engine can do this.

I haven't used the CD version, but I use the PHP version for my website and it works very well.

Aventurine answered 31/8, 2009 at 12:17 Comment(1)
I did look at that, thanks, but it seemed quite complex to adapt to my specific needs.Beeson
R
2

I know a lot of people use Java to write CD search applets. I have a slightly elderly list of various free and commercial programs at Search Tools for CD-ROMs and DVDs.

Rafaelrafaela answered 17/9, 2009 at 23:23 Comment(0)
D
1

Have a look at CLucene -

http://sourceforge.net/projects/clucene

http://clucene.git.sourceforge.net/git/gitweb.cgi?p=clucene/clucene;a=summary

Compiling the C++ sources into a console or a Win32 executable would make the above possible also using the Lucene technology (which I assume you'd rather want to stick with).

Durra answered 18/9, 2009 at 7:31 Comment(0)
B
0

Fullproof is a nifty little javascript library that can act as a text search for you. It would be useful in this context, but it's also useful in the "thick-javascript-webpage" model.

Bunni answered 8/2, 2013 at 2:25 Comment(0)
S
0

By configuring a single YAML file with mkdocs you can generate a static client-side search, given you keep all your source files as valid markdown. In combination with mkdocs material theme you also get a modern material UI by setting the options you need in the mkdocs config file.

Example of an mkdocs.yml with static client-side js search:

site_name: My Site
site_url: http://example.com/site
site_dir: ~/local/files/dir 
use_directory_urls: false

theme: 
    name: material
    highlightjs: false
    custom_dir: overrides
    extra:
      generator: false
    features:
      - navigation.instant
      - navigation.tracking
      - navigation.expand
      - toc.follow
      - toc.integrate
      - search.highlight
      - header.autohide
# see also: https://squidfunk.github.io/mkdocs-material/setup/setting-up-navigation/#anchor-tracking

markdown_extensions:
    - toc:
        permalink: True
    - sane_lists

nav:
    - Top-level Category:
      - "Self-Defence Against Fresh Fruit": fruit.md
      - "Piranha Brothers": piranha.md
      ...

Installing mkdocs and mkdocs-material

pip install mkdocs
pip install mkdocs-material

You can see the js search in action at both https://www.mkdocs.org/ and https://squidfunk.github.io/mkdocs-material/

Satiety answered 22/4, 2022 at 8:24 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.