[PLUG] Swish++ is very good. (Re: Search engines, robots.txt)

Karl M. Hegbloom karlheg at pdxlinux.org
Tue Dec 3 21:05:52 UTC 2002


On Tue, 2002-12-03 at 09:54, Mark Griskey wrote:
> If you are compiling from source, you may need to lower the Word_Threshold 
> value in config.h.  I have found that even on the most robust Linux machines, 
> the default is too high, which according to the comments, was set for a Sparc 
> machine running Solaris.  Also be sure to make sure the defined temp 
> directory is adequate.  Actually, Swish++ won't compile until you define this 
> variable. 

I will assume, at least for now, that the Debian package's maintainer
has chosen an appropriate value for that threshold setting.  I have
changed the temp directory via the conffile, since "/tmp", on drizzle,
is a tmpfs, and the comments in the sample conffile indicate that it can
use up a lot of space when a lot of files are being indexed.  The /tmp
tmpfs is configured to a 128Mb size limit, and that might fill up.

> Also, you can play around with the parameters in config.h, it defines things 
> such as what characters are letters, how many consecutive consonants a word 
> may have, etc.  This is very useful if you need to tailor the search to fit 
> your needs. ( We needed to be able to search for ISBN numbers. )

What exact change did you make to make it work for ISBN numbers, please,
in case I want to index BibTeX files with ISBN numbers in the entries?

Do you know if the filters get run more than once, so that, for
instance, a .ps.gz file gets first unzipped and then has pstotext run on
it, or will I need a special filter set up for .ps.gz files?






More information about the PLUG mailing list