[PLUG] Swish++ is very good. (Re: Search engines, robots.txt)
Karl M. Hegbloom
karlheg at pdxlinux.org
Tue Dec 3 21:05:52 UTC 2002
On Tue, 2002-12-03 at 09:54, Mark Griskey wrote:
> If you are compiling from source, you may need to lower the Word_Threshold
> value in config.h. I have found that even on the most robust Linux machines,
> the default is too high, which according to the comments, was set for a Sparc
> machine running Solaris. Also be sure to make sure the defined temp
> directory is adequate. Actually, Swish++ won't compile until you define this
> variable.
I will assume, at least for now, that the Debian package's maintainer
has chosen an appropriate value for that threshold setting. I have
changed the temp directory via the conffile, since "/tmp", on drizzle,
is a tmpfs, and the comments in the sample conffile indicate that it can
use up a lot of space when a lot of files are being indexed. The /tmp
tmpfs is configured to a 128Mb size limit, and that might fill up.
> Also, you can play around with the parameters in config.h, it defines things
> such as what characters are letters, how many consecutive consonants a word
> may have, etc. This is very useful if you need to tailor the search to fit
> your needs. ( We needed to be able to search for ISBN numbers. )
What exact change did you make to make it work for ISBN numbers, please,
in case I want to index BibTeX files with ISBN numbers in the entries?
Do you know if the filters get run more than once, so that, for
instance, a .ps.gz file gets first unzipped and then has pstotext run on
it, or will I need a special filter set up for .ps.gz files?
More information about the PLUG
mailing list