[PLUG] Swish++ is very good. (Re: Search engines, robots.txt)
Karl M. Hegbloom
karlheg at pdxlinux.org
Tue Dec 3 21:43:21 UTC 2002
On Tue, 2002-12-03 at 13:22, Mark Griskey wrote:
> On Tuesday 03 December 2002 01:05 pm, Karl M. Hegbloom wrote:
> > What exact change did you make to make it work for ISBN numbers, please,
> > in case I want to index BibTeX files with ISBN numbers in the entries?
>
> I changed the value of Word_Min_Vowels (The minimum number of vowels a word
> must have in order to be indexed) to 0. I read somewhere the source code
> that any character that is not a vowel (defined as aeiouy) is considered a
> consonant. Since our ISBN numbers are all numeric, with a few exceptions, I
> had to lower it to 0. I also raised the Word_Max_Consec_Consonants (...this
> many consecutive consonants) to 10. These changes were made to index
> bibliographic information for about 10,000 scientific titles that are often
> searched for using the ISBN, the title, or a committee member's name. We
> have not encountered any adverse effects but changing the vowel/consonant
> values.
Do you know about:
http://citeseer.nj.nec.com/cs
I wonder if their software is available, or if you have told their
system about your papers (provided they are CS related).
> > Do you know if the filters get run more than once, so that, for
> > instance, a .ps.gz file gets first unzipped and then has pstotext run on
> > it, or will I need a special filter set up for .ps.gz files?
>
> I am not sure on this one, but would be curious to know if it does require a
> special filter.
Ok, I'll try and remember to report the result back here.
More information about the PLUG
mailing list