[PLUG] Python and Natural Language Toolkit

Martin A. Brown martin at linux-ip.net
Thu Nov 19 06:19:08 UTC 2015


Good evening,

>>Oh, yes, and another item about the reference to 'data'.  The NLTK 
>>itself is software, but the project also has put a great deal of 
>>effort into independently versioned data sets that can be used by 
>>the software.  So, I'd suggest the following:
>>
>>  Step 1:  Install python-nltk (or python3-nltk).
>>  Step 2:  Find the data corpus.
>
>Thanks to you and the others who responded. 
>
>Evidently I left out the fact that I found python-ntlk in Synaptic 
>package manager and installed it without error. That part is done. 
>My problem is loading the data. To repeat what I said about this 
>previously:

Ah, well that's a horse of a different feather!

>Then I tried to follow this to download the data:
>-----
>from the ntlk.org/install page:
>To install the data, first install NLTK (see
>http://nltk.org/install.html), then use NLTK’s data downloader as
>described below.
>...
>Reading through the rest of the download options the only one that made
>any sense was:
>Run the command python -m nltk.downloader all
>-----
>
>But this command just gave a 404 (not found) error.
>
>How do I get the ntlk data?

With python-nltk-3.0.2 (running under Python 2.7.8 on OpenSUSE 13.2) 
I tried that command:

  python -m nltk.downloader all

And, it worked for me.  However, if you are getting a 404, you may 
find it helpful to look at what the URL is in your copy of the 
module called nltk.download.  Embedded Python software documentation 
is available in manpage format using something called 'pydoc'.  So, 
I ran:

  pydoc nltk.download

And found that http://www.nltk.org/nltk_data/ is the URL.

By default, the nltk.downloader will choose to create a directory 
called $HOME/nltk_data.  Personally, I prefer to have control over 
that sort of thing, so I'd draw your attention to the command-line 
option:

  python -m nltk.downloader -d ~/wip/nltk_data/ all

I'm not sure why you got an HTTP 404, but you could also run the 
nltk.downloader in interactive mode.  The documentation indicates 
that a graphical interface will be supplied if the TKinter graphical 
toolkit (and presumably some X somewhere) is available.

  python -m nltk.downloader -d ~/someplace/nltk_data/

Then you can see what URL your particular installation is using as 
the source for the NLTK data set.

If you continue to have trouble with it, you might ask them on their 
Googly group:

  https://groups.google.com/forum/#!forum/nltk-users

Enjoy and good luck,

-Martin

-- 
Martin A. Brown
http://linux-ip.net/


More information about the PLUG mailing list