This howto will explain how to get Nutch, Nutch-Gui, Sun JDK & Tomcat 6.0.16 working on Centos 5.x or 6.x while maintaining a normally functioning Centos system. Currently, Centos 5.x ships with Tomcat 5.5, however, while it does run, there are problems with the default install of this version that results in errors which are undocumented and persistent at this time. If you have information or believe that these errors have been addressed and can point to a fix, please use the contact form on this website to let us know. The following instructions allow for easy removal of any software installed through following this howto by either using “rpm -e foo.rpm” or “rm -rf /opt/foo” returning your system to its original state.

Applicable to Centos Versions:

  • Centos 5.x
  • Centos 6.x

Requirements

Explanation of requirements.

  1. Root or sudo access with appropriate privileges to the system you intend to install on.
  2. A server preferably on a high-speed network.
  3. Sun JDK rpm.bin.
  4. Tomcat 6 rpm.
  5. Nutch 1.0 tar.gz.
  6. Nutch-Gui 0.2 tar.gz.

Doing the Work

Basic description of what will be done and what is expected.

  1. Install a few dependencies:
  2. Download & install the latest Sun JDK rpm.bin:
  3. Download & install Tomcat 6:
  4. Download & install Nutch 1.0:
  5. Edit /etc/profile:
  6. Configure Nutch to fetch URLs:
  7. Nutch “deepcrawler” script:
  8. Fetch URLs with Nutch via command line:
  9. Download & install Nutch-Gui 0.2:

Troubleshooting / How To Test

Explanation troubleshooting basics and expectations.

  1. Make sure the required packages are installed and JAVA_HOME path variable is set in /etc/profile:
  2. Set Tomcat to start on boot:

Common problems and fixes

Describe common problems here, include links to known common problems if on another site

More Information

Any additional information or notes.

Disclaimer

We test this stuff on our own machines, really we do. But you may run into problems, if you do, come to #centoshelp on irc.freenode.net

Added Reading

Last Modified: 2 Feb, 2012 at 17:39:21