|
|
This page is intended to be a guide for new members of Java NLP. As well as helpful how-to information, it contains some important information on policies and practice. Please email suggestions/corrections to William Morgan.
Java NLP is a subgroup of Stanford's NLP group. We share a repository of utility and NLP-related java code, which we are constantly improving and increasing.
Most code falls into one of three categories. There is utility code, which is not NLP specific. The second category is usable NLP code. We have a state-of-the-art parser, part-of-speech tagger, and sequence classifier (for doing tasks such as named entity recognition) to name a few. The last category is works-in-progress. People store their research code in Java NLP so that they can easily access other code.
Sometimes, parts of the repository are released to the
public, on the software page.
What are the benefits of being a member?
The primary benefit of being a member is that you have access to lots of other people's
really useful code. Would you like to use a sentence's parse tree in your current project?
A word's part-of-speech tag? A nice way of keeping/manipulating counts of things? Java NLP
has these things and many more. Also, you know the authors, so getting explanations of how
to make something work is easy. A secondary benefit, is that you have more eyes looking for,
and sometimes fixing, bugs in your code. Oh, and you get free pizza.
What are the drawbacks of being a member?
As a member you must attend regular meetings. Currently we meet at 12:15 on Fridays in the room next to Chris's office.
At these meetings we discuss ways to improve Java NLP, which usually consist of
adding/removing/merging code, adding documentation and reorganizing code structure. We
assign tasks at these meetings and you should try to complete your task by the next meeting.
What are the alternatives to membership?
Sometimes people working with the NLP group have just been users of the code, and that's okay for particular projects if you just want to use a tool. But even as a user you should make sure you are familiar with and abide with the contents of this document. Note: JavaNLP is typically not available to people not working with the Stanford NLP Group.
We use Subversion to manage our source code. Please read the
JavaNLP SCM guide.
IntelliJ
Because Java is such a defective language, some people find it
useful to compensate by using an IDE such as IntelliJ. IntelliJ
has a lot of useful features. It can do tasks like
refactoring for you. Look at the IntelliJ
guide for more advice on getting started.
ant
We use "ant" to compile our code. Ant is a more verbose, less functional version of make for Java. Its inane XML syntax is pretty much a direct attack on productivity. We use it because the people who set it up have graduated and no one wants to touch it now.
In order to get the repository to compile, your classpath needs to
be set correctly. It must include the jars for all of the
third-party resources that we use. Just add the line source
/u/username/javanlp/bin/setup.csh to your .cshrc
file. For bash users, add setup.bash instead.
William wrote a helper so that bash can auto-complete Java class
names. Run ant complete from your JavaNLP directory
for instructions.
We have JavaDocs for the
repository. When you update/add javaDocs in your code, you can
update the JavaDocs on the
web immediately. But the JavaDocs are rebuilt each night, so
most people just wait 24 hours.
Repository Browser
Coming soon!
Our Machines
See the machine info page.
Rules
Repository Must Compile
The repository MUST compile. Make sure your code compiles before
you commit it. If you edit/add more than one file, make sure you
commit all of the files. If you make a change to code that other
code uses, make sure to fix that code as well.
Repository Should Not Be Distributed
You should not give or distribute any part of the JavaNLP repository to third parties without first consulting with the JavaNLP membership. You should also make sure that checked out copies of JavaNLP code are appropriately protected so that not just anyone can read them. Where possible we try to make JavaNLP code available, and quite a bit of it is available on the Software page, but there are a number of dimensions to be respected, including grants with restrictive intellectual property clauses, unfinished and unpublished research in progress, and just the wishes of the primary authors of the code.
It's good if the JavaNLP Javadoc is useful and usable. We're not always perfect about providing good documentation for everything (athough it is certainly a good thing to aspire to). However, you should make sure that you at least do the following things:
Thou shalt use space characters instead of tabs for indentation.
Thou shalt also use an indent width of 2 spaces. See javaNLP coding conventions.
Log Messages
When you commit files, Subversion asks for a log message. Please put a human-understandable and specific log message. These help other people know what's happening, will likely help you if you have to go back and figure something out about your code. They don't have to be long or fancy, or contain correct grammar, they just need to be a little bit informative.
svn diffBe aware of what files you are changing how when you are doing a
commit. It's easy to accidentally commit code with half-done changes
or adding debugging that you did not intend to commit, but which you
happened to write last week and which is still in your JavaNLP
directory. It's a really good idea to do a svn status
and svn diff
before any svn commit so you really know for sure what
changes you are committing to the repository.
If you use IntelliJ to create files it will create a file header
for you. Counter-intuitively, this header won't even be proper
JavaDoc format. So if you use IntelliJ, please go to Options >
File Templates and fix it so it inserts something sensible.
java-nlp-list
As either a member or a user, you must subscribe to the
java-nlp-list mailing list (send a message body of
subscribe java-nlp-list to
majordomo@lists.stanford.edu. This is the only way you'll
hear that you've broken code or that there are problems or when meetings
are.
"dev" is a sister repository to Java NLP where people can store code that they are working
on, but aren't yet ready to add to Java NLP. Every member who wants one can have their own
subdirectory in dev in which to keep whatever they wish. Code in dev can, and usually will,
make use of code in Java NLP. While Java NLP is always expected to compile, this is not the
case with dev. But be careful - if someone changes code in Java NLP which makes code of
yours which is in Java NLP not compile, they will also go into your code and update so that
it compiles again. They will do no such thing in your dev directory, meaning its possible for
you to write something in dev which compiles. step out for a bathroom break, come back and
have it no longer compile. So you should be motivated to get things from dev into Java NLP
as quickly as possible.
Useful non-JavaNLP Tips
Viewing/Killing Jobs
There are many ways to view and kill jobs. The easiest is probably top, which you can type at the command
to see the top jobs running on the machine, who's running them, how long they've been running, how much of the CPU and
memory they're taking up, etc. You can type 'u' and then a username to get that person's jobs. Also, each job will have a
number next to it, to kill a job type 'k' and then the number. And don't worry, you can't kill other people's jobs.
Leaving Programs Running After You Logoff
If you want a process to not be terminated when you log off, nohup it:
nohup java myJavaProgram &
Which machine should I run my job on?
See William's guide to working with multiple machines.
More Questions?
Don't be shy! If you have a question, the easiest way to get an
answer is probably to ask at a JavaNLP meeting or to write to the
java-nlp-list mailing list, where multiple people can
help. Or, if you have other questions, you can try William, the Java NLP czar, or Chris
[manning@cs.stanford.edu],
our beloved advisor.
|
Local links: NLP lunch · PAIL lunch · NLP Reading Group · JavaNLP (javadocs) · machines · Wiki |
Site design by Bill MacCartney |