STANFORD TOOLS

I am planning to release the following programs:

  • JavaGHKM (coming soon)
    A program that extracts synchronous grammar rules from any aligned corpus of target-language trees (e.g., English) and source-language sentences (e.g., Chinese). This is a new implementation of the work described in my NAACL-04 and ACL-06 papers. This distribution contains a rule extractor, and does not provide any decoder. A decoder that can handle arbitrary (n-ary) synchronous grammar rules can be found here.
COLUMBIA TOOLS

The following programs were written while I was at Columbia University. I continue to maintain them, though they are still distributed by Columbia. Under restrictions imposed by Columbia, these programs can only be used for research and educational purposes. To get access to the two first programs in the list, any potential user needs to print and fax one license agreement for each program (see below). Once approved, download instructions will follow.

  • LCseg [ license agreement ]
    A domain-independent discourse segmenter based on lexical cohesion. It divides unrestricted texts into topically cohesive units. This work is described in my ACL-03 paper.
  • LexChainer [ license agreement ]
    A tool that uses WordNet on unrestricted texts for finding lexical chains, chains of semantically related words. This tool also does word sense disambiguation to ensure that words appearing in the same chain have related meanings. This work is described in my IJCAI-03 paper.
  • NXT transcription extraction tool
    dump_meeting is a small program that creates plain-text meeting transcriptions and annotation from NXT-encoded meeting data. It currently supports extraction of transcriptions, extractive summaries, dialog acts, adjacency pairs, and topic segmentation. Various printing options are available (e.g., punctuation, case sensitive, ASR-like), and turn segmentation may be arbitrarily defined (dialog act units, silence-based, or as specified by the user). Detailed usage is available here.

Last modified:
Valid HTML 4.01 Transitional