The following programs were
written while I was at Columbia University. I continue to maintain
them, though they are still distributed by Columbia. Under
restrictions imposed by Columbia, these programs can only be used
for research and educational purposes. To get access to the
two first programs in the list, any potential user needs to print
and fax one license agreement for each program (see below). Once
approved, download instructions will follow.
- LCseg [ license agreement
]
A domain-independent discourse segmenter based on lexical cohesion. It
divides unrestricted texts into topically cohesive units. This work is
described in my ACL-03 paper.
- LexChainer [ license
agreement ]
A tool that uses WordNet on unrestricted texts for finding
lexical chains, chains of semantically related words.
This tool also does word sense disambiguation to ensure that
words appearing in the same chain have related meanings.
This work is described in my IJCAI-03 paper.
- NXT transcription extraction tool
dump_meeting is a small
program that creates plain-text meeting transcriptions and
annotation from NXT-encoded
meeting data. It currently supports extraction of
transcriptions, extractive summaries, dialog acts, adjacency
pairs, and topic segmentation. Various printing options are
available (e.g., punctuation, case sensitive, ASR-like), and
turn segmentation may be arbitrarily defined (dialog act units,
silence-based, or as specified by the user). Detailed usage is
available here.
|