The US National Security Agency seeks industry partners to license and commercialize a powerful text and data similarity software measure known as KODA. KODA has outstanding potential for applications such as text summarization, outline creation, data comparison, and query based data content searches for search engines, knowledge management and data mining, and analysis.
KODA is a patent pending method developed by the NSA for text summarization in a manner that relies solely on the text itself and does not rely on any information that is external to the text. KODA identifies at least one set of textual units in the text that best summarizes the text, where a textual unit may be one or more words, ASCII characters, graphics (like musical notes), phrases, sentences, paragraphs, etc. KODA does not require the use of a dictionary or a collection of exemplary text for a particular topic. KODA was expressly designed to overcome dependencies on summarization training data sets that cause errors due to formatting, linguistics, language, definition and introductory sentences or phrases.
- Enhanced performance on heterogeneous textual information like scientific data and non-structured documents
- Results are not influenced by “first sentence” of article dependency
- Results are independent of language and linguistic patterns
- Does not utilize training data, dictionaries or other look-up references
- Can be easily integrated and adapted into a variety of systems and platforms
- Allows large data throughput on standard desktop computer equipment
KODA has been successfully tested on a variety of document types and compared with a number of summarization methods. KODA is patent pending and is available for licensing from the National Security Agency. The software can be easily demonstrated. Collaborative R&D with the NSA is a possibility.