07. Extracting Patterns

Shimon EdelmanShimon J. Edelman, Psychology, and colleagues developed a computer algorithm for language learning and processing that can scan text in various languages, including English and Chinese, autonomously and, without previous information, infer the underlying rules of grammar. The new method, automatic distillation of structure (ADIOS), successfully identifies complex patterns in raw text. It can take a body of text, abstract a collection of recurring patterns or rules from it, and then generate new material. The algorithm discovers patterns by repeatedly aligning sentences and looking for overlapping parts. It works for such data as sheet music or protein sequences. The development, which has a patent pending, has implications for speech recognition and other applications in natural language engineering, as well as for genomics and proteomics. It offers new insight into language acquisition and psycholinguistics. The algorithm has been tested on child-directed language, the full text of the Bible in several languages, and musical notation and applied to biological data, including nucleotide base pairs and amino acid sequences.

