> top > docs > @sieun:1

@sieun:1 JSONTXT

Models for many natural language tasks benefit from the flexibility to use overlapping, non-independent features. For example, the need for labeled data can be drastically reduced by taking advantage of domain knowledge in the form of word lists, part-of-speech tags, character ngrams, and capitalization patterns. While it is difficult to capture such inter-dependent features with a generative probabilistic model, conditionally-trained models, such as conditional maximum entropy models, handle them well. There has been significant work with such models for greedy sequence modeling in NLP (Ratnaparkhi, 1996; Borthwick et al., 1998). Conditional Random Fields (CRFs) (Lafferty et al., 2001) are undirected graphical models, a special case of which correspond to conditionally-trained finite state machines. While based on the same exponential form as maximum entropy models, they have efficient procedures for complete, non-greedy finite-state inference and training. CRFs have shown empirical successes recently in POS tagging (Lafferty et al., 2001), noun phrase segmentation (Sha and Pereira, 2003) and Chinese word segmentation (McCallum and Feng, 2003). Given these models’ great flexibility to include a wide array of features, an important question that remains is what features should be used? For example, in some cases capturing a word tri-gram is important, however, there is not sufficient memory or computation to include all word tri-grams. As the number of overlapping atomic features increases, the difficulty and importance of constructing only certain feature combinations grows. This paper presents a feature induction method for CRFs. Founded on the principle of constructing only those feature conjunctions that significantly increase loglikelihood, the approach builds on that of Della Pietra et al (1997), but is altered to work with conditional rather than joint probabilities, and with a mean-field approximation and other additional modifications that improve efficiency specifically for a sequence model. In comparison with traditional approaches, automated feature induction offers both improved accuracy and significant reduction in feature count; it enables the use of richer, higherorder Markov models, and offers more freedom to liberally guess about which atomic features may be relevant

projects that include this document

Unselected / annnotation Selected / annnotation
Test_project (377)