NLP

2014-09-11

If L separates any pair of words {w1, w2…wn} then if L=L(M) M must have at least n states
regular language
formal language

2014-09-16

finite state transducer
Definition: L sperates strings w, v. if for some u, exactly one of wu, vu is in L.
Thm: if L=L(M) and Delta(q0, w)=Delta(q0, v) then L does not separate w and v
deterministic finite automaton
If L(M) separates every pair wi, wj(i!=j), then M has at least n states
- index(L)=”number of partitions induced by =-L”, language separate, partition
- pumping lemma
- Myhill-Nerode theorem
nondeterministic finite automaton
- powerset construction

2014-09-18

stochastic languages

Quotes Extraction

Using machine learning to extract quotes from text

methods

regular expressions & pattern matching
- on-the-record
machine learning
- citizen-quotes is using [NLTK Maximum Entrophy Classifiers] (http://www.nltk.org/_modules/nltk/classify/maxent.html) as its ML classifier
  - it is a supervised learning
  - selected six features. Each feature is represented by a function that takes paragraph text as an input and returns either a boolean or categorical variable.
    1. common attribution words (said, asked, etc.)
      - quote marks
      - common attribution word within five words of a quote mark (“I love tacos,” Smith said.)
      - how many words does it have in quotes (helps deal with the “air quotes” problem)
      - what are the five words that fall immediately after a closed quote
      - what is the last word in the paragraph
  - maxent figures out the most useful features and weighs them, then value the input paragraph
Google inquotes is a experimental tool which is now under the shelf

Coreference resolution

Named-entity recognition

Terms to Know

Conditional independence
Language model
Noisy channel model
Bayes’ rule
Marginal Probability
Bloom filter
N-gram
Maximum Likelihood
HMM
- transition probability, emission probability
- Viterbi algorithm
  - time complexity O(T**2 * n), T is the number of tags, n is the length
- Log-liner model
- Multinomial logistic regression
  - Maximum entrophy Markov model
  - Conditional random field
    - there is algorithms for getting feature functions, NLP experts create it
    - Parameter learnign use: Gradient descent

2014-10-23

Targeted projection pursuit
sequence model