Daniel Cer - Research Paper Notes

Semantic Parsing

Mixing Weak Learners in Semantic Parsing, Rodney D. Neilsen, Sameer Pradhan, In proceedings of EMNLP 2004 [PDF]
- Uses random forests for semantic parsing
- Suggests cool feature space dimensionality reduction algorithm.
- Features drawn from Gildea and Jurafsky 2002, Pradhan et al 2003, & Surdeanu et al. 2003.
  - Pradhan et al 2003 & Surdeanu et al. 2003 features
    - Name entities
    - Head word POS
    - Content word
    - Verb Cluster
    - Half Path
- Introduces 2 features: governing preposition (GP), & content word base (normalized content word - singular form, no prefixes, digits replaced with 'n').
Using Predicate-Argument Structures for Information Extraction, Mihai Surdeanu, Sanda Harabagin, John Williams, and Paul Aarseth, In proceedings of ACL 2002 [PDF]
- Uses semantic parsing for information extraction
- Introduces named entity based features, POS features, as well as 'content' word feature (i.e. the content word is a heuristically determined informative word, that in many cases will be different from the head of a constituent)
- Semantic parsing based on a decision tree classifier (C5).
The Necessity of Parsing for Predicate Argument Recognition, Daniel Gildea, & Martha Palmer, In proceedings of ACL 2002 [PDF]
- Gildea & Jurafsky features and back-off based semantic parser applied to (early version of) PropBank rather than FrameNet
- Compares and contrasts Framenet and PropBank results
  - Using automatic parsers, performance on FrameNet is better - (P:64.6,R:61.2) vs (P:57.7,R:50.0)
- Examines the value of accurate parses in the semantic role labeling task
  - Using 'gold standard' parsers, PropBank performance increases to (P:71.1,R:64.4), and subsequent filtering out of under represented examples further increases performance to (P:73.5,R:71.7).
  - Conclusion --> Having good parses it critical to achieving good performance
- Argues for the necessity of syntactic parsing in semantic role labeling by comparing system that uses features extracted from a parse tree to a system that relies on an idealized* base-level constituent chunker (* - constituent chunks are based on gold standard parses)
  - Chunker based system is a bit of a 'straw man', for a better one see Hacioglu et al. 2003 (HTL proceedings)
  - Chunker based system has terrible performance using even a very liberal scoring metric (P:49.5,R:35.1), with strict scoring the results are even worse (P:27.6,R:22.0)
  - However, the system does illustrated the relative importance of two features, head word & path, in the performance of the parse tree based system.
Target Word Detection and Semantic Role Chunking using Support Vector Machines, Kadri Hacioglu, & Wayne Ward, In proceedings of HLT-NACCL 2003 [PDF]
- Demonstrates that better than expected semantic role labeling performance can achieved by a chucking based system
  - Competitive with Gildea & Jurafsky 2002
  - achieved overall (P:67.6,R:55.9) - in comparsion Gildea & Jurafsky achieved (P:65.0,R:61.0).
- Used FrameNet data set with the same set of semantic role mappings used in Gildea & Jurafsky 2002.
- Performed both the task of identification of the target word and labeling of semantic roles
  - achieved (P:76.8,R:73.1) when identifying the target word
- Features - all contained in five word sliding window around the current word being labeled: word identities, part of speech, constituent chunk information, and classifier labels assigned to two proceedings words
- Implemented using Yamcha/ TinySVM.
Maximum Entropy Models for FrameNet Classification, Micheal Fleischman, Namhee Kwon, & Eduard Hovy, in proceedings of EMNLP 2003 [PDF]
- Training & evaluation done using the same division of the FrameNet data set seen in Gildea & Jurafsky 2002
- Uses maximum entropy to estimate model probabilities
  - Just using features drawn from G&J 2002, using maximum entropy, rather than the G&J back off model, results in an increase in performance from 78.5 to 81.7 % on the labeling task.
- Introduces 3 new features for used in the labeling task
  - New Features:
    - Order - the linear position of the frame element in the context of the other frame elements to be labeled (i.e. whether the frame element is the first, second, third, etc. frame element in the sentence that is to be labeled)
    - Syntactic pattern - a global feature for a sentence that reflects the phrase type & logical function (governing cat in G&J 2002) of each frame element to be labeled, ordered according to their linear position in the sentence with the position of the target reflected by a 'target' entry in the list
    - Previous role - a feature that reflects the role assigned to the previous frame element or roles assigned to the previous 2 frame elements. Using this feature involves performing a viterbi search over possible assignments of roles to the frame elements
  - Using syntactic pattern & order, in addition to the baseline features, increases performance to 83.6 on the labeling task.
  - Using all three new features (+baseline features) increases performance to 84.7 on the labeling task.
- Frame identification - performed using only three sets of features: path, path /\ target, & target /\ head word.
  - Achieves (P:.736,R:.679) on the pure identification task
  - When identification is combined with frame element classification, the system the performance is (P:.6,R:.554).
  - In comparison G&J 2002 report (P:.726,R:.631) and (P:.67,R:.468), on pure identification & identification + labeling, respectively.
- Experiments involving varying the size of the training set suggest that further increases in the amount of data available could substantially improve performance on the classification task, but with smaller gains on the identification task. This suggests a more sophisticated model is necessary for further improvements in frame element identification.
SENSEVAL Automatic Labeling of Semantic Roles using Maximum Entropy, Namhee Kwon, Michael Fleischman, & Eduard Hovy, In proceedings of Senseval-3 (2004) [PDF]
- Builds on Fleischman, Kwon & Hovy 2003
- Uses Features drawn from Gildea & Jurafsky 2002, and Fleischman, Kwon, & Hovy 2003.
- Incorporates three new features motivated by the task definition for the senseval track on automatic labeling of semantic roles.
  - Frame - identity of the frame
  - lexical unit - base form of the predicate being labeled represented in conjunction with the predicate's grammatical type (i.e. verb, noun, adjective)
  - lexical unit type - grammatical type of the predicate (i.e. verb, noun, adjective), as derived from the representation of the lexical unit.
- Introduces a 'partial path' feature - identical to standard path feature iff constituent being labeled is under the same "S" as the target word. Otherwise, it is set to the value "nopath".
- Introduces a sentence segmentation step into the labeling pipeline
  - New pipeline: segmentation->identification->tagging
  - Segmentation consists of selecting the sequence of constituents that covers the entire input sentence and are at the highest level possible in the parse tree while still allowing the target to be contained within it's own segment
  - Advantages:
    - Less candidate FEs during identification step (results in faster training)
    - Allows the identification of FEs to be done using a straight forward application of a MEMM to the resulting sequence of segments. This allows for easy inclusion of features that encode dependencies between a FE labels in the sequence
  - Disadvantage: while 85.8% of FEs correspond to a constituent in the parse tree, only 79.5% of the FEs correspond to the resulting segments. Thus, the upper bound on successful FE identification is lowered.
- Performance
  - Using scoring script from senseval - Restricted task (id+labeling): (P:.802,R:.654) label overlap 0.784; Unrestricted task (labeling): (P:.867,R:.858) label overlap 0.866
  - Using exact match scores for the test set - Restricted task (id+labeling): (P:.711,R:.585); Unrestricted task (labeling): (P:.867,R:.858)
  - Best performing senseval-3 system (UTDMorareseu) - Restricted task (id+labeling): (P:.899,R:.772) label overlap 0.882; Unrestricted task (labeling): (P:.946,R:.907) label overlap 0.946
SENSEVAL-3 TASK Automatic Labeling of Semantic Roles, Kenneth C. Litkowski, In proceedings of Senseval-3 (2004) [PDF]
- Summarizes senseval-3 automatic semantic role labeling task and results
- Used subset of FrameNet 1.1 database
  - Complete FrameNet 1.1 database - 132,968 annotated sentences, 487 frames, 696 distinctly-named frame elements
  - For this task: 8,002 annotated sentences (in the evaluation set), 40 frames
  - Frames choosen at random from those with at least 370 annotations
- Answers sumbitted as a plain text file with the results for one annotated sentence per line. Each such line indicates the frame of the sentence, the sentence's unique id number, the semantic roles present in the frame as well as the character positions that those semantic roles occur in the sentence (e.g. "Motion.1087911 Theme(82,88) Path(0,0)").
- Null instantions, i.e. semantic roles that are conceptually present but not explicitly represented in the sentence, can be indicated by a system by a semantic role with start and end positions of 0.
- Two varietations on the task
  - Restricted - The information the system can used during evaluation is restricted to: the sentence to be labeled, the the identity (/position of the) target predicate, & the lexical unit
  - Unrestricted - During evaluation the system can use all of the information in the FrameNet database except for the identity of the frame element to be labeled
- Scoring
  - Precision & Recall of semantic roles returned by the system - whereby such semantic roles must overlap with the manually annotated semantic roles by at least on character position
  - Overlapp - the degree to which correctly labeled semantic roles returned by the system overlap with manually annotate semantic roles, as measured by fraction of overlappping characters
  - Attempted - number of semantic roles returned by the system divided by the number of manually annotated semantic roles.
  - Null intantiations did not effect any of the scoring metrics
- Submissions: 20 systems by 8 teams
- Results:
  - Restricted task
    - Average - Prec: 0.803, Rec: 0.757
    - Best(UTDMorarescu) - Prec: 0.946, Rec: 0.907, Over: 0.946, Att:95.8
  - Unrestricted task
    - Average - Prec: 0.595, Rec: 0.481
    - Best(UTDMorarescu) - Prec: 0.899, Rec: 0.772, Over: 0.882, Att: 85.9
Semantic Parsing Based on Framenet, Cosmin Ardian Bejan, Alessandro Moschitti, Paul Morarescu, Grabriel Nicolae, and Sanda Harabagiu, In Proceedings of Senseval-3 (2004) [PDF]
- Best results for SENSEVAL-3 Automatic labeling of semantic roles bakeoff
- SVM based system
  - For the labeling task, a separate multi-class SVM based classifier was trained for each frame
  - Each multi-class classifier used for the labeling task was implemented as a set of one vs. all (OVA) binary classifiers. In case of two binary classifiers attempting to assign a label to the same frame element, they "select(ed) the classification which was assigned the highest score by the SVM". Presumably this means the output of the SVM in which the FE was the furthest away from the decision boundary
  - While not entirely clear from the paper, I would guess that separate binary classifiers, one for each frame, were used for the identification task as well
- Features
  - Drawn from Gildea & Jurafsky 2002, Surdeanu et al. 2003, and Pradhan et al. 2004.
    - Gildea et al. 2002: phrase type, parse tree path, positition, voice, head word, governing category, target word
    - Surdeanu et al. 2003: Content word, part of speech of head word, part of speech of content word, named entity class of content word, boolean named entity flags
    - Pradhan et al. 2004: Parse tree path w/o direction of transitions, partial path, first word in constituent, last word in constituent, part of speech of first word, part of speech of last word, left constituent phrase type, left constituent head, left constituent head POS, right constituent phrase type, right constituent head, right constituent head POS, pp preposition, tree distance to target
  - Introduced a ton of new features
    - Human - true if phrase is a personal pronoun or is a hyponym of PERSON sense 1 in WordNet
    - Support verb - if the target is in a VP, then is set to the POS of the VP. Otherwise, it's set to NULL.
    - Target type - the target's lexical class
    - List constituent FEs - a list of the phrase types of the frame elements in a sentence
    - Grammatical function - as given in the FrameNet database for each labeled FE
    - List grammatical function - a list of the grammatical functions of the frame elements in a sentence
    - Number FE - the number of frame elements in a sentence
    - Frame name - the name of the frame under which the frame elements are to be labeled
    - Coverage - Boolean feature that indicates whether or not the FE is perfectly covered by a constituent in the parse tree
    - Coreness - whether the frame element being labeled is core, peripheral, or extra-thematic
    - Sub corpus - the subcorpus that contains the sentence being labeled
  - For identification task features used are those from Gildea et al. 2002, Surdeanu et al. 2003, Pradhan et al. 2004, as well as support verb, target type, frame net, and sub corpus for the new feature set defined above
- Parse normalization heuristics introduced:
  - For the unlabeled task, if there is no one constituent that exactly matches the boundaries of the FE but there are a sequence of constituents that are exactly contained within it, then a new NP-merge node is introduced to join the smaller constituents
  - If the target word is a noun or an adjective, consecutive nouns within the same noun phrase of the target are combined into a new larger NP
- Results
  - Unrestricted: (P:.945,R:0.906)
  - Restricted: (P:.824,R:.711)
Calibrating Features for Semantic Role Labeling, Nianwen Xue and Martha Palmer, In proceedings of EMNLP 2004 [PDF]
- Theme - Prior research has not fully exploited all of the information that is present in a parse tree and useful for semantic parsing. Further, it is possible to only use parse tree based features, along with log-linear classifiers rather than SVMs, in order to obtain performance that is very close to more sophisticated systems.
- Dataset - Propbank (version released on 2/4/2004)
- Classifier - Maximum entropy. Given the amount of training time required for maximum entropy relative to SVMs, they can be used to much more rapidly explore what features are most effective.
- Features - based on features given in Gildea & Jurafsky 2002, with a couple of new features and with the rest being carefully engineered conjunctions of features:
  - Identification
    - Path
    - Head
    - Head part of speech
    - Phrase type /\ predicate
    - Head /\ Predicate
    - Predicate /\ Distance from predicate
  - New features (/feature combinations) for classification
    - Syntactic frame - list of the FE phrase types being labeled, with the targets position being indicated with 'target', and with flagging of the current FE being labeled (e.g. np_v_NP_np, whereby NP corresponds to the FE currently being labeled)
    - Head of PP parent - if the constituent is immediately embedded in a PP, then the head of that PP.
    - Lexicalized constituent type - position /\ voice
    - Lexicalized head word - head /\ predicate
    - Voice position combination - phrase type /\ predicate
- Indroces novel way of filtering arguments to be labeled. That is, starting with the predicate, the system collects all of it's immediate sisters in the tree. If one of the sisters is a PP, then the system collects all of the PPs immediate children as well. The process the repeats of the parent of the predicate, and after that the parent's parent. The process continutes until the top of the tree is reached. Using gold standard parses, 99.3% of the arguments are caputured. Using parsers from the Collins parser, 88.9% of the arguments are captured.
- Results
  - Gold standard parses - Classification: 92.95%, Identification & classification: 88.51 % (F-score)
  - Automatic parsed - Identificaiton & classication: 76.21%
  - Comparsion made to Pradhan et al. 2004 (using gold standard parses) - Classification: 93.0%, Identification & classification: 89.4% (F-score)
Shallow Semantic Parsing using Support Vector Machines, Sameer Pradhan, Wayne Ward, Kadri Hacioglu, James H. Martin, Dan Jurafsky, In proceedings of HTL 2004 [PDF]
- Theme - Using features from Gildea & Jurafsky 2002, and a couple of features from Surdeanu et al. 2003, as well as a large number of novel features, the authors build what was at the time of publication the best performing shallow semantic parsing system. Experiments are also done to assess the value of the various features used, explore the use of HMM rescoring of a labeled sentence, as well as preformacne of the system on different versions of the propbank corpus and a new corpus based on data from the AQUAINT project data
- Dataset - PropBank July 2002 release, as well as a handful of experiments with the ProbBank Feb 2004 release.
- Classifier - SVM w/polynomial kernel degree 2 (Yamcha + TinySVM)
- Features:
  - Drawn from Gildea & Jurafsky 2002 and Surdeanu et al. 2003
    - Predicate (G&J)
    - Path (G&J)
    - Phrase type (G&J)
    - Position (G&J)
    - Voice (G&J)
    - Head word (G&J)
    - Sub-categorization - phrase structure rule used to expand the predicate's parent node (listed a being from Gildea & Jurafsky; although I don't remember this feature being in that paper)
    - Named entities in constituent (Surdeanu et al. 2003)
    - Head word POS (Surdeanu et al. 2003)
  - Novel features
    - Verb clusterings - one of 64 clusters of verbs where by the clusters were created using a collection of verb-direct-object relationships collected by Minipar (Lin 1998)
    - Verb sense information - as tagged in the PropBank corpus (an "oracle" feature)
    - Partial path - path from constiteunt being labeled to the constitient that is the lowest common ancestor of the constituent being labeled and the predicate
    - Head word of prepositional phrases - For prepositional phrases, the head of the first noun phrase inside the prepositional phrase. The preposition is preserved by attaching it to the phrase type (It sounds like this feature replaces the traditional head feature for PP, rather than just adding in another feature to the mix)
    - First and Last Word/POS in Constituent
    - Ordinal constituent position - linear distance, in intervening words, to the predicate.
    - Constituent tree distance - distance to the predicate as measured by tree archs
    - Constituent relative features - Head, POS, and phrase type of the constituent's parent as well as left & right siblings
    - Temporal cue words - keywords that attempt to flag temporal expressions often missed by the named entity tagger
    - Dynamic class context - the predicated class of the previous two arguments labeled
- HMM rescoring
  - Use plautt's algorithm to get probablities from SVM's
  - Trigram language model trained over core arguments
  - Two variations - One with a identical representation for all predicates and one with predicates being represented by a specific lemma
  - Baseline - (P:.900,R:.861,F:.880), Shared predicate representation - (P:.980,R:.863,F:.885), Specific predicate lemma - (P:.905,R:.874,F:.889)
- Best Performance
  - July 2002 data, manual parses - (all args) Classification: 91.0%, ID+Classification (P:.889,R:.846,F:.867); (core args) Classification: 93.9%, ID+Classification (P:.905,R:.874,F:.889)
  - July 2002 data, automatic parses - (all args) Classifcation: 90.0%, ID+Classification (P:.840,R:.753,F:.794)
  - Feb 2004 data, manual parses - (all args) Classification: 93.0%, ID+Classification (P:.899,R:.890,F:.894)
  - AQUAINT data - (all args) Classificaion: 83.8%, ID+Classification (P:.652,R:.615,F:.633)

Machine Translation

Scaling Phrase-Based Statistical Machine Translation to Larger Corpora and Longer Phrases Chris Callison-Burch, Colin Bannard, and Josh Schroeder, ACL 2005 [PDF]
A Hierarchical Phrase-Based Model for Statistical Machine Translation, David Chiang (ACL 2005 best paper award) [PDF]