[link]
Summary by Ablaikhan Akhazhanov 5 years ago
The paper written by an international collaboration between UPenn and CUNI presents an original parsing approach that expands the conventional projective tree-based method to include the non-projective dependencies in text. Authors represent words and their relationships as vertices and edges of a complete directed graph respectively, and then employ Chu-Liu-Edmonds algorithm to find the maximum spanning trees allowing non-projective parses. Importantly, they demonstrated better accuracy for Czech language and higher efficiency (O(n^2)) compared to Eisner’s projective parsing (O(n^3)). Generality and unexpected asymptotical computational simplicity of the proposed approach attracted numerous researchers and led to the best student paper award on HLT/EMNLP conference in 2005.
Conventionally, finding a maximum projective spanning tree (MST) corresponds to finding a maximum dependency tree, which could be solved by Chu-Liu-Edmonds or Eisner algorithms. To find the solution, authors define a score s(x,y) = sum[w * f(i,j)], where (i,j) is an edge in a dependency tree y corresponding to a sentence {x}, f is a feature vector, and w is the weight vector that needs to be optimized. To extend the procedure to non-projective trees, authors apply a greedy Chu-Liu-Edmonds algorithm, where for each vertex it leaves only the incoming edge with the highest score. If the resulting graph has cycles, they are replaced with vertices, and the algorithms iterates until it converges to a tree, which must be the MST. Tarjan’s efficient implementation of the algorithms guarantees O(n^2), which dispels the notion that non-projective parsing is “harder” than projective parsing that takes O(n^3). To optimize weight vector w, authors employ factored version of Margin Infused Relaxed Algorithm (MIRA), which iteratively updates w for each sample (x_t,y_t) subject to min||w_t+1 – w_t|| s.t. s(l, j) – s(k, k) >= 1 for (l,j) in y_t and (k,j) not in y_t.
To evaluate the proposed approach, the paper shows dependency parsing results for Czech, using the Prague Dependency Treebank that has both projective parsing and non-projective parsing tasks. The method achieves the state of the art performance for the dataset and promises an outstanding generality for numerous other languages and low computation cost. However, the paper vaguely defines the high-dimensional feature vectors f(i,j) that must play a crucial role in the optimization of the weight vector w. Moreover, lower accuracy and completeness of the proposed method for English requires a closer look at the optimization procedure. Finally, as the authors mentioned in the paper, the non-projective parsing does not generalize well to bigger substructures as the search becomes intractable.
more
less