Semi-Supervised Web Wrapper Repair via Recursive Tree Matching
Cohen, Joseph Paul
and
0003, Wei Ding
and
Bagherjeiran, Abraham
arXiv e-Print archive - 2015 via Local Bibsonomy
Keywords:
dblp
This idea is so badass! It uses Simple Tree Matching \cite{journals/spe/Yang91} and extends it to work with HTML and then recursively searches an unseen document to align it with previously seen examples. An overview of the problem of *shift* can be seen on the left of the figure below and the alignment is shown on the right.
http://i.imgur.com/b8EzP42.png