Zhengjie Fan, Jérôme Euzenat, François Scharffe, Learning concise pattern for interlinking with extended version space, in: Dominik l zak, Hung Son Nguyen, Marek Reformat, Eugene Santos (eds), Proc. 13th IEEE/WIC/ACM international conference on web intelligence (WI), Warsaw (PL), pp70-77, 2014
Many data sets on the web contain analogous data which represent the same resources in the world, so it is helpful to interlink different data sets for sharing information. However, finding correct links is very challenging because there are many instances to compare. In this paper, an interlinking method is proposed to interlink instances across different data sets. The input is class correspondences, property correspondences and a set of sample links that are assessed by users as either "positive" or "negative". We apply a machine learning method, Version Space, in order to construct a classifier, which is called interlinking pattern, that can justify correct links and incorrect links for both data sets. We improve the learning method so that it resolves the no-conjunctive-pattern problem. We call it Extended Version Space. Experiments confirm that our method with only 1% of sample links already reaches a high F-measure (around 0.96-0.99). The F-measure quickly converges, being improved by nearly 10% than other comparable approaches.