An Author Verification Approach Based on Differential Features
Alberto Bartoli, Alex Dagri, Andrea De Lorenzo, , Fabiano Tarlao
6th Uncovering Plagiarism, Authorship and Social Softare Misuse at Conference and Labs of the Evaluation Forum (PAN-CLEF), held in Toulouse (France)
Winner of challenge for Spanish language
Links and material:
We describe the approach that we submitted to the 2015 PAN competition for the author identification task. The task consists in determining if an unknown document was authored by the same author of a set of documents with the same author. We propose a machine learning approach based on a number of different features that characterize documents from widely different points of view. We construct non-overlapping groups of homogeneous features, use a random forest regressor for each features group, and combine the output of all regressors by their arithmetic mean. We train a different regressor for each language.Our approach achieved the first position in the final rank for the Spanish language.