Learning from positive and unlabeled examples by enforcing statistical significance

Geurts, Pierre

Download

Paper published in a journal (Scientific congresses and symposiums)

Learning from positive and unlabeled examples by enforcing statistical significance

Geurts, Pierre

2011 • In Proceedings of Machine Learning Research, 15

Peer Reviewed verified by ORBi

Permalink
https://hdl.handle.net/2268/87877

Files (1)Send to Details Statistics Bibliography Similar publications

Files

Full Text

geurts11a-1.pdf

Publisher postprint (249.85 kB)

Download

All documents in ORBi are protected by a user license.

Send to

RIS BibTex APA Chicago Permalink X Linkedin

Details

Keywords :

machine learning; kernel methods; semi-supervised learning; bioinformatics; gene regulatory networks

Abstract :

[en] Given a finite but large set of objects de- scribed by a vector of features, only a small subset of which have been labeled as ‘positive’ with respect to a class of interest, we consider the problem of characterizing the positive class. We formalize this as the problem of learning a feature based score function that minimizes the p-value of a non parametric statistical hypothesis test. For lin- ear score functions over the original feature space or over one of its kernelized versions, we provide a solution of this problem computed by a one-class SVM applied on a surrogate dataset obtained by sampling subsets of the overall set of objects and representing them by their average feature-vector shifted by the average feature-vector of the original sample of positive examples. We carry out experiments with this method on the prediction of targets of transcription factors in two different organisms, E. Coli and S. Cererevisiae. Our method extends enrichment analysis commonly carried out in Bioinformatics and its results outperform common solutions to this problem.

Research center :

Systems and modeling

Disciplines :

Computer science

Author, co-author :

Geurts, Pierre ; Université de Liège - ULiège > Dép. d'électric., électron. et informat. (Inst.Montefiore) > Systèmes et modélisation

Language :

English

Title :

Learning from positive and unlabeled examples by enforcing statistical significance

Publication date :

April 2011

Event name :

Fourteenth International Conference on Artificial Intelligence and Statistics

Event organizer :

Geoffrey Gordon, David Dunson, and Miroslav Dudík

Event place :

Miami, United States

Event date :

April 11-13

Audience :

International

Journal title :

Proceedings of Machine Learning Research

eISSN :

2640-3498

Publisher :

Microtome Publishing, Brookline, United States - Massachusetts

Special issue title :

Proceedings of the Fourteenth International Conference on Artificial Intelligence and Statistics

Volume :

Peer reviewed :

Peer Reviewed verified by ORBi

Available on ORBi :

since 28 March 2011

Statistics

Number of views

199 (31 by ULiège)

Number of downloads

160 (23 by ULiège)

More statistics