Evaluating the Computational Requirements of using SVM software to train Data-Intensive Problems

January 1, 2010

Book Chapter

Evaluating the Computational Requirements of using SVM software to train Data-Intensive Problems

Abstract

Support vector machine (SVM) technology is a growing class of computational algorithms for solving classification problems using supervised learning. There are many freely available SVM codes with implementations of varying granularity in the core optimization task. The performance of SVM implementations is related to four key elements; (1) the size and dimension of the data set on which they are operating, (2) the granularity of their core SVM optimization implementation, (3) the kernel transformation applied to the data, and the (4) underlying hardware on which the implementation is running. To assess the performance of different SVM implementations, several freely available codes representing the spectrum of optimization granularity were built on a variety of hardware—two linux clusters, and a shared memory machine. Binary classifiers were trained on datasets of varying size containing anywhere from a few hundred to over 50,000 vectors. Performance of the methods was measured in terms of wall-clock time for training, statistical quality of the resulting classifier, hardware performance (memory footprint, memory bandwidth, floating point unit usage, I/O bandwidth, and instruction stalls), robustness and portability.

Revised: July 22, 2010 | Published: January 1, 2010

Citation

Oehmen C.S., and B.M. Webb-Robertson. 2010. Evaluating the Computational Requirements of using SVM software to train Data-Intensive Problems. In Machine Learning Research Progress, Nova Science Publishers. Hauppauge, New York:Nova Science. PNNL-SA-59493.