Hydraulic properties related to river flow affect salmon spawning habitat. Accurate prediction of salmon spawning habitat and understanding the influential properties on the spawning behavior are of great interest for hydroelectric dam management. Previous research predicted salmon spawning habitat through deriving river specific spawning suitability indices and employing a function estimate method like logistic regression on several static river flow related properties and had some success. The objective of this study was two-fold. First dynamic river flow properties associated with upstream dam operation were successfully derived from a huge set of time series of both water velocity and water depth for about one fifth of a million habitat cells through principal component analysis (PCA) using nonlinear iterative partial least squares (NIPLAS). The inclusion of dynamic variables in the models greatly improved the model prediction. Secondly, nine machine learning methods were applied to the data and it was found that decision tree and rule induction methods were generally outperformed usually used logistic regression. Specifically random forest, an advanced decision tree algorithm, provided unanimous better results. Over-prediction problem in previous studies were greatly alleviated.
Revised: February 10, 2009 |
Published: July 1, 2008
Citation
Xie Y., C.J. Murray, T.P. Hanrahan, and D.R. Geist. 2008.Data Mining on Large Data Set for Predicting Salmon Spawning Habitat. In Proceedings of The 2008 International Conference on Data Mining (DMIN'08), edited by R. Stahlbock, S. F. Crone and S. Lessmann, 1, 233-239. Athens, Nevada:CSREA Press.PNNL-SA-59284.