On the Detection and Selection of Informative Subsequences from Large Historical Data Records for Linear System Identification
von David Leonardo Arengas RojasPerforming experiments for system identification of continuously operated plants might be restricted as it can impact negatively normal production. In such cases, using historical logged data can become an attractive alternative for system identification. However, operating points are rarely changed and parameter estimation methods can suffer numerical problems. 
Three main drawbacks of current approaches in this research area can be discussed. Firstly, detection tests are not adapted for dynamical systems. Secondly, methods to define upper interval bounds are not robust to colored noise that is more likely to be found in real applications. Thirdly, model estimation with the retrieved data is not supported and the performance of the method cannot be assessed. In the method proposed in this work, called data selection for system identification (DS4SID), previous drawbacks are addressed and robust tests are designed and implemented. The performance of DS4SID is evaluated in a simulated and laboratory multivariate processes. A process unit of the lab-scale factory “μPlant” is used as industryoriented case study. Models estimated with selected data are shown to have similar performance than estimates with the entire data set.
Three main drawbacks of current approaches in this research area can be discussed. Firstly, detection tests are not adapted for dynamical systems. Secondly, methods to define upper interval bounds are not robust to colored noise that is more likely to be found in real applications. Thirdly, model estimation with the retrieved data is not supported and the performance of the method cannot be assessed. In the method proposed in this work, called data selection for system identification (DS4SID), previous drawbacks are addressed and robust tests are designed and implemented. The performance of DS4SID is evaluated in a simulated and laboratory multivariate processes. A process unit of the lab-scale factory “μPlant” is used as industryoriented case study. Models estimated with selected data are shown to have similar performance than estimates with the entire data set.






