Abstract

Time series forecasting has made significant contributions in many fields such as energy forecasting, financial market forecasting, retail forecasting, and plays an important role in various domains. In the past, statistical models were used for time series being forecasting. However, as models bus Increasingly complex, time series also became more complex, and statistical models were no longer able to cope with the situation. Therefore, the emergence of deep learning models has enabled the field to take a big step forward. Traditional point forecasting methods were unable to provide users or companies with more information, and therefore could not help various domains make decisions. Today, with the maturity of probabilistic time series forecasting, it can provide probability distributions of future events, giving more information to users or companies, enabling them to make better decisions. In recent years, with the advent of the big data era, using the local model approach to model each time series has become impossible. Therefore, there has been a shift towards using global models to construct a model for all time series. In the global model approach, it was found that using unrelated time series would result in a decrease in model performance. Therefore, through previous literature, the correlation of time series was sought and it was found that the correlation of time series did not have a clear definition. In the definition of time series correlation, some use the same type of data as the basis for correlation, while others use different metrics and clustering algorithms to define correlation. In this thesis, we will compare different definitions of correlation to determine which definition of correlation can bring the highest degree of correlation.

This research uses Taiwan's listed stock data, and only uses the opening price and trading volume information. Subsequently, we use predictability indicators to select stocks with predictive value as correlation analysis indicators. In the definition of time series correlation-based, we used shape-based, model-based, and industry-based with k-means algorithm, as well as selecting the most advanced STRIPE model in the time series field as our experimental model. Finally, based on the model's performance, we determine which metric can bring the highest degree of correlation. In conclusion, feature-based analysis can bring the highest degree of correlation, and it was found that using feature extraction methods can provide three advantages: 1. The data after feature extraction can effectively reduce redundant features, reducing overfitting, 2. situations can minimize data variability within the same category and enhance variability between different categories. 3. Feature extraction reduces the complexity of data, making it easier to interpret the model.

Introduction