Assessment of Surface Water Quality by Using Multivariate Statistical Analysis Techniques : A Case Study of Nhue River , Vietnam

The aim of this study is to assess the spatial variability and to determine the main contamination sources in surface water quality of the Nhue River, Viet Nam by using multivariate statistical analysis techniques, including principal component analysis (PCA) and cluster analysis (CA). Eight water quality parameters were measured at 21 sites along the Nhue River and its tributaries during irrigated periods from 2016 to 2019. The spatial variability of water quality in the Nhue River and its tributaries was determined separately from cluster analysis. The result determined two tributaries, including Yen Xa Canal (NT9 monitoring site) and To Lich River (NT3 monitoring site) leading to severe pollution at To Bridge (N4 monitoring site) region in the Nhue River. The PCA determined a reduced number of two principal components that explained 47.75% of the total variation in the data. The first PC indicated that water temperature (WT) and pH are the dominant polluting factors which are attributed to craft villages, domestic sewage and industrial wastewater. Following is nitrate nitrogen NO3 − in the second PC which is related to fertilizer application in the farms nearby. The results indicated that CA multivariate statistical analysis technique is useful for the assessment of the spatial water quality variability in a river which has a number of tributaries.


I. INTRODUCTION
Rivers play an important role as a primary input for a huge array of human consumption and economic activities, including domestic and industrial water use, irrigation, agriculture, recreation and transport. Surface water quality is both influenced by natural processes and anthropogenic activities [1] through the discharge of industrial and domestic wastewater, craft villages, urban runoff as well as agricultural drainage to the rivers. At present, many river basins in Vietnam such as Cau, Nhue, Day and Dong Nai basins belong to the most polluted river group threatening health of the surrounding residents and native eco lives. Therefore, the government has been monitoring water quality of these rivers for years in order to manage water resources efficiently. However, long-term monitoring datasets are large, with Manuscript  complex matrixes comprising numerous physic-chemical parameters. Consequently, it is often difficult for planners to extract meaningful information from these datasets, identify significant parameters, and apportion pollution sources [2]. The multivariate statistical techniques, such as cluster analysis (CA), principal component analysis (PCA), factor analysis (FA) and discriminant analysis (DA) have been widely accepted as a valuable tool in: (1) analyzing and interpreting the complex data matrices to better understand the water quality and ecological status of the studied systems; (2) verifying temporal and spatial variations caused by natural and anthropogenic factors linked to seasonality; and (3) identifying the possible factors/sources that influence water systems [3]- [5]. Due to the ability of multivariate statistical techniques to treat large volume of spatial and temporal data from a variety of monitoring sites, they have become popular for a better understanding of water quality and ecological status in recent year [6].
In the present study, a large data matrix of 8 water quality parameters were selected and collected from 2016 to 2019 at 21 monitoring sites in the Nhue River, Viet Nam. They were subjected to the PCA and CA multivariate techniques to extract information of water quality data. Firstly, similarities and dissimilarities among 10 sites along the Nhue River, and among 11 sites on its tributaries were classified by mean of CA. Then, the complex water quality datasets were analyzed to extract latent water quality factors using PCA. Finally, the possible pollution sources on water quality were identified.

A. Study Area
The Nhue River takes its source from the Red River at the Lien Mac 1 Gate to the north west of Ha Noi (Fig. 1). The upstream part of the Nhue River with length of about 20 km mainly lies in the urban areas of Ha Noi city. The rest of the river is located in areas where agriculture is the predominant land use [7]. Like other rivers running in the Red River Delta of Viet Nam, the Nhue River flows south and southeast throughout its course without abrupt redirection or disruption. In order to prevent flood and to ensure irrigation during crop season, the discharge in the river is entirely regulated by dams/sluice gates such as Lien Mac 1, Nhat Tuu, Luong Co which were constructed from 1936 to 1941, during French colonization. The river basin is bordered by the Red River to the north and the east, the Day River to the west, the Chau Giang River to the south. To the west, the Day River almost runs parallel with the Nhue River at the distance of 10 km. To the south, the Nhue River joins the Day River at Phu Ly city, 74 km from its source. To the east, the Nhue basin is actually limited by the national highway number 1 and this highway also runs parallel with the river. The Nhue River has also several significant inflows such as Dam, Cau Nga, Xuan La, Phu Do, Trung Van, La Khe, Yen Xa, To Lich. Of which the To Lich River, a tributary of the Nhue River, is reportedly responsible for 77.5 km 2 portion of Ha Noi area. The Duy Tien and Van Dinh Canals were built and canalized for irrigation and drainage of the surrounding paddy fields. The Nhue River is considered as an inter-province hydro-agriculture system with different industrial and agricultural zones which are responsible for activate irrigation of 81,710 ha. All along its course, the river presents a netted canalization system in charge of irrigation for paddy fields. It is observed that irrigation canals spread all over the river basin and hook up to the Nhue every 2 or 3 km. At the moment, the Nhue River is seriously polluted by large amount of wastewater from domestic waste, hospital waste, industrial waste, agricultural waste, and craft village waste discharging directly into the river without being treated. To deal with this situation, the Directorate of Water Resources (DWR) under Ministry of Agriculture and Rural Development (MARD) has been monitoring water quality at 21 sites (which covers all surface water types, including the main river and tributaries) along the river to make decision and action plan for irrigation.

B. Monitoring Sites and Water Quality Parameters
In this study, the datasets of 21 water quality monitoring sites of the Nhue River during irrigated periods from 2016 to 2019. In which there are 10 monitoring sites along the main river and 11 sites on its tributaries. The water quality parameters, including water temperature (WT), pH, turbidity (Tur), dissolved oxygen (DO), total dissolved solid (TDS), ammonia nitrogen (NH 4 + ) , nitrate nitrogen ( NO 3 − ), and conductivity (EC) were selected and analyzed according to the American Public Health Association standard (APHA, 1998). All the water quality parameters are expressed in mg/L, except temperature ( o C), pH, and turbidity (NTU). The statistical summary of the water quality parameters monitored at 21 sites was shown in Table II.

C. Data and Multivariate Statistical Methods
Descriptive statistics (mean) and multivariate statistical techniques, including cluster analysis (CA) and principal component analysis (PCA) were performed on the datasets using SPSS 25.0.
CA: CA is a notable method that assembles objects into aggregations based on their interdependent variables or characteristics [8]. Hierarchical agglomerative clustering is the most common approach, which starts with each case in a International Journal of Environmental Science and Development, Vol. 11, No. 10, October 2020 separate cluster and joins the clusters together step by step until only one cluster remains and is typically illustrated by a dendrogram (tree diagram) [9]. The dendrogram provides a visual summary of the clustering process, presenting a picture of the groups and their proximity, with a dramatic reduction in dimensionality of original data. The Euclidean distance usually gives the similarity between two samples and a distance can be represented by the difference between analytical values from samples.
In this study, cluster analysis was applied to detect spatial similarity for grouping of sites under the monitoring network. This work was performed on the standardized data by means of Ward's method, with Euclidean distance as a measure of similarity. PCA: PCA is a helpful recognition technique applied for extracting information concisely by transforming many original, interrelated variables into fewer, uncorrelated variables named principal components (PCs). PCA focuses on the information from the most meaningful parameters, which minimizes the original data with the least loss of information. Reference [10] classified the factor loading as 'strong', 'moderate', and 'weak' corresponding to absolute loading values of >0.75, 0.75-0.50 and 0.50-0.30, respectively.
Before performing CA and PCA, Kaiser-Meyer-Olkin (KMO) and Bartlett's test were performed to examine the suitability of the data. High KMO value (close to 1) generally indicates that PCA may be useful.

A. Cluster Analysis
In this study, the spatial variability of water quality in the Nhue River and its tributaries was determined separately from cluster analysis. Two mean log values for each variable from the datasets were calculated for every monitoring site with the dimension of 10 (sites) × 8 (parameters), and 11 (sites) × 8 (parameters) for the main river and tributaries respectively.  For main river cluster analysis, the resulted dendogram (Fig. 2) grouped all the ten sampling sites in the Nhue River into three statistically significant clusters. The Cluster 1 includes Sites N2, N3, N5 and N7 located in the middle region of the Nhue River, the Cluster 2 comprises Sites N1, N6, N8, N9 and N10 located in the upstream and downstream parts of the main river, and Cluster 3 consists of only Site N4. Site N4 at Cau To Bridge placed in Cluster 3 corresponds to relatively highly polluted (HP) region. At this section, the Nhue River receives wastewater from Yen Xa Canal and To Lich River through Thanh Liet dam. The Thanh Liet dam is located at the downstream of To Lich River where receives all domestic wastewater from Ha Noi city.
Cluster analysis applied to determine separately the spatial variability of water quality among monitoring sites in the main river, and among ones in tributaries brings an important result. Wastewater from the tributaries, including Yen Xa Canal (NT9) and To Lich River (NT3) flowing into the Nhue River at To Bridge (N4) leads to severe pollution at this river region. Thus, Managers and Leaders should prioritize action plan controlling pollution in these tributaries to minimize pollution for the middle section of the Nhue River.

B. Principal Component Analysis
PCA based on factor analysis was used to explore the possible sources of pollution and to recognize considerable factors accounting for variation in water quality. Kaiser-Meyer-Olkin (KMO) and Bartlett's tests were performed on the parameter correlation matrix to examine the validity of the PCA. The results of the KMO and Bartlett's tests were 0.68 and 667.499, respectively indicating that PCA was useful for data reduction and that significant relationships were present among the variables. PCA was applied to a standardized dataset to identify the latent factors. The aim of this analysis was primarily to create an entirely new, smaller set of factors compared to the original dataset. The PCA revealed two PCs with eigenvalues>1 that explained about 45.75% of the total data variance in the water quality datasets where a correlation greater than 0.75 is considered "strong"; 0.75-0.50, "moderate"; and 0.50-0.30, as "weak" significant factor loading. Bold values indicate strong loadings (>0.70).
The first PC was responsible for the 31.22 % variance and was positively correlated (loading >0.70) with WT (r=0.971) and pH (0.935). The strong positive loading of WT observed was consistent with the fact that surface water in the Nhue River is influenced by anthropogenic activities such as domestic sewage and industrial wastewater from Ha Noi city. Moreover, the source of the Nhue River is regulated by Lien Mac Gate. Discharge of municipal and industrial wastewater adds heavy metals in the riverine water that also results in pH fluctuation [11]. The upstream and middle sections of the Nhue River comprised of craft villages, mechanical instruments, paint, chemical and electronic industries.
The PC2 in this study accounting for 14.53 % of the total variance was correlated with NO 3 − (r=0.968). The presence of NO 3 − originate from fertilizer application in the farms; agricultural land use strongly influences nitrate nitrogen. This is evident as farmers practice cereal plantation around the area.
Generally, it is evident from the analysis that surface water in the Nhue River is severely polluted by agricultural sources, domestic sewage, craft villages, and industries.

IV. CONCLUSION
In this case study, different multivariate statistical techniques were used to evaluate spatial variation and to distinguish sources of pollution in surface water quality of the Nhue River. Hierarchical cluster analysis applied to determine separately the spatial variability of water quality among 10 monitoring sites in the main river, and among 11 ones in tributaries brings useful information. The result indicated that two tributaries, including Yen Xa Canal (NT9) and To Lich River (NT3) lead to severe pollution at To Bridge (N4) in the Nhue River. PCA revealed that parameters related to craft village, industrial and agricultural activities, and domestic sewage discharge were the most important parameters contributing to water quality variation in the Nhue River. This study brings noticeable results which are useful to be applied for plan river management. Nguyen Huu Thanh is a researcher at Institute of Civil Engineering (ICE), Thuyloi University (TLU) in Hanoi, Vietnam. He has experiences in environmental management system, natural resources management issues, and managing natural hazards. Currently, he is involving in some related projects on increasing water self-purification capacity and HEC-RAS, MIKE model.