Borehole and Near-Surface Greenhouse Gas Emission Monitoring System with Self-calibrating Algorithm and Zone-Based Data Analysis via Clustering Technique

Greenhouse gas emissions are found in the atmosphere that absorbs energy within a given thermal range. These gases are also often found in solid waste management facilities, which are caused by massive piles of garbage. This study was conducted to develop a system that can monitor greenhouse gas emissions in a Material Recovery Facility. The study was also directed to further help in complying with the environmental laws in the Philippines. The developed monitoring system is a low-cost, portable device capable of detecting gases concentrations which are commonly found in solid waste management facilities. The system is integrated with a web application that can be used by solid waste engineers, and even citizens in monitoring greenhouse gas levels and potential actions. Self-Calibration was performed to develop an algorithm that can be integrated using Linear Regression. Isolation Testing and Third Party Testing were performed to ensure the device’s readings are accurate and reliable prior to Usability Testing. Tests results indicate that emission readings from the prototype are within the acceptable and expected range base on standards. These results allowed collection of data that were used in employing Clustering Technique to understand the emission patterns, and provide descriptive analysis of Green House Gas emissions in two zones of a Material Recovery Facility.


I. INTRODUCTION
Greenhouse gases are also often found in solid waste management facilities which are caused by massive piles of garbage. According to study [1], the Philippines emits greenhouse gases in an estimated average of 50,000 metric tons of solid waste per day of which only 35,000 metric tons are collected.
The most acceptable option to dispose the solid waste is to dump it in a landfill. As the waste decomposes, it produces greenhouse gases such as methane, nitrous oxide and carbon dioxide [2].
Republic Act 9003 (RA) also known as The Ecological Waste Management Act of 2000 was created to solve impending garbage problems in the Philippines by setting a standard, protocols, and rules [3]. This law aims to protect or prevent accidents and ensure the protection of public health Manuscript  and the environment. The law requires both public and private solid waste facility operators to create or adopt an ecological waste management program.
There were 991 disposal facilities in the Philippines in 2013. But of the total figure, only 45 landfills were used as sanitary landfills or only 4% of the disposal facilities are being regulated while the other 96% remain uncontrolled. Uncontrolled solid waste management facilities mean that the wastes that are being dumped in the facilities are not being monitored therefore emitting different kinds of gases that endanger the environment [4].
A closed landfill converted into a Material Recovery Facility (MRF) was identified by this study. According to EPA, facilities like this still produce greenhouse gases through the vents and small cracks in the soil. This can be hazardous when not monitored as it may cause fires and explosions because of the flammability of one of the greenhouse gases called methane [5]. Currently the MRF is being used to recover methane from waste and plastic to fuel conversion. It has boreholes to leak the gas from the ground.
The general objective of this study is to elevate the process of monitoring of greenhouse gases in a solid waste management Facility (in form of MRF) to help provide preventive measures from possible explosion and health risks among the entities within the perimeters of the facility's affected area, while complying to the Solid Waste Management Act. Specifically, the study aimed to: 1) Develop a greenhouse gas sensory system using IoT technology that will help in borehole well and near-surface detection of greenhouse gas emissions in solid waste management facilities, 2) Provide an IoT-based device that will collect and monitor real-time of greenhouse gas emissions in multiple zones of the facility that can be further used for analysis and description of gas emission patterns, and 3) Develop a web-based application that can intelligently present real-time data of the greenhouse gas emissions produced by solid waste facility, and precautionary to the engineers, officers and citizens.

II. LITERATURE REVIEW
Experts have created different ways to measure the gas quality specifically for landfill areas and solid waste management facilities. A study about soil gas monitoring measures the concentration of chemicals in the vapor space soils [6]. Another study used active sampling where a volume of gas is passed with the use of a filter or chemical solution over an allotted time and then either the solution or the filter  Vol. 11, No. 4, April 2020 will be analyzed in the laboratory to identify events [7]. However, continuous and remote monitoring that uses device in analyzing concentration in a short period of time in specific area is a suggested practice in solid waste facilities. Such measurement requires frequent repetition to ensure accurate reading, which suggest sampling practice done at least twice daily in order to target dynamics of gas generation and migration. A study about monitoring industrial gas emissions used the same principle of continuous and remote monitoring by having gas readings every 5 minutes and averaged every hour [8]. The said study measures gas in an industrial pipe that are also common in greenhouse gas. However, for this study, near-surface and borehole wells will be considered since most gases are emanating from the ground, and the orifice from bore pipes in the soil of the facility.
Near-surface monitoring of gas is done near the surface of the source like soil. When monitoring the gas through near-surface of the ground, the best method to used is the "Flux-box monitoring" technique. Flux-box is an effective technique for measuring normal ground emission [9]. According to an Environment Agency, flux box is an enclosed chamber in which changes of concentration of methane gas above a small area is measured over time. On the other hand, borehole wells are placed in the soil to release the gas underground and to minimize the concentration of gas underground. When monitoring the gas using borehole wells, depth will depend on the headspace depth within the borehole well. A study on gas management system emphasizes placement device at borehole wells in order to balance and maintained gas emissions. This can quickly recognize and address issues before there are events of high gas migration to the perimeter. A reliable monitoring of gas migration to perimeter borehole wells relaying of data in near real time to a base station that have sensors with a range capable of adequately monitoring of gas events at all times [10].
Although GHGs are impossible to get rid of, there are values of GHG that are acceptable and are needed to maintain. Methane's Lower Explosive Limit (LEL) is 5% which is equivalent to 50,000 ppm. The National Institute for Occupational Safety and Health's (NIOSH) has recommended the maximum methane concentration level for workers around1,000 ppm [11]. Carbon Monoxide (CO) is a type of greenhouse gas that can be found in small portions. Based on the article, more than 1000 ppm level could be an indicator of landfill fire [12]. The Emergency and Continuous Exposure Guidance Levels for Selected Submarine Contaminants warned that when hydrogen, an asphyxiant gas, reaches the value of 4,100 ppm or higher may already be considered at Lower Explosive Limit (LEL) [13].
A Clustering Technique was used by the study after the prototype was developed and deployed. This technique was employed to describe actual gas emission data in the facility. According to a study that focuses in gas emission, cluster analysis groups observations of different type in such a way that the resulting clusters are as homogeneous as possible within each group and as different as possible from each other. Among the clustering techniques, the most popular is the k-means method. The method is derived from the representation of each cluster using the average or weighted average. In this method, the number of clusters are presumes a priori as well as the number of iterations. Algorithm for creating clusters is strongly dependent on the value of k. The number of clusters should be large enough that clusters will reflect the specific characteristics of the data set. At the same time, however, the value of k must be significantly less than the number of objects in the data set, because that is the meaning of the grouping [14].
Algorithm of k-means can be described by a study [15] as follows: A data set containing n objects is given and the number of clusters k is assumed. 1) Arbitrarily choose k objects from D as the initial cluster centres; 2) Repeat steps (a) and (b) until there are changes in the allocation of objects to clusters: a) (re)assign each object pi ∈ D to the cluster Ci, to which the object is most similar, based on the mean value of the objects pi in the cluster Ci; b) update the cluster means, i.e., calculate the mean value of the objects for each cluster. K-means method has several advantages also mentioned in the study. It is relatively simple and the algorithm procedure is relatively efficient compared with hierarchical methods. The reasons for the algorithm's popularity are its ease of interpretation, simplicity of implementation, speed of convergence and adaptability to sparse data.

III. RESULTS AND DISCUSSIONS
Following a methodology that has been used in a Modified Prototyping for IoT, based on the Nurun Process [16], the study performed several tests including Self-Calibration Testing, Isolation Testing, and Third Party Testing with Unpaired T-test to ensure the device's readings are accurate and reliable within the Model Stage. In the Realize Stage of the methodology, a Usability Testing was conducted. Clustering Technique via k-means was employed to further analyze the actual gas emissions in two different zones in the facility. The general set-up of the study can be viewed in the succeeding figure.   1 shows the communication among the components of the monitoring system. There were two sets of gas sensors used which are MQ2 and MQ135 respectively. These gas sensors send raw data to the microcontroller. For this study, an Arduino microcontroller was used. The Arduino Mega processes raw data that is sent to the WeMos D1 WIFI Module, which in turn transmits the data to the cloud via a Wi-Fi Connection. The cloud server processes and stores the data and handles the requests of the users thru the websites.

A. Self-calibration Testing via Linear Regression
The Self-Calibration Test mathematically identify the output based on the data sheet of MQ2 and MQ135 Gas Sensors used in the prototype. The study exposed each sensor accordingly to different gases to acquire data for calibration. Each sensor has a data sheet that the manufacturer created to be used as basis in testing if the sensors met the standards. The study based their procedures mentioned in the literature. In computing for the ppm value, the slope formula was used alongside given formula by the manufacturers of the sensor itself, in finding the theoretical value to calculate the percent error of the data produced by the sensors.  In Fig. 2 and Fig. 3, the x-axis represents the Parts per Million (ppm) while the y-axis represents the Rs/Ro ratio. The Rs/Ro ratio indicates the sensor's resistance. Mathematically, a specific ppm value can be computed given the graph from the datasheet by identifying the Rs/Ro ratio. According to real-time emission monitoring study that uses MQ sensors, the first step in solving the graph is to get the 3 values which are (1) the log value of x1, (2) the log value of y1, and (3) the slope. Two (2) points are needed in order to solve the slope. MQ2 Sensor was used to measure the H 2 and CH 4 gases in this study. Looking at the graph for H 2 gas, the log value of the first point is (2.303895551, 0.317900444), and the log value of the second point is (4, -0.484567295). For CH 4 gas, the log value of the first point is (2.303895551, 0.484567453), and the log value of the second point is (4, -0.157407608). MQ135 sensor is used to measure the CO 2 and CO gases. For CO 2 gas, the log value of the first point (1, 0.3617014) and the log value of the second point (2.29977089, -0.098404383). For CO gas, the log value of the first point (1, 0.452126985) and the log value of the second point (2.29977089, 0.127658657).
The slope will then be computed using the slope formula shown in Equation (1). The result of the slope for H2 is m = (-0.484567295 -0.317900444) / (4 -2.303895551), which yields to -0.47312401. The result for CH 4 is m = (-0.157407608 -0.484567453) / (4 -2.303895551) which yields answer of (-0.378499721. The result for CO 2 is m = (-0.098404383 -0.3617014) / (2.29977089 -1),yield -0.353989912. The result for CO is m = (0.127658657 -0.452126985) / (2.29977089 -1), which yields to -0.353989912. The computed values were used in creating an algorithm that computes for the actual ppm value. In order to compute for the ppm value per sensor, Equation (2) was used with the computed values from the datasheet. The value1 is the x-intercept, value 2 is the y-intercept, and value 3 is the slope.
The formula was converted into an algorithm for the microcontroller as shown in Fig. 4. This was to calibrate the sensors programmatically in an embedded system based on the computed values stated above. Results were compared with another device that measures Greenhouse gases to ensure that the self-calibration algorithm has properly calibrated the sensors. The Self-Calibration Testing was conducted in a clean-air environment on which the sensors has been exposed. GasAlert MicroClip XL was used as a third party device in measuring the concentration of the gases. The prototype was then exposed to the same environment with the third-party device. To test the sensors' concentration accuracy, data were tabulated alongside with the third-party results as shown in Table I. For the purpose of comparison, difference between two (2) results were computed. Results show small differences among tests performed which shows good calibration of the sensors in the prototype.

B. Isolation Testing
An Isolation Testing seen in Fig. 5 was conducted in the study to validate whether the data that the sensors produce were similar with expected range set by the International Organization for Standardization, in conformance of the ISO 9705. The developers used the simulation of Sesseng and Reitan as their basis for this testing [17].
The developers used a box with a dimension of 18.5" × 12" × 12" (LxWxH). This was used as a setup environment.
The gas sensors were setup inside the box, and used a towel that measures 6" × 6" × 0.5" as a fuel load. The fuel load was burned for a minute before extinguishing inside the enclosed box that contains the gas sensors (see Table II for the actual conducting of Isolation Testing).
The developers conducted the Isolation Testing in a clean-air environment where the sensors underwent the Self-Calibration Testing. For this testing, CO and CO2 were used as variables to test the sensors concentration on a certain dose of gas.  Table II is within the acceptable and expected range based on the standards set by ISO. The ISO Standards state the expected concentration of the gases when placed in an isolated area with the given dimensions. Hence, it is expected for the sensors used to function accordingly based on the standards when deployed into the actual environment.

C. Third-Party Testing
Third-party test was conducted to assure that the readings of the prototype is the same or near the readings of a third-party device used in the industry. The developers went to a Sewage Plant to conduct the testing. The readings of the prototype were compared to the readings of the third-party device. Based on the comparison of readings between the prototype and the third-party device, the difference percentage is relatively low. This indicates that the prototype was able to read gas emissions the same way with the devices used in the industry as seen in Tables III to V. Actual Third Party Test and the device used for comparison can be seen in Fig. 6.    Based on the gathered data from the third-party testing, the data from the gas sensors of the device were near (based from % accuracy) the readings of the gas monitoring used by a third party device. To further analyze the readings of the developed device and third-party device, an Unpaired T-Test was use to validate if the results have significant difference or not. Based on the collected values from the two devices, no significant differences were found with respect to readings. Results of Unpaired T-Test can be seen in Table VI.
International Journal of Environmental Science and Development, Vol. 11, No. 4, April 2020

D. Realtime Collecting and Monitoring of Greenhouse Gas Emissions
The device was setup in an identified Material Recovery Facility (MRF) in the Philippines after Self-Calibration and Isolation Testing. Fig. 7 shows deployment set-up of the system in the facility. The gas sensors were placed on the borehole wells and flux-box for the data collection, which was sent to the common repository in the cloud server in order to view the collected GHG emissions data online.  The prototype was deployed in a zone consists of one flux box, one borehole well and the device itself. Physical set-up in a zone can be viewed in Fig. 8. The device components shown in Fig. 9 are the WeMos D1, Arduino Mega, MQ135 gas sensor, MQ2 gas sensor, 12V to 5V Step Down Power Supply and a 12V-5.0AH Lead-Acid Battery.  After the development of testing, the prototype was deployed to collect actual GHG emissions from an identified Material Recovery Facility. Table VI shows sample data collected with interval readings every 5 minutes. The results show that the device was able to detect CH 4 , CO, CO 2 , and H 2 gases. The values highlighted with green in Table VI are acceptable values and those highlighted with red are values that are not acceptable. Based on the values collected in the locale, the levels of greenhouse gases are mostly acceptable although these values were limited for 4 hours only. Acceptable values of the greenhouse gases were collected, and values was referred to "Acceptable values of GHG in Landfill Areas"

G. Descriptive Analysis of Gas Emissions Collected via Clustering Technique
The prototype was able to collect a total of 20 instances of gas emissions from two different zones in approximately 100 minutes, following the 5 minute intervals of recording. Attributes (including all gas sensor values in both boxes and pipes) and they corresponding zone and class, used for Clustering can be viewed as follows: @attribute zone {zone1,zone2} @attribute CH4Box numeric @attribute CH4Pipe numeric @attribute CO2Box numeric @attribute CO2Pipe numeric @attribute H2Box numeric @attribute H2Pipe numeric @attribute COBox numeric @attribute COPipe numeric @attribute class { Normal, High Emission } A "Normal" class values were set to instances with "NO" High or with Acceptable gas emission values, while "High" class values were set to instances with one or many high or with unacceptable gas emission values regardless of type of gas. Actual gas emission data collected is cleansed, transformed and converted into an acceptable arff files, which is then fed to a WEKA Mining Tool for Clustering Technique. Default settings, including raining sets, in the WEKA Mining Tool was used since the primary goal is just to identify intersting clusters that can describe the gas emissions in the facility.
The number of k or Number of iterations used is three(3). Base on the results, Cluster sum of squared error is 7.998429642683553. Fig. 10 shows the actual screenshot of the result of the clustering perfomed in the WEKA Mining tool. Results in Fig. 10 shows that both zone of the identified Material Facility Recovery exhibit both HIGH and NORMAL emission of green house gas values. Zone 2 on the other hand shows normal or acceptable values of gas emissions within the period of testing. The results of clustering suggest that there is a need to validate and monitor the green house gas emission near zone 2, and act upon immediately for proper reinformcement.
Cluster 0 shows unacceptable emissions of CO gasses in both pipes and boxes of zone 1 in the facility. CO 2 also has presence of high values in zone1. While Cluster 2 shows that Zone 2 to have normal values, it is important to note that among all Attributes , CO and CO 2 are have the highest emission levels and should also acted upon.

H. Sample Screenshots of the Web Application (in Mobile
View) Fig. 11 shows a screenshot of the website module for safety measurement page. The module shows the list of safety measures to be done when there is a high greenhouse gas concentration in the area  The Dashboard in Fig. 12 provides statistical data about the monitored greenhouse gas emissions in specific zone. It provides real-time status of gas emissions in a given clusters and the device deployed in it. International Journal of Environmental Science and Development, Vol. 11, No. 4, April 2020 The export page in Fig. 13 shows the graphical representation of the gas data between the two specific date entered. The exported data is a readable excel file that can be used as reference by engineers, facility officers, and even researchers.
The citizens' dashboard page shown in Fig. 14 also has the same function as the admin's dashboard page. The citizens' dashboard page can provide reports that will comply with the Ecological Waste Management Act of 2000. Fig. 15. Summary of usability test among facility officers. Fig. 15 shows the usability rating of the prototype. The questions consist of the websites and device functionality and responsiveness. According to the data gathered among Facility Officers, the functionality of the device successfully met its objectives giving it a mark of "Strongly Agree" while the websites functionality got an average score of 3.33 with a mark of "Agree".

IV. CONCLUSIONS AND RECOMMENDATIONS
The study was able to elevate the process of monitoring of greenhouse gases in a Material Recovery Facility. It was also able to help the implementation of policies set by The Ecological Waste Management Act of 2000 or RA 9003 specifically in Section 42 Letter E that requires solid waste management facility operators to have means of monitoring gas emissions produced by the facility. The study developed an IoT-based greenhouse gas monitoring system through borehole wells and near-surface (flux-box) that can help the Solid Waste or landfill facilities to detect greenhouse gas emissions such as methane (CH 4 ), carbon dioxide (CO 2 ), carbon monoxide (CO), and hydrogen (H 2 ), and collect the actual concentration of the said gases in ppm values using MQ2 and MQ135 gas sensors.
UAT results show "Strongly Agree" scores among Facility Experts, in terms of real-time detection of greenhouse gases through the developed device. This result answered the first specific objective of the study, which is to develop a greenhouse gas sensory system using IoT technology that will help in borehole well and near-surface detection of greenhouse gas emissions in solid waste management facilities.
Based on the comparison of the devices for monitoring gases, the developed prototype shows accurate readings when compared to third party gas sensors. The prototype can perform readings of gases with minimal percent differences, with one test meeting near 100% of accuracy during self-calibration. Linear Regression process strengthened the capability of the sensors to mathematically convert raw values into actual ppm values, needed to provide an automatic calibration algorithm within the device. Isolation Test and Third Party Tests results reveal that the device performs and behave at the expected range and values based on the standards set by ISO. Unpaired T-Test results also shows no significant differences between the developed prototype and third party devices in terms of reading and output. These answer the second specific objective of the study which is to provide an IoT-based device that will collect and monitor real-time of Greenhouse gas emissions in multiple zones of the facility.
Aside from the prototype, the study also developed a web-based application that can present real-time data of the greenhouse gas emissions produced by the facility. The website can provide historical data and generate raw data reports that include the elements and concentration values and date/time occurrences in each of the defined zones of the facility. The website can also display the status of the device if it is transmitting data or not that will help the officers to monitor and take action whenever necessary. The website is able to provide the measured greenhouse gas values (in ppm) with precautionary measures for the Facility officers and the residents who live near the facility. Based on the Usability Test results for the website, the result gained a score of "Agree" which means the study was able to meet the third objective of the study.
Finally, Clustering Technique performed in the collected gas emission data within the facility, allowed the study to establish better analysis and description of the gas emissions in different zones. Clustering results show that both zone of the identified Material Facility Recovery exhibit both HIGH and NORMAL emission of green house gas values. It is important to note that among all Attributes , CO and CO2 have the highest emission levels that manifested during testing. This helped the researchers in providing better information for the facility officers, and reliable inputs for the website.
However, the study would like to recommend that other gases be included in the sensory system like Hydrogen Sulfide and Nitrous Oxide, since these are also common greenhouse gases that can be found in solid waste management facilities. In addition, it is recommended to provide analysis of results based on the influence of Temperature and Humidity attributes on the output resistance ratio of the sensors used. By doing so, the values of resistance and the PPM can be more stable and relatable. Data Collection should be performed in multiple zones given a wide range of inclusive hours for better analysis of the emission of the gases. While the study was able to provide manifestation of high and normal gas values, large collection of data from different set up of more than two zones can elevate the study in predicting behavior of the gas emissions, which can be used as inputs in proper designing of a recovery facility in the future.