PLoS ONE
Public Library of Science
Forecasting the monthly incidence rate of brucellosis in west of Iran using time series and data mining from 2010 to 2019
Volume: 15, Issue: 5
DOI 10.1371/journal.pone.0232910
•
•
•
• Altmetric

### Notes

Abstract

BackgroundThe identification of statistical models for the accurate forecast and timely determination of the outbreak of infectious diseases is very important for the healthcare system. Thus, this study was conducted to assess and compare the performance of four machine-learning methods in modeling and forecasting brucellosis time series data based on climatic parameters.MethodsIn this cohort study, human brucellosis cases and climatic parameters were analyzed on a monthly basis for the Qazvin province–located in northwestern Iran- over a period of 9 years (2010–2018). The data were classified into two subsets of education (80%) and testing (20%). Artificial neural network methods (radial basis function and multilayer perceptron), support vector machine and random forest were fitted to each set. Performance analysis of the models were done using the Root Mean Square Error (RMSE), Mean Absolute Error (MAE), Mean Absolute Root Error (MARE), and R2 criteria.ResultsThe incidence rate of the brucellosis in Qazvin province was 27.43 per 100,000 during 2010–2019. Based on our results, the values of the RMSE (0.22), MAE (0.175), MARE (0.007) criteria were smaller for the multilayer perceptron neural network than their values in the other three models. Moreover, the R2 (0.99) value was bigger in this model. Therefore, the multilayer perceptron neural network exhibited better performance in forecasting the studied data. The average wind speed and mean temperature were the most effective climatic parameters in the incidence of this disease.ConclusionsThe multilayer perceptron neural network can be used as an effective method in detecting the behavioral trend of brucellosis over time. Nevertheless, further studies focusing on the application and comparison of these methods are needed to detect the most appropriate forecast method for this disease.

Bagheri, Tapak, Karami, Hosseinkhani, Najari, Karimi, Cheraghi, and Tlelo-Cuautle: Forecasting the monthly incidence rate of brucellosis in west of Iran using time series and data mining from 2010 to 2019

## Background

Brucellosis (Malta fever) is one of the most common zoonotic diseases and has long been one of the most important health concerns for humans and animals since old times [1]. The significance of this disease is not limited to its physical complications, and is one of the most important challenges of economic development in many countries–including Iran- as economic development in Iran still depends on its agriculture and ranching [13]. Direct contact with the infected livestock or dairy products is one of the most common routes of transmission, although the main transmission route is through the consumption of raw milk and other unpasteurized dairy products [46]. The prevalence of brucellosis is globally widespread, nevertheless, the highest prevalence is seen in the Mediterranean region, Arabian Peninsula, Indian Subcontinent and parts of South and Central Americas [79]. This disease still persists as an undetectable endemic disease in many developing countries [10,11]. According to the World Health Organization (WHO), annually, 500000 cases of infection are reported globally, and for every detected case, four cases go undetected [1215]. Although brucellosis has been eradicated in many industrial countries, it is still a serious health threat in some countries, including Iran [1618]. In Iran, brucellosis is recognized as an endemic disease that is annually reported in the northern and north-western parts of the country at high rates, and will lead to extensive problems [17]. To reduce the rate of this disease, and prevent its associated problems, strategic planning must be done and control and prevention measures must be taken based on applied management by health officials and planners. To this end, the utilization of modelling techniques seems necessary for the timely detection of the epidemic in the future and the early detection of the changing trend of the disease over time. These must be done for the timely and appropriate execution of control measures such as sensitization and education of physicians on the diagnosis and treatment of these patients, and delivery of health messages regarding prevention, etc.

To achieve this goal, quality data and forecast methods with the least errors are required [19]. The healthcare system is an important means of collection, analysis, interpretation and dissemination of healthcare data results, which is mainly used to prevent and control diseases and health events [20]. The health system has been designed to facilitate the detection of abnormal behavior of infectious diseases and other health events. To achieve this goal, different statistical methods have been used to forecast infectious diseases. Time series models have been used by researchers for a long time now. They attempt to forecast the epidemiologic behavior of diseases using historical surveillance data. In the past, researchers have used various time series models to forecast the incidence of epidemics, such as, exponential smoothing [21], generalized regression [22], analysis [23], and multilayered time series models [24]. However, the use of these models requires the determination of exact mathematical parameters and the establishment of underlying hypotheses, particularly the linearity of the regression association [25]. In recent years, time series models based on machine learning methods–such as the artificial neural network- have been used to model the time series incidence of infectious diseases [26]. It has been demonstrated that these methods are effectively better at forecast than the classic methods. The artificial neural network (ANN) is a powerful non-linear technique used in data modeling that can model the complex connections between forecasting variables and the target without taking into account any primary hypothesis and previous knowledge of the relations between the parameters under study [27]. Two pioneer methods in neural networks are the Radial Basis Function (RBF) and the Multilayer Perceptron (MLP) networks. RBF is a more common type of neural network learning which responds to a limited section of the input space; it has a faster and more accurate and yet simpler network structure compared to other neural networks, while the MLP is more generalizable [28]. Another machine learning method is the Support Vector Machine (SVM) method. SVM is a macro data method that is used owing to its desirable performance in regression problems and classification when compared to classic models. This model employs a risk function including empirical error and a regularization principle [29]. It has higher power and better performance in practical applications. This trait is due to its structural principle of risk minimization; it has greater generalizability and is superior to the empirical risk minimization principle. SVM have been employed in different time series problems, namely, machinery industry [30], engine reliability prediction/forecast [31] and forecasting economics time series [32,33]. SVM success in forecasting time series of different fields of science led us to the conclusion that we should use it for forecasting brucellosis time series. Many researchers have approved the desirable performance of these four techniques and their advantages in forecast [28]. Nonetheless, in spite of the widespread application of these techniques, they have not–to our knowledge- been evaluated for Qazvin’s brucellosis data. The precise and timely forecast of trend changes in outbreak control management are very important, and the performance of various methods depend on the data, and their performance may differ for different data. Therefore, the goals of this study were to assess the performance of artificial neural networks (including, the RBF and MLP–separately), the SVM and random forest in forecasting the number of brucellosis cases and to identify a model with better forecast abilities. This model may then be utilized in the public health system, to control and prevent the high incidence of brucellosis.

From the climatic perspective, it is essential to determine the epidemiologic conditions of brucellosis in terms of environmental circumstances for different regions. This in turn demands the examination of environmental factors of each region. Of the most significant environmental characteristics of each region are its climatic and weather conditions and other influential factors. Given the bacterial nature of brucellosis, detecting climatic/weather characteristics and other influential factors can greatly help manage and control this disease. Hence, the other goal of this study was to determine the impact of climatic factors such as, Average temperature, minimum and maximum temperature, precipitation, wind speed and average wind speed, and other variables, such as, mean age, gender ratio, rural ratio, ratio of unpasteurized dairy product consumption, and contact with livestock on the incidence of brucellosis–using machine learning methods. Thus, by determining the most appropriate model, the results of this research can prove beneficial to epidemiologists in preventing and controlling epidemics.

## Methods

### The data and area under study

This study was conducted on time series data of brucellosis using the following covariates: month, season, year, rural ratio, mean age, males ratio, ranchers’ ratio, ratio of contact with livestock, ratio of consumption of unpasteurized dairy products, and climatic parameters, including, Average temperature, minimum and maximum temperature, precipitation, wind speed and average wind speed in Qazvin province–on a monthly basis. Qazvin is located in North-western Iran and at the southern skirts of the Alborz Mountain Range. It is cool in summer and cold in winter. There is an appropriate distribution of humidity across Qazvin due to the effect of rain-producing air masses and altitudes. The trend of humidity changes during the year indicates maximum humidity during winter and minimum humidity during summer. Based on the most recent national geographical divisions made by the Ministry of Interior in 2013, Qazvin province has an area of 15567 m2, and includes 6 counties, Qazvin, Buin-Zahra, Abyek, Avaj, Takestan and Alborz.

Based on national guidelines, the patients’ clinical and epidemiological data are registered online in the Health Surveillance System. Accordingly, patients with the following clinical–epidemiological symptoms of brucellosis were considered disease cases: fever, myalgia and para-clinical symptoms (the results of two routine lab tests for brucellosis) including, Wright’s (diagnostic test for brucellosis; values greater than 1.8 indicate presence of infection) and 2ME (Mercaptoethanol Brucella agglutination test) (brucellosis confirmatory test, which if greater than or equal to 1.4 is indicative of the presence of infection) [34,35].

Here the trend of a number of human brucellosis cases was analyzed using some covariates and monthly climatic parameters during 2010–2018 in Qazvin province. Data on the number of brucellosis cases and covariates (including, rural ratio, mean age, gender ratio, ratio of contact with livestock, ratio of unpasteurized dairy product consumption) were extracted from the databank of Qazvin University of Medical Sciences’ Deputy of Health, and data related to climatic parameters were obtained from Qazvin province’s Meteorological System. To examine the validity of the models applied in this study, the monthly data were classified into two sets, the training and test sets. This classification was done based on the performance assessment of time series data. Studies conducted on time series data consider a 70 or 80 percent ratio of data as the training set of data (from the beginning of the series) and the remainder are considered as the test set [36,37]. Therefore, here too, the 80 to 20 percent ratio was considered for the data as the training (from April 2010 until August 2017) and the test (from September 2017 until March 2018) sets, respectively.

#### Models

In this study, four machine learning methods including the radial basis function, multilayer perceptron, support vector machine and Random forest time series were employed to forecast monthly changes of brucellosis frequency using covariates and climatic parameters. Auto Regressive Integrated Moving Average (ARIMA) was fitted to the data with 1–12 lags for the monthly brucellosis data, covariates and climatic parameters. A significance level of 0.05 was taken into consideration.

#### Support vector machine

The SVM is a machine learning method that is used due to its desirable performance in regression and classification problems compared to the classic models. This model employs a loss function including empirical error and a regularization principle [29]. When dealing with regression problems, this method attempts to estimate the relationship between response variable and covariates using a linear function in a higher dimension instead of a non-linear function in the initial space of data Suppose y(t) is a set of time series data that depends on time$t\in \left\{1,2,\dots n\right\}$. In time series problems, the goal is to create a forecast rule based on current and past data that can be used to estimate future values. Therefore, the function f(.) is defined as a function that reverses an output to forecast future values [29]. The following equation is a forecast function for non-linear regression:

$f\left(y\right)=w.\varphi \left(y\right)+b$

The SVM depicts the data that are nonlinear in their input space in a higher dimensional feature space through the kernel function$\varphi \left(.\right)$, which must be accurately selected. Therefore, a linear problem will be obtained. In order to estimate the forecast rule, the (weights) w coefficient and x-intercept b must be optimized.

There are a number of different kernels [38]. In our study, the kernel function was used with better performance upon examining different kernel functions’ performances.

Artificial neural networks . Artificial neural networks (ANN) are data processing mathematical tools used in many scientific fields for forecasting, pattern recognition and classification [36]. There are several nodes and weights that connect the nodes to each other. Several ANNs exist, of which the MLP and RBF have been applied in many studies and been compared with each other. The MLP is a special type of this method which has non-linear activation functions such as the sigmoid in the hidden layer and the linear function in the external layer [36]. The relationship between the input and hidden layers was is as below:

${y}_{j}=f\left(\sum _{i=1}^{N}{w}_{ji}{x}_{i}+{b}_{j}\right)$
in which, x is the nodal value of the previous layer, y is the nodal value of the current layer, b is the intercept of the current layer, and w represents the regression coefficients or weights [39,40];

To fit the MLP model, two hidden layers, one input and output layers were used in this study. Sigmoid and tangent hyperbolic functions were considered in the hidden layers and identity function was used in the output layer.

Random forest . The Random forest (RA) technique is a regression and classification tool based on a set of tree forecasters [41]. For a regression problem, RA combines the forecasts obtained from several regression trees, such that, each tree is built by splitting down the predictor space return. (Analysis continues up to the point that the constructed sub-spaces become homogenous and similar) [42]. The RA algorithm includes, 1) the stage of extracting many bootstrap samples from the primary data and construction of training sets, 2) growing a regression tree for each of the train samples obtained, 3) finally, predicting the response variable for the new data by accumulating the predictions obtained from all trees [43].

Model assessment criteria. To assess and compare the accuracy of prediction and the performance of the models in the times series data modeling in this study, the Root Mean Square Error (RMSE), Mean Absolute Error (MAE) and Mean Absolute Root Error (MARE), and R2 determination coefficient criteria were used, which were calculated by the following relations [28,44]:

$RMSE=\sqrt{\frac{1}{n}\sum {\left({Y}_{obs}-{Y}_{pred}\right)}^{2}}$
$MAE=\frac{1}{n}\sum |{Y}_{obs}-{Y}_{pred}|$
${R}^{2}=1-\frac{\sum {\left({Y}_{obs}-M{Y}_{pred}\right)}^{2}}{\sum {\left({Y}_{obs}-\overline{Y}\right)}^{2}}$
$MARE=\frac{1}{n}\sum \frac{|{Y}_{obs}-{Y}_{pred}|}{{Y}_{obs}}$

In the associations/relations above, Yobs and Ypred, respectively, represent the numbers of brucellosis cases observed and predicted.

Implementation and parameter tuning . To implement the models, variables in Table 1 as well as climatic variables of wind speed (m/s) and temperature (Centigrade) were used as predictors and the numbers of brucellosis cases observed was used as the output. Then, all the three machine learning techniques of RF, SVM and ANN were implemented to predict. For all the three models, there were some parameters to be tuned. To this, first, we divided the data set into two sets of training and testing (80–20%). Then, we conducted a 10-fold cross-validation over the training set to find the optimum values. For the SVM, two parameters of C and gamma were tuned and the optimum values obtained were 0.023 and 0.008, respectively. For the ANN, the number of hidden layers needed to be selected using cross-validation. So, we considered 1–3 hidden layers and an ANN with two hidden layers was selected as the optimum. This was the case for both MLP and RBF. For the random forest, the number of trees and mtry (the number of covariates randomly selected from all predictors to create each tree) were tuned and a RF with 550 trees and mtry = 3 was selected as the optimum parameters. The models were trained using the data in the training set and were tested on the testing set.

Table 1
Descriptive characteristics of brucellosis cases in Qazvin Province.
VariablesCategoryFrequency (percent)P-values
Age Group0–9 years163 (5.10)0.026
10–19 years329 (10.30)
20–29 years654 (20.47)
30–39 years653 (20.44)
40–49 years486 (15.21)
50–59 years436 (13.65)
≥60 years473 (14.80)
GenderMale2026(63.43)<0001
Female1168(36.57)
Contact with livestockYes2452 (76.76)<0001
No742 (23.23)
HabitatUrban1113(34.84)<0001
Rural2081(65.15)
Job typeHousewife982 (30.75)<0001
Rancher—Farmer1287 (40.29)
Student236 (7.38)
Employee49 (1.53)
Worker139 (4.35)
Private144 (4.50)
Others357 (11.17)
Consumptions of unpasteurized dairiesYes2612 (81.77)<0001
No582 (22.18)

#### Software

All analysis was done using, R 3.4.2 in fitting the models, covariates and dependent variables were normalized–which was done with the equation below [44]:

${Y}_{Normalized}=\frac{Y-{Y}_{\mathrm{min}}}{{Y}_{\mathrm{max}}-{Y}_{\mathrm{min}}}$

## Results

The examination of data of 3194 registered brucellosis patients showed that between the years of the study (2010–2018) most patients (63.4%) were males and the remaining were females (Table 1). Their mean age was 38.43±10.28 years; after classifying the individuals into 5-year age groups we observed that the highest percentage of the disease had occurred in the third and fourth decades of life, i.e. between 25 to 39 years (31.18%) [Table 1]. The examination of employment status revealed that the most commonly affected job was ranching–farming (40.29%) [Table 1]. Upon examining the status of the disease per rural and urban regions, the highest frequency was seen in rural regions, at a rate of 65.24% [Table 1]. Upon examining the probable risk factors of this disease, the consumption of unpasteurized dairy products (81.77%) and contact with livestock (76.76%) had the highest frequencies [Table 1]. We examined the monthly pattern of the disease, and found that April (6.32%) and August (10.49%) had witnessed the lowest and highest percentages of disease, respectively [Table 2]. Regarding the seasonal pattern of disease, the highest and lowest percentage frequencies were seen in summer (30.08%) and autumn (19.94%), respectively [Table 2]. The year 2015 (16.03%) witnessed the highest reporting rate of the disease among the years of the study. Moreover, the lowest frequency percentage was reported in 2010 (6.01%) [Table 2]. The mean 9-year incidence weight of the disease for each of Qazvin’s counties indicated that Avaj (222.42 per 100000 person) and Takestan (42.63 per 100000 person) held first and second positions, respectively, while the provincial incidence was 27.43 per 100,000 person [Table 3, Fig 1]. We also extracted the statistical features of climatic parameters, the results of which are as follows; mean temperature: 14.59±9.05, precipitation: 25.62±24.02, wind speed: 1.88 ±34, maximum temperature: 28.21±09.63, minimum temperature: 1.59±8.52, wind speed: 14.36 ±04.09 (Table 4).

Fig 1
Average incidence rate of brucellosis in Qazvin Provinces during 2010–2019.
Table 2
Frequency of brucellosis cases by year and season in Qazvin Province.
YearSpringSummerAutumnWinterP_valueTotal
Number (Percent)Number (Percent)Number (Percent)Number (Percent)0.166Number (Percent)
201051 (24.56)69 (35.94)37 (19.27)35 (18.23)0.488192 (6.01)
201185 (27.16)87 (27.80)64 (20.45)77 (60.24)0.299313 (9.79)
201282 (25.87)109 (34.38)59 (18.61)67 (21.14)1.00317 (9.92)
201372 (24.08)90 (30.10)56 (18.73)81 (27.09)1.00299 (9.36)
2014101 (22.54)149 (33.26)81 (18.08)117 (26.12)0.488448 (14.02)
2015118 (23.05)153 (29.88)83 (16.21)158 (30.89)0.166512 (16.03)
2016119 (28.95)120 (29.20)88 (21.41)84 (20.44)0.083411 (12.86)
201794 (26.93)92 (26.36)83 (23.78)80 (22.92)0.166349 (10.92)
201869 (19.55)92 (26.06)86 (24.36)106 (30.03)1.00353 (11.05)
Total791 (24.76)961 (30.08)637 (19.94)805 (25.20)0.6343194 (%100)
Table 3
Annual brucellosis incidence rates (per 100,000) by counties of Qazvin Province.
YearBuin ZahraAlborzAbyekTakestanQazvinAvajCounty
201031.575.4128.7724.2810.59-15.97
201143.7113.7737.3021.3924.88-26.04
201247.239.9447.8823.1423.22-26.06
201311.482.8416.9523.0813.26-12.88
201423.0011.0145.6283.3514.36283.1235.99
201559.8119.1533.9060.2213.88400.9040.67
201636.7815.3033.7956.6514.72255.6532.60
201735.8714.4341.0549.6317.9289.8927.64
201836.6014.4229.3842.0522.6487.0527.92
Mean Weighted Incidence35.8512.0134.9542.6317.27222.4227.43
Table 4
Descriptive statistics of the monthly brucellosis cases in Qazvin Province.
Number of brucellosis casesAverage (SD)31.3 (11.5)31.8 (12.1)29.3 (8.4)
Min7.07.015.0
Max62.062.041.0
Rural RatioAverage (SD)2.0 (2.4)2.6 (2)1.3 (0.6)
Min0.30.30.5
Max16.016.02.6
Average ageAverage (SD)38.4 (10.3)37.9 (11.3)40.7 (3.5)
Min13.713.734.8
Max125.9125.947.1
Male ratioAverage (SD)2.2 (1.3)2.3 (1.4)2.2 (1.3)
Min0.20.20.2
Max9.09.09.0
Ratio of ranchersAverage (SD)0.3 (0.1)0.3 (0.1)0.3 (0.1)
Min0.10.10.2
Max0.60.60.6
Ratio of Contact HistoryAverage (SD)3.6 (3.9)3.7 (4.2)3.0 (1.8)
Min0.20.21.1
Max25.025.07.5
Ratio of history of consumption of unpasteurized dairyAverage (SD)4.5 (4.3)4.1 (4.5)6.0 (3.3)
Min0.60.61.0
Max27.027.014.0
Average temperature (c0)-Average (SD)14.6 (9.1)14.9 (9.1)13.2 (8.9)
Min0.90.92.7
Max30.328.230.3
Min0.000.000.00
Max89.389.383.6
Average wind speed (Meter per Seconds)Average (SD)1.9 (0.3)1.9 (0.4)2.0 (0.3)
Min1.01.01.6
Max2.62.62.6
Maximum temperature (c0)Average (SD)28.2 (9.6)28.7 (9.6)26.4 (9.9)
Min12.512.514.0
Max42.542.541.7
Minimum temperature (c0)Average (SD)2.0 (8.5)2.2 (8.6)0.8 (0.8)
Min-14.2-13.0-14.2
Max16.816.216.8

Also, the correlation between descriptive variables (climatic and non-climatic) and monthly brucellosis cases was presented (see Table 5). Fig 2 illustrates the time series graphs of the number of monthly brucellosis cases in Qazvin province. As it can be seen, the trends are nonlinear at provincial level, thus, classic time series methods do not efficiently work for these data. Correlation analysis was done to select appropriate inputs of modeling and significant ARIMA coefficients were considered as the inputs. The four artificial neural network methods (radial basis function and multilayer perceptron), support vector machine and random forest were fitted to each set. To compare the performance of the four models, the RMSE, MAE, MARE, and R2 criteria were calculated for the training and test sets (See Table 6). Given these results, the RMSE, MARE and MAE values for the MLP method yielded smaller values compared to the other three ANN methods (RBF, RF, SVM). Furthermore, the R2 value was closer to one in the MLP method compared to the other three ANN methods. Based on these findings, we may conclude that the MLP method performed better than the other three modeling and forecast methods for Qazvin province’s monthly time series data sets–based on covariates and climatic parameters. The temporal changes of the observed cases of brucellosis and the values estimated by the four ANN methods, RA and SVM for the testing set are illustrated in Figs 3 and 4. As seen in the figure, the frequency of brucellosis has increased during the months of spring. This figure also demonstrates that the values forecasted by the MLP ANN method are better than the other three RBF, RF & SVM methods.

Fig 2
Time series diagrams of the number of monthly brucellosis cases in Qazvin Province during the years 2010–2019.
Fig 3
Forecasted number of brucellosis cases obtained from MLP, RBF, RF and SVM time series.
Fig 4
SVM, RF.Graph of the number of residuals obtained from fitting MLP, RBF, SVM, RF time series models.
Table 5
Correlation between descriptive statistics and the monthly brucellosis cases.
VariablesPearson CorrelationP-value
Rural Ratio0.200.420
Average age-0.060.530
Male ratio-0.170.086
Ratio of ranchers0.060.556
Ratio of Contact History0.35<0.001
Consumption of unpasteurized dairy0.33<0.001
Average temperature (c0)-0.330.001
Wind speed (M/S)0.45<0.001
Maximum temperature (c0)0.310.001
Minimum temperature (c0)0.300.002
Average wind speed (M.S)0.290.003
Table 6
Evaluation of the prediction models over the test set.
ModelEvaluation criteria
RMSEMAEMARER2
Multilayer Perceptron networks0.220.180.011.00
Random Forrest9.257.640.330.01
Support Vector Machine8.216.580.290.08

The remaining four methods’ graphs are illustrated in Fig 4. The MLP method yielded smaller remnants, therefore, the performance of the MLP was better compared to the RBF, SVM and RF methods.

Moreover, Fig 5 depicts the observed values and estimates of (forecasted) brucellosis cases resulting from the four methods compared against each other using the scatter plot. As can be seen, all the points have fallen in the first one-fourth, which indicates that the estimated values are equal to the observed values. Moreover, the significance level of the fitted regression model was calculated for each of the four methods (MLP, RBF, RF and SVM) and was smaller than 0.001, which indicates the significance, validity and agreement between the observed and forecasted values in the four models. Given the results in Fig 5, the slope of the regression line was closer to 1 in the MLP model than in the other three models, which, once again, indicates the better performance of this method.

Fig 5
Number of brucellosis cases observed and forecasted using four MLP, RBF, RF and SVM models in Qazvin Province.

The significance of the variables used in the MLP have been shown in Fig 6; most climatic variables–particularly temperature and wind speed- influenced the number of brucellosis cases.

Fig 6
Climatic importance chart for forecasting the number of monthly brucellosis cases in Qazvin Province.

## Discussion

First, we will discuss the epidemiologic descriptive results of brucellosis in Qazvin province between the years 2010 and 2018. The incidence of the disease was on average 27.43 per 100000 person in the 9 years of the study, which, according to Zeynali & Shirzadi’s classification falls in the highly infected regions (21–30 per 100000). We must however note that the statistics reported are approximately 4 to 10 percent of the existent cases, a phenomenon that occurs even in developed countries. This happens due to the variety in clinical features, not visiting a physician when the clinical symptoms are mild, and incomplete registration and reporting [4547]. Thus, we predict that the actual number of cases across the province are much higher than the official records. Shoraka et al reported the incidence of brucellosis in North Khorasan’s Maneh and Samalghan counties at 25.2 and 38.6 per 100000 person, respectively, during the years 2008 and 2009 [48]. Farahani et al estimated this incidence rate in Arak at 60 cases per 100000 persons during 2001–2010 [49]. In our study, the most frequently affected age group was the 25–35-year-old age group. The disease was mostly prevalent in Qazvin’s rural areas and in men, thus, it was mostly seen in rural males whose main occupation was ranching and who were in contact with livestock. The high percentage of the disease in this age group may be justified by their high person, heavy workload, and their direct contact with livestock. The finding that males are more commonly affected than females can be confirmed by Farahani et al’s results in 2010 [50]. Another similar foreign study conducted by Donno et al in 2010 also indicated a higher percentage of brucellosis among males (66.2%) [51]. Although the results of Zeynalian et al’s study in Esfahan state otherwise, i.e., the disease is more common among females [52]. In industrial nations, brucellosis has more often been reported in slaughterhouse workers and butchers [53]. Here, the most frequently affected occupations were ranchers–farmers (40.2%) and housewives (30.7%). The high prevalence among the latter group may be explained by the fact that rural housewives very often work alongside their spouses in ranching and farming and are therefore in contact with livestock and dairy products, thus being exposed to the risk of infection. In terms of occupation, the studies conducted by Medical Universities of Semnan, Kordestan, Birjand and Lorestan reported the highest prevalence of brucellosis among housewives [5456]. Moreover, determining the seasonal prevalence of the disease indicated that it occurs most often in summer. Similarly, Esmail-nasab et al observed that brucellosis has higher prevalence during the months of May, June and July [57]. In 2011, Hamzavi et al studied the prevalence of the disease in Kermanshah, and found it to be more prevalent during the months of spring and in rural regions [58]. Elsewhere, in 2009, researchers observed that the prevalence of the disease reached 45 per 100000 person in East Azerbaijan and that it occurred more frequently during May and June [59]. Perhaps the higher prevalence of the disease during the warmer months of the year is due to the increased reproduction rate of livestock and greater contact with them. All the aforementioned results point towards one fact, that although many countries have been reported as brucellosis free, it is still prevalent in Iran, in spite of the considerable advancements made in its control; it is still a health problem, particularly in the western regions of the country and the outskirts of the Alborz mountain range, including the Qazvin province [60]. Our results indicated that, compared to its urban counterparts, the prevalence of the disease is higher in rural regions of Qazvin, a finding which underscores the necessity of laying greater focus on the control & prevention of brucellosis in this province and especially its rural areas. It seems that the habit of consumption of local dairy products–as an absolute must- and the ranching occupation and contact with livestock among the people of this region are the main reasons behind the relatively high prevalence of the disease. Given these findings, the residents of this province must be educated on the consumption of pasteurized dairy products. Another important point is the collaboration between the University of Medical Sciences and the provincial Central Veterinary Office to encourage ranchers to vaccinate their livestock, which is essential in significantly reducing the prevalence of the disease. Moreover, unawareness on the disease is another major reason why it cannot be controlled. The people and particularly rural residents and nomads who are exposed to the disease do not have adequate basic information about brucellosis, such that various studies across the country have shown low levels of awareness, knowledge and performance regarding this disease [61,62]. Furthermore, the highest incidence rates over the 9-year period were observed in Avaj (222.42 per 100000 person) and Takestan (42.63 per 100000 person), respectively; these rates are even higher than the provincial and national rates. Perhaps, the rural nature of these two counties, as well as their adjacency to infected provinces like Hamedan contribute to this high prevalence.

The second part of this study deals with the forecast results of brucellosis incidence by employing machine learning data analysis methods and their comparison in forecasting this rate in Qazvin province for the years 2010–2018. The precise and timely determination of infectious diseases’ epidemics plays an important role in their control and prevention. This can be done through prevention strategies such as, sensitization and raising awareness among physicians on the rapid diagnosis of disease, correct treatment of patients, and delivery of health messages. Efficient statistical models of high precision can be useful tools for forecasting infectious disease outbreaks in the future [25]. The performance of statistical models is dependent on time series data, and there is no single model that can perform the best for all cases. Therefore, it is very important to assess and compare the performances of various statistical methods–particularly machine learning–based methods- as one can discover important and applicable information about their strengths and weaknesses [63], and acquire a better perspective on the utilization of better forecast models. Theory–based machine learning models have exhibited good performance in different fields of science, including time series analysis. Based on literature, theory–based machine learning methods are effective and efficient in health systems. These methods are naturally beneficial forecast methods in time series analyses of endemic diseases, as they are capable of modeling nonlinear relations and data complexities.

In this study, that was conducted on human brucellosis cases of Qazvin province between 2010 and 2018, a total of 3194 patients were detected, upon which the accuracy of the four MLP, RBF, RF and SVM methods were modeled and compared. Based on our results, in comparison to the other three methods, the MLP method exhibited better performance in modeling the monthly changes of brucellosis, and estimated a trend closer to the one observed. The trends forecasted by the RBF, RF and SVM neural networks were very different from the one observed. The intercept of the values observed and those forecasted will lead to misleading planning in the health system [64]. The values of monthly brucellosis cases estimated by MLP showed very good agreement with the values observed. However, the values estimated by the other three methods (RBF, RF and SVM) did not show good agreement with the observed values. Since the differences between the observed and estimated values can lead to errors in the healthcare system, their disagreement is of utmost importance. Based on goodness of fit criteria (RMSE, MAE, MARE and R2), the graphs presenting the values forecasted by the MLP time series method were more powerful in forecasting the monthly cases of brucellosis, than those of the other three methods; the time series and non-series forecasted the number of brucellosis cases better than the other three models. The MLP’s better performance, or, in other words, the smaller differences between its observed and forecasted values may be attributed to the utilization of the following in its modeling: historical data (values observed in the past 12 months) as forecasting variables in modeling, other influential parameters such as, mean temperature, minimum temperature, maximum temperature, precipitation, wind speed and average wind speed, and other factors such as mean age, ratio of unpasteurized dairy product consumption, ratio of contact with livestock, males’ ratio, ranchers’ ratio, and rural ratio. The dissimilarity between the test and training data sets might severely affect a model and reduce its forecast power.

Like many other studies, we too concluded that MLP performs better in estimating the monthly cases of brucellosis [6567]. However, our results do not conform to those observed by Bayram et al, wherein RBF–based monthly brucellosis time series analysis performed better than the combination of RBF and KNN. Therefore, our results showed that the MLP method can be effectively used in the monthly forecast of brucellosis. The MLP network is one of the most important artificial neural networks that are normally formed of multiple input layers and the input signal is distributed throughout the network in layers. Therefore, given its complicated structure it has better generalizability in forecasting the output variable. This task is undertaken through the identification of complicated temporal changes inside time series data [66]. Recently, studies have been conducted by various countries on the comparison of machine learning methods’ performance aimed at forecasting health data. One of these studies is Zhang et.al study [19]. In this study, the classic methods of ARIMA and exponential smoothing were compared to SVM, where SVM exhibited a better performance. Guan et al compared the performance of neural networks with classic statistical models to forecast the incidence of hepatitis and showed that neural networks performed much better than classic statistical models [68]. In 2017, Oliveira et al also compared a few data mining methods, including the K-nearest neighbor and MLP networks. Of the methods employed, the MLP method was better than the rest [69]. Given that–to our knowledge- this study is the first in its kind in Qazvin province, we recommend future research studies to compare the performance of other data analysis methods in the field of brucellosis and/or other diseases in this province.

Another objective of this research was to study the detection of climatic and other risk factors influencing brucellosis. Given the bacterial nature of the cause of this disease, environmental factors such as, weather conditions and certain other influential factors can affect the occurrence of this disease. Thus, in addition to 1–12 month lag variables, here we used the following climatic data: average temperature, minimum temperature, maximum temperature, wind speed and average wind speed, precipitation, and other risk factors such as, month, year, season, mean age, gender ratio, ratio of unpasteurized dairy product consumption, and ratio of contact with livestock. Their impacts upon the disease were then examined using the aforementioned methods. Based on our results of the MLP model, we found that temperature and wind were directly related to the brucellosis incidence, and were the most influential factors compared to other climatic parameters. Qazvin province is located in a cold mountainous area with lowlands, thus, ranching thrives in this region. It appears that Qazvin’s climatic conditions significantly affect the incidence of this disease, as when the temperature is suitable and the pastures are of good quality the livestock thrive and reproduce more. In other words, it may be said that when the average temperature is 15 degrees Centigrade, it can have the greatest effect among the climatic parameters one year later; meaning, this bacterium can remain alive in the environment for one year at this temperature in Qazvin. Undoubtedly, these bacteria live shorter during minimum and maximum temperatures, i.e. the incidence of the disease is lower during hot summers and cold winters, whereas, the moderate climate of Qazvin during these two seasons aggravates the disease. Wind reduces the incidence of this disease at high speeds, reason being that these bacteria live shorter in air. An increase in air pressure aggravates the disease, as higher pressure indicates air stability, and it seems that the disease flourishes in a relatively stable climate and suitable temperatures.

Finally, there are various statistical models in medical sciences that can predict disease behavior. Data Mining System for Infection Control Surveillance (DMSS) is one of a novel approaches [70] for achieving the mentioned goal. Application of DMSS in health care data leads to the determination of rapid and accurate predicting outbreaks and it led to timely and appropriate health decisions of policymakers and epidemiologists.

One of the limitations of this study is the limited duration of the time series duration, which can partially reduce the forecast model’s performance. Another limitation is the lack of comparison between machine learning based–statistical methods and classic methods.

## Conclusion

Based on our results, the MLP artificial neural network model can be used for detecting changes in behavior of human brucellosis cases over time and based on changes in climatic parameters. Most climatic parameters were influential in the incidence of the disease, and the most influential one was temperature. Further studies on the practical application of time series models and detection of the best model for the control and prevention of brucellosis are warranted.

## Acknowledgements

We would hereby like to extend our gratitude to Qazvin University of Medical Sciences’ Head of Department of Disease Prevention and Control, Dr. Shiva Leghaee and her colleagues who helped in data extraction.

## References

1

H RM Hatami, H Eftekhar. Epidemiology and control of brucellosis In: Comprehensive public health book. Tehran: Arjmand Press2004:p., pp.1207e212.

2

M Namiduru, K Gungor, O Dikensoy, I Baydar, E Ekinci, I Karaoglan, et al. Epidemiological, clinical and laboratory features of brucellosis: a prospective evaluation of 120 adult patients. International journal of clinical practice. 2003;57(1):, pp.20–4.

3

JG Pérez-Rendón, JB Almenara, AM Rodríguez. . The epidemiological characteristics of brucellosis in the primary health care district of Sierra de Cadiz. Atencion primaria. 1997;19(6):, pp.290–5.

4

Importance of zoonotic diseases in Iran., (2005).

5

JA Serra, PG Godoy. . Incidence, etiology and epidemiology of brucellosis in a rural area of the province of Lleida. Revista espanola de salud publica. 2000;74(1):, pp.45–53.

6

E MG Young, J Bennett, R Dolin. Principles and practice of infectious diseases. New York: Churchill Livingstone;. 1995;4th ed.

7

M Minas, A Minas, K Gourgulianis, A Stournara. . Epidemiological and clinical aspects of human brucellosis in Central Greece. Japanese journal of infectious diseases. 2007;60(6):, pp.362

8

G Pappas, P Papadimitriou, N Akritidis, L Christou, EV Tsianos. . The new global map of human brucellosis. The Lancet infectious diseases. 2006;6(2):, pp.91–9. , doi: 10.1016/S1473-3099(06)70382-6

9

M Refai. . Incidence and control of brucellosis in the Near East region. Veterinary microbiology. 2002;90(1–4):, pp.81–110. , doi: 10.1016/s0378-1135(02)00248-1

10

M Sofian, A Aghakhani, AA Velayati, M Banifazl, A Eslamifar, A Ramezani. . Risk factors for human brucellosis in Iran: a case–control study. International journal of infectious diseases. 2008;12(2):, pp.157–61. , doi: 10.1016/j.ijid.2007.04.019

11

JJ McDermott, S Arimi. . Brucellosis in sub-Saharan Africa: epidemiology, control and impact. Veterinary microbiology. 2002;90(1–4):, pp.111–34. , doi: 10.1016/s0378-1135(02)00249-3

12

WH Organization. Brucellosis Fact sheet N173. World Health Organization, Geneva, Switzerland1997.

13

JD Radolf. . Southwestern Internal Medicine Conference: brucellosis: don’t let it get your goat!The American journal of the medical sciences. 1994;307(1):, pp.64–75. , doi: 10.1097/00000441-199401000-00012

14

S Purwar. . Human brucellosis: a burden of half-million cases per year. Southern medical journal. 2007;100(11):, pp.1074, doi: 10.1097/SMJ.0b013e318157f6c5

15

H A-RM Samaha, RM Khoudair, HM Ashour. Emerg Infect Dis Multicenter study of brucellosis in Egypt. 2008;14(1916e8).

16

M Moosazadeh, R Nikaeen, G Abedi, M Kheradmand, S Safiri. . Epidemiological and clinical features of people with malta fever in iran: a systematic review and meta-analysis. Osong public health and research perspectives. 2016;7(3):, pp.157–67. , doi: 10.1016/j.phrp.2016.04.009

17

R Mirnejad, FM Jazi, S Mostafaei, M Sedighi. . Epidemiology of brucellosis in Iran: A comprehensive systematic review and meta-analysis study.Microbial pathogenesis. 2017;109:, pp.239–47. , doi: 10.1016/j.micpath.2017.06.005

18

SM Alavi, ME Motlagh. . A review of epidemiology, diagnosis and management of brucellosis for general physicians working in the Iranian health network. Jundishapur Journal of Microbiology. 2012;5(2):, pp.384.

19

X Zhang, T Zhang, AA Young, X Li. . Applications and comparisons of four time series models in epidemiological surveillance data. PLoS One. 2014;9(2):, pp.e88075, doi: 10.1371/journal.pone.0088075

20

FF Nobre, ABS Monteiro, PR Telles, GD Williamson. . Dynamic linear model and SARIMA: a comparison of their forecasting performance in epidemiology. Statistics in medicine. 2001;20(20):, pp.3051–69. , doi: 10.1002/sim.963

21

C Farrington, N Andrews. In R Brookmeyer. and D Stroup., editors, Monitoring the Health of Persons, chapter Outbreak Detection: Application to Infectious Disease Surveillance. Oxford University Press; 2003.

22

D Chadwick, B Arch, A Wilder-Smith, N Paton. . Distinguishing dengue fever from other infections on the basis of simple clinical and laboratory features: application of logistic regression analysis. Journal of Clinical Virology. 2006;35(2):, pp.147–53. , doi: 10.1016/j.jcv.2005.06.002

23

G González-Parra, AJ Arenas, L Jódar. . Piecewise finite series solutions of seasonal diseases models using multistage Adomian method. Communications in Nonlinear Science and Numerical Simulation. 2009;14(11):, pp.3967–77.

24

M Spaeder, JC Fackler. . A multi-tiered time-series modelling approach to forecasting respiratory syncytial virus incidence at the local level. Epidemiology & Infection. 2012;140(4):, pp.602–7.

25

L Tapak, O Hamidi, M Fathian, M Karami. . Comparative evaluation of time series models for predicting influenza outbreaks: application of influenza-like illness data from sentinel sites of healthcare centers in Iran. BMC research notes. 2019121;12(1):, pp.353, doi: 10.1186/s13104-019-4393-y

26

C-C Chang. " . LIBSVM: a library for support vector machines," ACM Transactions on Intelligent Systems and Technology, 2: 27: , pp.1–27: 27, 2011http://wwwcsientuedutw/~cjlin/libsvm. 2011;2.

27

SS Baboo, IK Shereef. . An efficient weather forecasting system using artificial neural network. International journal of environmental science and development. 2010;1(4):, pp.321.

28

N Shirmohammadi‐Khorram, L Tapak, O Hamidi, Z Maryanaji. . A comparison of three data mining time series models in prediction of monthly brucellosis surveillance data. Zoonoses and public health. 201911;66(7):, pp.759–72. , doi: 10.1111/zph.12622

29

C-H Wu, J-M Ho, D-T Lee. . Travel-time prediction with support vector regression. IEEE transactions on intelligent transportation systems. 2004;5(4):, pp.276–81.

30

P-F Pai, C-S Lin. . Using support vector machines to forecast the production values of the machinery industry in Taiwan. The International Journal of Advanced Manufacturing Technology. 2005;27(1–2):, pp.205.

31

W-C Hong, P-F Pai. . Predicting engine reliability by support vector machines. The International Journal of Advanced Manufacturing Technology. 2006;28(1–2):, pp.154–61.

32

Müller K-R, Smola AJ, Rätsch G, Schölkopf B, Kohlmorgen J, Vapnik V, editors. Predicting time series with support vector machines. International Conference on Artificial Neural Networks; 1997: Springer.

33

FE Tay, L Cao. . Modified support vector machines in financial time series forecasting. Neurocomputing. 2002;48(1–4):, pp.847–61.

34

P Eini, F Keramat, M Hasanzadehhoseinabadi. . Epidemiologic, clinical and laboratory findings of patients with brucellosis in Hamadan, west of Iran. Journal of research in health sciences. 2012;12(2):, pp.105–8.

35

M Zeinali, M Shirzadi, J Sharifian. National guideline for Brucellosis control. Tehran: Ministry of Health and Medical Education2009:, pp.10–7.

36

T Hastie, R Tibshirani, J Friedman. The elements of statistical learningSpringer series in statistics.:: Springer; 2001.

37

M Tominola, M Tynkkynen, J Lemmetty, P Harstel, L Sikanen. . Estimating the Characteristics of a Marked Stand Using k-Nearest-Neighbour Regression. Journal of Forest Engineering. 1999;10(2):, pp.75–81.

38

H Wu, Y Cai, Y Wu, R Zhong, Q Li, J Zheng, et al. Time series analysis of weekly influenza-like illness rate using a one-year period of factors in random forest regression. Bioscience trends. 2017.

39

S Bayram, ME Ocal, E Laptali Oral, CD Atis. . Comparison of multi layer perceptron (MLP) and radial basis function (RBF) for construction cost estimation: the case of Turkey. Journal of Civil Engineering and Management. 2016;22(4):, pp.480–90.

40

L Tapak, H Mahjub, O Hamidi, J Poorolajal. . Real-data comparison of data mining methods in prediction of diabetes in Iran. Healthcare informatics research. 2013;19(3):, pp.177–85. , doi: 10.4258/hir.2013.19.3.177

41

Segal MR. Machine learning benchmarks and random forest regression. 2004.

42

A Liaw, M Wiener. . Classification and regression by randomForest. R news. 2002;2(3):, pp.18–22.

43

G James, D Witten, T Hastie, R Tibshirani. An introduction to statistical learning: Springer; 2013.

44

H Yoon, S-C Jun, Y Hyun, G-O Bae, K-K Lee. . A comparative study of artificial neural networks and support vector machines for predicting groundwater levels in a coastal aquifer. Journal of Hydrology. 2011;396(1–2):, pp.128–38.

45

SS Long, LK Pickering, CG Prober. Principles and practice of pediatric infectious disease: Elsevier Health Sciences; 2012.

46

GT Fosgate, TE Carpenter, BB Chomel, JT Case, EE DeBess, KF Reilly. . Time-space clustering of human brucellosis, California, 1973–1992. Emerging infectious diseases. 2002;8(7):, pp.672, doi: 10.3201/eid0807.010351

47

A Zemestani, N Faghiri-Beirami, A Hosseinzadeh-Fasaghandis, R Hashemi-Aghdam, A Ebrahimzadeh. . Descriptive Epidemiology of Human Brucellosis in Oskou County. Depiction of Health. 2016;7(1):, pp.34–42.

48

H HH Shoraka, A Sofizadeh, et al. Epidemiological Study of brucellosis in in mane & samalghan, north khorasan province, 2008–2009. North Khorasan MUJ. 2010;2(3):, pp.67–8.

49

Farahani S, SHAHMOHAMADI S, Navidi I, Sofian S. An investigation of the epidemiology of brucellosis in Arak City, Iran,(2001–2010). 2012.

50

Taheri Sudjani MHL Mohammad; Capricorn great; Raisi Ahmad; Mohammadzadeh Morteza. . Epidemiology of brucellosis in Shahrekord city. Journal of Jahrom University of Medical Sciences. 2016;14(1):, pp.1–7.

51

D Donev, Z Karadzovski, B Kasapinov, V Lazarevik. . Epidemiological and public health aspects of brucellosis in the Republic of Macedonia. Prilozi. 2010;31(1):, pp.33–54.

52

MZ Dastjerdi, RF Nobari, J Ramazanpour. . Epidemiological features of human brucellosis in central Iran, 2006–2011. Public health. 2012;126(12):, pp.1058–62. , doi: 10.1016/j.puhe.2012.07.001

53

EJ Young. . Brucella species. Principles and practice of infectious diseases. 2000:, pp.2386–91.

54

A Tohme, A Hammoud, M Germanos-Haddad, E Ghayad. . Human brucellosis. Retrospective studies of 63 cases in Lebanon. Presse medicale (Paris, France: 1983). 2001;30(27):, pp.1339–43.

55

Shaikh S GR, Ghajarbaigi P. Epidemiological Study of brucellosis in Qazvin province. Proceeding of 2th National Iranian Congress on brucellosis. 2007; Shahid Beheshti University of Medical Sciences:267–9.

56

Moradi GH KS, Sofimajidpur MGhaderi A, Gharibi F. Epidemiological Study of brucellosis inKurdistan province. Proceeding of 2nd National Iranian Congress on brucellosis. 2007:151–2.

57

N BN Esmail Nasab, E Ghaderi, et al. Epidemiology of brucellosis in Kurdistan Province 2006. Azad Univ2007;1(3):, pp.53–8.

58

Y KN Hamzavi, M Ghazizadeh. . Epidemiological study of brucellosis in Kermanshahprovince in2011. J Kermanshah.18(2):, pp.114–21.

59

A AS Soleymani, M Seyf, et al. Descriptive epidemiology of brucellosis in the province from the year 2005 to2008. Tabriz J. 2012;3(4):, pp.64–9.

60

A Z. Theoretical overview on human brucellosis. Proceedings of the 2nd National Iranian Congress on Brucellosis. 2007May 19–21,Tehran, Iran:47–74.

61

M Sofian, A-A VElAyATI, A AgHAkHANI, W McFarland, A-A Farazi, M Banifazl, et al. Comparison of two durations of triple-drug therapy in patients with uncomplicated brucellosis: A randomized controlled trial. Scandinavian journal of infectious diseases. 2014;46(8):, pp.573–7. , doi: 10.3109/00365548.2014.918275

62

S BA Mahmudabad, MD Nabizadeh, J Ayatollahi. . The Effect of Health Education on Knowledge, Attitude and Practice (KAP) of High School Students' Towards Brucellosis in Yazd. World Applied Sciences Journal. 2008;5:, pp.522–4.

63

M Karami. . Validity of evaluation approaches for outbreak detection methods in syndromic surveillance systems. Iranian journal of public health. 2012;41(11):, pp.102–3.

64

X Zhang, T Zhang, J Pei, Y Liu, X Li, P Medrano-Gracia. . Time series modelling of syphilis incidence in China from 2005 to 2012. PLoS One. 2016;11(2):, pp.e0149401, doi: 10.1371/journal.pone.0149401

65

M Ture, I Kurt. . Comparison of four different time series methods to forecast hepatitis A virus infection. Expert Systems with Applications. 2006;31(1):, pp.41–6.

66

H Memarian, SK Balasundram. . Comparison between multi-layer perceptron and radial basis function networks for sediment load estimation in a tropical watershed. Journal of Water Resource and Protection. 2012;4(10):, pp.870.

67

L Tapak, N Shirmohammadi-Khorram, O Hamidi, Z Maryanaji. . Predicting the frequency of human brucellosis using climatic indices by three data mining techniques of radial basis function, multilayer perceptron and nearest Neighbor: A comparative study. Iranian Journal of Epidemiology. 2018;14(2):, pp.153–65.

68

P Guan, D-S Huang, B-S Zhou. . Forecasting model for the incidence of hepatitis A based on artificial neural network. World journal of gastroenterology: WJG. 2004;10(24):, pp.3579, doi: 10.3748/wjg.v10.i24.3579

69

A Oliveira, BM Faria, AR Gaio, LP Reis. . Data mining in HIV-AIDS surveillance system. Journal of medical systems. 2017;41(4):, pp.51, doi: 10.1007/s10916-017-0697-4

70

SE Brossette, AP Sprague, WT Jones, SA Moser. . A data mining system for infection control surveillance. Methods of information in medicine. 2000;39(04/05):, pp.303–10.

25 Feb 2020

PONE-D-20-00374

Forecasting the monthly incidence rate of brucellosis in west of Iran using time series and data mining from 2010 to 2019

PLOS ONE

Dear Dr Cheraghi,

Thank you for submitting your manuscript to PLOS ONE. After careful consideration, we feel that it has merit but does not fully meet PLOS ONE’s publication criteria as it currently stands. Therefore, we invite you to submit a revised version of the manuscript that addresses the points raised during the review process.

Reviewers' comments are listed to perform the required changes before acceptance of your work.

We would appreciate receiving your revised manuscript by Apr 10 2020 11:59PM. When you are ready to submit your revision, log on to https://www.editorialmanager.com/pone/ and select the 'Submissions Needing Revision' folder to locate your manuscript file.

To enhance the reproducibility of your results, we recommend that if applicable you deposit your laboratory protocols in protocols.io, where a protocol can be assigned its own identifier (DOI) such that it can be cited independently in the future. For instructions see: http://journals.plos.org/plosone/s/submission-guidelines#loc-laboratory-protocols

A rebuttal letter that responds to each point raised by the academic editor and reviewer(s). This letter should be uploaded as separate file and labeled 'Response to Reviewers'.
A marked-up copy of your manuscript that highlights changes made to the original version. This file should be uploaded as separate file and labeled 'Revised Manuscript with Track Changes'.
An unmarked version of your revised paper without tracked changes. This file should be uploaded as separate file and labeled 'Manuscript'.

Please note while forming your response, if your article is accepted, you may have the opportunity to make the peer review history publicly available. The record will include editor decision letters (with reviews) and your responses to reviewer comments. If eligible, we will contact you to opt in or out.

We look forward to receiving your revised manuscript.

Kind regards,

Esteban Tlelo-Cuautle, Ph.D

PLOS ONE

You are encouraged to attend reviewers' comments to improve the impact of your work.

Journal Requirements:

1. Please ensure that your manuscript meets PLOS ONE's style requirements, including those for file naming. The PLOS ONE style templates can be found at http://www.plosone.org/attachments/PLOSOne_formatting_sample_main_body.pdf and http://www.plosone.org/attachments/PLOSOne_formatting_sample_title_authors_affiliations.pdf

2. Thank you for stating the following in the Cover letter of your manuscript:

"This study was funded by the Vice-Chancellor of Research and Technology of Hamadan

University of Medical Sciences. The funders had no role in study design, data collection and analysis,

decision to publish, or preparation of the manuscript."

We note that you have provided funding information that is not currently declared in your Funding Statement. However, funding information should not appear in the Acknowledgments section or other areas of your manuscript. We will only publish funding information present in the Funding Statement section of the online submission form.

Please remove any funding-related text from the manuscript and let us know how you would like to update your Funding Statement. Currently, your Funding Statement reads as follows:

"None."

3. Thank you for stating the following in your Competing Interests section:

"None declared."

Please complete your Competing Interests on the online submission form to state any Competing Interests. If you have no competing interests, please state "The authors have declared that no competing interests exist.", as detailed online in our guide for authors at http://journals.plos.org/plosone/s/submit-now

This information should be included in your cover letter; we will change the online submission form on your behalf.

4. In your Data Availability statement, you have not specified where the minimal data set underlying the results described in your manuscript can be found. PLOS defines a study's minimal data set as the underlying data used to reach the conclusions drawn in the manuscript and any additional data required to replicate the reported study findings in their entirety. All PLOS journals require that the minimal data set be made fully available. For more information about our data policy, please see http://journals.plos.org/plosone/s/data-availability.

Upon re-submitting your revised manuscript, please upload your study’s minimal underlying data set as either Supporting Information files or to a stable, public repository and include the relevant URLs, DOIs, or accession numbers within your revised cover letter. For a list of acceptable repositories, please see http://journals.plos.org/plosone/s/data-availability#loc-recommended-repositories. Any potentially identifying patient information must be fully anonymized.

Important: If there are ethical or legal restrictions to sharing your data publicly, please explain these restrictions in detail. Please see our guidelines for more information on what we consider unacceptable restrictions to publicly sharing data: http://journals.plos.org/plosone/s/data-availability#loc-unacceptable-data-access-restrictions. Note that it is not acceptable for the authors to be the sole named individuals responsible for ensuring data access.

We will update your Data Availability statement to reflect the information you provide in your cover letter.

5. Please amend your authorship list in your manuscript file to include author Hadi Bagheri, Leili Tapak, Manoochehr Karami, Zahra Hasankhani, Hamidreza Najari, Safdar Karimi

6. Please amend your list of authors on the manuscript to ensure that each author is linked to an affiliation. Authors’ affiliations should reflect the institution where the work was done (if authors moved subsequently, you can also list the new affiliation stating “current affiliation:….” as necessary).

7. We note you have included a table to which you do not refer in the text of your manuscript. Please ensure that you refer to Table 5 in your text; if accepted, production will need this reference to link the reader to the Table.

8. Please include a copy of Table 10 which you refer to in your text on page 7.

[Note: HTML markup is below. Please do not edit.]

Reviewer's Responses to Questions

1. Is the manuscript technically sound, and do the data support the conclusions?

The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented.

Reviewer #1: Yes

Reviewer #2: Yes

**********

2. Has the statistical analysis been performed appropriately and rigorously?

Reviewer #1: Yes

Reviewer #2: Yes

**********

3. Have the authors made all data underlying the findings in their manuscript fully available?

The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified.

Reviewer #1: Yes

Reviewer #2: Yes

**********

4. Is the manuscript presented in an intelligible fashion and written in standard English?

PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here.

Reviewer #1: Yes

Reviewer #2: Yes

**********

5. Review Comments to the Author

Reviewer #1: Dear Editor

Bests,

The authors compared the performance of four machine learning methods in forecasting human brucellosis. Generally the subject is interesting and the manuscript is well written. In my concern, there are some minor issues with the manuscript as follows:

Are there other disease that could look like the case definition? How sensitive and specific to Brucellosis is that definition.

Do the patients have to have all of the signs at one time?

Do the authors mean random forest by “random accumulation”? Please correct them as random accumulation is not the usual term.

According to golden rules of reporting the numbers (BMJ publication) the numbers under 10, must be presented in letters!

The majority of the references (References: 5, 6, 20 and 22) are not up to date.

All formulas must be numbered.

Reviewer #2: Overall, I think this manuscript is well written. Just a few suggestions for the result section.

1. Table 1 and Table 2, add p values to compare the incidence between each characteristics.

2. Add a time series correlation matrix or plot to show the correlation between the characteristics in Table 4 and the time series of brucellosis.

3 Explain in detail how you optimize the parameters in your four machine learning models. For example, how did you choose the number of hidden layers for neural network. Please also list those parameters for the other models, and explain how did you determine these parameters.

4 In the discussion section, please include more discussion how to apply these models in real infectious disease surveillance.

**********

6. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Reviewer #1: No

Reviewer #2: Yes: Xingyu Zhang

[NOTE: If reviewer comments were submitted as an attachment file, they will be attached to this email and accessible via the submission site. Please log into your account, locate the manuscript record, and check for the action link "View Attachments". If this link does not appear, there are no attachment files to be viewed.]

While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com/. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Registration is free. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email us at figures@plos.org. Please note that Supporting Information files do not need this step.

30 Mar 2020

Dear Editor,

Thank you for your constructive and valuable comments. Below we have provided a point-by-point response to all the comments.

The authors compared the performance of four machine learning methods in forecasting human brucellosis. Generally, the subject is interesting and the manuscript is well written. In my concern, there are some minor issues with the manuscript as follows:

Are there other disease that could look like the case definition? How sensitive and specific to Brucellosis is that definition.

No, there aren't. The clinical – epidemiological symptoms of brucellosis were considered disease cases including : fever, myalgia and para-clinical symptoms are very non-specific and it may be like to several disease, but the definition of definitive case in our study was the positive results of two routine lab tests for brucellosis) including, Wright’s (diagnostic test for brucellosis; values greater than 1.8 indicate presence of infection) and 2ME (Mercaptoethanol Brucella agglutination test) (brucellosis confirmatory test, which if greater than or equal to 1.4 is indicative of the presence of infection. Please refer to Page 13th lines: 132-35

Do the patients have to have all of the signs at one time?

Not necessarily, because this is the prospective cohort studies, that may be people affect the brucellosis in various time points.

Do the authors mean random forest by “random accumulation”? Please correct them as random accumulation is not the usual term.

Thanks revised.

According to golden rules of reporting the numbers (BMJ publication) the numbers under 10, must be presented in letters!

Thanks revised.

The majority of the references (References: 5, 6, 20 and 22) are not up to date.

Thanks, revised.

All formulas must be numbered. Thanks, revised. Pages 5-6

Thanks revised.

Reviewer #2: Overall, I think this manuscript is well written. Just a few suggestions for the result section.

1. Table 1 and Table 2, add p values to compare the incidence between each characteristics.

Thanks, revised. Please see table 1 and 2

2. Add a time series correlation matrix or plot to show the correlation between the characteristics in Table 4 and the time series of brucellosis.

3 Explain in detail how you optimize the parameters in your four machine learning models. For example, how did you choose the number of hidden layers for neural network. Please also list those parameters for the other models, and explain how did you determine these parameters.

Edited. Page 8th lines 198-210 as follow as:

To implement the models, variables in Table 1 as well as climatic variables of wind speed (m/s) and temperature (Centigrade) were used as predictors and the numbers of brucellosis cases observed was used as the output….

4 In the discussion section, please include more discussion how to apply these models in real infectious disease surveillance.

Thanks, revised, as follow as: Page 13th Lines: 372-75

24 Apr 2020

Forecasting the monthly incidence rate of brucellosis in west of Iran using time series and data mining from 2010 to 2019

PONE-D-20-00374R1

Dear Dr. Cheraghi,

We are pleased to inform you that your manuscript has been judged scientifically suitable for publication and will be formally accepted for publication once it complies with all outstanding technical requirements.

Within one week, you will receive an e-mail containing information on the amendments required prior to publication. When all required modifications have been addressed, you will receive a formal acceptance letter and your manuscript will proceed to our production department and be scheduled for publication.

Shortly after the formal acceptance letter is sent, an invoice for payment will follow. To ensure an efficient production and billing process, please log into Editorial Manager at https://www.editorialmanager.com/pone/, click the "Update My Information" link at the top of the page, and update your user information. If you have any billing related questions, please contact our Author Billing department directly at authorbilling@plos.org.

If your institution or institutions have a press office, please notify them about your upcoming paper to enable them to help maximize its impact. If they will be preparing press materials for this manuscript, you must inform our press team as soon as possible and no later than 48 hours after receiving the formal acceptance. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information, please contact onepress@plos.org.

With kind regards,

Esteban Tlelo-Cuautle, Ph.D

PLOS ONE

The updated manuscript is fine to be accepted

Reviewer's Responses to Questions

1. If the authors have adequately addressed your comments raised in a previous round of review and you feel that this manuscript is now acceptable for publication, you may indicate that here to bypass the “Comments to the Author” section, enter your conflict of interest statement in the “Confidential to Editor” section, and submit your "Accept" recommendation.

Reviewer #1: (No Response)

**********

2. Is the manuscript technically sound, and do the data support the conclusions?

The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented.

Reviewer #1: (No Response)

Reviewer #2: Yes

**********

3. Has the statistical analysis been performed appropriately and rigorously?

Reviewer #1: (No Response)

Reviewer #2: Yes

**********

4. Have the authors made all data underlying the findings in their manuscript fully available?

The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified.

Reviewer #1: (No Response)

Reviewer #2: Yes

**********

5. Is the manuscript presented in an intelligible fashion and written in standard English?

PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here.

Reviewer #1: (No Response)

Reviewer #2: Yes

**********

6. Review Comments to the Author

Reviewer #1: Dear Editor

There is no comments for this manuscript and the authors provided approperiate answers.

The decision is to accept.

Reviewer #2: (No Response)

**********

7. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Reviewer #1: No

Reviewer #2: No

28 Apr 2020

PONE-D-20-00374R1

Forecasting the monthly incidence rate of brucellosis in west of Iran using time series and data mining from 2010 to 2019

Dear Dr. Cheraghi:

I am pleased to inform you that your manuscript has been deemed suitable for publication in PLOS ONE. Congratulations! Your manuscript is now with our production department.

For any other questions or concerns, please email plosone@plos.org.

Thank you for submitting your work to PLOS ONE.

With kind regards,

PLOS ONE Editorial Office Staff

on behalf of

Dr. Esteban Tlelo-Cuautle

PLOS ONE

Citing articles via
https://www.researchpad.co/tools/openurl?pubtype=article&doi=10.1371/journal.pone.0232910&title=Forecasting the monthly incidence rate of brucellosis in west of Iran using time series and data mining from 2010 to 2019&author=Hadi Bagheri,Leili Tapak,Manoochehr Karami,Zahra Hosseinkhani,Hamidreza Najari,Safdar Karimi,Zahra Cheraghi,Esteban Tlelo-Cuautle,Esteban Tlelo-Cuautle,Esteban Tlelo-Cuautle,Esteban Tlelo-Cuautle,&keyword=&subject=Research Article,Medicine and Health Sciences,Infectious Diseases,Bacterial Diseases,Brucellosis,Medicine and Health Sciences,Tropical Diseases,Neglected Tropical Diseases,Brucellosis,Medicine and Health Sciences,Infectious Diseases,Zoonoses,Brucellosis,Biology and Life Sciences,Veterinary Science,Veterinary Diseases,Computer and Information Sciences,Artificial Intelligence,Machine Learning,Support Vector Machines,Computer and Information Sciences,Artificial Intelligence,Artificial Neural Networks,Biology and Life Sciences,Computational Biology,Computational Neuroscience,Artificial Neural Networks,Biology and Life Sciences,Neuroscience,Computational Neuroscience,Artificial Neural Networks,Research and Analysis Methods,Mathematical and Statistical Techniques,Statistical Methods,Forecasting,Physical Sciences,Mathematics,Statistics,Statistical Methods,Forecasting,Earth Sciences,Atmospheric Science,Meteorology,Wind,Medicine and Health Sciences,Epidemiology,Computer and Information Sciences,Artificial Intelligence,Machine Learning,