PLoS ONE
Public Library of Science
Genetic algorithm-based personalized models of human cardiac action potential
Volume: 15, Issue: 5
DOI 10.1371/journal.pone.0231695
•
•
•
• Altmetric

### Notes

Abstract

We present a novel modification of genetic algorithm (GA) which determines personalized parameters of cardiomyocyte electrophysiology model based on set of experimental human action potential (AP) recorded at different heart rates. In order to find the steady state solution, the optimized algorithm performs simultaneous search in the parametric and slow variables spaces. We demonstrate that several GA modifications are required for effective convergence. Firstly, we used Cauchy mutation along a random direction in the parametric space. Secondly, relatively large number of elite organisms (6–10% of the population passed on to new generation) was required for effective convergence. Test runs with synthetic AP as input data indicate that algorithm error is low for high amplitude ionic currents (1.6±1.6% for IKr, 3.2±3.5% for IK1, 3.9±3.5% for INa, 8.2±6.3% for ICaL). Experimental signal-to-noise ratio above 28 dB was required for high quality GA performance. GA was validated against optical mapping recordings of human ventricular AP and mRNA expression profile of donor hearts. In particular, GA output parameters were rescaled proportionally to mRNA levels ratio between patients. We have demonstrated that mRNA-based models predict the AP waveform dependence on heart rate with high precision. The latter also provides a novel technique of model personalization that makes it possible to map gene expression profile to cardiac function.

Smirnov, Pikunov, Syunyaev, Deviatiiarov, Gusev, Aras, Gams, Koppel, Efimov, and Rodríguez: Genetic algorithm-based personalized models of human cardiac action potential

## Introduction

Over the past half century, mathematical models of cardiac electrophysiology came a long way in terms of complexity, precision, and the area of application. Recent advances in computational cardiac electrophysiology make clinical application of computer models possible due to personalization of tissue geometry and fibers orientation [1]. However, while tissue-specific, person-specific and pathology-specific gene expression profiles affect AP waveform and propagation, these differences are usually not accounted for. The tissue-level simulations are usually based upon the same averaged single-cell elements of the model.

A number of publications utilized genetic algorithms (GA) to determine a set of cell model parameters reproducing experimental AP [25]. GA apply evolutionary principles to computational models aiming to find the optimal solution fitting experimental data. Initially, a number of model “organisms” with random parameter values are generated. After that, the “selection” operator is applied to the first generation of models, passing the models with higher values of fitness function to the “mating pool”. Usually, a fitness function is based on the Euclidean distance, as a squared difference between model and experiment. Mutation and crossover operators are then applied to the models in the mating pool. The former modifies model parameters according to a probability distribution function. In the simplest GA setting, the crossover operator exchanges the parameter values between organisms with a fixed probability, however more complicated modifications of crossover operators, such as Simulated Binary Crossover (SBX) were shown to improve algorithm performance [6]. Modified models are then passed to the next generation, new fitness function values are calculated, and the same set of genetic operators is applied iteratively until desirable goodness of fit is reached.

An obvious advantage of GAs is that these algorithms are perfectly parallel, making its parallelization straightforward: the slowest part of the algorithm, fitness function, could be performed independently for different organisms, while communication between tasks is limited to a small array of parameters. However, effective implementation of GA requires modification of genetic operators for each particular set of problems, which is often referred to as “no free lunch theorem ” [7]. One of the goals of the current study was to develop robust GA implementation making it possible to find the set of cardiac electrophysiology model parameters without premature convergence to sub-optimal solution.

Another limitation of optimization algorithms as applied to electrophysiological models is the absence of a unique solution. As was noted previously [810] same AP waveforms could be reproduced by computer models with different sets of parameters, in other words, model parameters are unidentifiable from the AP. Also, techniques combining stochastic pacing and complicated voltage-clamp protocols have been recently proposed to overcome the problem [4,8]. However, these techniques are limited to single-cell voltage-clamp recordings, which are not feasible in clinical electrophysiology or whole heart and cardiac tissue measurements. The aim of the current study was to develop a technique that would allow finding a unique solution using optical, microelectrode, or monophasic AP recordings from cardiac tissue or whole heart. To the best of our knowledge, this is a first study providing a technique suitable for optical recordings, where only normalized AP waveform is known, but not the exact transmembrane potential values. Arbitrary rescaling and shift of input AP waveform introduces a new dimension to parameters identifiability problem mentioned above. A possible approach to address this problem is to utilize so-called restitution property, which is AP dependence on heart rate or pacing cycle length (PCL). For example, a reduction of the sodium current would result not only in the reduction of the amplitude of AP, but also in change of the steady state intracellular ionic concentrations and consequently, in changes of the restitution curve. Several previous publications [2, 5] utilized restitution property for optimization of GA-based cardiac models. For example, Syed et. al. [2] have used atrial AP waveform at several PCLs as input for GA and paced every organism for 10 seconds before fitness function evaluation. As we demonstrate below in the Final Algorithm subsection of the Results section, this approach results in a poor convergence. We have identified two reasons behind this fact: firstly, intracellular concentrations require much more than 10 seconds to converge to a steady state; secondly, a model with the particular set of parameters may converge to different steady states depending on the initial conditions. In order to address these issues, we implemented a modification of GA allowing optimizing parameters and searching for steady state in the slow variables space simultaneously. After each short simulation, variables are saved, modified by genetic operators and reused as initial states for a new generation.

Finally, we verified the algorithm against the experimental optical AP recordings from the human heart. Since we could not measure ionic channel conductivities directly, we used the following assumption instead (Fig 1B): these conductivities should be proportional to corresponding proteins mRNA level of expression as measured by either Cap Analysis of Gene Expression (CAGE) [11] or RNA-seq. Thus, given that GA output model parameters represent actual ionic channels conductivities, the model rescaled in correspondence with mRNA expression profile differences between two patients, would reproduce AP restitution properties of both patients. Moreover, we have to note that this approach (i.e. combining GA with transcription profile) could be regarded as another technique of model personalization. As we show below, GA signal-to-noise ratio (SNR) requirements are rather strict and hard to accomplish in a clinical setting, while mRNA expression profile is possible to measure from tissue biopsy.

Fig 1
(A ) Genetic algorithm schematic diagram. Initially a set of organisms is generated, each of which is determined by a random vector of scaling factors for optimized model parameters (step 1). For each organism AP waveforms are calculated at several pacing frequencies (cycle lengths) and compared with the input APs (steps 3, 4). Organisms with the lowest Root Mean Square Error (RMSE) value are saved and directly copied into the subsequent generation, replacing the worst organisms (orange arrow). State vectors (intracellular concentration, gating variables etc) are also saved after each short simulation and reused as initial state during simulation in the next generation. After selection (step 6) the fittest organisms form the mating pool are modified by SBX crossover and Cauchy mutation (step 7, 8). Modified organisms move to the next generation (9). Process of AP and fitness function calculation, selection, crossover and mutation and elite replacement is repeated until the stop criterion is fulfilled. The algorithm is based on [3], modifications of the original algorithm are highlighted by green. (B) Algorithm verification with molecular (mRNA expression) and functional data (optical mapping). Patient 1 model parameters (MP1) were determined by GA. Patient 1 parameters (MP1) were rescaled proportional to Patient 2/Patient 1 mRNA expression level ratio (CAGE2/ CAGE1) and verified against functional data.Genetic Algorithm (GA) and CAGE-based personalization block diagrams.

## Materials and methods

### Computer simulations

We used O’Hara-Rudy model [12] to simulate human ventricular cell electrophysiology. The genetic algorithm (GA) is based on Bot et al. [3] (Fig 2A). Briefly: tournament selection was used for selection operator, i.e. two individuals are selected at random from two copies of the previous generation and the one with higher fitness function goes into the mating pool. Then random organisms are selected from the mating pool to modify their parameter by crossover (with a 0.9 probability) and mutation (with a 0.1 probability in the original algorithm, 0.9 probability in the modified algorithm) operators. Simulated binary crossover (SBX) [6] was used with polynomial probability density function (PDF) of 10th order and 0.5 genewise swap probability. Polynomial mutation with order 20 PDF was used in the original algorithm (see below the modifications to mutation operator, that we used in this study). After that, worst organisms in the mating pool are replaced by elite organisms (i.e. best organisms from the previous generation) organisms and the same sequence is repeated for the next generation.

Fig 2
(A,B) The single-cell O’Hara-Rudy model was paced at 1Hz frequency for 1000 s, from different initial intracellular concentrations (initial concentrations are listed on panel A). AP waveform on the 1st, 10th and 1000th beats are depicted by dotted, dashed and solid lines correspondingly. (C, D) [Na+]i and [Ca2+]nsr concentrations change during the simulations. (E, F) Box-and-whiskers plots depict GA output [Na+]i and [Ca2+]nsr concentrations distance from steady state values.Model convergence to steady state.

We introduced modifications to the algorithm (green-tinted boxes on Fig 1A) as described below, while their beneficial effects on the algorithm are discussed in the “Results” section.

• Input data. is steady state AP waveforms recorded at several PCL. The algorithm is optimized for optical AP recordings, where absolute transmembrane potential (TP) values are not known, and thus input data (both synthetic and experimental) is renormalized by the algorithm prior to every fitness function calculation. The following technique was used for renormalization. Firstly, input AP waveform is shifted along the time axis in order to superimpose compared waveforms; in particular, half-maximum depolarization of the waveforms to be compared is used as a reference point. After that, input AP is rescaled: Vrescaled = αV+β, where α and β coefficients are determined by the least-squares technique to minimize the deviation between input and output AP. In order to discard subthreshold depolarizations some large error value was assigned to low-amplitude APs (see below “Fitness function” subsection).
• AP calculations. O’Hara-Rudy [12] model was simulated with a custom C++ code using the Rush-Larsen integration technique [13] with an adaptive step as described in [12]. The minimal time step was set to 5e-3 ms.

Since cell-to-cell interactions affect AP waveform (S1A Fig), we simulated 1D tissue instead of a single cell when GA was applied to experimental data recorded from tissue. On the other hand, our simulations have shown that within the physiological range of conduction velocities (CV), 20–100 cm/s, exact gap junction conductivity value does not affect AP (S1B Fig). Therefore, during a GA run each model was simulated as 1D-tissue with 5 mS/μF conductivity between cells resulting in CV of 27 cm/s. The AP was recorded from the central cell of 30-cells long 1D tissue. We have found 30-cells long tissue to be sufficient to exclude boundary effects on the central cell of the tissue in case of CV slower than 27 cm/s (S1A Fig). 1D model simulations, while being less computationally expensive than 2D or 3D models, correspond to a plane wave propagating in a 3D tissue at a significant distance from the pacing electrode. Moreover, given that in a wide range of conductivities (S1B Fig) exact AP waveform was essentially independent from gap junctions conductivity, we can conclude that in case of a minor 2D or 3D wavefront curvature, additional perturbations by diffusion operator would not affect AP waveform as well.

• Fitness function. We used a Root-Mean-Square Error (RMSE) to evaluate how close is a given organism AP to input data at particular PCL:
$RMS{E}^{i}=\sqrt{\frac{1}{N}\sum _{t=0}^{N}{\left[{V}_{ref}^{i}\left(t\right)-{V}_{\text{mod}}^{i}\left(t\right)\right]}^{2}},$
where ${V}_{ref}^{i}$ is a baseline TP, ${V}_{\text{mod}}^{i}$- is a TP of simulated AP, N—is a number of samples. The fitness function was calculated as a weighted sum of SDi corresponding to different PCLs:
$RMS{E}_{tot}=\sum _{i=0}^{n}{w}_{i}\cdot RMS{E}^{i}$
where wi is a weight coefficient. Weights were taken equal for all PCLs unless otherwise noted. In order to eliminate subthreshold depolarizations, APs with an amplitude below 30 mV were discarded, i.e . large RMSE value was assigned to these organisms. Since photon scattering in the optical mapping setting is known to distort depolarization [14,15], initial depolarization phase (below -20 mV) was removed from compared AP prior to RMSE calculation.
• Save state vector. Computational cost of pacing every organism at every generation until reaching steady state during a GA run is prohibitively high. Thus, we saved each AP after a short simulation. We used 9 stimulations before fitness function evaluation since we have found odd number of stimulations helpful to exclude possible 1:1 alternans in the output model: in the case of alternating APs, the waveform (and, consequently, RMSE as well) was different every other generation. Thus, if alternating AP have a low RMSE value on generation N, the RMSE is going to increase on generation N+1. After that state vector (ionic concentrations, gating variables, etc.) are saved for each organism. These state vectors are used as initial state in the next GA generation. As a result, each organism approaches closer to steady state variables with every generation.
• “Elitism strategy”. Since genetic operators tend to spoil a “good” solution, the best organism is passed to the next generation without any changes replacing the worst [16]. More precisely: elite organisms do participate in mutation and crossover as usual, but an “unspoiled” copy is saved to replace the worst organisms in every generation. However, final state of elite organisms on generation N are still reused as initial state on generation N+1, thus slow variables get closer to steady state, while AP waveform and RMSE changes correspondingly (see “save state variables” above). We have found that using a high number of elite organisms (about 6–10% of the whole population) is optimal for our GA modification.
• Cauchy mutation.
• As was noted previously [17] Cauchy mutation in general results in better convergence for functions with many local minima. Therefore, we modified the mutation operator to use Cauchy distribution:
${f}_{X}\left(x\right)=\frac{c}{\pi }\left[\frac{\gamma }{{\left(x-{x}_{0}\right)}^{2}+{\gamma }^{2}}\right]$
where fX is a probability density of the distribution; x0 corresponds to unmutated parameter value; γ = 0.18*xO′Hara−Rudy is the half-width at half-maximum of the distribution (i.e. PDF is half the maximum value, when xx0 = γ); $\frac{c}{\pi }$ is a normalization constant resulting from a limited range of parameter values, varied between 0.01 · xO′Hara−Rudy and 4.0 · xO′Hara−Rudy; xO′Hara−Rudy is original O’Hara-Rudy model parameter value. The particular value of γ was chosen since test runs indicated best algorithm convergence in this case (see S3A and S3B Fig).
• Most commonly in genetic algorithms mutation operator is applied to each parameter separately with some fixed probability [18], we have found that it has adverse effects on algorithm convergence, because of the small probability to mutate several parameters simultaneously. Instead, we choose a random unit vector in multi-dimensional parameter-space and mutate the parameter vector in this direction. For example, in the case of the two-parameter problem: if ($\frac{\sqrt{2}}{2}$,$\frac{\sqrt{2}}{2}$) unit vector was chosen, then both parameters are going to be increased by the same amount after mutation. See also Fig 3 and the corresponding Results section.
• The initial values of the slow variables at each PCL (intracellular Na+ and network sarcoplasmic reticulum Ca2+ concentrations) are included in parameters vector and mutated as usual model parameters. The [Ca2+]NSR concentration was chosen, because significant amount of calcium is stored within SR when the cell is at resting potential. We did not include potassium concentrations in the optimized parameters vector, since 10 mM [K+]i concentration changes results only in approximately 2% Nernst potential change, and thus a major concentration changes have a minor immediate effect on AP waveform. This technique allows the algorithm to reach the model steady state more effectively (see corresponding part in the Results section). Note that intracellular concentrations are different for different PCL, consequently, there are separate values for each pacing frequency. Thus, when input data included 4 pacing frequencies the full set of parameters was: gNa, gKr, gK1, gKs, PCaL, gto, gNaK, gNCX, gpCa, Jrel, Jup, CMDN, CaMKII, [Na+i]300 ms, [Na+i]500 ms, [Na+i]1000 ms, [Na+i]2000 ms, [Ca2+NSR]300 ms, [Ca2+NSR]500 ms, [Ca2+NSR]1000 ms, [Ca2+NSR]2000 ms.
Fig 3
(A) Vector mutation: random mutation direction choice, blue points indicate parameter values after mutation. (B) Point mutation: each parameter value is mutated with fixed probability, which results in low probability to mutate parameter value in diagonal direction. (C) RMSE averaged over all organisms of 9 GA runs plotted against generation number. Red and blue lines correspond to point and vector mutations, respectively. (D, E) Objective parameters distribution for 9 GA runs with point mutation (red boxes) and vector mutation (blue boxes). Dashed line corresponds to the input model parameter values.Random mutation direction in multi-dimensional parameter space.

In order to verify the algorithm precision against synthetic data, simulated APs with an arbitrary set of model parameters were used as input data. The input model was paced until reaching steady state (1000 seconds) at several PCLs: 217 ms, 225 ms, 250 ms, 300 ms, 500 ms, 1000 ms, 2000 ms, unless noted otherwise.

### Multidimensional data visualization

• Principal Component Analysis technique was used to visualize convergence in multidimensional parametric space (note, that slow variables where not used for PCA analysis). Organisms parameters of p compared generations form a matrix X of size m×np, where m is the number of organisms and n is the number of optimized parameters. According to the principal component method, matrix X decomposed into a multiplication of two matrices T (scores matrix) and P (loading matrix) plus residual matrix E:
$X=T{P}^{T}+E={t}_{1}{p}_{1}^{T}+{t}_{2}{p}_{2}^{T}+E,$
where pi, (i = 1, 2) correspond to loading matrix rows, columns ti, (i = 1, 2) of matrix T are Principal Components (PC) being the organisms parameters in the new coordinate system. P is a transformation matrix from initial variables space to 2D space of principal components. The first two principal components explained 75% of parameters variance in the Fig 4 and S2 Fig.

In order to estimate organism parameters convergence in the principal components space we used Mean Cluster Error (MCE) and Standard Distance (SDist) metrics (S2A Fig):

• Mean Cluster Error represent the distance between (x0, y0) point corresponding to precise solution (input model value) and (xc, yc) corresponding to cluster geometric center calculated within a single generation:
$MCE=\sqrt{{\left({x}_{0}-{x}_{c}\right)}^{2}+{\left({y}_{0}-{y}_{c}\right)}^{2}}$
• Standard Distance was used to estimate cluster size:
$SDist=\sqrt{\frac{\sum _{i=1}^{n}{\left({x}_{i}-{x}_{c}\right)}^{2}}{n}+\frac{\sum _{i=1}^{n}{\left({y}_{i}-{y}_{c}\right)}^{2}}{n}},$
Where xi and yi are principal components of a given organism, (xc, yc) is the cluster geometric center and n is the total number of organisms within a generation.

Fig 4
Principal component analysis of convergence dependence on the number of elite organisms: 0% (red points), 0.3% (blue points), 3.3% (green points) and 6.6% (purple points). Higher number of elite organisms results in the faster clusterization around the input model parameters.Modified elitism strategy.

### Donor heart procurement

All studies using human heart tissue were approved by the Institutional Review Board (Office of Human Research) of the George Washington University. In total, for this study, we procured from Washington Regional Transplant Community in Washington, DC discarded ventricular tissues from 14 deidentified donor human hearts, which were unsuitable for transplantation. All hearts were arrested using the ice-cold cardioplegic solution in the operating room and tissue was transported to the laboratory for dissection and electrophysiological experiments.

For subsequent CAGE mRNA analysis left ventricular tissue samples were dissected, submerged in RNAlater (Invitrogen) for 24 hours at 4°C, and stored at -80° C until the extraction of RNA. Tissue for RNA-Seq analysis was collected from the right ventricular (RV) tissue close to the apical region. Total RNA was extracted from these samples using the RNeasy Fibrous Tissue Mini Kit (Qiagen) according to the manufacturer’s instructions, from approximately 30 mg samples. At the end of the extraction, spin columns were eluted with the eluate to increase the RNA yield. Total RNA concentration and purity were determined on an Eppendorf BioPhotometer D30. Acceptable purity, as quantified by both the 260/280 and 260/230 absorbance ratios, was between 1.8 and 2.2. CAGE samples were also evaluated for integrity by electrophoresis on 1.2% agarose gels with 0.25 μg/ml ethidium bromide. RNA-seq samples were evaluated for integrity by Agilent Bioanalyzer, samples with RIN >7 were used for sequencing.

### Optical recordings of action potentials

Human left ventricular wedge or right ventricular outflow tract (RVOT) preparations were used for experiments, as described previously [19]. Briefly, wedges from the posterolateral LV free wall perfused via the left marginal artery were dissected, cannulated, and mounted in a tissue chamber with 4 surfaces (epicardium, endocardium, and the 2 transmural sides) facing 4 CMOS cameras of the optical apparatus [19]. For Human RVOT preparations, the left coronary artery (LCA) and the right coronary artery (RCA) were cannulated to enable RVOT perfusion and the tissue suspended vertically in a bath to enable simultaneous dual-sided optical mapping with 2 CMOS cameras. Preparations were perfused with oxygenated Tyrode solution maintained at 37°C, with a perfusion pressure of 60 to 80 mm Hg. The preparation was washed with 2 L of Tyrode solution to remove excess transplant solution and restore basal electrophysiology. Tissue was immobilized by blebbistatin (10–15 μM) to suppress motion artifacts in optical recordings, without adverse electrophysiological effects [20]. Di-4-ANBDQBS was used to map transmembrane potential as described previously [21]. Platinum-iridium tipped bipolar pacing electrode was placed on the epicardial surface. Optical action potentials were mapped from ≈5×5 cm field of view from the 4 (LV) or 2 (RVOT) surfaces using 4 or 2 MiCAM05 (SciMedia, CA) CMOS cameras with high spatial and temporal resolutions (100×100 pixels; sampling frequency, 1 KHz).

### Cap-Analysis of Gene Expression (CAGE)

This study was done on RNA extracted from human tissues procured as described above. After total RNA was extracted, as previously described, 5 μg of total RNA (260/280 2.0±0.01, 260/230 1.9±0.15, RIN 8.0±0.66) were submitted for 5'nAnT-iCAGE libraries preparation according to standard protocol [11], sequenced and demultiplexed on Illumina HiSeq2500 High throughput mode (50nt single end). In silico processing of sequenced CAGE tags was performed by using Moirai system [22]. This protocol includes quality control (fastx_trimmer: -Q33, -l 47), N base, and rRNA trimming (rRNAdust v1.0) with subsequent alignment to the human genome version hg19 through Burrows Wheeler Aligner (BWA). The median mapping ratio was 0.88±0.02 and median depth of 11.7M. CTSS (CAGE transcription start sites) and clusters of CAGE signal generated by applying python scripts: level1 and level2 [23]. The first script generates CTSS (CAGE tag starting sites) files, where 5' end of the mapped CAGE reads are counted at a single base pair resolution. The second script performs signal clustering on CTSS files with a minimum 10 TPM (tags per million) in at least one sample and minimum distance between clusters of 20 base pairs. These resulted in median 1.27M of CTSS and 13.1K of putative promoter regions, where ~12.1K overlap promoters from FANTOM5 [24]. Finally, 11612 predicted promoters were associated with 10355 genes through RefSeq and Ensembl transcripts obtained from UCSC [25] by extending the searching area of its 5’ ends in ±500 base pairs. CAGE promoters for the key genes were manually curated by visualization in Zenbu browser [26]. TPM normalized CAGE counts were submitted to edgeR package for R for differential expression analysis according to the protocol [27].

### RNA-Seq analysis of gene expression

RNA sequencing was done at The George Washington Genomics Core facility. Library preparation was done with TruSeq Stranded mRNA Library Prep. Sequencing was performed on Illumina NextSeq 500 with a target of 30,000,000 reads per sample using 2x75 cycle High-Output kit. Raw reads were quality checked with FASTQC [28]. Trimming of the NextSeq sequencing adapters was done with Flexbar. Read alignment to the human genome 19 and transcript abundance were performed with Kallisto [29]. Summarization of the abundances into a matrix for the downstream analysis was done with tximport [30]. Transcript normalization was done with DESeq2 according to the published workflow [31].

### Optical mapping signals processing

In 2 left ventricular preparations and 7 right ventricular preparations after collecting tissue samples for CAGE optical APs were recorded from endocardial surface of the wedge. The tissue was paced until reaching steady state (100 seconds) at 4 different PCLs. Unless otherwise noted the PCLs used for recording AP waveform restitution were: 2000 ms, 1000 ms, 500 ms and 300 ms. We observed some variation in AP waveform over endocardial surface, therefore a single pixel recording with higher APD, upstroke velocity and overall signal-to-noise ratio was chosen manually and used as input data for GA. Low-pass filter was not used, to avoid AP waveform distortion. 60-Hz hum was removed with narrow band stop IIR Butterworth filter. Ensemble averaging over a series of APs recorded from the same pixel was used to improve signal to noise ratio (SNR).

### Algorithm verification against mRNA expression profile

When GA was applied to experimental recording, we could not directly measure ionic channel conductivities to verify the precision of GA output parameters. Instead, the following indirect approach based on transcription profiling was used (Fig 1B). Genome-wide transcription profile was measured via CAGE (for Patients 1–7) or RNA-Seq (for Patients 8–14) techniques as described above. We assumed ionic channels conductivity to be proportional to TPM counts (in case of the CAGE) or DESeq2-normalized counts (in case of RNA-Seq) of the mRNA encoding the corresponding pore-forming subunit protein. In particular, we considered differences in SCN5A, KCNH2, KCNJ2, KCNQ1, CACNA1C, KCNA4, ATP1A1, SLC8A1, ATP2B4, RYR2, ATP2A2, CALM1, CAMK2D genes to be represented in the model by INa, IKr, IK1, IKs, ICaL, Ito, INaK, INCX, IpCa, Jrel, Jup, CMDN, CaMKII parameters correspondingly (similar approach was used in [12]). Differences in these genes level of expression among Patients 1–7 left ventricular preparations are shown in S4 Fig. In the case of CAGE group of patients, functional data was not available for Patients 3–7, thus only Patients 1 and 2 were used for algorithm verification. Patients 1 and 2 are highlighted by orange and grey colors in S4 Fig. Patient 1 functional data from optical mapping was used as input for GA. Output Patient 1 model parameters (ionic channels conductivities) were rescaled proportional to Patient 2/Patient 1 ratio of corresponding mRNA TPM. The resulting mRNA-based Patient 2 model was compared to Patient 2 functional data as described below in “Results” section. Similarly Patient 8, 9 and 11 right ventricular AP waveform was used as input for GA, output parameters were rescaled proportional to mRNA profile as measured via RNA-Seq technique and compared to corresponding Patients 9–14 AP waveform recordings. Due to computational limitations, after preliminary analysis several patients available were excluded and not used as input to GA. In particular, Patient 10 was excluded because of a very long depolarization time (see “Experimental data” subsection of the “Results” section for a brief discussion of optical mapping artefacts affecting depolarization phase). Patients 12 and 13 were excluded because of the very short APD (below 300 ms, which indicated ischemia of the preparation). Patients 12 and 14 were excluded due to low signal-to-noise ratio, while at high frequencies alternans was also observed for Patient 14.

## Results

### Intracellular concentrations

The immediate consequence of using steady state AP waveform dependence on PCL as GA input data is that output model AP should be steady state as well. The direct approach to the problem is to pace every organism for a long time during a GA run. However, this solution is computationally very expensive: successful GA convergence requires at least 100 organisms and 100 generations [2,4]. On the other hand, arbitrary initial state of the model requires at least 100 s to stabilize intracellular ionic concentrations. Moreover, given different initial state, intracellular concentrations converge to different values [32], although this issue is very often neglected in cardiac electrophysiology studies. Fig 2A–2D exemplifies the case. O’Hara-Rudy model was paced as 1000 ms PCL from the following initial states:

• 1st state: [K+]i = 145 mmol, [Na+]i = 5 mmol, [Ca2+]NSR = 0.5 mmol
• 2nd state: [K+]i = 120 mmol, [Na+]i = 8 mmol, [Ca2+]NSR = 4 mmol.

Fig 2B shows that these differences may have significant effects on the steady state AP waveform: resting membrane potential (RMP) difference is 4.8 mV, AP duration difference is 15 ms.

Instead of the direct approach, we have evaluated the fitness function after a short run (9 stimulations at each PCL). The final state variables at each PCL are saved and reused as initial state at the next generation (Fig 1A). However, given that the set of parameters minimizing RMSE is itself a function of intracellular concentrations, this approach results in an optimizer solving essentially a new problem every generation. Moreover, as shown above (Fig 2B–2D) “fixed” initial state for the organisms should result in a “fixed” steady state that might be different from input data. The solution that we propose in this study is to perform a simultaneous search in the parametric and slow variables space, i.e. the initial values of slow variables were added to parameters vector making them susceptible to mutation and crossover operators. Why would this technique eventually result in concentrations converging to a steady state? The rationale is the following: if given parameters vector results in an acceptable AP waveform (i.e . close to the input AP), but far from steady state, then this solution is going to be discarded eventually, since waveform is going to change after few beats. For example, if AP of particular organism on generation N minimizes RMSE, but corresponds to one of the dotted lines in Fig 2B, then RMSE is going to be large for this particular organism on generation N+1 (i.e. after 9 beats, one of the dashed lines).

In order to test if this technique indeed results in a steady state we have run 9 GA simulations. After that, the output models on generation № 700 were paced for 1000 seconds. The observed difference between GA output state and steady state is depicted in Fig 2E and 2F. The output [Na+]i was 0.016±0.012 mM from steady state, the [Ca2+]NSR was 0.08±0.07 mM from steady state. S5 Fig demonstrates additional GA tests with different numbers of beats prior to fitness function. While using 5 or less beats increased output model distance to steady state, the increased number of beats resulted in minor improvement of the output model.

As we demonstrate below, this simultaneous optimization still might halt algorithm convergence after a number of generations; below we discuss several modifications to mutation operator and elitism strategy that improve optimizer performance.

### Mutation operator

The usual GA approach is to treat parameters separately by mutation operator, i.e . there is a fixed probability to mutate each parameter [18]. As demonstrated in Fig 3B this approach (“point mutation ”) results in a low probability to modify several parameters at the same time. RMSE dependence on generation number averaged on 9 GA runs (Fig 3C) demonstrate that slow convergence similar to coordinate descent algorithm [33] resulting from “point mutation” halts algorithm convergence after 100 generations. Therefore, in our GA implementation random direction in parameter space is chosen by mutation operation and the whole set of parameters is modified at the same time (“vector mutation ”, Fig 3A) resulting in better convergence (Fig 3C). Fig 3D shows that after 700 generations vector mutation estimates parameters much better than point mutation: for example, the error is 3±3% vs 7±8% for IK1, 1.6±1.6% vs 5±5% for IKr, 4±3% vs 11±8% for INa.

Cauchy distribution is a “pathological ” distribution with infinite variance and expected value. As was noted previously [17] Cauchy mutation tends to generate offspring far away from its parent. Consequently, it prohibits algorithm stagnation in local minima and generally results in better convergence for multimodal functions. Fig 5A compares two sample runs of GA with polynomial and Cauchy mutation. In case of polynomial mutation by generation 300 the whole population converged to a vicinity of a single solution that is different from input model set of parameters (red line). We observed slow convergence on subsequent generations simultaneous with slow intracellular concentration changes (compare to S6 Fig plotting corresponding intracellular concentration changes of the best organism). In the case of Cauchy mutation, every parameter except INaK converged to some vicinity of input model value by generation 300. At the same time, we did not observe algorithm stagnation: note, for example, the wide variation of IK1 and RyR parameters on generation 500. Consequently, we observed more robust algorithm convergence in case of Cauchy mutation for model fitting to synthetic AP data (Fig 5B). For example, the error of model parameters was 14±24% vs 17±75% for IKs, 11±25% vs 45 ±23% for Ito, 6±6% vs 13 ±15% for ICaL in GA runs with Cauchy and polynomial mutation correspondingly. These results might imply that wider polynomial mutation would result in better convergence as well. In order to test this assumption, we ran a number of GA tests with different distribution parameters. As shown by RMSE dependence on distribution parameters shown in S3 Fig, wider polynomial mutation did not improve the algorithm convergence.

Fig 5
(A ) Radar plots illustrating algorithm convergence in case of Cauchy (black) vs Polynomial (grey) mutations. Synthetic AP with parameter values shown by red line was used as input data. Axes show parameters ratio to corresponding O’Hara-Rudy model [12] values. (B) Objective parameters distribution for 9 GA runs with Polynomial mutation (red boxes) and Cauchy mutation (blue boxes). Dashed line corresponds to the input model parameter values. All values shown correspond to the best organism on generation 700.Cauchy mutation.

### Elitism strategy

Genetic operators tend to spoil a “good” solution; therefore, best organisms are passed to the next generation without any modifications [3,4]. We have found that given the wide parametric variation by Cauchy mutation in our GA implementation a large number of elite organisms is required for fast GA convergence. While wide exploration of parametric space is required at the initial stage of algorithm convergence, it is more effective to exploit the global minimum once it was localized (see [34] for discussion on exploration and exploitation). This could be achieved by a large number of elite organisms passing their parameters via crossover operator to siblings after clustering around the same minimum.

Principal component analysis (PCA) comparison of sample GA runs (Fig 4A–4D) shows that in case of higher proportion of elite organisms (6.6%) solution tends to cluster around the precise solution at generation № 100, while a GA run with 3.3% of elite organisms requires at least twice the number of generations to converge the population to the same cluster size. As further explained in S2 Fig high proportion (6.6% or 3.3%) of elite organisms results in the fast reduction of the cluster size (Standard Distance of the population, S2C Fig). This reduction, in turn, allows to exploit the solution by the algorithm: mean cluster error (MCE) has a clear trend after the reduction of the cluster size (S2B Fig), while MCE for the low proportion of elite organisms (0% or 0.3%) follows random fluctuations. In other words, large number of elite organisms result in two-stage optimization: initially the whole parametric space is explored, but eventually a number of elite organisms converge to the same minimum attracting the whole population to its vicinity and resulting in effective local optimization. Although PCA plots did not account for intracellular concentrations (see Methods section) it could be seen from comparison of S6 Fig (Cauchy mutation) and S2 Fig (6.6% elite) that actually convergence to global parametric minimum (MCE and SDist reduction) is simultaneous with concentration changes to some vicinity of input model steady state value.

The requirement of a high number of elite organisms indicates that interbreeding via crossover operator does not necessarily result in improved results. In order to test whether crossover operator is indeed essential to algorithm convergence we have ran 7 GA runs without the crossover . Indeed, as seen in S7 Fig, leaving out the crossover still results in decent convergence, however some output parameters are much less precise, in particular the error is 100±50% vs 19±23% for IKs, 93±54% vs 32±25% for SERCA, 200±140% vs 26±14% for RyR.

The RMSE dependence on the proportion of elite organisms to the whole population (S8 Fig). We observed slower algorithm convergence, when the number of elite organisms was below 4% of the whole population. On the other hand, the increase of the proportion of elite organisms above 10% typically resulted in algorithm stagnation after generation 100. Due to random nature of the algorithm convergence, the convergence speed is susceptible to fluctuations, but optimal proportion of elite organisms could be estimated as 6–10% of the whole population from S8 Fig.

### Final algorithm

In order to test if the new algorithm narrowed down the solution range, we have compared our algorithm with the original one [3] (Fig 6). We have re-implemented the original algorithm by Bot et.al. using Sastry toolbox [35] with the following modifications to it. Firstly, each organism was paced for 50 stimulations at every PCL, i.e. quasi-steady-state was used because it was computationally very expensive to reach actual steady state. Secondly, the normalized AP waveform at 7 PCLs was used as input data. Finally, the least squares technique was used to renormalize input data prior to fitness function evaluation as described in the Methods section. As shown in Fig 6 our algorithm determined IK1 conductivity with 3±3% precision, IKr− 1.6±1.6%, INa− 4±4%, ICaL—8±6% (vs. 17±17%, 11±10%, 16±15%, 86±94% error, respectively, compared to the original algorithm). Membrane currents that did not have profound effects on the AP waveform were less precise (IKs, Ito, INCX, INaK − 24±24%, 21±23%, 25±31%, 18±19% error, respectively), however, precision was still better then original algorithm (56±69%, 46±55%, 36±46%, 92±9%, respectively). Model parameters that did not affect AP waveform directly (RyR, SERCA and CAMKII) were also determined with an error less than 69%: 25±13% vs 25±32% for RyR, 25±22% vs 46±55% for SERCA, 14±16% vs 68±77% for CAMKII. As shown in Fig 6B the modified algorithm output intracellular concentration are also close to the input model precise values: for [Na+]i the difference was 0.2±0.6 mmol/l, while for [Ca2+]NSR the difference was 0.3±0.2 mmol/l.

Fig 6
(A) Parameter scaling by the presented algorithm (blue boxes) as compared to implementation of [3] (red boxes) after a long evolutionary run for 700 generations. Dashed line corresponds to input model parameter value. (B,C) Output intracellular concentrations comparison between 2 algorithms.Comparison of presented algorithm precision to Bot et al.

### Input data requirements: Restitution information

In the results described above, input data was APs simulated at 7 different PCLs (see Methods section). However, in real experimental setting it is preferable to reduce the recording time including the number of PCLs at which steady state AP waveforms are recorded. We have tested GA performance, when input AP was simulated at limited number of pacing frequencies. Particular PCLs we used to compare algorithm performance are listed in Fig 7A. We observed that for some parameters (Ito, SERCA: S9E and S9J Fig) single AP waveform recorded at 1000 ms was sufficient (i.e. algorithm precision did not increase when more detailed restitution properties were used). Two extreme points on the restitution curve (217 ms and 2000 ms) were required to determine IKr conductivity with 1.6±1.6% precision (S9B Fig). The parameters most sensitive to the AP restitution are shown in Fig 7B–7F. It is preferable to use full restitution curve (7 PCLs) for IKs, INa and ICaL. However, 4 points on restitution curve still result in acceptable accuracy for all other parameters. While using considerably different target parameter values would affect the results shown in this section, it provides an estimate of the amount of restitution information required to determine identifiable parameters. Consequently, we have recorded the AP waveform at 4 PCLs in the experiments described below.

Fig 7
(A) Input model AP was recorded at several PCLs listed in the table. Larger number of input baselines results in better algorithm performance. (B-F) Box-and-whiskers plots depict the parameters most sensitive to the changes in the number of input baselines (IK1, IKs, INa, ICaL, INaK) at generation 700 of 8 GA runs. Dashed line corresponds to the input model parameter value.Solution sensitivity to the number of input baselines.

### Input data requirements: Signal-to-noise ratio

The other inherent problem of experimental data that might spoil algorithm performance is noise. To test how noisy data would affect the algorithm precision we have added Gaussian noise to the input simulated AP. Sample AP waveforms with different signal to noise ratio (SNR) are depicted on Fig 8E and 8F and S10 Fig. As expected, noise amplitude heavily affected precision of the output parameters. As seen on Fig 8A–8D and S11 Fig 20 dB SNR completely breaks down algorithm: for example, error is up to 100% for IKr and up to 227% for IKs conductivities. 28 dB SNR, achievable in experimental setting gives better results: the observed error is 1±2% for IKr, 6±3% for INa, 17±9% for ICaL, 23±11% for INaK . As also demonstrated by histogram and probability plot on S12 Fig experimental noise in the optical mapping setting is indeed close to Gaussian (see corresponding section of S1 Text for details).

Fig 8
The input model AP waveform was distorted by Gaussian noise. (A) Parameters dependence on the SNR is illustrated at generation 700 of 9 GA runs. Dashed line depicts the input model parameter value. (B) Sample AP waveform at 28 dB and 20 dB SNR.Solution sensitivity to the input baselines signal-to-noise ratio (SNR).

### Experimental data

GA was also tested with optical APs as input data (see Fig 1B and “Methods” section). We observed some degree of APD heterogeneity in the most of human wedge preparations that we have optically mapped for this study. In particular, the Patient 1 APD80 was 314±63 ms (APD map is shown on S13 Fig). One possible explanation of this heterogeneity is uneven perfusion of the sample resulting in mild ischemia that shortens AP duration because of activation of the ATP-dependent potassium channels IK,ATP . Since ATP-dependent potassium current was not accounted for by the model, we have chosen an AP with the longest APD as and input to GA. Fig 9A compares GA output model with input data. The Patient 1 model faithfully reproduced AP waveform dependence on the PCL, however, we observed some deviations between model and experiment. The deviations between input data and the output model are listed in Table 1. While RMSE is close to the noise level, and APD80 error did not exceed 14 ms, the difference in the depolarization phase is very pronounced: (dV/dt)MAX was approximately 20 V/s for input data, while for the output model it ranged from 55 to 80 V/s. This effect might be due to photon scattering in optical mapping recordings [14, 15]. Photons emitted by fluorescent dye undergo multiple scattering events and thus the recorded signal from a given pixel is actually an averaged signal from thousands of myocytes. This effect is known to distort AP waveform during depolarization when differences of membrane potential across the tissue are significant due to propagation of wavefront of excitation. Thus, in order to reduce the effect of this experimental artifact on the model, initial depolarization phase (below -20 mV) was removed from compared AP prior to fitness function calculation (see also Methods section).

Fig 9
(A) Patient 2 (see text for details) model APs (red line) fitted by GA as compared to the optical APs (blue) recorded at CL of 2000 ms, 1000 ms, 500 ms and 300 ms. (B ) Patient 1 model was rescaled using mRNA expression data and compared to experimental APs (see Fig 1B). Personalized models based on mRNA expression data from 7 donor hearts ventricles demonstrate different AP waveform (C) and restitution properties (D).Algorithm verification.
Table 1
Patient 1 GA-optimized model error.
CL = 300 msCL = 500 msCL = 1000 msCL = 2000 ms
Δ(dV/dt)MAX38 V/s53 V/s34 V/s58 V/s
RMSE6.3 mV4.0 mV5.8 mV3.8 mV
ΔAPD8014 ms3 ms8 ms3 ms

As described above in the Methods section, in order to verify output parameter values the Patient 1 model parameters were rescaled proportional to the ratio of mRNA expression levels between Patient 2 and Patient 1. The technique is based on the assumption that ionic channels conductivities are proportional to the mRNA counts (and, consequently, to the amount of protein being expressed) as measured by CAGE. If GA output parameters are close to actual conductivities in ex vivo tissue sample, then a model with parameters rescaled proportionally to the ratio of mRNA levels is going to reproduce another patients AP waveform. Thus, we have compared rescaled Patient 2 model to the optical mapping data recorded from the corresponding heart. As shown on Fig 9B and Table 2Patient 2 model still faithfully reproduced AP waveform at every PCL, however, AP waveform error was more pronounced. Particular spots of AP recordings for Patient 1 and Patient 2 are shown in S13 Fig.

Table 2
Patient 2 CAGE-based model error.
CL = 300 msCL = 500 msCL = 1000 msCL = 2000 ms
Δ(dV/dt) MAX69 V/s64 V/s47 V/s36 V/s
RMSE8 mV12 mV11 mV16 mV
ΔAPD807 ms4 ms10 ms9 ms

Using a similar technique, we have reconstructed personalized models for other 5 patients, AP waveform and restitution curves are depicted in Fig 9C and 9D. As noted above, functional data was not available for these particular patients, but the variability between the models is within physiological range [36]. Patient 5 and Patient 7 APD was too short (below 200 ms), which is explained by the fact that mRNA levels of expression had the most extreme deviations from median value in these patients. Patient 5 was an outlier in terms of ATP1A1, ATP2A2, ATP1B3, ATP2C1, KCNJ5, KCNK3 genes. Patient 7 was an outlier in terms of ATP1A1, ATP1B1, ATP1B4, CACNA1C, CACNA2D1, CACNA2D3, CACCB1, CACNB2, CALM1, CALM3, KCNH2, KCNJ11, KCNJ3, KCNK1, KCNK3, CAMK2B, CAMK2D genes (see also S4 Fig). These deviations might indicate problems with heart preservation prior to tissue collection or undiagnosed heart diseases.

To further study the precision and limitations of mRNA-based rescaling, we have tested this technique on a number of RV preparations. We have to note here, that while apical RV region was used for tissue collection and consequent transcriptome analysis, RVOT preparation was used for optical mapping studies. As seen on APD maps on Fig 10 in general we observed shorter APD close to the pulmonary artery (upper half of the preparation). Given the heterogeneity of the sample, we supposed the longer AP to represent typical RV tissue. Particular spot used for GA input is shown by red cross in Fig 10. As shown in Fig 11 and Table 3 the GA output model reproduced the AP waveform and restitution.

Fig 10
Gray lines depict AP waveforms recorded from the wedge preparations. Experimental waveforms are aligned to match the time corresponding to (dV/dt)MAX. Red lines correspond to Patient 8 GA-output model with parameters rescaled according to Patients 9–14 mRNA profiles. Similarly, blue and green lines correspond to Patient 9 and 11 based models correspondingly. The pixels of AP waveform recording that was used as input to GA is labeled by “+” symbols on APD maps. The pixel of the recording that was used on Fig 11 to compare Patient 8 based models with experimental AP is marked by red x on the APD maps.RVOT experimental recordings.
Fig 11
Patient 8 GA-output model (top left corner) comparison with experimental AP recordings. Red, green yellow and blue lines correspond to AP recorded at 2000, 1000, 500 and 250 ms PCL correspondingly. The pixel of recording is shown on APD maps in Fig 10. Model restitution curve (black line) is superimposed on violin plots depicting APD60 distribution in the wedge preparation. Other panels use the same convention to compare Patient 8 model rescaled according to RNA-seq data to Patients 10–12 AP recordings.Patient 8 GA output model.
Table 3
Patient 8 GA-optimized model error.
CL = 250 msCL = 500 msCL = 1000 msCL = 2000 ms
RMSE3.9 mV3.2 mV3.6 mV5.0 mV
ΔAPD8010 ms1 ms7 ms7 ms

Similarly to the case described above, we have rescaled GA-optimized RV model parameters proportionally to mRNA expression ratios as measured via RNA-seq technique. Resulting models did reproduce the experimental AP waveform and restitution. The comparison of resulting Patients 9–11 AP waveforms with experimental recordings are given in Fig 10 and Table 4.

Table 4
Patient 9–11 RNA-seq based models error.
CL = 250 msCL = 300 msCL = 500 msCL = 1000 msCL = 2000 ms
Patient 9
RMSE4.9 mV5.2 mV3.7 mV5.3 mV
ΔAPD803 ms17 ms5 ms4 ms
Patient 10
RMSE5.9 mV5.4 mV6.4 mV5.7 mV
ΔAPD8019 ms4 ms30 ms13 ms
Patient 11
RMSE4.9 mV4.5 mV4.3 mV5.5 mV
ΔAPD804 ms8 ms9 ms13 ms

As demonstrated by violin plots in Fig 11 the sample was very heterogeneous, thus mRNA-based model reproduced an experimental recording from a particular spot of the preparation (the particular pixel that was used for AP comparison is shown on APD maps in Fig 10). Violin plots demonstrate that in the cases of Patients 8, 10 and 11 the distribution was bi-modal, while mRNA-based model AP represents the longer mode of distribution. Since the apical RV region was used for tissue collection, we hypothesize that the longer mode of the distribution represents typical RV tissue, while the shorter mode represents RVOT-specific tissue.

On the other hand, as shown in Fig 10 (red AP) the rescaled models failed to reproduce the experimental AP waveform for Patients 12–14. In the cases of Patients 12 and 13, the sample was most probably ischemic, since APD was below 300 ms. In the case of Patient 14 the lower half of the sample, that we hypothesize to represent typical RV tissue, was mostly very noisy. Thus, the longer mode of the APD distribution might be obscured in this particular case.

Fig 10 also demonstrates GA-optimized models of Patients 9 (blue line) and 11 (green line) as rescaled to all other RV preparations. Although GA-output models were relatively close to input AP waveform (RMSE was below 7.4 mV for every AP waveform), the mRNA-based rescaling failed to reproduce the other patients AP waveform. Table 5 lists the parameter values for three variants of Patient 9 model: GA-output model, and two transcription profile-based models. It should be noted that in the case of GA-output Patient 11 model the sodium current is particularly low, resulting in subthreshold depolarizations of corresponding mRNA-based Patients 9 and 10 models. As mentioned above, this artifact is caused by slow depolarization phase of optical signal, which resulted in low INa scaling parameter in this particular GA run. It is also interesting that, as seen on Fig 10, AP for two models of Patient 9 are relatively close: Patient 9 GA-output model (blue line) and Patient 8 based model after parameters rescaling (red line). However, parameter sets shown in Table 5 are very different, in particular INCX, INaK and SERCA conductivities are 1.5–3.5 times higher in the latter case, while INa is 2.5 times lower. Surprisingly, in this case, the GA-output model actually performed worse than the mRNA-based model, the difference is most prominent at 250 ms PCL: in the former case (Patient 9-based model) APD80 error was 26 ms, in the latter (Patient 8-based model) APD80 error was 3 ms. On the other hand, this difference resulted only in slight RMSE250ms difference: 5.7 vs 5.0 mV correspondingly. This implies high sensitivity of GA to AP perturbations that was shown above by noise sensitivity analysis: GA-output model should follow experimental AP waveform closely and minor experimental artefacts might cause major change in parameter values. Patient 8 based models reproduced AP waveforms for 4 hearts, which indicates model precision for this particular case. This is also in line with the fact that GA-output model total RMSE was lower for Patient 8 than for Patient 9: 15.6 mV against 24.5 mV. It is also interesting that despite major differences in parameters, models behavior was surprisingly similar not only in the case of Patient 9, but also in a number of other cases. At least partially, this could be explained by similarities in expression profiles: for example, differences between Patient 11 and Patient 13 genes expression corresponding to IKr, IKs, Ito, ICaL , RyR and CaMKII were less than 9% (see Table B in S1 Text).

Table 5
Parameter values (relative to baseline O’Hara-Rudy model [12]) of Patient 9 models derived via GA or via comparison of mRNA levels to reference patient.
Reference patientPatient 8 (mRNA-rescaled model)Patient 9 (GA-optimized model)Patient 10 (mRNA-rescaled model)
IK10.4570.2830.618
IKr1.2261.0020.806
IKs0.1490.1430.306
INa0.9892.5420.550
Ito8.2189.7434.227
ICaL0.6060.8300.405
INCX2.9911.7721.459
INaK4.1361.2203.353
IpCa1.0660.1700.307
SERCA4.6902.2273.436
CaM10.1123.7621.739
RyR0.8250.6900.470
CaMKII0.1630.2110.902

## Discussion

In this study, firstly, we have introduced a novel GA modification allowing one to personalize cardiac electrophysiology models using steady-state AP recordings dependence on PCL as input data. The algorithm modification is based on the idea that parametric optimization and slow variables steady state search should be performed simultaneously for effective convergence. Secondly, we have tested the modifications we introduced on synthetic action potentials proving our modifications to be advantageous for overall algorithm performance. Finally, we have tested the algorithm performance against cardiac optical mapping experimental data and mRNA expression profile. The output parameters precision was confirmed by the observation that mRNA-based models predict patients AP waveform and restitution. Essentially, a combination of GA with the mRNA expression measurements provides a novel technique for model personalization.

### The algorithm modification

Many different combinations of model parameters result in the same AP waveform. However, these solutions are mostly non-steady state: AP waveform diverges from experimental in the long run. Implicit steady state requirement allowed us to substantially narrow down the parameters range. As shown above (Fig 7, S9 Fig), using AP waveform dependence on PCL as input data further increased algorithm precision.

Albeit premature convergence to local minimum of RMSE hinders the solution of the optimization problem, it is preferable to exploit the vicinity of previously visited points once global minimum is localized. This is accomplished in this study by a large number of elite organisms. When elite organisms fall in some vicinity of RMSE minimum two scenarios are possible. If given solution is far from steady state they are going to be either discarded because of major variation of AP waveform in few generations (since these organisms are not susceptible to modification by mutation). If slow variables are close to steady state, elite organisms start to attract the population to the same vicinity. In the latter case, the whole population is going to exploit the vicinity of the optimization problem solution, thus working as a local optimizer. As seen from the comparison between S6C and S6D and S2B and S2C Figs intracellular concentrations and parametric convergence share similar dynamics when ratio of elite organisms is high. On the other hand, it might be more effective to use either classic gradient-based methods or Covariance Matrix Adaptation Evolution Strategy [41] once global minimum is localized (i.e. when a number of elite organisms converged to the same vicinity).

The convergence speed of the algorithm was improved by further modifications in the mutation operator. We found that mutating each parameter with a fixed probability (referred to as “point mutation ” above) resulted in a single parameter being modified by mutation operator (Fig 3B). This kind of descent is very slow in 27-dimensional parameter space (parameters include intracellular concentrations at each PCL) and usually stopped convergence after about 100 generations (Fig 3C). Instead, we chose random direction in multidimensional parameter space and mutated parameter in this direction using Cauchy distribution, resulting in better algorithm performance (Figs 3 and 5).

It is worth noting that all these modifications to GA essentially make the algorithm similar to particle-based algorithms. For example, multivariate jumps are used by Particle Swarm [42], Covariance Matrix Adaptation Evolution Strategy [41] and Cuckoo Flight [43] algorithms. The last one uses “pathological” distribution similar to Cauchy mutation for Levy-flight walk. Although it would be very interesting to compare these algorithms performance with GA modification proposed in this study, some custom modifications to the aforementioned algorithms are still required: as was noted above same AP waveform could be reproduced by different parameter sets [8]. In order to narrow down the output parameters range, not steady state solutions should be somehow automatically discarded by the algorithm. This kind of extensive benchmarking is beyond the scope of the current study.

### The algorithm verification against synthetic data

Test runs indicate that resulting algorithm evaluates the most important model parameters (IK1, IKr, INa and ICaL ) with high precision (Fig 6). However, low amplitude ionic currents and parameters affecting calcium transients were much less precise. We further tested the algorithm sensitivity to noise to find that 20 dB SNR breaks the algorithm, while SNR above 28 dB has minor effects on algorithm performance (Fig 8). Another point should be discussed in this regard: while the algorithm is relatively stable, SNR requirements are quite strict. It was possible to achieve 28 dB level noise in ex vivo optical mapping experiment; however, it still required some post-processing: hum removal and ensemble averaging in particular. These strict requirements might hinder algorithm application to clinical recordings prone to artifacts distorting the signal. As discussed below a combination of GA with mRNA expression profile could help to solve the problem of model personalization using clinically-measurable data.

### The algorithm verification against experimental data

We did not measure ionic channels conductivities in this study, thus we could not directly verify algorithm output parameters precision against experimental data. Instead, we used an indirect approach based on the following strong assumptions. Firstly, we assumed that given ionic channel conductivity should be proportional to corresponding proteins mRNA expression level. One thing to note here is that actual ionic channel is assembled from several proteins. For example, cardiac sodium channel is assembled from pore-forming subunit encoded by SCN5A gene and auxiliary subunits encoded by SCN1B and SCN2B genes [44]. The situation is more complicated in the case of IK1, which is most likely a heterotetrameric complex assembled from proteins encoded by KCNJ2, KCNJ4 and KCNJ12 genes in any combination [45]. Thus, our second assumption was that a single pore-forming subunit level of expression could predict ionic channel conductivity. If these assumptions hold and if GA output parameters correspond to actual ionic channels conductivities in one patient, then model of another patient model could be reproduced by simple rescaling of the conductivities.

Indeed, two-patient comparison of LV preparations have shown that mRNA-rescaled model reproduced Patient 2 AP waveform at every PCL (Figs 1B and 10). We have undertaken more extensive study on RV preparations that demonstrated that Patient 8 GA output model reproduced AP waveform and restitution in three cases (Patients 9–11 ) after parameters rescaling (Fig 11), however it failed to reproduce the AP waveform for 3 other patients (Patient 12–14 ) (Fig 10). One possible reason standing behind this fact is poor perfusion of the wedge preparation resulting in the activation of ATP-dependent potassium current. Indeed APD80 was below 300 ms for Patients 12 and 13 . Another possible reason is the heterogeneity within the right ventricle that we observed in all RVOT preparations (Fig 10). Since different regions of the ventricle were used for tissue collection and optical mapping, expression profile might not correspond to optically mapped RVOT region. This heterogeneity also explains why models shown in Fig 10 tend to overestimate the APD. In order to exclude possible effect of ATP-dependent shortening of AP, we have manually chosen the recording with the longest AP as GA-input. On the other hand, probable local tissue damage at the site of recording might result in an opposite effect, i.e . downregulation of K-channels [46] prolonging the APD.

In the Fig 10 and Table 5 we also demonstrate that minor changes in AP waveform may result in major changes of parameters values. Despite the differences in Patients 9-11-based models, in some cases the AP waveforms and restitution were very similar. Consequently, this imposes a limitation on the GA-output model: it has to follow experimental data very closely and perturbations of experimental AP waveform are likely to spoil algorithm performance. In our opinion, this fact also implies that the modifications we introduced to GA were crucial to the successful gene expression-based prediction of AP waveform.

As was noted previously [47] modern cardiac models coalesce multiple studies performed on different species, using different experimental conditions. Moreover, given the complexity of modern models, it is possible to describe particular dataset using different parameters [810] (see also Table 5 and Fig 10). These facts leave us with questions: even if the model describes the particular dataset underlying it, what is the predictive capability of computer simulations? Is it possible to extrapolate model predictions to different clinical or experimental conditions? For example, it is possible that the model description of a particular ionic current is imprecise, but other model components are instead tuned to counterbalance this imprecision. In this case, the model components rescaling would most probably upset the balance and result in model failure to predict AP waveform. In this study, we have shown that it is possible to use computational model to map expression profile to cardiac function and predict actual AP waveform (Figs 911). This fact makes a point not only in favor of particular GA-output parameters set, but also indicates the underlying computational model precision.

Our results also indicate that it might be possible to reconstruct the personalized model of in situ heart using a similar technique, i.e. transcription profile in combination with GA. In some cases (for example, post heart transplant patients) ventricular biopsy could be justified, and tissue samples might be obtained from the patient’s heart. In these cases model parameters could be recovered from an ex vivo heart experiment via GA, then another in situ personalized model could be rescaled using measured mRNA levels. Another possible implication of the provided technique is drug effects investigation. For example, ionic channel blockers often affect several ionic currents. Using GA to measure the effects of a drug provides a cheaper way to recover several ionic currents dose-response curve simultaneously then the patch-clamp technique.

## Limitations

The output conductivities of high-amplitude ionic currents were very precise; however, algorithm performance was much less accurate for important model parameters affecting calcium transients, RyR and SERCA in particular. Using multiparametric optical mapping [48] as input data could probably further increase algorithm precision, however, this question is beyond the scope of the current study.

We have observed a large number of factors that affect GA output values: strict SNR requirements, possible ischemia or heterogeneity of the preparation. Thus, additional measurements are still required to control the GA output parameters precision. Patient 11-model appears to underestimate the sodium current and consequently Patient 11-based models failed to depolarize in half of the cases. Patient 9-based model seem to overestimate APD and, consequently, in half of the cases it was impossible to pace the model at the fastest frequency (Fig 10). This implies the imprecision of Patients 9 and 11 GA-output parameter values.

We used a rough approach for mRNA-based personalized models: ionic channel conductivity was taken proportional to a single gene level of expression. Mostly we used the pore-forming protein for this purpose. More precise approach would require one to account for auxiliary subunits affecting ionic channels voltage-dependence.

## References

1

NA Trayanova, F Pashakhanloo, KC Wu, HR Halperin. . Imaging-Based Simulations for Predicting Sudden Death and Guiding Ventricular Tachycardia Ablation. Circ Arrhythm Electrophysiol. 2017;10, doi: 10.1161/CIRCEP.117.004743

2

Z Syed, E Vigmond, S Nattel, LJ Leon. . Atrial cell action potential parameter fitting using genetic algorithms. Med Biol Eng Comput. 2005;43: , pp.561–571. , doi: 10.1007/bf02351029

3

CT Bot, AR Kherlopian, FA Ortega, DJ Christini, T Krogh-Madsen. . Rapid Genetic Algorithm Optimization of a Mouse Computational Model: Benefits for Anthropomorphization of Neonatal Mouse Cardiomyocytes. Front Physiol. 2012;3, doi: 10.3389/fphys.2012.00421

4

W Groenendaal, FA Ortega, AR Kherlopian, AC Zygmunt, T Krogh-Madsen, DJ Christini. . Cell-Specific Cardiac Electrophysiology Models. AD McCulloch, editor. PLOS Comput Biol. 2015;11: , pp.e1004242, doi: 10.1371/journal.pcbi.1004242

5

S Dutta, D Strauss, T Colatsky, Z Li. . Optimization of an In Silico Cardiac Cell Model for Proarrhythmia Risk Assessment. Front in Physiol. 2017;8, doi: 10.3389/fphys.2017.00616

6

K Deb. . Simulated Binary Crossover for Continuous Search Space. Compl Sys. 1994; 9(3):, pp.115–148.

7

DH Wolpert, WG Macready. . No free lunch theorems for optimization. IEEE Trans Evol Comput. 1997;1: , pp.67–82. , doi: 10.1109/4235.585893

8

T Krogh-Madsen, EA Sobie, DJ Christini. . Improving cardiomyocyte model fidelity and utility via dynamic electrophysiology protocols and optimization algorithms: Cardiomyocyte model optimization. J Physiol. 2016;594: , pp.2525–2536. , doi: 10.1113/JP270618

9

AX Sarkar, DJ Christini, EA Sobie. . Exploiting mathematical models to illuminate electrophysiological variability between individuals. J Physiol. 2012;590(11):, pp.2555–2567. , doi: 10.1113/jphysiol.2011.223313

10

JN Weiss, A Karma, WR MacLellan, et al. "Good enough solutions" and the genetics of complex diseases. Circ Res. 2012;111(4):, pp.493–504. , doi: 10.1161/CIRCRESAHA.112.269084

11

M Murata, H Nishiyori-Sueki, M Kojima-Ishiyama, P Carninci, Y Hayashizaki, M Itoh. Detecting Expressed Genes Using CAGE In: E Miyamoto-Sato, H Ohashi, H Sasaki, J Nishikawa, H Yanagawa, editors. Transcription Factor Regulatory Networks. New York, NY: Springer New York; 2014 pp. , pp.67–85.

12

T O’Hara, L Virág, A Varró, Y Rudy. . Simulation of the Undiseased Human Cardiac Ventricular Action Potential: Model Formulation and Experimental Validation. AD McCulloch, editor. PLoS Comput Biol. 2011;7: , pp.e1002061, doi: 10.1371/journal.pcbi.1002061

13

S Rush, H Larsen. . A practical algorithm for solving dynamic membrane equations. IEEE Trans Biomed Eng. 1978;25(4):, pp.389–392. , doi: 10.1109/TBME.1978.326270

14

G Kanaporis, I Martišienė, J Jurevičius, R Vosyliūtė, A Navalinskas, R Treinys, et al. Optical mapping at increased illumination intensities. J Biomed Opt. 2012;17: , pp.0960071, doi: 10.1117/1.JBO.17.9.096007

15

MJ Bishop, B Rodriguez, J Eason, JP Whiteley, N Trayanova, DJ Gavaghan. . Synthesis of voltage-sensitive optical signals: application to panoramic optical mapping. Biophys J. 2006415;90(8):, pp.2938–45. , doi: 10.1529/biophysj.105.076505

16

K Deb, A Pratap, S Agarwal, T Meyarivan. . A fast and elitist multiobjective genetic algorithm: NSGA-II. IEEE Trans Evol Comput. 2002;6: , pp.182–197. , doi: 10.1109/4235.996017

17

X. Yao, Y. Liu, & G. Lin. Evolutionary programming made faster. IEEE Transactions on Evolutionary computation. 1999;3(2), , pp.82–102.

18

K Deb, D Deb. . Analyzing Mutation Schemes for Real-Parameter Genetic Algorithms. Int. J. Artif. Intell. Soft. Comput. 2014; 4(1): , pp.1–28. , doi: 10.1504/IJAISC.2014.059280

19

KK Aras, NR Faye, B Cathey, IR Efimov. . Critical Volume of Human Myocardium Necessary to Maintain Ventricular Fibrillation. Circ Arrhythm Electrophysiol. 2018;11, doi: 10.1161/CIRCEP.118.006692

20

Q Lou, W Li, IR Efimov. . The role of dynamic instability and wavelength in arrhythmia maintenance as revealed by panoramic imaging with blebbistatin vs. 2,3-butanedione monoxime. Am J Physiol-Heart Circ Physiol. 2012;302: , pp.H262–H269. , doi: 10.1152/ajpheart.00711.2011

21

C Gloschat, K Aras, S Gupta, NR Faye, H Zhang, RA Syunyaev, et al. RHYTHM: An Open Source Imaging Toolkit for Cardiac Panoramic Optical Mapping. Sci Rep. 2018;8: , pp.2921, doi: 10.1038/s41598-018-21333-w

22

A Hasegawa, C Daub, P Carninci, Y Hayashizaki, T Lassmann. . MOIRAI: a compact workflow system for CAGE analysis. BMC Bioinformatics. 2014;15: , pp.144, doi: 10.1186/1471-2105-15-144

23

https://github.com/Population-Transcriptomics/C1-CAGE-preview/blob/master/tutorial.md,

24

The FANTOM Consortium and the RIKEN PMI and CLST (DGT). . A promoter-level mammalian expression atlas. Nature. 2014;507: , pp.462–470. , doi: 10.1038/nature13182

25

UCSC Genome Browser, https://genome.ucsc.edu/

26

The FANTOM Consortium, J Severin, M Lizio, J Harshbarger, H Kawaji, CO Daub, et al. Interactive visualization and analysis of large-scale sequencing datasets using ZENBU. Nat Biotechnol. 2014;32: , pp.217–219. , doi: 10.1038/nbt.2840

27

DJ McCarthy, Y Chen, GK Smyth. . Differential expression analysis of multifactor RNA-Seq experiments with respect to biological variation. Nucleic Acids Res. 2012;40: , pp.4288–4297. , doi: 10.1093/nar/gks042

28

Andrews S. FastQC: a quality control tool for high throughput sequence data. Available from: http://www.bioinformatics.babraham.ac.uk/projects/fastqc

29

NL Bray, H Pimentel, P Melsted, L Pachter. . Near-optimal probabilistic RNA-seq quantification. Nature biotechnology. 2016;34(5):, pp.525, doi: 10.1038/nbt.3519

30

C Soneson, MI Love, MD Robinson. . Differential analyses for RNA-seq: transcript-level estimates improve gene-level inferences. F1000Research. 2015;4.

31

MI Love, W Huber, S Anders. . Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2. Genome biology. 2014;15(12):, pp.550, doi: 10.1186/s13059-014-0550-8

32

L Livshitz, Y Rudy. . Uniqueness and stability of action potential models during rest, pacing, and conduction using problem-solving environment. Biophys J. 2009;97(5):, pp.1265–1276. , doi: 10.1016/j.bpj.2009.05.062

33

Stephen J. Wright ". Coordinate descent algorithms." Mathematical Programming 151.1 (2015): , pp.3–34.

34

M. Črepinšek, S. H. Liu, & M. Mernik. Exploration and exploitation in evolutionary algorithms: A survey. ACM Computing Surveys (CSUR). (2013); 45(3), , pp.3

35

Sastry K. Single and Multiobjective Genetic Algorithm Toolbox in C++. IlliGAL Report, No. 2007016 [online], 2007. http://illigal.org/category/source-code/

36

BJ Boukens, MS Sulkin, CR Gloschat, FS Ng, EJ Vigmond, IR Efimov. . Transmural APD gradient synchronizes repolarization in the human left ventricular wall. Cardiovasc Res. 2015;108: , pp.188–196. , doi: 10.1093/cvr/cvv202

37

ML Walker, X Wan, GE Kirsch, DS Rosenbaum. . Hysteresis effect implicates calcium cycling as a mechanism of repolarization alternans. Circulation. 2003;108(21):, pp.2704–9, doi: 10.1161/01.CIR.0000093276.10885.5B

38

EJ Pruvot, RP Katra, DS Rosenbaum, KR Laurita. . Role of calcium cycling versus restitution in the mechanism of repolarization alternans. Circ Res. 2004;94(8):, pp.1083–90. , doi: 10.1161/01.RES.0000125629.72053.95

39

Caroline R, Fu Siong N, Chowdhury R, Chang E, Patel P, Lyon A et al. Hysteresis of cardiac action potential duration restitution occurs in the absence of calcium transient duration hysteresis—a dual optical mapping study of ex vivo rat hearts. 2nd Congress of the European-Society-of-Cardiology Council on Basic. 2012; S63-S63.

40

M Jiang, Y Wang, GN Tseng. . Adult Ventricular Myocytes Segregate KCNQ1 and KCNE1 to Keep the I(Ks) Amplitude in Check Until When Larger I(Ks) Is Needed. Circ Arrhythm Electrophysiol. 20176;10(6). pii: , pp.e005084, doi: 10.1161/CIRCEP.117.005084

41

N. Hansen and A. Ostermeier, “. Completely derandomized self-adaptation in evolution strategies,” Evol. Comput. 2001; 9(2), , pp.159–195. , doi: 10.1162/106365601750190398

42

R. Eberhart and J. Kennedy, “. A new optimizer using particle swarm theory,” Proceedings of the Sixth International Symposium on Micro Machine and Human Science. 1995, pp. , pp.39–43.

43

X.-S. Yang and D. Suash, “. Cuckoo search: recent advances and applications,” Neural Comput. Appl. 2014; 24, , pp.169–174

44

L.S. Meadows, L.L. Isom, . Sodium channels as macromolecular complexes: Implications for inherited arrhythmia syndromes, Cardiovascular Research. 2005; 67(3), , pp.448–458. , doi: 10.1016/j.cardiores.2005.04.003

45

A.N. Lopatin, J.M.B Anumonwo. . Structural and Molecular Bases of Cardiac Inward Rectifier Potassium Channel Function, Cardiac Electrophysiology: from Cell to Bedside(2018), pp. , pp.38–48

46

M Jiang, C Cabo, J Yao, PA Boyden, G Tseng. . Delayed rectifier K currents have reduced amplitudes and altered kinetics in myocytes from infarcted canine ventricle. Cardiovasc Res. 2000;48(1):, pp.34–43. , doi: 10.1016/s0008-6363(00)00159-0

47

S.A. Niederer, M. Fink, D. Noble and N.P. Smith. A meta‐analysis of cardiac electrophysiology computational models. Experimental Physiology. 2009; 94: , pp.486–495. , doi: 10.1113/expphysiol.2008.044610

48

B Cathey, S Obaid, AM Zolotarev, RA Pryamonosov, RA Syunyaev, SA George, et al. Open-Source Multiparametric Optocardiography. Sci Rep. 2019;9: , pp.721, doi: 10.1038/s41598-018-36809-y

This paper was transferred from another journal. As a result, its full editorial history (including decision letters, peer reviews and author responses) may not be present.

26 Sep 2019

PONE-D-19-25321

Genetic algorithm-based personalized models of human cardiac action potential.

PLOS ONE

Dear Dr. Syunyaev,

Thank you for submitting your manuscript to PLOS ONE. After careful consideration, we feel that it has merit but does not fully meet PLOS ONE’s publication criteria as it currently stands. Therefore, we invite you to submit a revised version of the manuscript that addresses the points raised during the review process.

Two reviewers have provided very extensive and constructive feedback, and I encourage you to carefully consider and address their comments.

When you are ready to submit your revision, log on to https://www.editorialmanager.com/pone/ and select the 'Submissions Needing Revision' folder to locate your manuscript file.

To enhance the reproducibility of your results, we recommend that if applicable you deposit your laboratory protocols in protocols.io, where a protocol can be assigned its own identifier (DOI) such that it can be cited independently in the future. For instructions see: http://journals.plos.org/plosone/s/submission-guidelines#loc-laboratory-protocols

A rebuttal letter that responds to each point raised by the academic editor and reviewer(s). This letter should be uploaded as separate file and labeled 'Response to Reviewers'.
A marked-up copy of your manuscript that highlights changes made to the original version. This file should be uploaded as separate file and labeled 'Revised Manuscript with Track Changes'.
An unmarked version of your revised paper without tracked changes. This file should be uploaded as separate file and labeled 'Manuscript'.

Please note while forming your response, if your article is accepted, you may have the opportunity to make the peer review history publicly available. The record will include editor decision letters (with reviews) and your responses to reviewer comments. If eligible, we will contact you to opt in or out.

We look forward to receiving your revised manuscript.

Kind regards,

B. Rodríguez

PLOS ONE

Journal Requirements:

Please ensure that your manuscript meets PLOS ONE's style requirements, including those for file naming. The PLOS ONE style templates can be found at

2. We note that you have stated that you will provide repository information for your data at acceptance. Should your manuscript be accepted for publication, we will hold it until you provide the relevant accession numbers or DOIs necessary to access your data. If you wish to make changes to your Data Availability statement, please describe these changes in your cover letter and we will update your Data Availability statement to reflect the information you provide.

[Note: HTML markup is below. Please do not edit.]

Reviewer's Responses to Questions

1. Is the manuscript technically sound, and do the data support the conclusions?

The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented.

Reviewer #1: Partly

Reviewer #2: Yes

**********

2. Has the statistical analysis been performed appropriately and rigorously?

Reviewer #1: N/A

Reviewer #2: Yes

**********

3. Have the authors made all data underlying the findings in their manuscript fully available?

The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified.

Reviewer #1: No

Reviewer #2: Yes

**********

4. Is the manuscript presented in an intelligible fashion and written in standard English?

PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here.

Reviewer #1: Yes

Reviewer #2: No

**********

5. Review Comments to the Author

Reviewer #1: Genetic algorithm-based personalized models of human cardiac action potential

The authors propose and implement a genetic-algorithm approach to estimate maximal conductances (and similar parameters for e.g. J_up) an action potential model, based on optical mapping of voltage in human left-ventricular preparations.

Improvements to previous GA methods are presented, including a restitution based approach, and a novel mutation operator.

A synthetic data study is performed, after which the method is applied to real data, and validated by comparing the obtained parameters to mRNA expression levels.

The work is very interesting, the focus on clinically measurable signals is novel and important, and the data from donor hearts adds further value to this study.

However, the text needs major work.

The manuscript is not structured well, requiring the reader to skip back and forth between sections.

It would benefit from an introduction that clearly states the entire study's objectives (not just the GA improvement) and gives the reader a better idea of what is to come.

Many of the results are stated as fact in the methods (without making clear this is something the study will attempt to prove), which further adds to the confusion.

I believe this manuscript has the potential to be very interesting if the authors rewrite it significantly.

Some more work also needs to be done to show that the chosen strategy of varying part of the initial state in the optimisation still leads to a stable steady-state'; especially in the light of the reduced number of pre-pacing beats and the interesting observation that there are multiple steady-states, with the exact state reached depending on the initial state).

Major points:

- Results section: It seems the text has been written to have the methods section between the introduction and the results. As it is now at the end of the paper, some rewriting is necessary. I am not sure why the methods are placed at the end, either, this does not seem to be the norm for PLOS ONE in the papers I checked.

- Lines 120-122: Please explain why this strategy, with only 9 stimulations at each PCL, is adequate, and show data to support this claim.

- Lines 122-124: After showing that the O'Hara model has a steady-state that depends on the initial conditions, the authors vary these initial conditions at every iteration of the optimisation, and then run a limited number of repeats. It is not clear to me at all how this would help bring the system into a steady-state faster, or why this should result in a better model. Please add a more detailed explanation, and show how your approach compares to a traditional approach with longer pre-pacing periods. If this is computationally infeasible, at least show the effect of adding more beats (e.g. 5, 9, 18, and 36). I would also be interested to know if the authors consider the existence of multiple "steady states" (technically periodic orbits) as a flaw in the model or if its a realistic feature that gets 'personalised' by the GA.

- The introduction states that modifications to GA are needed, but does not argue this very well. It would be good to mention that this is in fact one of the results shown later in the paper. Similarly, the methods section states "We have found that the original polynomial mutation tends to trap soution in the local minima", but doesn't mention that this will be shown in figure 4. The reasonining for introducing a Cauchy distribution is also not clear to me.

- Lines 172-173: "as shown by Andrews curves in Fig 4C the whole population is in the immediate vicinity of a single solution". It is not clear to me how Fig 4C shows this. Figures 4D and 4E also do not look like an optimiser getting stuck in local minima.

- Lines 181-182: "Moreover, the best organism is unchanged for a number of generations". Please explain how choosing a different distribution to sample new mutations from has caused this effect?

- Lines 181-182: "the best organism is unchanged for a number of generations, allowing it to reach steady state". I'm not sure how much this figure says about the Cauchy distribution being better, and how much it says about 9 beats being too short to reach steady state. Would a Cauchy distribution still be better if you had more beats? Presumably there is some sort of trade-off between the number of beats per simulation and the number of iterations of the GA. Please explain this in greater detail.

- Lines 299-313: This is very nice work and deserves to be mentioned much more in the paper, which is almost entirely about the GA and the syntetic data study.

- Lines 551-552: "Output model parameters ... between two patients.". Please explain this in much greater detail, here in the methods section, but also in the introduction or in the relevant part of the results section.

Minor points

- The authors present a novel adatation of a genetic algorithm, but do not present any arguments why genetic algorithms are suitable optimisers in this case. I would suggest adding either a comparison to other optimisers (e.g. CMA-ES), or at least provide some arguments why other methods are unsuitable and a custom GA is required.

- Line 88-89: Please add more references here, e.g. to the work by Sarkar or the subsequent discussion by the Weiss group.

- Line 137: The term "SBX crossover" has not been introduced.

- Figure 2: The resolution of the provided figure is too low to read the text.

- Figure 2: Several things mentioned in the caption to figure 2 (e.g. mRNA expression), and adapting the patient-1 model for patient 2) have not been mentioned yet. The text would benefit from an introduction of these important concepts in the Introduction section.

- Line 149: "coordinate descent" this term has not been introduced. Please rewrite to do so, or avoid using this term here instead.

- Line 160: "SD dynamics" this term has not been explained at this point in the text.

- Line 166: "As mentioned above", the methods section is no longer above this sentence.

- Line 168: "Although the best organism is ... to the steady state.". This means that running the simulations has now become part of your mutation operator. This should be explained more clearly in the methods.

- Lines 200-201: Please rewrite this. 6-10% seems a fair estimate, but it is not demonstrated all that clearly from the graph (for example, what happens at ~7.5%?).

- Lines 289-291: "The membrane potential differences are mostly confined to depolarization phase and are probably due to photon scattering". Please explain this in more detail, preferably with evidence to back it up.

- Figure 4: How much of the observed differences in convergence are due to the type of distribution, and how much due to the distribution's parameters? What would happen with a narrower Cauchy distribution? Or a wider polynomial one?

- Figure 6b: A central idea in genetic algorithms is that cross-breeding good solutions will yield improved solutions. The red trace in Fig 6B suggests that this is not the case for this problem. Does this imply that the crossover step is not beneficial for this problem? If so, it seems this is more a particle based search method rather than a genetic algorithm per se? It would be good if the authors could experiment with leaving out the crossover, or trying different optimisers on this problem.

- Figure 6: The resolution of the attached figure is too low to make this out clearly.

- Figure 6: These results are very interesting, and show that a GA with a number of elite organisms can act both as a global and a local optimiser. It would be good to discuss this further in the text. At the same time, the very fast convergence of elite organisms to something very close to the true solution suggests that it may be more efficient to perform a two-stage optimisation. Please comment.

- Figure 7: It is not clear from the caption how this comparison was set up. Did you re-implement the method by Bot et al. and apply it to your data?

- Line 391: "Computer simulations". Please provide more detail here. Why were 1D simulations preferred over single cell simulations, and why 1D instead of 2 or 3D? What integration method was used (and with what time step or tolerances?). How was this implemented? How many cells were stimulated, and from which cell or cells was an AP recorded?

- Line 397-399: Please explain this shifting in far greater detail. Which of the two signals was shifted and did this happen once or at every iteration? (In the latter case, how did you stop this method from introducing identifiability problems or local minima?)

- Line 407: This is not a standard deviation but a root-mean-squared error.

- Line 407: The choice of the symbol t^i_max for the number of samples (which is not a time) is confusing.

- Lines 419-420: "we have found odd number of stimulations helpful to exclude possible 1:1 alternans in the output model." This needs a lot more explanation.

- Line 430-431: "This is especially important in terms of intracellular ionic concentrations" Why?

- Line 430: Local minima are a problem _for_ optimisation methods, not _of_ them (they are a property of the score function being optimised).

- Lines 434-438: Please add more detail. What is a "half-width of the half-maximum of the distribution", and why is 0.18 times the original value a good choice?

- Line 441: "We have found that it has adverse effects on algorithm convergence". Please explain where/if this is shown.

- Lines 440-444: Please list the parameters that were varied.

- Lines 441-443: Please explain how applying a mutation to each parameter seperatly differs from your approach. Is the mutation in a parameter i now correlated to the mutation in parameter j somehow?

- Line 453: 1000s? Or 1000*PCL ms?

- Lines 456: The Andrews plots are cool, but don't offer much insight into which parameters vary. Did you try simpler representations, e.g. radar plots, instead?

- Line 466 m x n * p. Please add parentheses and use x or . instead of *.

- Lines 524-525 "Libaries in silico processing performed in Morai system.". Please rewrite this.

- Line 529: "python scripts: level1 and level2". What do these scripts do?

- Line 541: "(100 s)", 100s or a 100 beats?

- Line 558: "below". This is no longer below.

Reviewer #2: 1. Is the manuscript technically sound?

I believe it is. There were multiple moments reading through the paper where I began wondering if the authors had considered something I considered an issue, and then found that they had as I continued reading. I consider this a good sign that the application of the genetic algorithm technique to the problem of personalised cardiac model generation here has been considered in a good amount of depth. The methodology also proves quite successful.

Although there are some points below where I comment on potential issues with the methodology, I do not feel that they compromise the core material of the manuscript (application of a modified GA to achieve action potential model personalisation). The authors should, however, clarify what is going on with the question raised on Line 123.

I cannot comment on the experimental procedures (obtaining imaged voltage data or genetic expression levels).

-----------------------------------

2. Has the statistical analysis been performed appropriately and rigorously?

I believe so, but it would improve the paper if the results were also presented in a statistical way (i.e. in text, comment on spread of performance across runs, instead of using the worst run as the discussed figure).

-----------------------------------

3. Have the authors made all data underlying the findings in their manuscript fully available?

I missed where in the paper this was discussed, but am trusting the authors when they state on the online submission that data has been made available. If it is not yet mentioned in the paper, I suggest a very short (1-2 sentence) new section at the end of Methods that points readers to where data can be obtained.

-----------------------------------

4. Is the manuscript presented in an intelligible fashion and written in standard English?

I feel the work is rather clumsily written. There are many grammatical errors, but those are a separate issue. What I consider more important is that as a reader, I was given very little idea about how the paper sits within the literature. For example, a list of references of other works using the genetic algorithm in the cardiac context is provided, but then only one is further discussed (as the method there serves as the basis for the authors' modified method). Without going through all of those papers, a reader will have no idea whether those works used experimental/clinical data, or simply demonstrated an ability to recover chosen parameter values in the action potential model (i.e. fitting synthetic data). Flicking through the referenced papers, they do seem to work with experimental data, and so this is not a point of novelty.

Instead, a potential point of novelty is the consideration of steady state of the action potential model. The methodology in this paper has been specifically adjusted to recover the steady state as it goes, something which I suspect is novel. So then the questions are, is this indeed novel? If so, the authors should highlight it in the introduction, by pointing out other genetic algorithm applications in cardiac electrophysiology have not considered it (or considered it simply by incurring great computational cost). Additionally, if steady state is so important, the authors should highlight it. Figure 1 shows the effects of choosing different initial conditions, but that is a different issue - if anything, re-using the final state of an organism as the starting point for its offspring seems like it exacerbates this issue (more comments on this below).

Modifications are made to the genetic algorithm to improve its performance, but again it is not at all clear whether these are only novel in the field of cardiac electrophysiology, or novel ideas of the authors in the GA context. For example, using Cauchy distribution jumps is something I've seen in other optimisation (not GA) contexts (e.g. "Levy-flight" jumps in bio-inspired optimisation algorithms), and I assume has also been used in the GA context already. The authors should check up on this if they haven't, and provide an appropriate reference. Similarly, random vector jumps might indeed not be the norm in GA, but I'm sure they must've been used (many optimisation algorithms, including many that use multiple search agents as GA does, will be doing random multivariate Gaussian jumps, for example). Additionally, if these two modifications were so necessary to get good results here, one wonders how some of the other published GA papers in cardiac electrophysiology got away with not using them (or if they did use them, this should definitely be mentioned and emphasis on those aspects of the paper reduced!).

Instead, one major novelty of this work seems to be the success of a GA-fitted model to be used across different members of a population, simply by measuring differences in their relative gene expression levels. Although I don't have a great sense for how easy/hard this is to actually do clinically and so cannot comment too much on the impact level of this finding, this was certainly a unique (and interesting!) aspect of the paper... and one that isn't even mentioned in the introduction! Hence my comment about clumsy writing. As I state many times in this review, I feel this part of the work needs to be highlighted much more.

I also re-iterate that although the English is mostly understandable, there are many grammatical errors and that the manuscript could be improved significantly in this regard. I have highlighted some of the most glaring issues below.

-----------------------------------

Specific Issues:

Abstract: It feels a bit number-heavy, and the numbers quoted are not average (with standard deviation) error, but maximum error, which I don't think is typical. It also talks about things like "13mV accuracy" which doesn't make sense to me. Is that a maximum deviance of 13mV at any point along the action potential? A root mean square error of the voltage along the whole action potential? This is not clear. In general, I personally think the "story" of the paper could be much better told than just listing error numbers.

Line 50: There's no guarantee the parameters found by GA (with the benefit of multiple pacing frequencies) will be unique, but indeed use of multiple pacing frequencies helps a lot. If this wasn't explicitly considered, I recommend softening this language (similar to how it's expressed on line 97).

Line 98: Again, the intro doesn't mention use of CAGE at all. This seems like a huge mistake!

Line 110: It's nice to point out the issue with a sensitivity to initial condition in cardiac AP models. I believe this is an issue that has been commented in the literature, although I cannot think of a reference off the top of my head. One would be nice here (see also comment on Line 120, though).

Line 114: "Saving organisms state vectors" is not a good figure title, in my opinion. Most of the figure is discussing the behaviour of the action potential model. Speaking of, I feel the caption should mention the model used (O'Hara-Rudy), given the text hasn't yet, and the reader might get curious. I know I did.

Line 120: My issue here is, I don't see how the methodology introduced here actually helps with the issue just discussed, of different initial conditions giving different steady state behaviour. If anything, using the final state of an organism as the initial condition of its offspring seems like it will only cause the method to hone in on a certain behaviour, and fail to explore the range of possibilities represented by different initial conditions. I do not consider this a major failing of the work, as I do not think there are many works at all in cardiac electrophysiology that even consider, let alone deal with this issue. Instead, what I think should happen is that the issue is given less focus in the manuscript (if even brought up), and instead, Fig. 1A-C should highlight the importance of steady state (for example, by plotting an action potential at steady state, and one during the transient phase). Assuming steady state even is important, but one is lead to believe it is in this paper, given the focus on achieving it whilst also avoiding excessive computational cost. This change would make Fig. 1 contribute a lot more to the story of the paper.

Line 123: It was not originally clear to me how allowing intracellular ion concentrations to vary would speed up convergence. I now see that this is about convergence to steady state over the course of the genetic algorithm. Perhaps this could be reworded somehow to be more clear. Furthermore, I don't understand how [Na+]_i and [Ca2+]_nsr can be parameters chosen by the GA, given that they are variables in the model. Do the authors mean the *initial* value is allowed to vary within the GA? Or am I missing something? This needs to be made clear in the paper.

Line 125: "1.5mM precision" sounds not great for an intracellular ion concentration (well, maybe [K+]_i!). This seems to be the worst possible value, which is not a great way to report your error, in my opinion. Mean and standard deviation makes a lot more sense (and will look better!)

Line 132: SD is not defined yet as far as I saw.

Line 141: Patient 1 and patient 2 seem to be switched here (or in the figure). This was also an issue when the CAGE results were discussed later.

Line 154: Again, these error values seem to be worst-case.

Line 178: "Proper value" is mentioned here. Is that the value for synthetic data? I think this needs to be clearer.

Line 181: It's interesting that the best model organism not changing (implying stagnation of the algorithm) is here pointed out as a good thing. Can't argue with the results, though!

Line 215: The point here about "fine tuning" needs to be expressed better. I think I understand it now, but it took me a few reads. The idea is that with a small cluster, and offspring generated from the current population of organisms, the algorithm will only search locally and thus waste less time, right?

Line 304: This is not a clear way to discuss error, as commented on previously.

Discussion: I guess it is up to the editor and journal as to whether this numbered list of highlight points is suitable format for a discussion. It's not my personal preference.

Line 330/343/350: I'm not sure that the modifications to the GA are worth three separate discussion points. In particular, discussion points #3 and #4 should probably be combined and cut down a little, in my opinion.

Line 358: "algorithms" should be "organisms". There are many more grammatical issues, but this one is one that changes the meaning so I have commented on it specifically.

Line 376: Tacking on the CAGE results at the end here with a casual mention is not really doing them justice. Those results seem to me like a much bigger deal than the authors make them sound!

Line 466: This notation was unclear to me. Combining two different multiplication symbols leaves me guessing which is which. If I guessed right, I think you could just say m x np?

**********

6. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Reviewer #1: No

Reviewer #2: No

[NOTE: If reviewer comments were submitted as an attachment file, they will be attached to this email and accessible via the submission site. Please log into your account, locate the manuscript record, and check for the action link "View Attachments". If this link does not appear, there are no attachment files to be viewed.]

While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com/. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Registration is free. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email us at figures@plos.org. Please note that Supporting Information files do not need this step.

12 Jan 2020

We sincerely thank both reviewers for the time and effort they put into very extensive reviews and insightful critiques to the manuscript. Their critiques served us as a welcome guide during extensive revisions and additional research. We apologize that it took us long time to address their concerns, but it was due to very extensive nature of the critiques. In reply to the reviewers’ questions we ran additional simulations and thoroughly revised the text. Both reviewers mentioned that RNA-based model verification deserves much more prominence in the text. We agree with this suggestion; however, the data that we have presented in the originally submitted manuscript was not sufficient to provide a reasonable discussion of the subject. Now, in reply to reviewers concern we were able to conduct much more extensive additional research of the mRNA-based models rescaling, and we are happy to report that our new significant data provides solid support to our original premise.

We do hope that the new revision is much more comprehensible and interesting for potential reader then the initial submission. To facilitate review process, in the files attached to the submission and labeled as "Response to Reviewers", we have restated original critiques in green font and provided our response in black font and new manuscript text in blue font.

Reviewer #1: Genetic algorithm-based personalized models of human cardiac action potential

The authors propose and implement a genetic-algorithm approach to estimate maximal conductances (and similar parameters for e.g. J_up) an action potential model, based on optical mapping of voltage in human left-ventricular preparations. Improvements to previous GA methods are presented, including a restitution based approach, and a novel mutation operator. A synthetic data study is performed, after which the method is applied to real data, and validated by comparing the obtained parameters to mRNA expression levels.

The work is very interesting, the focus on clinically measurable signals is novel and important, and the data from donor hearts adds further value to this study. However, the text needs major work. The manuscript is not structured well, requiring the reader to skip back and forth between sections.

Thank you very much for positive overall assessment of our work.

It would benefit from an introduction that clearly states the entire study's objectives (not just the GA improvement) and gives the reader a better idea of what is to come. Many of the results are stated as fact in the methods (without making clear this is something the study will attempt to prove), which further adds to the confusion.

We thank the reviewer for bringing up this issue of the manuscript, in the revised version the manuscript introduction is extended, and states aims of the study more clearly.

I believe this manuscript has the potential to be very interesting if the authors rewrite it significantly.

Some more work also needs to be done to show that the chosen strategy of varying part of the initial state in the optimisation still leads to a stable steady-state'; especially in the light of the reduced number of pre-pacing beats and the interesting observation that there are multiple steady-states, with the exact state reached depending on the initial state).

Thank you, in reply to this helpful comment we have ran significant additional simulations and paced GA output model for 1000 s to compare steady state with GA output state. The difference is shown in new Fig. 2 and corresponding text.

Major points:

- Results section: It seems the text has been written to have the methods section between the introduction and the results. As it is now at the end of the paper, some rewriting is necessary. I am not sure why the methods are placed at the end, either, this does not seem to be the norm for PLOS ONE in the papers I checked.

Thank you, as we have stated above the Introduction section is extended, while “Methods” section is placed in the beginning of the article.

- Lines 120-122: Please explain why this strategy, with only 9 stimulations at each PCL, is adequate, and show data to support this claim.

Thank you for this important comment. The strategy is explained in greater detail and a kind of proof that this technique still results in intracellular concentrations close to steady state is depicted in Fig.2 of the new version and corresponding lines 481-498 of the text. New text reads:

“Why would this technique eventually result in concentrations converging to a steady state? The rationale is the following: if given parameters vector results in an acceptable AP waveform (i.e. close to the input AP), but far from steady state, then this solution is going to be discarded eventually, since waveform is going to change after few beats. For example, if AP of particular organism on generation N minimizes RMSE, but corresponds to one of the dotted lines in Fig. 2B, then RMSE is going to be large for this particular organism on generation N+1 (one of the dashed lines).

In particular, we suggest including the initial intracellular sodium ([Na+]i) and calcium sarcoplasmic reticulum load ([Ca2+]NSR) at every PCL as model parameters. The [Ca2+]NSR concentration was chosen, because most of the calcium is stored within sarcoplasmic reticulum when the cell is at resting potential. We did not include potassium concentrations in the optimized parameters vector, since 10 mM K+i concentration changes results only in approximately 2% Nernst potential change, and thus a major concentration changes have a minor immediate effect on AP waveform. In order to test if this technique indeed results in a steady state we have run 9 GA simulations. After that, the output models on generation № 700 were paced for 1000 seconds. The observed difference between GA output state and steady state is depicted in Fig. 2E-F. The output [Na+]i was 0.016±0.012 mM from steady state, the [Ca2+]NSR was 0.08±0.07 mM from steady state.”

- Lines 122-124: After showing that the O'Hara model has a steady-state that depends on the initial conditions, the authors vary these initial conditions at every iteration of the optimisation, and then run a limited number of repeats. It is not clear to me at all how this would help bring the system into a steady-state faster, or why this should result in a better model. Please add a more detailed explanation, and show how your approach compares to a traditional approach with longer pre-pacing periods. If this is computationally infeasible, at least show the effect of adding more beats (e.g. 5, 9, 18, and 36). I would also be interested to know if the authors consider the existence of multiple "steady states" (technically periodic orbits) as a flaw in the model or if its a realistic feature that gets 'personalized' by the GA.

Thank you for this question! In part, we provided clarification in the reply to the previous question. Please also, find a clarification on bringing the system closer to steady state, while making “jumps” in slow variables space the lines 473-481 of the text:

“The final state variables at each PCL are saved and reused as initial state at the next generation (Fig 1A). However, given that the set of parameters minimizing RMSE is itself a function of intracellular concentrations, this approach results in an optimizer solving essentially a new problem every generation. Moreover, as shown above (Fig. 2 B-D) “fixed” initial state for the organisms should result in a “fixed” steady state that might be different from input data. The solution that we propose in this study is to perform a simultaneous search in the parametric and slow variables space, i.e. the initial values of slow variables were added to parameters vector making them susceptible to mutation and crossover operators.”

The traditional approach in some sense, was provided in the Fig 7 of the manuscript, where every organism of the original Bot et. al. algorithm was paced for 50 beats. It is computationally very expensive to provide more than 50 beats.

We have also ran additional simulations with our version of algorithm to investigate the effects of different number of beats. The results (distance from steady state) are presented on S10 Fig.

We have added additional discussion of experimental evidence of multiple steady state in lines 778-773 of the text. There is some circumstantial evidence of multiple steady states: hysteresis of APD and alternans was observed by several research groups previously. However, we cannot be sure that this phenomenon occurs in living myocytes. For example, the number of some ionic channels (IKs in particular) might be dynamically adjusted by cell in response to different factors, such as calcium stress. Consequently, there are multiple ways to explain hysteresis phenomena.

“We have to note here, that several groups of researchers did observe restitution hysteresis as well as alternans hysteresis [34, 35, 36]. Thus, we can hypothesize that multiple steady states is an inherent feature of a myocyte that might be personalized by the GA. On the other hand, living cardiomyocyte is a complicated dynamic system and ionic channel conductivities themselves could change over the time [37].”

- The introduction states that modifications to GA are needed, but does not argue this very well. It would be good to mention that this is in fact one of the results shown later in the paper. Similarly, the methods section states "We have found that the original polynomial mutation tends to trap soution in the local minima", but doesn't mention that this will be shown in figure 4. The reasonining for introducing a Cauchy distribution is also not clear to me.

Thank you, we have now mentioned this fact in the introduction. Please, see lines 113-119. New text reads:

“As we demonstrate below in the Final Algorithm subsection of the Results section, this approach results in a poor parameters convergence. We have identified two reasons behind this fact: firstly, intracellular concentrations require much more than 10 seconds to converge to a steady state; secondly, a model with the particular set of parameters may converge to different steady states depending on the initial conditions. In order to address these issues we implemented a modification of GA allowing one to perform parameters optimization and steady state search in the slow variables space simultaneously.”

Regarding the Cauchy distribution: our initial observation was that modifying the concentrations by polynomial mutation might have the following effect. The whole population prematurely converges to a single non-steady state solution. After that, even though best organism gets closer to steady state every generation (increasing RMSE), some organism from the population mutates back to the previous concentration values. Cauchy distribution reduced the probability of premature convergence and did not reduce the convergence speed at the same time. In reply to reviewers’ questions we have ran additional simulations comparing Cauchy and polynomial distributions. Although, the former on average did result in better state variables, the distance from steady state was not statistically significant on the generation № 700.

For Cauchy mutation the output [Na+]i on the generation № 700 was 0.016±0.012 mmol from steady state, the [Ca2+]NSR was 0.08±0.07 mmol from steady state.

For Polynomial mutation the output [Na+]i on the generation № 700 was 0.03±0.02 mmol from steady state, the [Ca2+]NSR was 0.10±0.12 mmol from steady state.

Therefore, we have modified the figure (Fig 6 in the new revision) and corresponding text in the lines 493-508 as follows:

“Cauchy distribution is a “pathological” distribution with infinite variance and expected value. As was noted previously [18] Cauchy mutation tends to generate offspring far away from its parent. Consequently, it prohibits algorithm stagnation in local minima and generally results in better convergence for multimodal functions. Fig 6A compares two sample runs of GA with polynomial and Cauchy mutation. In case of polynomial mutation by generation 300 the whole population converged to a vicinity of a single solution that is different from input model set of parameters (red line). We observed slow convergence on subsequent generations simultaneous with slow intracellular concentration changes (compare to S2 Fig plotting corresponding intracellular concentration changes of the best organism). In the case of Cauchy mutation every parameter except INaK converged to some vicinity of input model value by generation 300. At the same time, we did not observe algorithm stagnation: note, for example, the wide variation of IK1 and RyR parameters on generation 500. Consequently, when synthetic AP was used as input data, we observed more robust algorithm convergence in case of Cauchy mutation (Fig. 6B). For example, the error of model parameters was 14±24% vs 17±75% for IKs, 11±25% vs 45 ±23% for Ito, 6±6% vs 13 ±15% for ICaL in GA runs with Cauchy and polynomial mutation correspondingly.”

- Lines 172-173: "as shown by Andrews curves in Fig 4C the whole population is in the immediate vicinity of a single solution". It is not clear to me how Fig 4C shows this. Figures 4D and 4E also do not look like an optimiser getting stuck in local minima.

- Lines 181-182: "Moreover, the best organism is unchanged for a number of generations". Please explain how choosing a different distribution to sample new mutations from has caused this effect?

- Lines 181-182: "the best organism is unchanged for a number of generations, allowing it to reach steady state". I'm not sure how much this figure says about the Cauchy distribution being better, and how much it says about 9 beats being too short to reach steady state. Would a Cauchy distribution still be better if you had more beats? Presumably there is some sort of trade-off between the number of beats per simulation and the number of iterations of the GA. Please explain this in greater detail.

Thank you very much for expressing these valid concerns. As mentioned in reply to previous question we have revised this part of the text, we hope that the new Fig 6 provides better arguments in favor of Cauchy distribution.

- Lines 299-313: This is very nice work and deserves to be mentioned much more in the paper, which is almost entirely about the GA and the syntetic data study.

- Lines 551-552: "Output model parameters ... between two patients.". Please explain this in much greater detail, here in the methods section, but also in the introduction or in the relevant part of the results section.

Thank you for a nice comment! We have conducted additional experiments and extended the corresponding sections extensively to address your questions.

Minor points

- The authors present a novel adatation of a genetic algorithm, but do not present any arguments why genetic algorithms are suitable optimisers in this case. I would suggest adding either a comparison to other optimisers (e.g. CMA-ES), or at least provide some arguments why other methods are unsuitable and a custom GA is required.

We have added additional discussion of the subject in the lines 806-815. Although it would be interesting to compare different optimizer, some custom modification would be required to find the steady-state solution. This kind of benchmarking would be very complicated and we believe it to be beyond the scope of current study. But we have modified the text as follows:

“It is worth noting that all these modifications to GA essentially make the algorithm similar to particle-based algorithms. For example, multivariate jumps are used by Particle Swarm [38], Covariance Matrix Adaptation Evolution Strategy [39] and Cuckoo Flight [40] algorithms. The last one uses “pathological” distribution similar to Cauchy mutation for Levy-flight walk. Although it would be very interesting to compare these algorithms performance with GA modification proposed in this study, some custom modifications to the aforementioned algorithms are still required: as was noted above same AP waveform could be reproduced by different parameter sets [8]. In order to narrow down the output parameters range, not steady state solutions should be somehow automatically discarded by the algorithm. This kind of extensive benchmarking is beyond the scope of the current study.”

- Line 88-89: Please add more references here, e.g. to the work by Sarkar or the subsequent discussion by the Weiss group.

Thank you very much indeed! We have provided more references in lines 94-95 in the revised manuscript.

- Line 137: The term "SBX crossover" has not been introduced.

Thank you, the abbreviation is introduced in line 81 in the revised manuscript.

- Figure 2: The resolution of the provided figure is too low to read the text.

Thank you for pointing out this important oversight! We believe that this was an image conversion issue on the PlosOne website, we will make sure to double check the resolution after the new submission is completed.

- Figure 2: Several things mentioned in the caption to figure 2 (e.g. mRNA expression), and adapting the patient-1 model for patient 2) have not been mentioned yet. The text would benefit from an introduction of these important concepts in the Introduction section.

Thank you for pointing out this lack of clarity in our original submission. The revised Introduction section was expanded as mentioned above. New text in the lines 122-133 now reads:

“Finally, we have verified the algorithm against the experimental optical mapping AP recordings from the human heart. Since we could not measure ionic channel conductivities directly, we used the following assumption instead (Fig. 1B): these conductivities should be proportional to corresponding proteins mRNA level of expression as measured by Cap Analysis of Gene Expression (CAGE) [21] or RNA-seq. Thus, given that GA output model parameters represent actual ionic channels conductivities, the model rescaled in correspondence with mRNA expression profile differences between two patients, would reproduce AP restitution properties of both patients. Moreover, we have to note that this approach (i.e. combining GA with transcription profile) could be regarded as another technique of model personalization. As we show below, GA signal-to-noise ratio (SNR) requirements are rather strict and hard to accomplish in a clinical setting, while mRNA expression profile is possible to measure from tissue biopsy.”

- Line 149: "coordinate descent" this term has not been introduced. Please rewrite to do so, or avoid using this term here instead.

Thank you, we now provide a reference to the review paper, which describes coordinate descent algorithms on line 486 of the manuscript.

- Line 160: "SD dynamics" this term has not been explained at this point in the text.

Thank you, the “RMSE dependence on generation number” is used throughout the new version of the manuscript instead of the “SD dynamics”.

- Line 166: "As mentioned above", the methods section is no longer above this sentence.

Thank you, the methods section was shifted to the beginning part of the article.

- Line 168: "Although the best organism is ... to the steady state.". This means that running the simulations has now become part of your mutation operator. This should be explained more clearly in the methods.

Thank you for pointing out this lack of clarity in our original manuscript! We have now provided a more detailed explanation of elitism strategy in the Methods section. Please, see lines 232-238 of the text:

“More precisely: elite organisms do participate in mutation and crossover as usual, but an “unspoiled” copy is saved to replace the worst organisms in every generation. However, final state of elite organisms on generation N are still reused as initial state on generation N+1, thus slow variables get closer to steady state, while AP waveform and RMSE changes correspondingly (see “save state variables” above). We have found that using a high number of elite organisms (about 6-10% of the whole population) is optimal for our GA modification.”

- Lines 200-201: Please rewrite this. 6-10% seems a fair estimate, but it is not demonstrated all that clearly from the graph (for example, what happens at ~7.5%?).

Because of the random nature of the algorithm, any combination of algorithm parameters might accidentally result in fast or slow convergence for a particular run. We believe that slow convergence of algorithm at 7.5 % is a random fluctuation. We have revised lines 553-558 accordingly, while new S8 Fig demonstrates RMSE dependence on the number of elite organisms on generations 100, 150, 300 and 400 giving better sense of convergence to a reader.

“We observed slower algorithm convergence, when the number of elite organisms was below 4 % of the whole population. On the other hand, the increase of the proportion of elite organisms above 10 % typically resulted in algorithm stagnation after generation 100. Due to random nature of the algorithm convergence, the convergence speed is susceptible to fluctuations, but optimal proportion of elite organisms could be estimated as 6-10% of the whole population from S8 Fig.”

- Lines 289-291: "The membrane potential differences are mostly confined to depolarization phase and are probably due to photon scattering". Please explain this in more detail, preferably with evidence to back it up.

Thank you, more detailed explanation is added to lines 629-639, while (dV/dt)max values are provided to explain more clearly differences in depolarization. New revised text reads:

“The deviations between input data and the output model are listed in Table 1. While RMSE is close to the noise level, and APD80 error did not exceed 14 ms, the difference in the depolarization phase is very pronounced: (dV/dt)MAX was approximately 20 V/s for input data, while for the output model it ranged from 55 to 80 V/s. This effect might be due to photon scattering in optical mapping recordings [10, 33]. Photons emitted by fluorescent dye undergo multiple scattering events and thus the recorded signal from a given pixel is actually an averaged signal from thousands of myocytes. This effect is known to distort AP waveform during depolarization when differences of membrane potential across the tissue are significant due to propagation of wavefront of excitation. Thus, in order to reduce the effect of this experimental artifact on the model, initial depolarization phase (below -20 mV) was removed from compared AP prior to fitness function calculation (see also Methods section).”

- Figure 4: How much of the observed differences in convergence are due to the type of distribution, and how much due to the distribution's parameters? What would happen with a narrower Cauchy distribution? Or a wider polynomial one?

We hope that new S9 Fig comparing RMSE with different distribution parameters provides the answer to this question.

- Figure 6b: A central idea in genetic algorithms is that cross-breeding good solutions will yield improved solutions. The red trace in Fig 6B suggests that this is not the case for this problem. Does this imply that the crossover step is not beneficial for this problem? If so, it seems this is more a particle based search method rather than a genetic algorithm per se? It would be good if the authors could experiment with leaving out the crossover, or trying different optimisers on this problem.

Thank you for this excellent suggestion! We have ran a number of additional simulations without crossover.S7 Fig depicts the results, while corresponding discussion is provided in lines 545-551 of the revised text:

“The requirement of a high number of elite organisms indicates that interbreeding via crossover operator does not necessarily result in improved results. In order to test whether crossover operator is indeed essential to algorithm convergence we have ran 7 GA runs without the crossover. Indeed, as seen in S7 Fig., leaving out the crossover still results in decent convergence, however some output parameters are much less precise, in particular the error is 100±50% vs 19±23% for IKs, 93±54% vs 32±25% for SERCA, 200±140% vs 26±14% for RyR.”

- Figure 6: The resolution of the attached figure is too low to make this out clearly.

Thank you, like in the case of Fig. 2 we will double check the resolution after conversion on PlosOne website.

- Figure 6: These results are very interesting, and show that a GA with a number of elite organisms can act both as a global and a local optimiser. It would be good to discuss this further in the text. At the same time, the very fast convergence of elite organisms to something very close to the true solution suggests that it may be more efficient to perform a two-stage optimisation. Please comment.

Thank you for the suggestion, this problem is discussed in lines 537-544 and 784-797 of the revised manuscript.

“In other words, large number of elite organisms result in two-stage optimization: initially the whole parametric space is explored, but eventually a number of elite organisms converge to the same minimum attracting the whole population to its vicinity and resulting in effective local optimization. Although PCA plots did not account for intracellular concentrations (see Methods section) it could be seen from comparison of S2 Fig (Cauchy mutation) and Fig 5 (6.6 % elite) that actually convergence to global parametric minimum (MCE and SDist reduction) is simultaneous with concentration changes to some vicinity of input model steady state value.”

“Albeit premature convergence to local minimum of RMSE hinders the solution of the optimization problem, it is preferable to exploit the vicinity of previously visited points once global minimum is localized. This is accomplished in this study by a large number of elite organisms. When elite organisms fall in some vicinity of RMSE minimum two scenarios are possible. If given solution is far from steady state they are going to be either discarded because of major variation of AP waveform in few generations (since these organisms are not susceptible to modification by mutation). If slow variables are close to steady state, elite organisms start to attract the population to the same vicinity. In the latter case, the whole population is going to exploit the vicinity of the optimization problem solution, thus working as a local optimizer. As seen from the comparison between S2 Fig. C-D and Fig. 5B-C intracellular concentrations and parametric convergence share similar dynamics when ratio of elite organisms is high. On the other hand, it might be more effective to use either classic gradient-based methods or Covariance Matrix Adaptation Evolution Strategy [39] once global minimum is localized (i.e. when a number of elite organisms converged to the same vicinity).”

- Figure 7: It is not clear from the caption how this comparison was set up. Did you re-implement the method by Bot et al. and apply it to your data?

Thank you, we have provided a more detailed description in the lines 561-567 of the revised text:

“We have re-implemented the original algorithm by Bot et.al. using Sastry toolbox [32] with the following modifications to it. Firstly, each organism was paced for 50 stimulations at every PCL, i.e. quasi-steady-state was used because it was computationally very expensive to reach actual steady state. Secondly, the normalized AP waveform at 7 PCLs was used as input data. Finally, the least squares technique was used to renormalize input data prior to fitness function evaluation as described in the Methods section.”

- Line 391: "Computer simulations". Please provide more detail here. Why were 1D simulations preferred over single cell simulations, and why 1D instead of 2 or 3D? What integration method was used (and with what time step or tolerances?). How was this implemented? How many cells were stimulated, and from which cell or cells was an AP recorded?

Thank you for this question. We provided detailed explanations on lines 200-206 of the revised text:

“1D model simulations, while being less computationally expensive than 2D or 3D models, correspond to a plane wave propagating in a 3D tissue at a significant distance from the pacing electrode. Moreover, given that in a wide range of conductivities (S1B Fig) exact AP waveform was essentially independent from gap junctions conductivity, we can conclude that in case of a minor 2D or 3D wavefront curvature, additional perturbations by diffusion operator would not affect AP waveform as well.”

- Line 397-399: Please explain this shifting in far greater detail. Which of the two signals was shifted and did this happen once or at every iteration? (In the latter case, how did you stop this method from introducing identifiability problems or local minima?)

Thank you, a more detailed explanation is provided on lines 178-188. One particular trick that we used is assigning large RMSE values to subthreshold depolarizations. AP waveform rescaling was actually performed for every organism. Please also find a brief discussion on the identifiability problem in lines 102-121 of the new manuscript.

“The algorithm is optimized for optical mapping recordings, where absolute transmembrane potential (TP) values are not known, and thus input data (both synthetic and experimental) was normalized by the algorithm. The least-squares technique is used to find scaling and shift coefficients of AP waveform. For every organism, before fitness function evaluation, input AP was shifted along the time axis to superimpose half-maximum depolarization of compared waveforms, after that experimental AP is rescaled and shifted to minimize the deviation between the experiment and the model: , where and coefficients are determined by the least-squares technique. In order to discard subthreshold depolarizations some large error value was assigned to low-amplitude APs (see below “Fitness function” subsection).”

“To the best of our knowledge, this is a first study providing a technique suitable for optical mapping recordings, where only normalized AP waveform is known, but not the exact transmembrane potential values. Arbitrary rescaling and shift of input AP waveform introduces a new dimension to parameters identifiability problem mentioned above. A possible approach to solve this problem is to utilize so-called restitution property, which is AP dependence on heart rate or pacing cycle length (PCL). For example, the sodium current reduction would result not only in the reduction of the amplitude of AP, but also in change of the steady state intracellular ionic concentrations and consequently, in the restitution curve change. Previously, several publications [2,5] utilized restitution property for GA-based cardiac model optimization. For example, Syed et. al. [2] have used atrial AP waveform at several PCLs as input for GA and paced every organism for 10 seconds before fitness function evaluation.”

- Line 407: This is not a standard deviation but a root-mean-squared error.

Thank you, SD was changed to RMSE in text and figures.

- Line 407: The choice of the symbol t^i_max for the number of samples (which is not a time) is confusing.

Thank you, the symbol was changed to N.

- Lines 419-420: "we have found odd number of stimulations helpful to exclude possible 1:1 alternans in the output model." This needs a lot more explanation.

Thank you, we have provided additional explanations on lines 223-226:

“in the case of alternating APs, the waveform (and, consequently, RMSE as well) was different every other generation. Thus, if alternating AP have a low RMSE value on generation N, the RMSE is going to increase on generation N+1.”

- Line 430-431: "This is especially important in terms of intracellular ionic concentrations" Why?

As already mentioned above in this reply to reviewers we have substantially revised the section of the article describing the Cauchy mutation.

- Line 430: Local minima are a problem _for_ optimisation methods, not _of_ them (they are a property of the score function being optimised).

Thank you, the error was fixed, however these lines were removed in the revised manuscript.

- Lines 434-438: Please add more detail. What is a "half-width of the half-maximum of the distribution", and why is 0.18 times the original value a good choice?

The explanation is provided in line 246. “(i.e. PDF is half the maximum value, when )” We have also provided additional S9_Fig demonstrating convergence dependence on γ value.

- Line 441: "We have found that it has adverse effects on algorithm convergence". Please explain where/if this is shown.

Thank you a reference to corresponding Results section is provided on line 259 of the revised manuscript: “See also Fig. 3 and the corresponding Results section.”

- Lines 440-444: Please list the parameters that were varied.

Thank you. The list of parameters was provided in lines 237-240 of the revised text:

“Thus, when input data included 4 pacing frequencies the full list of parameters was: gNa, gKr, gK1, gKs, PCaL, gto, gNaK, gNCX, gpCa, Jrel, Jup, CMDN, CaMKII, [Na+]300 ms, [Na+]500 ms, [Na+]1000 ms, [Na+]2000 ms, [Ca2+]300 ms, [Ca2+]500 ms, [Ca2+]1000 ms, [Ca2+]2000 ms.”

- Lines 441-443: Please explain how applying a mutation to each parameter seperatly differs from your approach. Is the mutation in a parameter i now correlated to the mutation in parameter j somehow?

Additional explanations were provided on lines 257-259:

“For example, in the case of the two-parameter problem: if (,) unit vector was chosen, then both parameters are going to be increased by the same amount after mutation. See also Fig. 3 and the corresponding Results section.”

- Line 453: 1000s? Or 1000*PCL ms?

1000 seconds. We have changed “s” to “seconds” in the text.

- Lines 456: The Andrews plots are cool, but don't offer much insight into which parameters vary. Did you try simpler representations, e.g. radar plots, instead?

Thank you for this suggestion. Radar plots are provided in the revised Fig. 6.

- Line 466 m x n * p. Please add parentheses and use x or . instead of *.

Thank you, we have changed the formula to mxnp on the line 289.

- Lines 524-525 "Libaries in silico processing performed in Morai system.". Please rewrite this.

- Line 529: "python scripts: level1 and level2". What do these scripts do?

Thank you: the CAGE data processing was modified on lines 371-372 and 377-380:

“In silico processing of sequenced CAGE tags was performed by using Moirai system [24].”

“The first script generates CTSS (CAGE tag starting sites) files, where 5' end of the mapped CAGE reads are counted at a single base pair resolution. The second script performs signal clustering on CTSS files with a minimum 10 TPM (tags per million) in at least one sample and minimum distance between clusters of 20 base pairs.”

- Line 541: "(100 s)", 100s or a 100 beats?

We have changed it to 100 seconds, see line 401.

- Line 558: "below". This is no longer below.

Thank you, the Methods section was shifted to initial part of the article.

We would like to thank Reviewer #1 for most thorough and helpful review, which significantly improved our manuscript!

Reviewer #2: 1. Is the manuscript technically sound?

I believe it is. There were multiple moments reading through the paper where I began wondering if the authors had considered something I considered an issue, and then found that they had as I continued reading. I consider this a good sign that the application of the genetic algorithm technique to the problem of personalised cardiac model generation here has been considered in a good amount of depth. The methodology also proves quite successful.

Although there are some points below where I comment on potential issues with the methodology, I do not feel that they compromise the core material of the manuscript (application of a modified GA to achieve action potential model personalisation). The authors should, however, clarify what is going on with the question raised on Line 123.

I cannot comment on the experimental procedures (obtaining imaged voltage data or genetic expression levels).

Thank you for positive assessment of our work. We have thoroughly revised our manuscript to address your concerns and concerns of other reviewers.

-----------------------------------

2. Has the statistical analysis been performed appropriately and rigorously?

I believe so, but it would improve the paper if the results were also presented in a statistical way (i.e. in text, comment on spread of performance across runs, instead of using the worst run as the discussed figure).

Thank you! Mean and SD values are now provided in the revised manuscript.

-----------------------------------

3. Have the authors made all data underlying the findings in their manuscript fully available?

I missed where in the paper this was discussed, but am trusting the authors when they state on the online submission that data has been made available. If it is not yet mentioned in the paper, I suggest a very short (1-2 sentence) new section at the end of Methods that points readers to where data can be obtained.

Yes, we have uploaded the data to datadryad.org. The data is going to be available upon article publication with the following DOI https://doi.org/10.5061/dryad.stqjq2c09 . Meanwhile the data is available to reviewers and editors via the following link https://datadryad.org/stash/share/LZQuhX_jA7dq003OcNniS7B8ngFfIy3J6hEY4644x5o .

-----------------------------------

4. Is the manuscript presented in an intelligible fashion and written in standard English?

I feel the work is rather clumsily written. There are many grammatical errors, but those are a separate issue. What I consider more important is that as a reader, I was given very little idea about how the paper sits within the literature. For example, a list of references of other works using the genetic algorithm in the cardiac context is provided, but then only one is further discussed (as the method there serves as the basis for the authors' modified method). Without going through all of those papers, a reader will have no idea whether those works used experimental/clinical data, or simply demonstrated an ability to recover chosen parameter values in the action potential model (i.e. fitting synthetic data). Flicking through the referenced papers, they do seem to work with experimental data, and so this is not a point of novelty.

Thank you for this important critique! We have extended introduction section and hope that new revision gives a better sense of novelty and reviews previous literature in the field. These are the main points of our motivation:

Firstly, to the best of our knowledge this is a first study discussing GA application to optical mapping recordings of human action potentials, where only a normalized AP waveform is available, but not the exact voltage values. Secondly, in the previous reports which relied on experimental data to unambiguously determine computer model parameters, only single-cell voltage-clamp recordings were utilized. That limits the scope of the GA method application. Most importantly, since it is almost impossible to measure all ionic channels conductivities none of the previous works verified output parameters of GA as applied to experimental data. Please, find new text in the lines 94-133 of the manuscript:

“Another limitation of optimization algorithms as applied to electrophysiological models is the absence of a unique solution. As was noted previously [8–10] same AP waveforms could be reproduced by computer models with different sets of parameters, in other words, model parameters are unidentifiable from the AP. Also, techniques combining stochastic pacing and complicated voltage-clamp protocols have been recently proposed to overcome the problem [4,8]. However, these techniques are limited to single-cell voltage-clamp recordings, which are not feasible in clinical electrophysiology or whole heart and cardiac tissue measurements. The aim of the current study was to develop a technique that would allow one to find a unique solution using optical mapping, microelectrode, or monophasic AP recordings from cardiac tissue or whole heart. To the best of our knowledge, this is a first study providing a technique suitable for optical mapping recordings, where only normalized AP waveform is known, but not the exact transmembrane potential values. Arbitrary rescaling and shift of input AP waveform introduces a new dimension to parameters identifiability problem mentioned above. A possible approach to solve this problem is to utilize so-called restitution property, which is AP dependence on heart rate or pacing cycle length (PCL). For example, the sodium current reduction would result not only in the reduction of the amplitude of AP, but also in change of the steady state intracellular ionic concentrations and consequently, in the restitution curve change. Previously, several publications [2,5] utilized restitution property for GA-based cardiac model optimization. For example, Syed et. al. [2] have used atrial AP waveform at several PCLs as input for GA and paced every organism for 10 seconds before fitness function evaluation. As we demonstrate below in the Final Algorithm subsection of the Results section, this approach results in a poor parameters convergence. We have identified two reasons behind this fact: firstly, intracellular concentrations require much more than 10 seconds to converge to a steady state; secondly, a model with the particular set of parameters may converge to different steady states depending on the initial conditions. In order to address these issues we implemented a modification of GA allowing one to perform parameters optimization and steady state search in the slow variables space simultaneously. After each short simulation variables are saved, modified by genetic operators and reused as initial states for a new generation.

Finally, we have verified the algorithm against the experimental optical mapping AP recordings from the human heart. Since we could not measure ionic channel conductivities directly, we used the following assumption instead (Fig. 1B): these conductivities should be proportional to corresponding proteins mRNA level of expression as measured by Cap Analysis of Gene Expression (CAGE) [21] or RNA-seq. Thus, given that GA output model parameters represent actual ionic channels conductivities, the model rescaled in correspondence with mRNA expression profile differences between two patients, would reproduce AP restitution properties of both patients. Moreover, we have to note that this approach (i.e. combining GA with transcription profile) could be regarded as another technique of model personalization. As we show below, GA signal-to-noise ratio (SNR) requirements are rather strict and hard to accomplish in a clinical setting, while mRNA expression profile is possible to measure from tissue biopsy.”

Instead, a potential point of novelty is the consideration of steady state of the action potential model. The methodology in this paper has been specifically adjusted to recover the steady state as it goes, something which I suspect is novel. So then the questions are, is this indeed novel? If so, the authors should highlight it in the introduction, by pointing out other genetic algorithm applications in cardiac electrophysiology have not considered it (or considered it simply by incurring great computational cost). Additionally, if steady state is so important, the authors should highlight it. Figure 1 shows the effects of choosing different initial conditions, but that is a different issue - if anything, re-using the final state of an organism as the starting point for its offspring seems like it exacerbates this issue (more comments on this below).

Thank you, and yes – this is novel. We have modified the figure (Fig 2 in the revision) and corresponding text according to your recommendation. See text in lines 456-462 of the revised text:

“Why would this technique eventually result in concentrations converging to a steady state? The rationale is the following: if given parameters vector results in an acceptable AP waveform (i.e. close to the input AP), but far from steady state, then this solution is going to be discarded eventually, since waveform is going to change after few beats. For example, if AP of particular organism on generation N minimizes RMSE, but corresponds to one of the dotted lines in Fig. 2B, then RMSE is going to be large for this particular organism on generation N+1 (one of the dashed lines). “

Modifications are made to the genetic algorithm to improve its performance, but again it is not at all clear whether these are only novel in the field of cardiac electrophysiology, or novel ideas of the authors in the GA context. For example, using Cauchy distribution jumps is something I've seen in other optimisation (not GA) contexts (e.g. "Levy-flight" jumps in bio-inspired optimisation algorithms), and I assume has also been used in the GA context already. The authors should check up on this if they haven't, and provide an appropriate reference. Similarly, random vector jumps might indeed not be the norm in GA, but I'm sure they must've been used (many optimisation algorithms, including many that use multiple search agents as GA does, will be doing random multivariate Gaussian jumps, for example). Additionally, if these two modifications were so necessary to get good results here, one wonders how some of the other published GA papers in cardiac electrophysiology got away with not using them (or if they did use them, this should definitely be mentioned and emphasis on those aspects of the paper reduced!).

Thank you for this important point, which we missed! A brief discussion of the subject is now provided in the Discussion section of the revised manuscript as follows:

“It is worth noting that all these modifications to GA essentially make the algorithm similar to particle-based algorithms. For example, multivariate jumps are used by Particle Swarm [38], Covariance Matrix Adaptation Evolution Strategy [39] and Cuckoo Flight [40] algorithms. The last one uses “pathological” distribution similar to Cauchy mutation for Levy-flight walk. Although it would be very interesting to compare these algorithms performance with GA modification proposed in this study, some custom modifications to the aforementioned algorithms are still required: as was noted above same AP waveform could be reproduced by different parameter sets [8]. In order to narrow down the output parameters range, not steady state solutions should be somehow automatically discarded by the algorithm. This kind of extensive benchmarking is beyond the scope of the current study.”

Instead, one major novelty of this work seems to be the success of a GA-fitted model to be used across different members of a population, simply by measuring differences in their relative gene expression levels. Although I don't have a great sense for how easy/hard this is to actually do clinically and so cannot comment too much on the impact level of this finding, this was certainly a unique (and interesting!) aspect of the paper... and one that isn't even mentioned in the introduction! Hence my comment about clumsy writing. As I state many times in this review, I feel this part of the work needs to be highlighted much more.

Thank you very much for this recommendation! We agree completely! The reason we have not emphasized this point earlier was that previous data was not extensive enough to provide well-grounded discussion of the subject. Now during revisions, we were able to collect additional data in address reviewers’ concerns. Please see new Figs 11-12 and corresponding sections of the text. Introduction, Methods, Results and Discussion were all extensively revised to cover this subject.

I also re-iterate that although the English is mostly understandable, there are many grammatical errors and that the manuscript could be improved significantly in this regard. I have highlighted some of the most glaring issues below.

Thank you very much for your numerous helpful recommendations that allowed us to improve the paper significantly, both scientifically and stylistically/grammatically.

-----------------------------------

Specific Issues:

Abstract: It feels a bit number-heavy, and the numbers quoted are not average (with standard deviation) error, but maximum error, which I don't think is typical. It also talks about things like "13mV accuracy" which doesn't make sense to me. Is that a maximum deviance of 13mV at any point along the action potential? A root mean square error of the voltage along the whole action potential? This is not clear. In general, I personally think the "story" of the paper could be much better told than just listing error numbers.

Thank you, we have modified the abstract accordingly.

Line 50: There's no guarantee the parameters found by GA (with the benefit of multiple pacing frequencies) will be unique, but indeed use of multiple pacing frequencies helps a lot. If this wasn't explicitly considered, I recommend softening this language (similar to how it's expressed on line 97).

Thank you, we have modified the text as you recommend. New lines 50-51 read:

“In order to find the set of model parameters we use steady-state action potential waveform dependence on heart rate, known as restitution property.”

Line 98: Again, the intro doesn't mention use of CAGE at all. This seems like a huge mistake!

Thank you, we have provided the discussion of the subject in the lines 122-133 of the revised text:

“Finally, we have verified the algorithm against the experimental optical mapping AP recordings from the human heart. Since we could not measure ionic channel conductivities directly, we used the following assumption instead (Fig. 1B): these conductivities should be proportional to corresponding proteins mRNA level of expression as measured by Cap Analysis of Gene Expression (CAGE) [21] or RNA-seq. Thus, given that GA output model parameters represent actual ionic channels conductivities, the model rescaled in correspondence with mRNA expression profile differences between two patients, would reproduce AP restitution properties of both patients. Moreover, we have to note that this approach (i.e. combining GA with transcription profile) could be regarded as another technique of model personalization. As we show below, GA signal-to-noise ratio (SNR) requirements are rather strict and hard to accomplish in a clinical setting, while mRNA expression profile is possible to measure from tissue biopsy.”

Line 110: It's nice to point out the issue with a sensitivity to initial condition in cardiac AP models. I believe this is an issue that has been commented in the literature, although I cannot think of a reference off the top of my head. One would be nice here (see also comment on Line 120, though).

Thank you, although this issue is rarely considered, we have provided a reference to an important Rudy lab’s paper addressing the subject. Please, see lines 445-446 of the revised text:

“We have to note here that, while sensitivity of cardiac models to initial conditions is rarely considered, is was discussed previously [30].”

Line 114: "Saving organisms state vectors" is not a good figure title, in my opinion. Most of the figure is discussing the behaviour of the action potential model. Speaking of, I feel the caption should mention the model used (O'Hara-Rudy), given the text hasn't yet, and the reader might get curious. I know I did.

Thank you for noticing this glaring omission on our part! The figure caption was modified accordingly.

Line 120: My issue here is, I don't see how the methodology introduced here actually helps with the issue just discussed, of different initial conditions giving different steady state behaviour. If anything, using the final state of an organism as the initial condition of its offspring seems like it will only cause the method to hone in on a certain behaviour, and fail to explore the range of possibilities represented by different initial conditions. I do not consider this a major failing of the work, as I do not think there are many works at all in cardiac electrophysiology that even consider, let alone deal with this issue. Instead, what I think should happen is that the issue is given less focus in the manuscript (if even brought up), and instead, Fig. 1A-C should highlight the importance of steady state (for example, by plotting an action potential at steady state, and one during the transient phase). Assuming steady state even is important, but one is lead to believe it is in this paper, given the focus on achieving it whilst also avoiding excessive computational cost. This change would make Fig. 1 contribute a lot more to the story of the paper.

Thank you! Former Fig. 1 (Fig 2 in the revisions) was modified accordingly. New figure depicts initial and intermediate AP waveform in addition to the steady state. Revised text was also provided in the lines 449-456. Briefly, the offspring initial state is also inherited from the final state of the parent, in between the fitness function evaluations (model runs) the concentrations are modified by mutation and crossover operators, prohibiting certain behavior of the model. Revised text now reads:

“However, given that the set of parameters minimizing RMSE is itself a function of intracellular concentrations, this approach results in an optimizer solving essentially a new problem every generation. Moreover, as shown above (Fig. 2 B-D) “fixed” initial state for the organisms should result in a “fixed” steady state that might be different from input data. The solution that we propose in this study is to perform a simultaneous search in the parametric and slow variables space, i.e. the initial values of slow variables were added to parameters vector making them susceptible to mutation and crossover operators.”

Line 123: It was not originally clear to me how allowing intracellular ion concentrations to vary would speed up convergence. I now see that this is about convergence to steady state over the course of the genetic algorithm. Perhaps this could be reworded somehow to be more clear. Furthermore, I don't understand how [Na+]_i and [Ca2+]_nsr can be parameters chosen by the GA, given that they are variables in the model. Do the authors mean the *initial* value is allowed to vary within the GA? Or am I missing something? This needs to be made clear in the paper.

Thank you, we hope new lines 456-469 discuss the issue much more clearly. Revised text reads:

“Why would this technique eventually result in concentrations converging to a steady state? The rationale is the following: if given parameters vector results in an acceptable AP waveform (i.e. close to the input AP), but far from steady state, then this solution is going to be discarded eventually, since waveform is going to change after few beats. For example, if AP of particular organism on generation N minimizes RMSE, but corresponds to one of the dotted lines in Fig. 2B, then RMSE is going to be large for this particular organism on generation N+1 (one of the dashed lines).

In particular, we suggest including the initial intracellular sodium ([Na+]i) and calcium sarcoplasmic reticulum load ([Ca2+]NSR) at every PCL as model parameters. The [Ca2+]NSR concentration was chosen, because most of the calcium is stored within sarcoplasmic reticulum when the cell is at resting potential. We did not include potassium concentrations in the optimized parameters vector, since 10 mM [K+]i concentration changes results only in approximately 2% Nernst potential change, and thus a major concentration changes have a minor immediate effect on AP waveform.”

Line 125: "1.5mM precision" sounds not great for an intracellular ion concentration (well, maybe [K+]_i!). This seems to be the worst possible value, which is not a great way to report your error, in my opinion. Mean and standard deviation makes a lot more sense (and will look better!)

Thank you, we have modified maximum values to mean and standard deviation throughout the text.

Line 132: SD is not defined yet as far as I saw.

Thank you, the SD was changed to RMSE in new version of the text, while the methods section moved to initial part of the article.

Line 141: Patient 1 and patient 2 seem to be switched here (or in the figure). This was also an issue when the CAGE results were discussed later.

Thank you for pointing out this error! The Patient 1\\2 error was fixed in Fig.1 caption as well as 644-646 of the text.

Line 154: Again, these error values seem to be worst-case.

Thank you, the issue was fixed. Please see lines 506-508 of the manuscript:

“For example, the error of model parameters was 14±24% vs 17±75% for IKs, 11±25% vs 45 ±23% for Ito, 6±6% vs 13 ±15% for ICaL in GA runs with Cauchy and polynomial mutation correspondingly.”

Line 178: "Proper value" is mentioned here. Is that the value for synthetic data? I think this needs to be clearer.

Thank you, the issue was fixed. See lines 504-506 of the text.

“Consequently, when synthetic AP was used as input data, we observed more robust algorithm convergence in case of Cauchy mutation (Fig. 6B).”

Line 181: It's interesting that the best model organism not changing (implying stagnation of the algorithm) is here pointed out as a good thing. Can't argue with the results, though!

Actually, the stagnation of the algorithm itself was not pointed out as a good thing. Instead, we were drawing attention of the reader to the fact that in this example, it is clear that mutation did not spoil the convergence to the steady state. However, we have ran additional tests and noticed that although Cauchy mutation in general resulted in an output state closer to steady-state, the difference was not statistically significant at the higher numbers of generations. Therefore, Cauchy mutation section was modified substantially in the new version.

Line 215: The point here about "fine tuning" needs to be expressed better. I think I understand it now, but it took me a few reads. The idea is that with a small cluster, and offspring generated from the current population of organisms, the algorithm will only search locally and thus waste less time, right?

Yes, this is exactly what we meant; we have modified the text to express it more clearly. See lines lines 524-528 and 537-540 of the Results section, and lines 784-797 of the Discussion section:

“While wide exploration of parametric space is required at the initial stage of algorithm convergence, it is more effective to exploit the global minimum once it was localized (see [31] for discussion on exploration and exploitation). This could be achieved by a large number of elite organisms passing their parameters via crossover operator to siblings after clustering around the same minimum.”

“In other words, large number of elite organisms result in two-stage optimization: initially the whole parametric space is explored, but eventually a number of elite organisms converge to the same minimum attracting the whole population to its vicinity and resulting in effective local optimization.”

“Albeit premature convergence to local minimum of RMSE hinders the solution of the optimization problem, it is preferable to exploit the vicinity of previously visited points once global minimum is localized. This is accomplished in this study by a large number of elite organisms. When elite organisms fall in some vicinity of RMSE minimum two scenarios are possible. If given solution is far from steady state they are going to be either discarded because of major variation of AP waveform in few generations (since these organisms are not susceptible to modification by mutation). If slow variables are close to steady state, elite organisms start to attract the population to the same vicinity. In the latter case, the whole population is going to exploit the vicinity of the optimization problem solution, thus working as a local optimizer. As seen from the comparison between S2 Fig. C-D and Fig. 5B-C intracellular concentrations and parametric convergence share similar dynamics when ratio of elite organisms is high. On the other hand, it might be more effective to use either classic gradient-based methods or Covariance Matrix Adaptation Evolution Strategy [39] once global minimum is localized (i.e. when a number of elite organisms converged to the same vicinity).”

Line 304: This is not a clear way to discuss error, as commented on previously.

Thank you, the issue was fixed.

Discussion: I guess it is up to the editor and journal as to whether this numbered list of highlight points is suitable format for a discussion. It's not my personal preference.

We have modified the discussion, accordingly.

Line 330/343/350: I'm not sure that the modifications to the GA are worth three separate discussion points. In particular, discussion points #3 and #4 should probably be combined and cut down a little, in my opinion.

Thank you. We have removed some parts of the text from discussion, however, after considering reviewers questions, we have decided that some points on algorithm modification have to be discussed more clearly. Thus, the discussion was cut down and extended once again. Please see lines 758-815 of the text.

Line 358: "algorithms" should be "organisms". There are many more grammatical issues, but this one is one that changes the meaning so I have commented on it specifically.

This part of the text was removed according to previous critiques.

Line 376: Tacking on the CAGE results at the end here with a casual mention is not really doing them justice. Those results seem to me like a much bigger deal than the authors make them sound!

Thank you, as mentioned above, we have performed additional experiments and extended corresponding sections.

Line 466: This notation was unclear to me. Combining two different multiplication symbols leaves me guessing which is which. If I guessed right, I think you could just say m x np?

Thank you for the suggestion, we have fixed the text accordingly. Please, see line 289 of the new version of the manuscript.

We would like to thank Reviewer #2 for a thorough and friendly review, which served us a welcome guide during revisions. We hope you find it improved.

5 Feb 2020

PONE-D-19-25321R1

Genetic algorithm-based personalized models of human cardiac action potential.

PLOS ONE

Dear Dr. Syunyaev,

Thank you for submitting your manuscript to PLOS ONE. After careful consideration, we feel that it has merit and has improved significantly, but still does not fully meet PLOS ONE’s publication criteria as it currently stands. Therefore, we invite you to submit a revised version of the manuscript that addresses the points raised during the review process.

Please take the opportunity to further improve the manuscript, following the constructive feedback from Reviewer 2.

We would appreciate receiving your revised manuscript by Mar 21 2020 11:59PM. When you are ready to submit your revision, log on to https://www.editorialmanager.com/pone/ and select the 'Submissions Needing Revision' folder to locate your manuscript file.

To enhance the reproducibility of your results, we recommend that if applicable you deposit your laboratory protocols in protocols.io, where a protocol can be assigned its own identifier (DOI) such that it can be cited independently in the future. For instructions see: http://journals.plos.org/plosone/s/submission-guidelines#loc-laboratory-protocols

A rebuttal letter that responds to each point raised by the academic editor and reviewer(s). This letter should be uploaded as separate file and labeled 'Response to Reviewers'.
A marked-up copy of your manuscript that highlights changes made to the original version. This file should be uploaded as separate file and labeled 'Revised Manuscript with Track Changes'.
An unmarked version of your revised paper without tracked changes. This file should be uploaded as separate file and labeled 'Manuscript'.

Please note while forming your response, if your article is accepted, you may have the opportunity to make the peer review history publicly available. The record will include editor decision letters (with reviews) and your responses to reviewer comments. If eligible, we will contact you to opt in or out.

We look forward to receiving your revised manuscript.

Kind regards,

B. Rodríguez

PLOS ONE

[Note: HTML markup is below. Please do not edit.]

Reviewer's Responses to Questions

1. If the authors have adequately addressed your comments raised in a previous round of review and you feel that this manuscript is now acceptable for publication, you may indicate that here to bypass the “Comments to the Author” section, enter your conflict of interest statement in the “Confidential to Editor” section, and submit your "Accept" recommendation.

Reviewer #1: (No Response)

Reviewer #2: (No Response)

**********

2. Is the manuscript technically sound, and do the data support the conclusions?

The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented.

Reviewer #1: Yes

Reviewer #2: Yes

**********

3. Has the statistical analysis been performed appropriately and rigorously?

Reviewer #1: Yes

Reviewer #2: Yes

**********

4. Have the authors made all data underlying the findings in their manuscript fully available?

The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified.

Reviewer #1: Yes

Reviewer #2: Yes

**********

5. Is the manuscript presented in an intelligible fashion and written in standard English?

PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here.

Reviewer #1: Yes

Reviewer #2: No

**********

6. Review Comments to the Author

Reviewer #1: I'd like to congratulate the authors on a much improved manuscript and a very interesting study.

A few minor concerns remain, which I am confident the authors can address:

- The caption to figure 2 has not been updated. (B,C) should now be (C,D), (D,E) should become (E,F), and the new B panel should be described.

- The first and last sentence of lines 220-229 appear almost contradictory, please rewrite.

- The statement on line 625-626 needs clarification

Finally, time permitting, the authors might consider adding a comparison of GA-obtained parameters with measured RNA levels (e.g. show if/how they correlate).

But as I didn't mention this in the initial review I'm happy to let this go!

Reviewer #2: Firstly I thank the authors for their very thorough response to my comments, and I believe that they have addressed the points I raised very well. The introduction reads a lot better now, in particular, and the discussion is also much improved. The improvements made to figures really help, also!

However, the paper has now also added a lot more results and I feel this has introduced some new issues into the work. I actually wonder if there was a misunderstanding of my (and the other reviewer's) comments - from what I see both of us asked for more emphasis to be placed on the CAGE results, but this meant (certainly in my case anyway) to give these results a proper discussion, mention them in the introduction, and so on. Instead, the authors have now included many more results further testing out the idea of using CAGE and also RNA-seq data to attempt to generate personalised models. This is great! But still actually feels underemphasised in terms of the text (things like discussion of the implications is a bit lacking). I would like to iterate how impressive this result is! I feel the impact of the paper could be greatly improved if more of a focus was given to those results. That said, PLoS One is about publishing correct science without all the emphasis on impact, and so I am not considering this a negative so much as a comment as to how I think the paper could potentially be further improved.

I have selected a recommendation of Minor Revisions. Although I have perhaps a lot of comments below, I would like to express the fact that I think the paper is much improved and with some cleaning up absolutely fit for publication.

I have some general comments, and then specific comments that I have sorted into important, and less important.

----------------------------------------------------------------------------------------------------

* I think the referencing of the supplementary figures is out of order. Not a huge deal, but suboptimal. Additionally, the style of referencing figures is not consistent throughout the paper. This would need to be fixed up in review or proof stage.

* The paper was originally written to have the methods at the back, and this was now changed for this version. However, some of the text in the Results feels like it's there because it's a holdover from the previous revision (or reworked/improved text from the previous version). This has resulted in text in the Results that feels more like justification for choices made in Methods and I think it would improve the paper to move it back into Methods, now that Methods comes before Results.

* The English is on the whole understandable and again I think the writing was improved, things I picked up on last time were given a much better explanation. That said, I still believe there is considerable room to improve the grammar, which would help in making the paper more readable, but also in selling the results and conclusions to the reader. I still consider it the journal's call as to what standard of grammar is acceptable.

For now, I have marked the criteria for "presented in an intelligible fashion and written in standard English" as a No. This is less because of the grammar and more because there are some specific issues (covered in my comments) that I think need to be fixed first. However, I do think the paper is understandable on the whole and I would mark it "Yes" if not for those specific issues.

* My other major comment relates to the arbitrarity of the results that are/are not displayed for the CAGE and RNA-seq model personalisation sections. Some of this might have been me missing something, but why is only Patient 2 compared to data, and not Patients 3-7? Was analysis run using one of the other patients as the baseline (as done for the RNA-seq section?). If so, transparency as to how that went would be good, and if not, this should be stated so that the reader isn't left wondering if Patient 1 was cherrypicked as the best.

Similarly, in the RNA-seq section, why are only patients 8, 9 and 11 used as baselines to create the other models? I appreciate that too many lines in Figure 11 would make it unreadable, but did the authors try using the other patients as baselines also? It would be good to have this at least commented on. Whether a sentence admitting that the three shown were the best three, or that "performance using the other patients as baselines was mixed, and not as good as patients 8 and 9", or whatever the exact case is here. If the analysis wasn't run for those, then say that, so that it doesn't just look like cherrypicking.

* Figure 11 is super low quality in the version I have here. Not sure if it was corrupted as part of the submission process, or if the figure just needs to be created in higher quality for upload. As it stands it's very hard to visually make out some of what it shows (for example, + and x are indistinguishable, and the lines of different colours are hard to tell apart, all can sometimes just appear grey due to image artifacts).

* Table 5 and surrounding discussion need work. See the relevant specific comments below.

-----------------------------------------------------------------------------------------------------

Line 39: This final sentence could be written in a far more impactful way. It also still mentions two patients, even though far more were used in the current results.

Line 65: It is simply not true that single-cell models "remain non-personalized". Line 71 is reviewing other papers who have used GA to recapture an experimental AP.

Line 91: Weird phrasing. The goal of the study is not to "develop a GA implementation suitable for cardiac models", because those already exist and are referenced. I just think this wording is a little strong and also not very specific to the very real benefits your GA approach brings.

Line 171: (B,C) in caption should be (C,D) I think

Line 172: (D,E) in caption should be (E,F) I think

Line 175: This mentions green-tinted boxes which were in Fig. 1A, not 2A. Fix numbering.

Line 221: Use something like "infeasible" instead of "impossible". Something is not "computationally impossible" just because it takes a long time.

Line 425: I was first left wondering why there were essentially two groups, patients 1-7 and then 8-14. I eventually realised that this was because one group was using CAGE, the other RNA-seq. This could really have been expressed much more cleanly at some point around here.

Line 445: I appreciate adding in a reference discussing the issue of sensitivity to initial conditions as I suggested. However the sentence that does so reads very awkwardly and reads very much like one of those "here because reviewer wanted it" sentences. Why not delete this sentence, and instead just add this reference on to the end of the sentence on line 438-439 where you're actually naturally bringing up this point about initial conditions in the first place? Something like:

"Moreover, given different initial states, intracellular concentrations can converge to different steady state values [30], although this issue is very often neglected in cardiac electrophysiology studies."

Line 462: From what I saw in the figure, moving from dot to dashed line is not "one generation" as the text implies here.

Line 463: "We suggest" isn't great wording. The section later goes on to say things like "we did not include potassium concentrations..." so why not stick to that sort of language? "We included the initial intracellular sodium..."

Line 463: The whole first half of this paragraph feels far more like methods than results. The authors talk about an extra modification made to the GA (intracellular ion concentration initial conditions also allowed to vary as part of the GA), but results without this modification are not shown. If results without were shown, and then commented upon, informing this modification that then led to improved results, I could understand putting this content in the Results section. As it is though, the easiest fix seems to be just to move it into the Methods (which also avoids having to generate any new Results).

Line 504: "Consequently, when synthetic AP was used as input data..." This makes it sound like the authors were using something else previously, then switched to synthetic AP data. I know that's not the case, but I just mean this wording is very awkward. Rephrase as something like:

"We observed more robust algorithm convergence in the case of Cauchy mutation, for model fitting to synthetic AP data (Fig. 6B)"

Line 521: "a" and "spoil" are in the wrong order.

(Already commented above as a general point there, but kept here for posterity.)

Line 666: Why are the other patients, 3-7 only having their APs displayed to indicate the extent and character of the variability, as opposed to patient 2 which features explicit comparison between the data and the CAGE-weighted prediction? This feels very arbitrary, essentially like the results worked well for patient 2 and not for any of the other patients. Honestly including patients 3-7 but not properly including them only makes this whole part feel weaker. Does the concept of adjusting the GA-tuned parameters according to CAGE fail for those patients, or was it just not performed for some reason? Is it that APs were not actually measured, only CAGE data, so this is the best that can be done? If so, make this super clear! If it's in here already and I just missed it (possible!), emphasise it, or otherwise, make sure to mention this! Otherwise, the extra patient data here just fills my mind with more questions, as opposed to strengthening the result.

(Already commented above as a general point there, but kept here for posterity.)

Line 685: Not sure if it's just the version I have on .pdf as a reviewer, but Figure 11 (one of the most important figures!) does not look great. There are serious image artifacts and it's actually a struggle for me to make out the lines plotting the AP curves against the data in some cases. The red and blue lines often blur in with one another and I do not have colourblindness, it is simply a result of the artifacting, plus their thinness. Much of this might just be limited to the version I'm looking at, but please see if this can be improved.

Line 705: In contrast to the comment for line 666 the treatment of patients 8-14 is far more transparent. The successes and failures of the approach are shown clearly, there's no hiding. This makes me feel a lot more comfortable, and the fact that the method works at all (even if not always) is still really nice. Seeing changes in DNA map directly through to changes in parameters that actually do a decent job of predicting the patient's AP is a HUGE point in favour of the underlying AP model doing the right thing (something which is always a little in question because there's many competing models that all give different predictions). "which indirectly indicates GA-output parameters precision for this particular case" is seriously understating what is actually a really cool result! I think this part needs way more/better discussion, including both a better emphasis on the good result,and some more insight into why the authors believe other cases may have failed. I know that some thoughts along these lines are given for why the patient 11-based models weren't great, but another point that could be made is that the models always seem to overestimate the APD when they fail to fit the data super well. If the authors have any thinking as to why, that'd be nice to add. Also, the models based on patient 8 and patient 9 seem very similar for most of the patients, but as pointed out, the parameter values themselves are very different. Worth commenting on that everpresent issue of multiple parameter sets giving similar behaviours, and that actually you've found two sets of parameters that also behave pretty much the same across different pacing frequencies.

Line 738: "The difference in APD80 is below 15ms for CL 1000ms". So what? Why pick out one biomarker, at one pacing frequency, and only talk about it? Is this supposed to be picking out the biggest point of discrepancy between the two curves, because otherwise their agreement is very good? If so, please communicate this (and also feel to use stronger words than "relatively close" on line 736, because although it's hard to make out on the low-quality figure, I'd call them a lot more than "relatively close" if I'm seeing them correctly!)

Line 745: Table 5 was an enigma to me before multiple re-reads. Are these multipliers? Compared to what baseline? The text says that RyR is 1.5-3.5 times higher for the Patient 8 based model for Patient 9 (as compared to the direct GA fit to patient 9 data) and yet the RyR values are 0.690 and 0.825, respectively, so this is not true! The caption for Table 5 really needs more information, and/or the text does. A reword would help make it so much more understable, too.

For example, if I'm indeed understanding it right, something like:

"Table 5: Parameter values (relative to baseline) selected for models for Patient 9 derived via GA, or via comparison of mRNA levels to reference patient"

And then the headings are essentially what you have, but make it clear that those are the reference patient. As it is, it's too easy for me to glance at the table and think the first column is results for Patient 8, and the last column for Patient 10. When it's not, all columns show results for Patient 9, but with models derived in different ways (as the caption tries to say, but I feel my wording just provided achieves this much better).

Line 748: The opening sentence of the discussion is very disagreeable to me. The paper hasn't really introduced a GA-based technique that allows personalization of AP models using their dependence on the pacing rate, despite claiming this. Refs [2,5] in the paper are already credited earlier in the paper as doing that. What this paper *does* do is bring improvements to the methodology. And then the CAGE and RNA-seq stuff, which is really striking. The discussion should tie these results together. "Look, our GA fits using our new improved methodology are so good, that we can even go and scale them based on DNA/RNA readings and go and recapture intersubject variability!" Perhaps that claim is a little strong, as poorer fits from older methods might still achieve much the same and this wasn't explored, but I think that is the direction you should be phrasing things in.

Line 770: "This approach has two disadvantages:" is not the way this should be worded. Why not instead something like "There remained two issues to be addressed:". The two things that are discussed are not so much disadvantages of your approach, as just challenges for any method trying to find steady state quickly insides of a GA (or other optimisation method). So my wording here takes out the unnecessary badmouthing of your method.

Line 869: Presuming I'm interpreting the figure right (low quality makes it hard to see), I think the Patient 9-based model did a much better job than the authors give it credit for, here. Patient 8 did the best, but Patient 9 actually fits well to more patient+pacing rate combinations than it fails to fit to. This is pretty good performance for a parameter tuning that is coming from indirect measurements, and as I mentioned earlier, actually has implications regarding how much confidence to give our AP model predictions (in a good way!). I don't think it should just be lumped in with the Patient 11-based model as "failing to fit".

Specific Comments (Less Important / Positives)

-----------------------------------------------------------------------------------------------------

Lines 93-133: This was a big improvement over the previous version. Well written.

Line 168: Figure 2 is much improved!

Line 181: The explanation starting here was a bit hard to understand. Not a major point but I think it could be expressed more clearly.

Line 268: Maybe put the i's and NSR's on these concentrations, though I appreciate it's uglier when you also want to use the subscript to mark the pacing frequency. Not a major issue to me either way.

Line 286: I agree with the other reviewer's first round comments, I think that the principal components are a much nicer way to visualise this!

Line 302: The point is labelled here R, and another later A, but then R and A are not referred to again from what I saw. Labelling them like this just made it more confusing to me, (x0,y0) and (xc,yc) would be fine.

Line 316: Figure 5 could probably be moved to the supplement, in my opinion. The point that large numbers of elite particles performs better is already made in Figure 4.

Line 456: This is a nice explanation of why the method presented in the paper could be expected to work. I feel like it fits better where the method is presented, instead of in results, but this isn't major.

Line 584: The section "Input data requirements" talks first about the amount of restitution information required for different parameters to be correctly identified, and then the requirements in terms of noise. I would actually just break this up into two separate sections, with more informative headings on each. "Input data requirements" doesn't immediately make me think of restitution/requirements on number of pacing frequencies. Also, it seems to me like other target values for the parameters could likely change how many different frequencies are required to identify those different parameters. So talking about this like a requirement on the input data has been identified feels a little wrong.

Line 604: On the other hand, while it might be true that different APs also showed a different sensitivity to noise, I'm far more comfortable with this section being expressed in terms of the "required signal-to-noise ratio". I also appreciate its inclusion, it was interesting and I think it's valuable.

**********

7. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Reviewer #1: Yes: Michael Clerx

Reviewer #2: No

[NOTE: If reviewer comments were submitted as an attachment file, they will be attached to this email and accessible via the submission site. Please log into your account, locate the manuscript record, and check for the action link "View Attachments". If this link does not appear, there are no attachment files to be viewed.]

While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com/. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Registration is free. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email us at figures@plos.org. Please note that Supporting Information files do not need this step.

20 Mar 2020

For the convenience of reviewers the text below is attached as a separate file.

We are grateful to the editor and reviewers for the opportunity to revise and resubmit our manuscript. In order to expedite the review process we restated reviewers’ questions in green font and our provided our answers in blue font.

Reviewer #1: I'd like to congratulate the authors on a much improved manuscript and a very interesting study.

We thank the Reviewer very much for the positive assessment of the revised manuscript.

A few minor concerns remain, which I am confident the authors can address:

- The caption to figure 2 has not been updated. (B,C) should now be (C,D), (D,E) should become (E,F), and the new B panel should be described.

Thank you, we have fixed the error in figure caption.

- The first and last sentence of lines 218-228 appear almost contradictory, please rewrite.

Thank you, we believe that we have expressed the idea more clearly in the new revision, new text reads as follows:

“Computational cost of pacing every organism at every generation until reaching steady state during a GA run is prohibitively high. Thus, we saved each AP after a short simulation… As a result, each organism approaches closer to steady state variables with every generation.”

- The statement on line 634-636 needs clarification

Thank you, we have included a sentence explaining the statement:

“One possible explanation of this heterogeneity is uneven perfusion of the sample resulting in a mild ischemia that shortens AP duration because of activation of the ATP-dependent potassium channels IK,ATP.”

Finally, time permitting, the authors might consider adding a comparison of GA-obtained parameters with measured RNA levels (e.g. show if/how they correlate).

But as I didn't mention this in the initial review I'm happy to let this go!

Indeed, we have considered the correlation between output parameters and mRNA levels. However, given the limited amount of experimental data, we concluded that correlation is not too meaningful. As we mentioned several times throughout the text, we believe that at least in some cases experimental tissue was exposed to ischemic conditions. We also demonstrate in the article that Patients 9 and 11 models performance are not perfect: the former underestimated the sodium current, while the latter overestimated the APD. Moreover, as we demonstrate in Fig. 6, the precision of some parameters, such as low-amplitude ionic currents, was relatively imprecise. We thus concluded that, given the limited amount of data, model rescaling is the best way to verify the algorithm precision. For example, low-amplitude currents would spoil the correlation, but are less likely to affect AP waveform after the parameters rescaling. However, we will address this important issue in the future studies.

Reviewer #2: Firstly I thank the authors for their very thorough response to my comments, and I believe that they have addressed the points I raised very well. The introduction reads a lot better now, in particular, and the discussion is also much improved. The improvements made to figures really help, also!

However, the paper has now also added a lot more results and I feel this has introduced some new issues into the work. I actually wonder if there was a misunderstanding of my (and the other reviewer's) comments - from what I see both of us asked for more emphasis to be placed on the CAGE results, but this meant (certainly in my case anyway) to give these results a proper discussion, mention them in the introduction, and so on. Instead, the authors have now included many more results further testing out the idea of using CAGE and also RNA-seq data to attempt to generate personalised models. This is great! But still actually feels underemphasised in terms of the text (things like discussion of the implications is a bit lacking). I would like to iterate how impressive this result is! I feel the impact of the paper could be greatly improved if more of a focus was given to those results. That said, PLoS One is about publishing correct science without all the emphasis on impact, and so I am not considering this a negative so much as a comment as to how I think the paper could potentially be further improved.

I have selected a recommendation of Minor Revisions. Although I have perhaps a lot of comments below, I would like to express the fact that I think the paper is much improved and with some cleaning up absolutely fit for publication.

We thank the reviewer for this positive evaluation of the new revision and new valuable comments and recommendations. We have carefully considered the reviewer’s recommendations and modified the manuscript accordingly.

I have some general comments, and then specific comments that I have sorted into important, and less important.

----------------------------------------------------------------------------------------------------

* I think the referencing of the supplementary figures is out of order. Not a huge deal, but suboptimal. Additionally, the style of referencing figures is not consistent throughout the paper. This would need to be fixed up in review or proof stage.

Thank you, we have fixed the order and formatting of the references.

* The paper was originally written to have the methods at the back, and this was now changed for this version. However, some of the text in the Results feels like it's there because it's a holdover from the previous revision (or reworked/improved text from the previous version). This has resulted in text in the Results that feels more like justification for choices made in Methods and I think it would improve the paper to move it back into Methods, now that Methods comes before Results.

Thank you, we have moved some portions of the text to methods, in particular the description of the slow variables mutations was shifted to the “Methods” section as recommended by the reviewer. While we agree that on many occasions “Results” section repeats some of the information already mentioned in the “Methods” section, we have to note, that half of the figures is indeed the validation of the GA modifications. Although, mRNA-related study is probably much more impressive, we believe that these modifications were crucial to the successful model rescaling, and thus this validation was necessary. In our opinion, in most of the cases repeating some information from the “Methods” was necessary to spare the reader from flipping pages back and forth.

* The English is on the whole understandable and again I think the writing was improved, things I picked up on last time were given a much better explanation. That said, I still believe there is considerable room to improve the grammar, which would help in making the paper more readable, but also in selling the results and conclusions to the reader. I still consider it the journal's call as to what standard of grammar is acceptable.

For now, I have marked the criteria for "presented in an intelligible fashion and written in standard English" as a No. This is less because of the grammar and more because there are some specific issues (covered in my comments) that I think need to be fixed first. However, I do think the paper is understandable on the whole and I would mark it "Yes" if not for those specific issues.

* My other major comment relates to the arbitrarity of the results that are/are not displayed for the CAGE and RNA-seq model personalisation sections. Some of this might have been me missing something, but why is only Patient 2 compared to data, and not Patients 3-7? Was analysis run using one of the other patients as the baseline (as done for the RNA-seq section?). If so, transparency as to how that went would be good, and if not, this should be stated so that the reader isn't left wondering if Patient 1 was cherrypicked as the best.

Similarly, in the RNA-seq section, why are only patients 8, 9 and 11 used as baselines to create the other models? I appreciate that too many lines in Figure 11 would make it unreadable, but did the authors try using the other patients as baselines also? It would be good to have this at least commented on. Whether a sentence admitting that the three shown were the best three, or that "performance using the other patients as baselines was mixed, and not as good as patients 8 and 9", or whatever the exact case is here. If the analysis wasn't run for those, then say that, so that it doesn't just look like cherrypicking.

We thank the reviewer for pointing this omission on our part; we have revised the Methods section to avoid misinterpretation of the actual results, new text in the lines 411-413 and 421-427 reads:

“In the case of CAGE group of patients, functional data was not available for Patients 3-7, thus only Patients 1 and 2 were used for algorithm verification.”

“Due to computational limitations, after preliminary analysis several patients available were excluded and not used as input to GA. In particular, Patient 10 was excluded because of a very long depolarization time (see “Experimental data” subsection of the “Results” section for a brief discussion of optical mapping artefacts affecting depolarization phase). Patients 12 and 13 were excluded because of the very short APD (below 300 ms, which indicated ischemia of the preparation). Patients 12 and 14 were excluded due to low signal-to-noise ratio, while at high frequencies alternans was also observed for Patient 14.”

* Figure 11 is super low quality in the version I have here. Not sure if it was corrupted as part of the submission process, or if the figure just needs to be created in higher quality for upload. As it stands it's very hard to visually make out some of what it shows (for example, + and x are indistinguishable, and the lines of different colours are hard to tell apart, all can sometimes just appear grey due to image artifacts).

We are sorry, but actually, we believe the Figure 11 (Figure 10 in the new revision) resolution to be very high. Although we cannot control the conversion from TIFF to the PDF on the PLOS One site, we suggest downloading the high definition figure via the link available from the PDF above the figure. However, we are not sure if the link works for reviewers.

* Table 5 and surrounding discussion need work. See the relevant specific comments below.

Thank you we have modified the table and the text according to suggestion.

-----------------------------------------------------------------------------------------------------

Line 39: This final sentence could be written in a far more impactful way. It also still mentions two patients, even though far more were used in the current results.

We have modified the abstract according to suggestion, new version reads:

“We have demonstrated that mRNA-based models predict the AP waveform restitution with high precision. The latter also provides a novel technique of model personalization that makes it possible to map gene expression profile to cardiac function.”

Line 65: It is simply not true that single-cell models "remain non-personalized". Line 71 is reviewing other papers who have used GA to recapture an experimental AP.

Thank you for pointing out this inaccuracy, we have removed the sentence from the text.

Line 91: Weird phrasing. The goal of the study is not to "develop a GA implementation suitable for cardiac models", because those already exist and are referenced. I just think this wording is a little strong and also not very specific to the very real benefits your GA approach brings.

Thank you we have changed the text in the lines 89-90:

“One of the goals of the current study was to develop robust GA implementation making it possible to find the set of cardiac electrophysiology model parameters without premature convergence to sub-optimal solution.”

Line 171: (B,C) in caption should be (C,D) I think

Line 172: (D,E) in caption should be (E,F) I think

Thank you! The error in the figure caption was corrected.

Line 175: This mentions green-tinted boxes which were in Fig. 1A, not 2A. Fix numbering.

Thank you, the numbering was fixed.

Line 221: Use something like "infeasible" instead of "impossible". Something is not "computationally impossible" just because it takes a long time.

The text was fixed according to suggestion. Please, see line 216 in the new revision.

Line 425: I was first left wondering why there were essentially two groups, patients 1-7 and then 8-14. I eventually realised that this was because one group was using CAGE, the other RNA-seq. This could really have been expressed much more cleanly at some point around here.

Thank you, the clarification was added to the methods section, lines 402-404:

“Genome-wide transcription profile was measured via CAGE (for Patients 1-7) or RNA-Seq (for Patients 8-14) techniques as described above.”

Line 445: I appreciate adding in a reference discussing the issue of sensitivity to initial conditions as I suggested. However the sentence that does so reads very awkwardly and reads very much like one of those "here because reviewer wanted it" sentences. Why not delete this sentence, and instead just add this reference on to the end of the sentence on line 438-439 where you're actually naturally bringing up this point about initial conditions in the first place? Something like:

"Moreover, given different initial states, intracellular concentrations can converge to different steady state values [30], although this issue is very often neglected in cardiac electrophysiology studies."

Thank you, we have changed the text according to your recommendation.

Line 462: From what I saw in the figure, moving from dot to dashed line is not "one generation" as the text implies here.

Actually, since 9 stimuli were used for every generation, moving from dotted to dashed line is a single generation indeed. We have added a brief explanation that we hope will clarify this issue (line 460):

“For example, if AP of particular organism on generation N minimizes RMSE, but corresponds to one of the dotted lines in Fig. 2B, then RMSE is going to be large for this particular organism on generation N+1 (i.e. after 9 beats, one of the dashed lines).”

The text above (lines 445-446) reads

“Instead of the direct approach, we have evaluated the fitness function after a short run (9 stimulations at each PCL).”

Line 463: "We suggest" isn't great wording. The section later goes on to say things like "we did not include potassium concentrations..." so why not stick to that sort of language? "We included the initial intracellular sodium..."

The corresponding text was removed from the new version of the manuscript. Thank you.

Line 463: The whole first half of this paragraph feels far more like methods than results. The authors talk about an extra modification made to the GA (intracellular ion concentration initial conditions also allowed to vary as part of the GA), but results without this modification are not shown. If results without were shown, and then commented upon, informing this modification that then led to improved results, I could understand putting this content in the Results section. As it is though, the easiest fix seems to be just to move it into the Methods (which also avoids having to generate any new Results).

Thank you, the text was moved to the lines 258-263.

Line 504: "Consequently, when synthetic AP was used as input data..." This makes it sound like the authors were using something else previously, then switched to synthetic AP data. I know that's not the case, but I just mean this wording is very awkward. Rephrase as something like:

"We observed more robust algorithm convergence in the case of Cauchy mutation, for model fitting to synthetic AP data (Fig. 6B)"

Thank you, the text was modified according to the recommendations. Please see lines 496-497 in the new revision.

Line 521: "a" and "spoil" are in the wrong order.

Thank you, the error was fixed.

(Already commented above as a general point there, but kept here for posterity.)

Line 666: Why are the other patients, 3-7 only having their APs displayed to indicate the extent and character of the variability, as opposed to patient 2 which features explicit comparison between the data and the CAGE-weighted prediction? This feels very arbitrary, essentially like the results worked well for patient 2 and not for any of the other patients. Honestly including patients 3-7 but not properly including them only makes this whole part feel weaker. Does the concept of adjusting the GA-tuned parameters according to CAGE fail for those patients, or was it just not performed for some reason? Is it that APs were not actually measured, only CAGE data, so this is the best that can be done? If so, make this super clear! If it's in here already and I just missed it (possible!), emphasise it, or otherwise, make sure to mention this! Otherwise, the extra patient data here just fills my mind with more questions, as opposed to strengthening the result.

We thank the reviewer for pointing this out, as mentioned above, the Methods section was changed accordingly to clarify this issue. Moreover, short explanation was added to the lines 664-666:

“As noted above, functional data was not available for these particular patients, but the variability between the models is within physiological range [11].”

(Already commented above as a general point there, but kept here for posterity.)

Line 685: Not sure if it's just the version I have on .pdf as a reviewer, but Figure 11 (one of the most important figures!) does not look great. There are serious image artifacts and it's actually a struggle for me to make out the lines plotting the AP curves against the data in some cases. The red and blue lines often blur in with one another and I do not have colourblindness, it is simply a result of the artifacting, plus their thinness. Much of this might just be limited to the version I'm looking at, but please see if this can be improved.

As we answered above, we think that the only solution of the problem is downloading high resolution figures using the link, but we are not sure if the link from PDF document works for reviewers. We have double-checked that the resolution of the figure is high and the high resolution images are available for download to us.

Line 705: In contrast to the comment for line 666 the treatment of patients 8-14 is far more transparent. The successes and failures of the approach are shown clearly, there's no hiding. This makes me feel a lot more comfortable, and the fact that the method works at all (even if not always) is still really nice. Seeing changes in DNA map directly through to changes in parameters that actually do a decent job of predicting the patient's AP is a HUGE point in favour of the underlying AP model doing the right thing (something which is always a little in question because there's many competing models that all give different predictions). "which indirectly indicates GA-output parameters precision for this particular case" is seriously understating what is actually a really cool result! I think this part needs way more/better discussion, including both a better emphasis on the good result,and some more insight into why the authors believe other cases may have failed. I know that some thoughts along these lines are given for why the patient 11-based models weren't great, but another point that could be made is that the models always seem to overestimate the APD when they fail to fit the data super well. If the authors have any thinking as to why, that'd be nice to add. Also, the models based on patient 8 and patient 9 seem very similar for most of the patients, but as pointed out, the parameter values themselves are very different. Worth commenting on that everpresent issue of multiple parameter sets giving similar behaviours, and that actually you've found two sets of parameters that also behave pretty much the same across different pacing frequencies.

We thank the reviewer for positive evaluation of our results. We do hope that the new discussion is more comprehensive and impactful. Several clarifications were added to the Results section, but, most importantly, new lines 865-889 read:

“This heterogeneity also explains why models shown in Fig. 10 tend to overestimate the APD. In order to exclude possible effect of ATP-dependent shortening of AP, we have manually chosen the recording with the longest AP as GA-input. On the other hand, probable local tissue damage at the site of recording might result in an opposite effect, i.e. downregulation of K-channels [48] prolonging the APD.

In the Fig. 10 and Table 5 we also demonstrate that minor changes in AP waveform may result in major changes of parameters values. Despite the differences in Patients 9-11-based models, in some cases the AP waveforms and restitution were very similar. Consequently, this imposes a limitation on the GA-output model: it has to follow experimental data very closely and perturbations of experimental AP waveform are likely to spoil algorithm performance. In our opinion, this fact also implies that the modifications we introduced to GA were crucial to the successful gene expression-based prediction of AP waveform.

As was noted previously [49] modern cardiac models coalesce multiple studies performed on different species, using different experimental conditions. Moreover, given the complexity of modern models, it is possible to describe particular dataset using different parameters [8-10] (see also Table 5 and Fig. 10). These facts leave us with questions: even if the model describes the particular dataset underlying it, what is the predictive capability of computer simulations? Is it possible to extrapolate model predictions to different clinical or experimental conditions? For example, it is possible that the model description of a particular ionic current is imprecise, but other model components are instead tuned to counterbalance this imprecision. In this case, the model components rescaling would most probably upset the balance and result in model failure to predict AP waveform. In this study, we have shown that it is possible to use computational model to map expression profile to cardiac function and predict actual AP waveform (Figs 9-11). This fact makes a point not only in favor of particular GA-output parameters set, but also indicates the underlying computational model precision. ”

Lines 895-899:

“Another possible implication of the provided technique is drug effects investigation. For example, ionic channel blockers often affect several ionic currents. Using GA to measure the effects of a drug provides a cheaper way to recover several ionic currents dose-response curve simultaneously then the patch-clamp technique.”

Lines 747-752 in the result section:

“It is also interesting that despite major differences in parameters, models behavior was surprisingly similar not only in the case of Patient 9, but also in a number of other cases. At least partially, this could be explained by similarities in expression profiles: for example, differences between Patient 11 and Patient 13 genes expression corresponding to IKr, IKs, Ito, ICaL, RyR and CaMKII were less than 9 % (see Table C in S1 text). ”

Lines 908-912 in the Limitations section:

“Patient 11-model appears to underestimate the sodium current and consequently Patient 11-based models failed to depolarize in half of the cases. Patient 9-based model seem to overestimate APD and, consequently, in half of the cases it was impossible to pace the model at the fastest frequency (Fig. 10). This implies the imprecision of Patients 9 and 11 GA-output parameter values.”

Line 738: "The difference in APD80 is below 15ms for CL 1000ms". So what? Why pick out one biomarker, at one pacing frequency, and only talk about it? Is this supposed to be picking out the biggest point of discrepancy between the two curves, because otherwise their agreement is very good? If so, please communicate this (and also feel to use stronger words than "relatively close" on line 736, because although it's hard to make out on the low-quality figure, I'd call them a lot more than "relatively close" if I'm seeing them correctly!)

Thank you, we have removed the sentence and instead inserted a portion of text that, we believe, communicates the idea better:

“Surprisingly, in this case, the GA-output model actually performed worse than the mRNA-based model, the difference is most prominent at 250 ms PCL: in the former case (Patient 9-based model) APD80 error was 26 ms, in the latter (Patient 8-based model) APD80 error was 3 ms. On the other hand, this difference resulted only in slight RMSE250ms difference: 5.7 vs 5.0 mV correspondingly. This implies high sensitivity of GA to AP perturbations that was shown above by noise sensitivity analysis: GA-output model should follow experimental AP waveform closely and minor experimental artefacts might cause major change in parameter values”

Line 745: Table 5 was an enigma to me before multiple re-reads. Are these multipliers? Compared to what baseline? The text says that RyR is 1.5-3.5 times higher for the Patient 8 based model for Patient 9 (as compared to the direct GA fit to patient 9 data) and yet the RyR values are 0.690 and 0.825, respectively, so this is not true! The caption for Table 5 really needs more information, and/or the text does. A reword would help make it so much more understable, too.

For example, if I'm indeed understanding it right, something like: "Table 5: Parameter values (relative to baseline) selected for models for Patient 9 derived via GA, or via comparison of mRNA levels to reference patient". And then the headings are essentially what you have, but make it clear that those are the reference patient. As it is, it's too easy for me to glance at the table and think the first column is results for Patient 8, and the last column for Patient 10. When it's not, all columns show results for Patient 9, but with models derived in different ways (as the caption tries to say, but I feel my wording just provided achieves this much better).

We thank the reviewer for pointing the issue out; the Table 5 was very hard to understand indeed. Regarding the RyR, we apologize, because it is an error on our side. Explanation in the text was modified. Lines 728-730: “Table 5 lists the parameter values for three variants of Patient 9 model: GA-output model, and two transcription profile-based models.”

Table 5 caption:

“Table 5: Parameter values (relative to baseline O’Hara-Rudy model [15]) of Patient 9 models derived via GA or via comparison of mRNA levels to reference patient.”

The headings also mention “Reference patient” now.

Line 748: The opening sentence of the discussion is very disagreeable to me. The paper hasn't really introduced a GA-based technique that allows personalization of AP models using their dependence on the pacing rate, despite claiming this. Refs [2,5] in the paper are already credited earlier in the paper as doing that. What this paper *does* do is bring improvements to the methodology. And then the CAGE and RNA-seq stuff, which is really striking. The discussion should tie these results together. "Look, our GA fits using our new improved methodology are so good, that we can even go and scale them based on DNA/RNA readings and go and recapture intersubject variability!" Perhaps that claim is a little strong, as poorer fits from older methods might still achieve much the same and this wasn't explored, but I think that is the direction you should be phrasing things in.

Thank you. We agree. We have revised the opening sentence and added a few sentences here:

“In this study, firstly, we have introduced a novel GA modification allowing one to personalize cardiac electrophysiology models using steady-state AP recordings dependence on PCL as input data. The algorithm modification is based on the idea that parametric optimization and slow variables steady state search should be performed simultaneously for effective convergence. Secondly, we have tested the modifications we introduced on synthetic action potentials proving our modifications to be advantageous for overall algorithm performance. Finally, we have tested the algorithm performance against cardiac optical mapping experimental data and mRNA expression profile. The output parameters precision was confirmed by the observation that mRNA-based models predict patients AP waveform and restitution. Essentially, a combination of GA with the mRNA expression measurements provides a novel technique for model personalization.”

Line 770: "This approach has two disadvantages:" is not the way this should be worded. Why not instead something like "There remained two issues to be addressed:". The two things that are discussed are not so much disadvantages of your approach, as just challenges for any method trying to find steady state quickly insides of a GA (or other optimization method). So my wording here takes out the unnecessary badmouthing of your method.

Thank you for pointing out this unclear text in our manuscript. Actually, we intended, unsuccessfully, to discuss and justify the mutation of slow variables. We revised the text to express it hopefully more clearly. Please see lines 782-788:

“Without further modifications this approach has two limitations:… In order to address these issues, two slow variables: intracellular sodium concentration and sarcoplasmic reticulum calcium load are treated as model parameters by the algorithm”.

Line 869: Presuming I'm interpreting the figure right (low quality makes it hard to see), I think the Patient 9-based model did a much better job than the authors give it credit for, here. Patient 8 did the best, but Patient 9 actually fits well to more patient+pacing rate combinations than it fails to fit to. This is pretty good performance for a parameter tuning that is coming from indirect measurements, and as I mentioned earlier, actually has implications regarding how much confidence to give our AP model predictions (in a good way!). I don't think it should just be lumped in with the Patient 11-based model as "failing to fit".

We thank the reviewer for this suggestion. However, we do believe Patient 9 and Patient 11 models to be relatively imprecise. The text was modified to communicate this message:

“Patient 11-model appears to underestimate the sodium current and consequently Patient 11-based models failed to depolarize in half of the cases. Patient 9-based model seem to overestimate APD and, consequently, in half of the cases it was impossible to pace the model at the fastest frequency (Fig. 10). This implies the imprecision of Patients 9 and 11 GA-output parameter values.”

Specific Comments (Less Important / Positives)

-----------------------------------------------------------------------------------------------------

Lines 93-133: This was a big improvement over the previous version. Well written.

Line 168: Figure 2 is much improved!

Thank you!

Line 181: The explanation starting here was a bit hard to understand. Not a major point but I think it could be expressed more clearly.

We hope that new text is clearer:

“The algorithm is optimized for optical AP recordings, where absolute transmembrane potential (TP) values are not known, and thus input data (both synthetic and experimental) is renormalized by the algorithm prior to every fitness function calculation. The following technique was used for renormalization. Firstly, input AP waveform is shifted along the time axis in order to superimpose compared waveforms; in particular, half-maximum depolarization of the waveforms to be compared is used as a reference point. After that, input AP is rescaled: , where and coefficients are determined by the least-squares technique to minimize the deviation between input and output AP.”

Line 268: Maybe put the i's and NSR's on these concentrations, though I appreciate it's uglier when you also want to use the subscript to mark the pacing frequency. Not a major issue to me either way.

Line 286: I agree with the other reviewer's first round comments, I think that the principal components are a much nicer way to visualise this!

Thank you!

Line 302: The point is labelled here R, and another later A, but then R and A are not referred to again from what I saw. Labelling them like this just made it more confusing to me, (x0,y0) and (xc,yc) would be fine.

R and A were removed from the text.

Line 316: Figure 5 could probably be moved to the supplement, in my opinion. The point that large numbers of elite particles performs better is already made in Figure 4.

We have shifted the figure to the supplement.

Line 456: This is a nice explanation of why the method presented in the paper could be expected to work. I feel like it fits better where the method is presented, instead of in results, but this isn't major.

Thank you, we have shifted the text partially, but the text below also provides the proof that output variables are steady-state indeed, thus we believe that the manuscript is more comprehensible with the explanation in the results section.

Line 584: The section "Input data requirements" talks first about the amount of restitution information required for different parameters to be correctly identified, and then the requirements in terms of noise. I would actually just break this up into two separate sections, with more informative headings on each. "Input data requirements" doesn't immediately make me think of restitution/requirements on number of pacing frequencies. Also, it seems to me like other target values for the parameters could likely change how many different frequencies are required to identify those different parameters. So talking about this like a requirement on the input data has been identified feels a little wrong.

Line 604: On the other hand, while it might be true that different APs also showed a different sensitivity to noise, I'm far more comfortable with this section being expressed in terms of the "required signal-to-noise ratio". I also appreciate its inclusion, it was interesting and I think it's valuable.

We have divided the text in two subsections (“Input data requirements: restitution information.” and “Input data requirements: signal-to-noise ratio.”). Indeed changing the balance between ionic currents considerably (for example increasing IKs, while decreasing IKr) would affect the algorithm sensitivity to the amount of restitution information. We have acknowledged it in the former subsection, lines 587-590:

“While using considerably different target parameter values would affect the results shown in this section, it provides an estimate of the amount of restitution information required to determine identifiable parameters. Consequently, we have recorded the AP waveform at 4 PCLs in the experiments described below.”

31 Mar 2020

Genetic algorithm-based personalized models of human cardiac action potential.

PONE-D-19-25321R2

Dear Dr. Syunyaev,

We are pleased to inform you that your manuscript has been judged scientifically suitable for publication and will be formally accepted for publication once it complies with all outstanding technical requirements.

Within one week, you will receive an e-mail containing information on the amendments required prior to publication. When all required modifications have been addressed, you will receive a formal acceptance letter and your manuscript will proceed to our production department and be scheduled for publication.

Shortly after the formal acceptance letter is sent, an invoice for payment will follow. To ensure an efficient production and billing process, please log into Editorial Manager at https://www.editorialmanager.com/pone/, click the "Update My Information" link at the top of the page, and update your user information. If you have any billing related questions, please contact our Author Billing department directly at authorbilling@plos.org.

If your institution or institutions have a press office, please notify them about your upcoming paper to enable them to help maximize its impact. If they will be preparing press materials for this manuscript, you must inform our press team as soon as possible and no later than 48 hours after receiving the formal acceptance. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information, please contact onepress@plos.org.

With kind regards,

B. Rodríguez

PLOS ONE

Please revise grammar and typos throughout the manuscript. There are still minor language errors.

21 Apr 2020

PONE-D-19-25321R2

Genetic algorithm-based personalized models of human cardiac action potential.

Dear Dr. Syunyaev:

I am pleased to inform you that your manuscript has been deemed suitable for publication in PLOS ONE. Congratulations! Your manuscript is now with our production department.

For any other questions or concerns, please email plosone@plos.org.

Thank you for submitting your work to PLOS ONE.

With kind regards,

PLOS ONE Editorial Office Staff

on behalf of

Prof B. Rodríguez