PLoS ONE
Public Library of Science
image
Low LEF1 expression is a biomarker of early T-cell precursor, an aggressive subtype of T-cell lymphoblastic leukemia
Volume: 15, Issue: 5
DOI 10.1371/journal.pone.0232520
  • PDF   
  • XML   
  •       
Abstract

Early T-cell precursor (ETP) is the only subtype of acute T-cell lymphoblastic leukemia (T-ALL) listed in the World Health Organization (WHO) classification of myeloid neoplasms and acute leukemia. Patients with ETP tend to have worse disease outcomes. ETP is defined by a series of immune markers. The diagnosis of ETP status can be vague due to the limitation of the current measurement. In this study, we performed unsupervised clustering and supervised prediction to investigate whether a molecular biomarker can be used to identify the ETP status in order to stratify risk groups. We found that the ETP status can be predicted by the expression level of Lymphoid enhancer binding factor 1 (LEF1) with high accuracy (AUC of ROC = 0.957 and 0.933 in two T-ALL cohorts). The patients with ETP subtype have a lower level of LEF1 comparing to the those without ETP. We suggest that incorporating the biomarker LEF1 with traditional immune-phenotyping will improve the diagnosis of ETP.

Wang, Zhang, and Bertolini: Low LEF1 expression is a biomarker of early T-cell precursor, an aggressive subtype of T-cell lymphoblastic leukemia

Introduction

Early T-cell precursor (ETP) is the only subtype of acute T-cell lymphoblastic leukemia (T-ALL) listed in the World Health Organization (WHO) classification of myeloid neoplasms and acute leukemia [1]. It is defined by immune-phenotyping as lack of CD1a and CD8, weak expression of CD5 and positive for one or more of the myeloid/stem cell markers [2]. The unique phenotype represents the characteristics of immature T-cells which correspond to the early stage of normal T-cell development. Contrary to “matured” T-ALLs, patients with ETP have worse disease outcomes [25]. Nowadays, most of the T-ALLs can be treated. However, the relapse rate is observed as up to 40% [6]. Targeting the specific ETP sub-group which has a higher relapse rate is crucial for treating T-ALLs.

As one of the routine examinations, the expression of relevant cell surface markers was measured by the flow cytometry [7]. However, the diagnosis can be varied depending on different pathologists and/or different parameters, for instance, the voltage setting. In order to improve the reliability of the diagnosis, Khogeer et al. developed a scoring system based on 11 surface markers to re-define the ETP subgroup [7]. Still, it is solely based on the measurement of flow cytometry. In this study, we investigate whether a molecular biomarker can be used to identify the ETP status in order to stratify risk groups.

Results and discussion

Inconsistent classification of ETP status yield by different classifiers

Currently, the most well accepted classification of ETP status was defined by immune phenotype proposed by Coustan-Smith et al. in 2009 [2]. According to their study, the sub-group of patients was discovered by unsupervised clustering using 35 differentially expressed genes in mice thymic early T-cell precursors. Patients in that group manifested a unique molecular character. Since then, this immune phenotype has been accepted as the definition of ETP [8]. How the ETP subtype was discovered indicates that the classification by immune-phenotyping and clustering by gene expression profile should reach an agreement. To evaluate if and to what extent the previously reported clustering methods could identify ETP status, we performed unsupervised clustering in two T-ALL datasets (TARGET T-ALL [9] and GSE42328 [10]). The characteristics of the two datasets were described in Materials and Method below.

Clustering by the whole transcriptome and by the 35-gene ETP signature were conducted on the two T-ALL cohorts (TARGET and GSE42328). Clustering based on the whole transcriptome is aiming to investigate whether distinct subtypes with molecular characteristics exist, and clustering based on the 35 ETP signature genes is to identify the ETP group in patients [2]. The subjects were divided into two clusters, but inconsistent classifications were yield by different approaches (Fig 1). Approximately 10–20% of T-ALLs were classified as ETP (n = 19, 10.05% in TARGET, n = 10, 18.87% in GSE42328) based on the immune-phenotype. However, more than half of the samples were predicted as ETP (n = 100, 60.61% in TARGET; n = 32, 60.38% in GSE42328) when clustering using the whole transcriptome, while 62 patients (37.58%) were predicted as ETP in TARGET and 34 (64.15%) in GSE32428 when clustering using the 35-gene panel. Thus, the classification of ETP by immune-phenotyping is inconsistent with that by unsupervised clustering of gene expression profile. which deviates from the original finding [8]. It indicates that the ETP status cannot be correctly classified by unsupervised clustering approaches using either the whole transcriptome or the 35-gene ETP signature.

Classifications of ETP by immune-phenotype, transcriptome, and 35-gene ETP signature genes in TARGET T-ALL (A) and GSE42328 (B) datasets.
Fig 1
Each column is a patient and each row represents a classification method. Patients were stratified to ETP (dark color) or nonETP (light color) groups based on immune phenotyping (first row), the whole transcriptome clustering (second row) or the 35-gene ETP signature clustering (third row), respectively.Classifications of ETP by immune-phenotype, transcriptome, and 35-gene ETP signature genes in TARGET T-ALL (A) and GSE42328 (B) datasets.

A potential biomarker of ETP status—LEF1

We then developed a lasso regression model to evaluate whether the ETP status could be predicted correctly based on transcriptomic data. The prediction accuracy of the model is first-rate that the area under the curve (AUC) of the receiver operating characteristic (ROC) achieved 0.957 (confidence interval (CI): 0.946–0.969, Fig 2A). It is concluded that the immune-phenotype defined ETP status can be predicted by gene expression profile.

(A) AUC of ROC, predictions based on transcriptome against ETP status in the TARGET T-ALL dataset; (B) The top 10 predictors with non-zero coefficients in the 100 rounds of outer CV; (C) The coefficients of the top 10 predictors in the 100 rounds of outer CV. The y-axis is the absolute value of coefficient; (D) AUC of ROC, individual-gene expression against ETP status in the TARGET T-ALL dataset; (E) AUC of ROC, LEF1 expression against T-ALL subtypes (early immature, cortical/mature) in the GSE42328 dataset; (F) the expression level of LEF1 in T-ALL subtypes in the GSE42328 dataset. AUC, the area under the curve; ROC, the receiver operating characteristic; ETP, early T-cell precursor.
Fig 2
(A) AUC of ROC, predictions based on transcriptome against ETP status in the TARGET T-ALL dataset; (B) The top 10 predictors with non-zero coefficients in the 100 rounds of outer CV; (C) The coefficients of the top 10 predictors in the 100 rounds of outer CV. The y-axis is the absolute value of coefficient; (D) AUC of ROC, individual-gene expression against ETP status in the TARGET T-ALL dataset; (E) AUC of ROC, LEF1 expression against T-ALL subtypes (early immature, cortical/mature) in the GSE42328 dataset; (F) the expression level of LEF1 in T-ALL subtypes in the GSE42328 dataset. AUC, the area under the curve; ROC, the receiver operating characteristic; ETP, early T-cell precursor.

To investigate which gene and to what extent it contributes to the prediction model, we looked into the features which were selected in the 100 rounds of outer cross-validation. The top 10 predicters were listed in Fig 2B. The coefficient of LEF1, CD5, HOXC10, and RSPO4 were high comparing to the other top predictors (Fig 2C). To predict the ETP status by expression level of individual genes, LEF1, CD5 and OGN achieved high accuracy in terms of AUC of ROC (Fig 2D).

Lymphoid enhancer binding factor 1 (LEF1) is the top player, which was selected 94 times out of 100 (Fig 2B). LEF1 is one of the most contributed variables. The coefficient of LEF1 ranged from –0.826 to –0.012 within the 94 times (Fig 2C). This gene is an important transcription factor in T-cell development and malignancy [11, 12]. It is also one of the most differentially expressed genes (DEGs) between ETP and nonETP in TARGET T-ALL dataset (S1 Table). There were 8761 genes out of 33038 differentially expressed and LEF1 was ranked 12th in the DEGs. Patients with ETP have a lower expression of LEF1 comparing to patients with “matured” T-cells.

We further tested whether LEF1 could predict the ETP status in the GSE42328 dataset. The prediction of ETP status by LEF1 expression level reached 0.933 (CI: 0.858–1, Fig 2E). The expression level of LEF1 in patients with early immature subtype is significantly lower than the cortical/mature group (Fig 2F). LEF1 was also found differentially expressed between ETP and nonETP subtypes in two previous Microarray datasets (GSE28703 and GSE8879) [2, 13].

To quantify the prediction accuracy in terms of sensitivity and specificity, we considered the identification of ETP by immune-phenotyping as the golden standard. The sensitivity (0.9 in TARGET, 1 in GSE42328) and specificity (1 in TARGET and 0.8 in GSE42328) of the prediction by LEF1 expression level are higher than either by the whole transcriptome clustering or by the 35-gene signature clustering in both TARGET T-ALL and GSE42328 (Table 1).

Table 1
The accuracy (sensitivity and specificity) of predicting the ETP status (defined by immune markers).
SensitivitySpecificity
TARGETcluster10.45
35-gene cluster10.71
LEF10.91
GSE42328cluster0.60.4
35-gene cluster0.30.6
LEF110.8

It is known that LEF1 is essential for T- and B-cell differentiation and lineage determination [1416]. Yu et al. demonstrated that Tcf1 and Lef1 transcription factors are intrinsic required in leukemic stem cells (LSCs) self-renewal [17]. It indicates that LEF1 plays a key role in T-ALL tumorigenesis and provides an important biological evidence to support our hypothesis that expression level of LEF1 could predict the ETP status. In this study, the high-risk subtype of T-ALL showed a lower expression of LEF1. Jia et al. also found that LEF1 expression is a favorable prognostic factor in a pediatric cohort with 94 B-ALLs and 28 T-ALLs [18]. To the contrast, the expression of LEF1 in solid tumors is higher than the corresponding normal tissue. The overexpression of LEF1 was observed in most of the solid tumors (15 out of 24 types in the TCGA Pancancer database, S2 Table). Moreover, overexpression of LEF1 was associated with worse overall survival in adrenocortical cancer, kidney clear cell carcinoma, kidney papillary cell carcinoma, rectum adenocarcinoma, and uveal melanoma (S3 Table). However, in thymoma, a negative association with the LEF1 expression level and overall survival rate was observed. It is interesting that thymus is the organ where the development of T- and B-cell takes place (S3 Table). Giambra et al. reported that the activity of leukemia stem cells is dependent on Lef1 in the Wnt-active subpopulation of T-ALL [19]. In addition, it has also been found that low expression of LEF1 is associated with worse survival in acute myeloid leukemia [20]. Overall, it suggests that LEF1 plays different roles in solid tumorigenesis and lymphoid malignancies. The underlying mechanism needs further investigation.

Conclusions

In this study, we investigated the molecular characteristics of the risky subtype of T-ALL–ETP. The current routine diagnosis of ETP is vague. We found that the ETP status can be predicted by the expression level of LEF1 with high accuracy. We propose that incorporating the biomarker LEF1 with traditional immune-phenotyping will improve the diagnosis of ETP.

Materials and methods

Study subjects

The TARGET T-ALL (n = 165, children and adolescent) dataset was used as the training set to develop the prediction model. Gene expression and clinical data were retrieved from Therapeutically Applicable Research To Generate Effective Treatments (TARGET, https://ocg.cancer.gov/programs/target, dbGaP ID: phs000464). The complete data of TARGET T-ALL in the original study were composed of 264 children and young adults diagnosed with T-ALL [9]. 190 cases had immune phenotypes in terms of ETP status, 25 out of which were labelled as "nearETP" and excluded due to the ambiguous definition, resulting in 165 cases used in this study. We compared the demographic and clinical characteristics of the selected samples to those in the original publication (Table 2). There are 33,038 genes totally in our dataset. The gene expression data were normalized using Trimmed Mean of M-values (TMM) [21] and log2 transformed.

Table 2
The demographic and basic clinical characteristics of the samples selected from TARGET T-ALL.
Original study (n = 265)Used this study (n = 165)
Age (Median, range)9 (1–29)9 (1–22)
Gender (male, n, %)202 (76.23%)122 (73.94%)
WBC (mean)167.54163.3
Bone marrow blasts at diagnosis (mean)91.5691.51
ETP status
 ETP (n)1919
 nonETP (n)146146
 nearETP (n)25NA
 Unknown (n)75NA

The GSE42328 adult T-ALL (n = 53) was the validation dataset. The GSE42328 dataset had 53 adult T-ALLs. The clinical characteristics of the samples were well described in their original study [10]. Data was downloaded from Gene Expression Omnibus (GEO, https://www.ncbi.nlm.nih.gov/geo/). In total, 47,212 probes were measured representing 31324 genes. The value of gene expression provided were quantile normalized and log2 transformed.

The TCGA pan-cancer dataset was used to investigate the association of the expression level of LEF1 with survival in 32 types of tumor. Data were downloaded from Pan-Cancer Atlas Hub at UCSC Xena platform (https://pancanatlas.xenahubs.net) [22]. In total, 9110 samples were selected with sample type labeled as primary tumor. The RNA sequencing data were batch-effects normalized and log2 transformed.

All the data in this study are available in public domains. Ethical approval is not applicable for this study.

Unsupervised clustering

Clustering by the whole transcriptome and by the 35-gene ETP signature were conducted respectively on the two T-ALL cohorts (TARGET and GSE42328). For the clustering by the whole 75 transcriptome, the top 5,000 most variable genes were selected [23]. For the clustering by 35-gene ETP signature, only the 35 genes (or the corresponding probes) were selected [2]. Hierarchy clustering with Pearson’s correlation coefficient was performed 10,000 times to achieve a consensus [24]. The procedure was conducted by R package ConsensusClusterPlus v1.48 [23].

Supervised prediction

A regression model that uses L1-regularization (lasso)[25] was applied to fit the data and to predict the ETP status by transcriptomic gene expression data (RNA sequencing). Nested cross validation (CV) was conducted to evaluate the model performance. For outer CV, 80% of the patients (132 out of 165) were randomly assigned to the training set and the rest 20% (33 out of 165) to the test set. The outer CV was run 100 times. For inner CV, 5-fold CV was performed to optimize the tuning parameter lambda. The TARGET T-ALL (n = 165) dataset was used to develop the prediction model and GSE42328 adult T-ALL (n = 53) was the validation dataset. The prediction was conducted by R package glmnet v2.0 [26].

Receiver operating characteristic (ROC) curves of prediction against true ETP status were constructed. The area under the ROC curve (AUC) and 95% confidence interval (CI) were generated to compare model performance. The AUCs of ROC curves were compared by the DeLong test [27]. The decision boundary was determined at the point closest to the top-left part of the ROC curve using the R package pROC v1.15 [28]. Individuals were predicted as ETP if their predicted probability was larger than or equal to the cut-off point; otherwise, they were classified as nonETP.

Other statistics

Gene expression data (RNA sequencing) were compared between with ETP and without ETP to determine differentially expressed genes (DEGs). Differential expression analysis was conducted by R package DESeq2 v1.24 [29]. To test if LEF1 was abnormally expressed in tumor, nonparametric Wilcoxon two-group comparison was conducted in each tumor type in the TCGA Pan-Cancer dataset. Cox regression [30] was used to test the association between LEF1’s expression level and overall survival in each tumor type.

To quantify the prediction accuracy in terms of sensitivity and specificity, we considered the identification of ETP by immune-phenotyping as the golden standard. Sensitivity is calculated as the proportion of patients with immune positive that are correctly identified by other methods (clustering or LEF1 expression level). Specificity is calculated as the proportion of patients with immune negative that are correctly identified as such.

References

1 

DA Arber, A Orazi, R Hasserjian, J Thiele, MJ Borowitz, MM Le Beau, et al. The 2016 revision to the World Health Organization classification of myeloid neoplasms and acute leukemia. Blood. 2016;127(20):, pp.2391–405. Epub 2016/04/14. , doi: 10.1182/blood-2016-03-643544 .

2 

E Coustan-Smith, CG Mullighan, M Onciu, FG Behm, SC Raimondi, D Pei, et al. Early T-cell precursor leukaemia: a subtype of very high-risk acute lymphoblastic leukaemia. Lancet Oncol. 2009;10(2):, pp.147–56. , doi: 10.1016/S1470-2045(08)70314-0

3 

M Ma, X Wang, J Tang, H Xue, J Chen, C Pan, et al. Early T-cell precursor leukemia: a subtype of high risk childhood acute lymphoblastic leukemia. Front Med. 2012;6(4):, pp.416–20. Epub 2012/10/16. , doi: 10.1007/s11684-012-0224-4 .

4 

J Bond, C Graux, L Lhermitte, D Lara, T Cluzeau, T Leguay, et al. Early Response-Based Therapy Stratification Improves Survival in Adult Early Thymic Precursor Acute Lymphoblastic Leukemia: A Group for Research on Adult Acute Lymphoblastic Leukemia Study. J Clin Oncol. 2017;35(23):, pp.2683–91. Epub 2017/06/13. , doi: 10.1200/JCO.2016.71.8585 .

5 

N Jain, AV Lamb, S O’Brien, F Ravandi, M Konopleva, E Jabbour, et al. Early T-cell precursor acute lymphoblastic leukemia/lymphoma (ETP-ALL/LBL) in adolescents and adults: a high-risk subtype. Blood. 2016;127(15):, pp.1863–9. , doi: 10.1182/blood-2015-08-661702

6 

CH Pui, LL Robison, AT Look. . Acute lymphoblastic leukaemia. Lancet. 2008;371(9617):, pp.1030–43. Epub 2008/03/25. , doi: 10.1016/S0140-6736(08)60457-2 .

7 

H Khogeer, H Rahman, N Jain, EA Angelova, H Yang, A Quesada, et al. Early T precursor acute lymphoblastic leukaemia/lymphoma shows differential immunophenotypic characteristics including frequent CD33 expression and in vitro response to targeted CD33 therapy. British Journal of Haematology. 2019;0(0). , doi: 10.1111/bjh.15960

8 

JW Vardiman, J Thiele, DA Arber, RD Brunning, MJ Borowitz, A Porwit, et al. The 2008 revision of the World Health Organization (WHO) classification of myeloid neoplasms and acute leukemia: rationale and important changes. Blood. 2009;114(5):, pp.937–51. Epub 2009/04/10. , doi: 10.1182/blood-2009-03-209262 .

9 

Y Liu, J Easton, Y Shao, J Maciaszek, Z Wang, MR Wilkinson, et al. The genomic landscape of pediatric and young adult T-lineage acute lymphoblastic leukemia. Nat Genet. 2017;49(8):, pp.1211–8. Epub 2017/07/04. , doi: 10.1038/ng.3909

10 

P Van Vlierberghe, A Ambesi-Impiombato, K De Keersmaecker, M Hadler, E Paietta, MS Tallman, et al. Prognostic relevance of integrated genetic profiling in adult T-cell acute lymphoblastic leukemia. Blood. 2013;122(1):, pp.74–82. Epub 2013/05/21. , doi: 10.1182/blood-2013-03-491092

11 

S Yu, X Zhou, C Steinke Farrah, C Liu, S-C Chen, O Zagorodna, et al. The TCF-1 and LEF-1 Transcription Factors Have Cooperative and Opposing Roles in T Cell Development and Malignancy. Immunity. 2012;37(5):, pp.813–26. , doi: 10.1016/j.immuni.2012.08.009

12 

FJT Staal, J M. Sen. . The canonical Wnt signaling pathway plays an important role in lymphopoiesis and hematopoiesis. European Journal of Immunology. 2008;38(7):, pp.1788–94. , doi: 10.1002/eji.200738118

13 

A Gutierrez, A Kentsis, T Sanda, L Holmfeldt, S-C Chen, J Zhang, et al. The BCL11B tumor suppressor is mutated across the major molecular subtypes of T-cell acute lymphoblastic leukemia. Blood, The Journal of the American Society of Hematology. 2011;118(15):, pp.4169–73.

14 

L Santiago, G Daniels, D Wang, FM Deng, P Lee. . Wnt signaling pathway protein LEF1 in cancer, as a biomarker for prognosis and a target for treatment. Am J Cancer Res. 2017;7(6):, pp.1389–406. Epub 2017/07/04.

15 

T Reya, M O’Riordan, R Okamura, E Devaney, K Willert, R Nusse, et al. Wnt signaling regulates B lymphocyte proliferation through a LEF-1 dependent mechanism. Immunity. 2000;13(1):, pp.15–24. , doi: 10.1016/s1074-7613(00)00004-2

16 

BN Weber, AW-S Chi, A Chavez, Y Yashiro-Ohtani, Q Yang, O Shestova, et al. A critical role for TCF-1 in T-lineage specification and differentiation. Nature. 2011;476(7358):, pp.63–8. , doi: 10.1038/nature10279

17 

S Yu, F Li, S Xing, T Zhao, W Peng, H-H Xue. . Hematopoietic and leukemic stem cells have distinct dependence on Tcf1 and Lef1 transcription factors. Journal of Biological Chemistry. 2016;291(21):, pp.11148–60. , doi: 10.1074/jbc.M116.717801

18 

M Jia, H-Z Zhao, H-P Shen, Y-P Cheng, Z-B Luo, S-S Li, et al. Overexpression of lymphoid enhancer-binding factor-1 (LEF1) is a novel favorable prognostic factor in childhood acute lymphoblastic leukemia. International journal of laboratory hematology. 2015;37(5):, pp.631–40. , doi: 10.1111/ijlh.12375

19 

V Giambra, S Gusscott, D Gracias, R Song, AP Weng. . Lef1 Is a Critical Mediator of Wnt/β-Catenin Signaling in T-Cell Acute Lymphoblastic Leukemia (T-ALL). Blood. 2016;128(22):, pp.5083.

20 

Y Fu, H Zhu, W Wu, J Xu, T Chen, B Xu, et al. Clinical significance of lymphoid enhancer-binding factor 1 expression in acute myeloid leukemia. Leukemia & Lymphoma. 2014;55(2):, pp.371–7. , doi: 10.3109/10428194.2013.805759

21 

MD Robinson, A Oshlack. . A scaling normalization method for differential expression analysis of RNA-seq data. Genome Biol. 2010;11(3):, pp.R25, doi: 10.1186/gb-2010-11-3-r25

22 

M Goldman, B Craft, M Hastie, K Repečka, F McDade, A Kamath, et al. The UCSC Xena platform for public and private cancer genomics data visualization and interpretation. bioRxiv. 2019:326470. , doi: 10.1101/326470

23 

MD Wilkerson, DN Hayes. . ConsensusClusterPlus: a class discovery tool with confidence assessments and item tracking. Bioinformatics. 2010;26(12):, pp.1572–3. , doi: 10.1093/bioinformatics/btq170

24 

S Monti, P Tamayo, J Mesirov, T Golub. . Consensus clustering: a resampling-based method for class discovery and visualization of gene expression microarray data. Mach Learn. 2003;52(1–2):, pp.91–118.

25 

R Tibshirani. . Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society: Series B (Methodological). 1996;58(1):, pp.267–88.

26 

J Friedman, T Hastie, R Tibshirani. . Regularization Paths for Generalized Linear Models via Coordinate Descent. J Stat Softw. 2010;33(1):, pp.1–22. Epub 2010/09/03.

27 

ER DeLong, DM DeLong, DL Clarke-Pearson. . Comparing the areas under two or more correlated receiver operating characteristic curves: a nonparametric approach. Biometrics. 1988;44(3):, pp.837–45. Epub 1988/09/01. .

28 

X Robin, N Turck, A Hainard, N Tiberti, F Lisacek, JC Sanchez, et al. pROC: an open-source package for R and S+ to analyze and compare ROC curves. BMC Bioinformatics. 2011;12:, pp.77, doi: 10.1186/1471-2105-12-77

29 

MI Love, W Huber, S Anders. . Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2. Genome Biol. 2014;15(12):, pp.550, doi: 10.1186/s13059-014-0550-8

30 

DR Cox. . Regression models and life‐tables. Journal of the Royal Statistical Society: Series B (Methodological). 1972;34(2):, pp.187–202.


30 Jan 2020

PONE-D-20-01174

low LEF1 expression is a biomarker of early T-cell precursor, an aggressive subtype of T-cell lymphoblastic leukemia

PLOS ONE

Dear Dr. Wang,

Thank you for submitting your manuscript to PLOS ONE. After careful consideration, we feel that it has merit but does not fully meet PLOS ONE’s publication criteria as it currently stands. Therefore, we invite you to submit a revised version of the manuscript that addresses the points raised during the review process by both Reviewers, experts in the field.

We would appreciate receiving your revised manuscript by Mar 14 2020 11:59PM. When you are ready to submit your revision, log on to https://www.editorialmanager.com/pone/ and select the 'Submissions Needing Revision' folder to locate your manuscript file.

If you would like to make changes to your financial disclosure, please include your updated statement in your cover letter.

To enhance the reproducibility of your results, we recommend that if applicable you deposit your laboratory protocols in protocols.io, where a protocol can be assigned its own identifier (DOI) such that it can be cited independently in the future. For instructions see: http://journals.plos.org/plosone/s/submission-guidelines#loc-laboratory-protocols

Please include the following items when submitting your revised manuscript:

    A rebuttal letter that responds to each point raised by the academic editor and reviewer(s). This letter should be uploaded as separate file and labeled 'Response to Reviewers'.
    A marked-up copy of your manuscript that highlights changes made to the original version. This file should be uploaded as separate file and labeled 'Revised Manuscript with Track Changes'.
    An unmarked version of your revised paper without tracked changes. This file should be uploaded as separate file and labeled 'Manuscript'.

Please note while forming your response, if your article is accepted, you may have the opportunity to make the peer review history publicly available. The record will include editor decision letters (with reviews) and your responses to reviewer comments. If eligible, we will contact you to opt in or out.

We look forward to receiving your revised manuscript.

Kind regards,

Francesco Bertolini, MD, PhD

Academic Editor

PLOS ONE

Journal requirements:

When submitting your revision, we need you to address these additional requirements.

1. Please ensure that your manuscript meets PLOS ONE's style requirements, including those for file naming. The PLOS ONE style templates can be found at

http://www.journals.plos.org/plosone/s/file?id=wjVg/PLOSOne_formatting_sample_main_body.pdf and http://www.journals.plos.org/plosone/s/file?id=ba62/PLOSOne_formatting_sample_title_authors_affiliations.pdf

2. At this time, we ask that you please provide the specific URL links of the public datasets, GSE42328 and TARGET T-ALL, used in your study.

[Note: HTML markup is below. Please do not edit.]

Reviewers' comments:

Reviewer's Responses to Questions

Comments to the Author

1. Is the manuscript technically sound, and do the data support the conclusions?

The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented.

Reviewer #1: Partly

Reviewer #2: Partly

**********

2. Has the statistical analysis been performed appropriately and rigorously?

Reviewer #1: I Don't Know

Reviewer #2: Yes

**********

3. Have the authors made all data underlying the findings in their manuscript fully available?

The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified.

Reviewer #1: Yes

Reviewer #2: Yes

**********

4. Is the manuscript presented in an intelligible fashion and written in standard English?

PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here.

Reviewer #1: No

Reviewer #2: Yes

**********

5. Review Comments to the Author

Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters)

Reviewer #1: Wang and Zhang report in silicon data pointing towards the differential diagnostic potential of LEF1 expression to detect early precursor T-ALL, in which it is expected to be reduced.

First of all the paper suffers from the at many occasions poor and difficult to understand written English that needs editing.

Second, since the paper will primarily be percepted by MD who are not that aware of the biostatistical methods used in silicon, proper citation of papers describing what exactly has been done and its feasibility should be considered.

I am not sure that statements as this given on lines 94-95 are per se deducible from the databases used and are justified.

Many additional statements are poorly supported by numeric data in the current paper version. Figure 1 is not telling a lot without more explanatory notes in the legend, actually I was not able to link the data on lines 108-112 to the optics of the figure.

Finally, the results need tissue validation on 5 biopsies of early precursor T-ALL in comparison with 10 more "maturated" cases by means of immunohistochemistry with the nicely working and several times reported antibody.

The discussion on the role of LEF1 in tumourigenesis does not add anything to the message of the paper and may be omitted.

Reviewer #2: Wang and Zhang in this research article investigated, through a clustering approach, if different methods used to diagnose Early T-cell Precursor (ETP) on Acute T-Cell Lymphoblastic Leukemia (T-ALL) are consistent between them. Moreover, they applied a prediction model in order to validate the goodness of LEF1 as molecular biomarker in detecting the status of ETP and therefore stratifying a risk group, moreover they used this approach to find other molecular biomarkers that might be useful to improve the diagnosis of ETP.

They found that the methods used to diagnose ETP not always overlap between them and that LEF1 marker has high accuracy in explaining the ETP status, finally they produced interesting and comparable results to LEF1 using other biomarkers.

The purposes of the study are interesting and well explained, but I was wondering if the authors could give a better explanation I) of the study subject used II) of the computational methods used, and III) of the results obtained. These three points are fundamental to make the study suitable for publication because they will manifest the accuracy of the methodology used and they will give the tools to reproduce the analyses and to understand them.

For this reason I would not consider the paper in its present form suitable for publication, but it shows sufficient potential to be reconsidered if it will be substantial revised in the major and minor points as I indicated below.

Major points:

I am suggesting a substantial extension of the method section with a better explanation of the parameters used and the reason why the authors opted for these specific unsupervised clustering and prediction model. I would also suggest to use, beside the one already used, another unsupervised clustering method to validate the results obtained.

Additionally I found some results not properly explained and improvable with some changes in the text and in the figures.

Specifically,

- In the study subjects section, the authors refer to a TARGET T-ALL dataset which includes 165 samples. I suggest to explain the characteristics of this dataset and why the authors preferred this data among the others available. In order to do this the author can report the website and the definition of the acronym already at the beginning of this section.

Additionally, looking at the TARGET publication guidelines of this dataset (https://ocg.cancer.gov/programs/target/target-publication-guidelines) the “TARGET data are available without restrictions on their use in publications or presentations, with the exception of the integrated Acute Lymphoblastic Leukemia (ALL) dataset.” and the “Investigators may only publish an ALL manuscript before the TARGET project team has published their global analysis on that tumor type IF the publication uses a very limited dataset (less than 5 genes) or the author has received written approval from the appropriate TARGET disease project team leaders.”. At this point, since this study is focused on the ALL dataset I was wondering if either this study used only 5 genes or the authors obtained the written approval to use this data. Unfortunately both of these information are missing in the current version of the manuscript.

Moreover, in this study 53 individuals has been taken from GEO with the ID GSE42328 but the reference from where these data have been generated which is “Van Vlierberghe P, Ambesi-Impiombato A, De Keersmaecker K, Hadler M et al. Prognostic relevance of integrated genetic profiling in adult T-cell acute lymphoblastic leukemia. Blood 2013 Jul 4;122(1):74-82. PMID: 23687089”, is not indicated, please cite it and add it to the bibliography.

Finally, please report the number of genes (total, median, mean of each of the dataset) and be more specific about the normalization and logarithmic transformation performed.

- In the unsupervised clustering method section it is not reported how the selection of the top 5,000 most variable genes occurred, please add also the reason why the author chose these number, a reference might help.

Additionally, please add a reference that justifies the use of Pearson’s correlation coefficient for “10,000 times to achieve a consensus” or extend this part in order to give the reader a better explanation of the methods employed. As indicated above I also suggest to apply another clustering method to confirm these results.

- In the prediction model section the authors report that the “80% of the patients were randomly assigned to the training set and rest 20% to the test set”, I find disagreement with the numbers indicated in the previous sections: 53 (test-set in the study subjects section) is not the 20% of 218 (165 of TARGET + 53 of GSE4328); moreover, even if it was, the 20% were not randomly chosen because they belong to the same group/experiment. Please, clarify this and include references and a better explanation of the parameters used in here.

- Line 124-125 I see from the plot that the two highest accuracy values in terms of AUC of ROC are LEF1 and OGN not LE1 and CD5 as indicated in the manuscript, please, clarify this part.

- Line 126-128 I noticed from Fig.2 that RSPO4 has higher median (the highest) than LEF1, please comment on this result go into more details.

- Lines 129-132 need a better explanation; is there a Supp. Fig or Table that explain the results of DEG? Please reformulate this sentence and, if it is the case, add the materials necessary to understand it.

- Line 133-136 the authors report that: “the prediction of subtype by LEF1 expression level reached 0.933 (CI: 0.858-1, Figure 2E).” what about the other prediction subtypes? A better explanation of this is required to understand this part.

Minor point:

- Line 60, please insert a reference for all the information reported.

- Lines 63-64, in the sentence “based on the old technology”, please indicate the technology the authors are referring and insert a reference.

- Lines 74-75, it is true that the data do not require Ethical approval but some of them are of public domain upon a written consent (see above).

- Please, reformulate the line 94-95 and indicate the figure to which the authors are referring, is it Fig 1?

- Lines 100-101 require a reference when is reported that the immune phenotype is accepted as definition of ETP.

- Lines 102-103 need to be reformulated it is not clear.

- Line 105, The sentence ”ETP signature were conducted respectively on the T-ALL cohort with 165 patients.” is incomplete, was the study also conducted using the test group of 53? Since this missing part the lines 105-107 are not clear.

- If the authors want to keep the text as in line 122-125 they have to invert the panels in Figure 2, the D has to be after the C.

- Line 138, the citation Zhao et al. is not present as a reference in the bibliography section.

- Line 146, please add a reference to the statement which says LEF1 influences the overall survival rate in Thymoma.

- Please extend Figure 1 caption with a better explanation of the plot and add a legend that explain the meaning of the shaded and not shaded colors.

- The colors in Figure 2C are not consistent with the ones of other panels of Figure 2.

**********

6. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #1: No

Reviewer #2: No

[NOTE: If reviewer comments were submitted as an attachment file, they will be attached to this email and accessible via the submission site. Please log into your account, locate the manuscript record, and check for the action link "View Attachments". If this link does not appear, there are no attachment files to be viewed.]

While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com/. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Registration is free. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email us at figures@plos.org. Please note that Supporting Information files do not need this step.


20 Mar 2020

Reviewer #1: Wang and Zhang report in silicon data pointing towards the differential diagnostic potential of LEF1 expression to detect early precursor T-ALL, in which it is expected to be reduced.

First of all the paper suffers from the at many occasions poor and difficult to understand written English that needs editing.

Thanks for the reviewer’s advice. The revised version is polished thoroughly.

Second, since the paper will primarily be percepted by MD who are not that aware of the biostatistical methods used in silicon, proper citation of papers describing what exactly has been done and its feasibility should be considered.

I am not sure that statements as this given on lines 94-95 are per se deducible from the databases used and are justified.

Thanks for the reviewer’s comments. The evidence which support the statement has been further extended and well explained.

The two paragraphs following lines 94-95 were the evidence to clarify and support the statement. The first paragraph introduces the ETP classification by immune phenotyping which is widely used in clinical setting. The second paragraph describes the classification based on gene expression profile. In the revision, we moved the statement after the two paragraphs.

To explain in more details, as shown in Figure 1, the classification of ETP by immune-phenotyping (labeled “ETP-status” in Figure 1A and 1B) is inconsistent with that by unsupervised clustering of gene expression profile (labeled “Cluster” and “Cluster-35genes” in Figure 1A and 1B). Each column is a patient and rows represent the classification methods. Dark colors (dark red, orange, and dark blue,) mean the patient is classified to ETP group. To the contrast, light colors (pink, yellow, and light blue) mean the patient is classified to non-ETP group. If the same patient shows dark colors in all three rows, it means the patient is consistently diagnosed as ETP-ALL. However, this is not the case for more than half of the patients.

If we consider classification by immune phenotyping as a golden standard, then the other two methods have failed in many cases to classify the patients into the correct group.

A more extensive explanation of Figure 1 is added to the legend.

Many additional statements are poorly supported by numeric data in the current paper version. Figure 1 is not telling a lot without more explanatory notes in the legend, actually I was not able to link the data on lines 108-112 to the optics of the figure.

We have extended the figure legend to explain Figure 1 in more details. We hope the reply above provides a better explanation to Figure 1.

Finally, the results need tissue validation on 5 biopsies of early precursor T-ALL in comparison with 10 more "maturated" cases by means of immunohistochemistry with the nicely working and several times reported antibody.

We appreciate the suggestion about validation in wet lab. Unfortunately, it is not feasible for this study and beyond our scope as being bioinformaticians. Moreover, the lower expression level of LEF1 in ETP comparing to maturated cases has been confirmed in many Microarray studies including GSE28703 (Gutierrez et al. 2011) and GSE8879 (Coustan-Smith et al. 2009).

Coustan-Smith E, Mullighan CG, Onciu M, Behm FG, Raimondi SC, Pei D, et al. Early T-cell precursor leukaemia: a subtype of very high-risk acute lymphoblastic leukaemia. Lancet Oncol. 2009;10(2):147-56.

Gutierrez A, Kentsis A, Sanda T, Holmfeldt L, Chen S-C, Zhang J, et al. The BCL11B tumor suppressor is mutated across the major molecular subtypes of T-cell acute lymphoblastic leukemia. Blood, The Journal of the American Society of Hematology. 2011;118(15):4169-73.

The discussion on the role of LEF1 in tumourigenesis does not add anything to the message of the paper and may be omitted.

In this study, we found that LEF1 could predict the ETP status of T-ALL. This finding indicates that LEF1 may play an important role in the early stage of T-ALL development. The discussion on the role of LEF1 provides an important biological evidence to support our hypothesis derived from bioinformatic analyses. Thus, we maintained the discussion in this revision.

Reviewer #2: Wang and Zhang in this research article investigated, through a clustering approach, if different methods used to diagnose Early T-cell Precursor (ETP) on Acute T-Cell Lymphoblastic Leukemia (T-ALL) are consistent between them. Moreover, they applied a prediction model in order to validate the goodness of LEF1 as molecular biomarker in detecting the status of ETP and therefore stratifying a risk group, moreover they used this approach to find other molecular biomarkers that might be useful to improve the diagnosis of ETP.

They found that the methods used to diagnose ETP not always overlap between them and that LEF1 marker has high accuracy in explaining the ETP status, finally they produced interesting and comparable results to LEF1 using other biomarkers.

The purposes of the study are interesting and well explained, but I was wondering if the authors could give a better explanation I) of the study subject used II) of the computational methods used, and III) of the results obtained. These three points are fundamental to make the study suitable for publication because they will manifest the accuracy of the methodology used and they will give the tools to reproduce the analyses and to understand them.

We appreciate the reviewer’s inputs. Your comments are valuable to our study. We have provided more explanations in this revision for the three aspects suggested. See responses below.

For this reason I would not consider the paper in its present form suitable for publication, but it shows sufficient potential to be reconsidered if it will be substantial revised in the major and minor points as I indicated below.

Major points:

I am suggesting a substantial extension of the method section with a better explanation of the parameters used and the reason why the authors opted for these specific unsupervised clustering and prediction model. I would also suggest to use, beside the one already used, another unsupervised clustering method to validate the results obtained.

Additionally I found some results not properly explained and improvable with some changes in the text and in the figures.

Specifically,

- In the study subjects section, the authors refer to a TARGET T-ALL dataset which includes 165 samples. I suggest to explain the characteristics of this dataset and why the authors preferred this data among the others available. In order to do this the author can report the website and the definition of the acronym already at the beginning of this section.

Additionally, looking at the TARGET publication guidelines of this dataset (https://ocg.cancer.gov/programs/target/target-publication-guidelines) the “TARGET data are available without restrictions on their use in publications or presentations, with the exception of the integrated Acute Lymphoblastic Leukemia (ALL) dataset.” and the “Investigators may only publish an ALL manuscript before the TARGET project team has published their global analysis on that tumor type IF the publication uses a very limited dataset (less than 5 genes) or the author has received written approval from the appropriate TARGET disease project team leaders.”. At this point, since this study is focused on the ALL dataset I was wondering if either this study used only 5 genes or the authors obtained the written approval to use this data. Unfortunately both of these information are missing in the current version of the manuscript.

We agree with the concerns. More information about the datasets is added in section “Study subjects” in the revised version.

The original study on TARGET T-ALL has been published.

“Liu Y, Easton J, Shao Y, Maciaszek J, Wang Z, Wilkinson MR, McCastlain K, Edmonson M, Pounds SB, Shi L, Zhou X. The genomic landscape of pediatric and young adult T-lineage acute lymphoblastic leukemia. Nature genetics. 2017 Aug;49(8):1211.”

Hence, the TARGET T-ALL data can be used without restriction in publications. We cited the original study in the revision.

Moreover, in this study 53 individuals has been taken from GEO with the ID GSE42328 but the reference from where these data have been generated which is “Van Vlierberghe P, Ambesi-Impiombato A, De Keersmaecker K, Hadler M et al. Prognostic relevance of integrated genetic profiling in adult T-cell acute lymphoblastic leukemia. Blood 2013 Jul 4;122(1):74-82. PMID: 23687089”, is not indicated, please cite it and add it to the bibliography.

Thanks for pointing that out. The reference is added in revision.

Finally, please report the number of genes (total, median, mean of each of the dataset) and be more specific about the normalization and logarithmic transformation performed.

Those are now reported in the section “Study subjects”, as suggested.

- In the unsupervised clustering method section it is not reported how the selection of the top 5,000 most variable genes occurred, please add also the reason why the author chose these number, a reference might help.

Additionally, please add a reference that justifies the use of Pearson’s correlation coefficient for “10,000 times to achieve a consensus” or extend this part in order to give the reader a better explanation of the methods employed. As indicated above I also suggest to apply another clustering method to confirm these results.

Selection a subset of the most variable genes ranges from 1,000 to 5,000 has been seen in many studies. In this study, 5,000 genes were selected as recommended by the writer of R package ConsensusClusterPlus. Citation is added.

We agree with the reviewer that unsupervised clustering could generate unstable results. The consensus clustering algorithm proposed by Monti et al. based on resampling could assess the stability of the true clusters, capture the consensus among several clustering runs. The algorithm subsampling a proportion of samples and a proportion of features (genes) each time. The hierarchy clustering was employed in each round of the sub-sample. This process is repeated for 10,000 times. Then, a final agglomerative hierarchical clustering is generated.

Wilkerson MD, Hayes DN. ConsensusClusterPlus: a class discovery tool with confidence assessments and item tracking. Bioinformatics. 2010;26(12):1572-3.

Monti S, Tamayo P, Mesirov J, Golub T. Consensus clustering: a resampling-based method for class discovery and visualization of gene expression microarray data. Mach Learn. 2003;52(1-2):91-118.

- In the prediction model section the authors report that the “80% of the patients were randomly assigned to the training set and rest 20% to the test set”, I find disagreement with the numbers indicated in the previous sections: 53 (test-set in the study subjects section) is not the 20% of 218 (165 of TARGET + 53 of GSE4328); moreover, even if it was, the 20% were not randomly chosen because they belong to the same group/experiment. Please, clarify this and include references and a better explanation of the parameters used in here.

Sorry for the confusing definitions about the training and test datasets.

Cross-validation was only conducted in the TARGET T-ALL. The purpose of cross-validation is tuning the model parameters and estimating the prediction performance. The TARGET T-ALL dataset was spitted to training set (132, 80% of 165) and test set (33, 20% of 165) for each round of CV.

After the parameters were tuned, the whole TARGET T-ALL was used to fit the final model. Then the model was tested in the GSE42328. That’s why we called the TARGET T-ALL as the “training set”, while the GSE42328 as the “test set” in the submitted version. To avoid this confusion, we defined the GSE42328 dataset as “secondary/external validation dataset” in the revision.

- Line 124-125 I see from the plot that the two highest accuracy values in terms of AUC of ROC are LEF1 and OGN not LE1 and CD5 as indicated in the manuscript, please, clarify this part.

Thanks for pointing that out. The two highest accuracy values should be LEF1 and OGN.

- Line 126-128 I noticed from Fig.2 that RSPO4 has higher median (the highest) than LEF1, please comment on this result go into more details.

In 94 times out of 100 of CV, LEF1 was selected in the model. This is the most important evidence that LEF1 plays a key role to predict the ETP status. While RSPO4 was selected in 48 out of 100 times.

The median value of RSPO4’s coefficient in 48 rounds of CV is slightly higher than LEF1’s in 94 rounds. It indicates that RSPO4 can also be a good candidate to predict ETP status, but not as good as LEF1.

- Lines 129-132 need a better explanation; is there a Supp. Fig or Table that explain the results of DEG? Please reformulate this sentence and, if it is the case, add the materials necessary to understand it.

Supp. Table (Supplementary Table 1) and relevant description (Lines 208 to 212, and lines 338-340) on differential expression analysis are added.

- Line 133-136 the authors report that: “the prediction of subtype by LEF1 expression level reached 0.933 (CI: 0.858-1, Figure 2E).” what about the other prediction subtypes? A better explanation of this is required to understand this part.

The “subtype” was referred to ETP status. We revised that sentence as “The prediction of ETP status by LEF1 expression level reached 0.933”.

Minor point:

- Line 60, please insert a reference for all the information reported.

A reference is added.

- Lines 63-64, in the sentence “based on the old technology”, please indicate the technology the authors are referring and insert a reference.

Manuscript is revised to explicitly state “based on the measurement of flow cytometry”

- Lines 74-75, it is true that the data do not require Ethical approval but some of them are of public domain upon a written consent (see above).

As explained above, data used in this study do not require either ethical approval or written consent.

- Please, reformulate the line 94-95 and indicate the figure to which the authors are referring, is it Fig 1?

Thanks for the comment. Taking the two reviewers’ suggestions together, this section has been re-written.

- Lines 100-101 require a reference when is reported that the immune phenotype is accepted as definition of ETP.

A reference is added.

- Lines 102-103 need to be reformulated it is not clear.

This section has been re-written.

- Line 105, The sentence ”ETP signature were conducted respectively on the T-ALL cohort with 165 patients.” is incomplete, was the study also conducted using the test group of 53? Since this missing part the lines 105-107 are not clear.

This section has been re-written.

- If the authors want to keep the text as in line 122-125 they have to invert the panels in Figure 2, the D has to be after the C.

The two figures are switched.

- Line 138, the citation Zhao et al. is not present as a reference in the bibliography section.

The reference is added.

- Line 146, please add a reference to the statement which says LEF1 influences the overall survival rate in Thymoma.

Supporting information is in supplementary table 2, which is added.

- Please extend Figure 1 caption with a better explanation of the plot and add a legend that explain the meaning of the shaded and not shaded colors.

Figure legend and an extended explanation of Figure 1 are added.

- The colors in Figure 2C are not consistent with the ones of other panels of Figure 2.

The colors are adjusted to be consistent with Figure 2B and 2C.

Submitted filename: Response_to_Reviewers.docx

3 Apr 2020

PONE-D-20-01174R1

low LEF1 expression is a biomarker of early T-cell precursor, an aggressive subtype of T-cell lymphoblastic leukemia

PLOS ONE

Dear Dr. Wang,

Thank you for submitting your manuscript to PLOS ONE. After careful consideration, we feel that it has merit but does not fully meet PLOS ONE’s publication criteria as it currently stands. Therefore, we invite you to submit a revised version of the manuscript that addresses the points raised during the review process by both Reviewers.

We would appreciate receiving your revised manuscript by May 17 2020 11:59PM. When you are ready to submit your revision, log on to https://www.editorialmanager.com/pone/ and select the 'Submissions Needing Revision' folder to locate your manuscript file.

If you would like to make changes to your financial disclosure, please include your updated statement in your cover letter.

To enhance the reproducibility of your results, we recommend that if applicable you deposit your laboratory protocols in protocols.io, where a protocol can be assigned its own identifier (DOI) such that it can be cited independently in the future. For instructions see: http://journals.plos.org/plosone/s/submission-guidelines#loc-laboratory-protocols

Please include the following items when submitting your revised manuscript:

    A rebuttal letter that responds to each point raised by the academic editor and reviewer(s). This letter should be uploaded as separate file and labeled 'Response to Reviewers'.
    A marked-up copy of your manuscript that highlights changes made to the original version. This file should be uploaded as separate file and labeled 'Revised Manuscript with Track Changes'.
    An unmarked version of your revised paper without tracked changes. This file should be uploaded as separate file and labeled 'Manuscript'.

Please note while forming your response, if your article is accepted, you may have the opportunity to make the peer review history publicly available. The record will include editor decision letters (with reviews) and your responses to reviewer comments. If eligible, we will contact you to opt in or out.

We look forward to receiving your revised manuscript.

Kind regards,

Francesco Bertolini, MD, PhD

Academic Editor

PLOS ONE

[Note: HTML markup is below. Please do not edit.]

Reviewers' comments:

Reviewer's Responses to Questions

Comments to the Author

1. If the authors have adequately addressed your comments raised in a previous round of review and you feel that this manuscript is now acceptable for publication, you may indicate that here to bypass the “Comments to the Author” section, enter your conflict of interest statement in the “Confidential to Editor” section, and submit your "Accept" recommendation.

Reviewer #1: (No Response)

Reviewer #2: (No Response)

**********

2. Is the manuscript technically sound, and do the data support the conclusions?

The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented.

Reviewer #1: Yes

Reviewer #2: Yes

**********

3. Has the statistical analysis been performed appropriately and rigorously?

Reviewer #1: I Don't Know

Reviewer #2: Yes

**********

4. Have the authors made all data underlying the findings in their manuscript fully available?

The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified.

Reviewer #1: Yes

Reviewer #2: Yes

**********

5. Is the manuscript presented in an intelligible fashion and written in standard English?

PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here.

Reviewer #1: Yes

Reviewer #2: No

**********

6. Review Comments to the Author

Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters)

Reviewer #1: The authors considerably improved their paper. I am not competent enough to address the biostatistical issues related to the manuscript; here I fully rely on the opinion of referee 2. I still think that the discussion on the role of LEF1 in tumorigenesis, which is textbook knowledge, does not add anything to the message of the paper and should be either shortened or omitted.

Reviewer #2: Wang and Zhang in their revised research article entitled: “low LEF1 expression is a biomarker of early T-cell precursor, an aggressive subtype of T-cell lymphoblastic leukemia” have addressed most of the suggestions raised. However, I think that many minor changes are required to make the manuscript suitable for the publication and I encourage the author to do them. I appreciated the availability of the scripts used.

Minor points above:

In the title please indicate the first letter in upper case.

Line 68 reformulate the sentence, indeed “whole transcriptome and by the 35-gene ETP signature were conducted respectively on...” may be understood as one analysis was applied on TARGET while the other on GSE42328 instead both the analyses were applied to both the data sets.

Lines 109-110 explain the concept of specificity and sensitivity and how you calculated them adding few sentences on the methods, maybe in the “Other Statistic section”.

Lines 154-163 please substitute the word “study” with synonyms.

If, from TCGA dataset, the authors used RNA sequencing data, it would be useful to add the number of genes (if specified) for completeness.

Unsupervised clustering section: the authors have deleted the selection of the 5,000 genes which was reported in the previous version of the manuscript; in my opinion it is important to add it back in order to let the reader fully understand the method used to select these genes.

Supervised prediction section, please add a reference in the first two sentences of this paragraph. Moreover, please provide the numbers and parameters used as pointed out in the answer to the reviewer. Finally I do not see along the text that the GSE42328 data set is called neither “external” nor “secondary” (as indicated in the answer to reviewers) but only “validation”, is it intended?

Please, add in the m&m, under the section “Other Statistics”, the acronym DEGs when referring to the explanation of this methodolgy. Moreover, please add a reference and the method used to do the Cox regression and add the “survival” R library (with its citation) if it was used to calculate the survival rate.

Supplementary Tables and Materials need captions (legend) to explain what they show. I suggest merging all the tables in a unique .xlsx file with different sheets and then add a caption that explains all the sheets (Tables).

Additionally, Supplementary Table 2 has the name of the sheet which is "Table_S1", please correct it and also indicate which threshold you used when the Wilcoxon test was performed and what the data on the columns mean.

The same for Supplementary Table 3, in which the name sheet is mislabelled “Table_S2_LEF1_TCGA_survival” and there is no caption.

Please adjust the references according to the PLOS ONE format.

**********

7. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #1: No

Reviewer #2: No

[NOTE: If reviewer comments were submitted as an attachment file, they will be attached to this email and accessible via the submission site. Please log into your account, locate the manuscript record, and check for the action link "View Attachments". If this link does not appear, there are no attachment files to be viewed.]

While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com/. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Registration is free. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email us at figures@plos.org. Please note that Supporting Information files do not need this step.


5 Apr 2020

Thanks for the reviewers’ advice. The manuscript was edited accordingly.

Reviewer #1: The authors considerably improved their paper. I am not competent enough to address the biostatistical issues related to the manuscript; here I fully rely on the opinion of referee 2. I still think that the discussion on the role of LEF1 in tumorigenesis, which is textbook knowledge, does not add anything to the message of the paper and should be either shortened or omitted.

Thanks for reviewing our manuscript. Considering the reviewer's suggestion, we tried to shorten the discussion on LEF1's biological function, but found that each sentence is supporting our hypothesis and point of view. In addition, the description and discussion based on re-analyzing the TCGA Pancancer data is our original work, not referred from any textbook or other publications. We prefer to keep them. But if the reviewer has a more specific suggestion, e.g. removing a specific sentence, we could try to revise this section again.

Reviewer #2: Wang and Zhang in their revised research article entitled: “low LEF1 expression is a biomarker of early T-cell precursor, an aggressive subtype of T-cell lymphoblastic leukemia” have addressed most of the suggestions raised. However, I think that many minor changes are required to make the manuscript suitable for the publication and I encourage the author to do them. I appreciated the availability of the scripts used.

Minor points above:

In the title please indicate the first letter in upper case.

Corrected.

Line 68 reformulate the sentence, indeed “whole transcriptome and by the 35-gene ETP signature were conducted respectively on...” may be understood as one analysis was applied on TARGET while the other on GSE42328 instead both the analyses were applied to both the data sets.

“respectively” is deleted.

Lines 109-110 explain the concept of specificity and sensitivity and how you calculated them adding few sentences on the methods, maybe in the “Other Statistic section”.

Explained as suggested.

Lines 154-163 please substitute the word “study” with synonyms.

“study” is replaced with “publication”.

If, from TCGA dataset, the authors used RNA sequencing data, it would be useful to add the number of genes (if specified) for completeness.

We only used the expression level of LEF1 gene from the RNA sequencing dataset.

Unsupervised clustering section: the authors have deleted the selection of the 5,000 genes which was reported in the previous version of the manuscript; in my opinion it is important to add it back in order to let the reader fully understand the method used to select these genes.

Thanks for reminding. It was removed by mistake. That sentence is added back in this revision.

Supervised prediction section, please add a reference in the first two sentences of this paragraph. Moreover, please provide the numbers and parameters used as pointed out in the answer to the reviewer. Finally I do not see along the text that the GSE42328 data set is called neither “external” nor “secondary” (as indicated in the answer to reviewers) but only “validation”, is it intended?

Reference is added and numbers are provided.

To be consistent and concise, we decided to call the GSE42328 the validation dataset.

Please, add in the m&m, under the section “Other Statistics”, the acronym DEGs when referring to the explanation of this methodolgy. Moreover, please add a reference and the method used to do the Cox regression and add the “survival” R library (with its citation) if it was used to calculate the survival rate.

Added as suggested.

Supplementary Tables and Materials need captions (legend) to explain what they show. I suggest merging all the tables in a unique .xlsx file with different sheets and then add a caption that explains all the sheets (Tables).

Additionally, Supplementary Table 2 has the name of the sheet which is "Table_S1", please correct it and also indicate which threshold you used when the Wilcoxon test was performed and what the data on the columns mean.

The same for Supplementary Table 3, in which the name sheet is mislabelled “Table_S2_LEF1_TCGA_survival” and there is no caption.

Corrected as suggested.

Please adjust the references according to the PLOS ONE format.

Done.

Submitted filename: Response_to_Reviewers.docx

17 Apr 2020

low LEF1 expression is a biomarker of early T-cell precursor, an aggressive subtype of T-cell lymphoblastic leukemia

PONE-D-20-01174R2

Dear Dr. Wang,

We are pleased to inform you that your manuscript has been judged scientifically suitable for publication and will be formally accepted for publication once it complies with all outstanding technical requirements.

Within one week, you will receive an e-mail containing information on the amendments required prior to publication. When all required modifications have been addressed, you will receive a formal acceptance letter and your manuscript will proceed to our production department and be scheduled for publication.

Shortly after the formal acceptance letter is sent, an invoice for payment will follow. To ensure an efficient production and billing process, please log into Editorial Manager at https://www.editorialmanager.com/pone/, click the "Update My Information" link at the top of the page, and update your user information. If you have any billing related questions, please contact our Author Billing department directly at authorbilling@plos.org.

If your institution or institutions have a press office, please notify them about your upcoming paper to enable them to help maximize its impact. If they will be preparing press materials for this manuscript, you must inform our press team as soon as possible and no later than 48 hours after receiving the formal acceptance. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information, please contact onepress@plos.org.

With kind regards,

Francesco Bertolini, MD, PhD

Academic Editor

PLOS ONE

Additional Editor Comments (optional):

Reviewers' comments:

Reviewer's Responses to Questions

Comments to the Author

1. If the authors have adequately addressed your comments raised in a previous round of review and you feel that this manuscript is now acceptable for publication, you may indicate that here to bypass the “Comments to the Author” section, enter your conflict of interest statement in the “Confidential to Editor” section, and submit your "Accept" recommendation.

Reviewer #1: All comments have been addressed

Reviewer #2: (No Response)

**********

2. Is the manuscript technically sound, and do the data support the conclusions?

The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented.

Reviewer #1: Yes

Reviewer #2: Yes

**********

3. Has the statistical analysis been performed appropriately and rigorously?

Reviewer #1: Yes

Reviewer #2: Yes

**********

4. Have the authors made all data underlying the findings in their manuscript fully available?

The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified.

Reviewer #1: Yes

Reviewer #2: Yes

**********

5. Is the manuscript presented in an intelligible fashion and written in standard English?

PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here.

Reviewer #1: Yes

Reviewer #2: Yes

**********

6. Review Comments to the Author

Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters)

Reviewer #1: I have no additional comments. The authors have addressed my points and I accept their arguments with respect to keeping the discussion on the bilogical functions of LEF1.

Reviewer #2: Wang and Zhang revised and improved their study according to the suggestions of the reviewers. The paper, in my opinion, is now suitable for publication after the correction of some typo as:

-The number with more than 3 digits should be indicated as XX,XXX

-Line 202: please correct the name of the test in “non-parametric Wilcoxon”

-Please adjust the references according to the PLOS ONE format, I see more than one doi in some of the references and at the ref. number 19 it is missing the last page “2016;128(22):5083-.”, if the article is only of one page I think it should be indicate as 5083.

**********

7. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #1: No

Reviewer #2: No


1 May 2020

PONE-D-20-01174R2

low LEF1 expression is a biomarker of early T-cell precursor, an aggressive subtype of T-cell lymphoblastic leukemia

Dear Dr. Wang:

I am pleased to inform you that your manuscript has been deemed suitable for publication in PLOS ONE. Congratulations! Your manuscript is now with our production department.

If your institution or institutions have a press office, please notify them about your upcoming paper at this point, to enable them to help maximize its impact. If they will be preparing press materials for this manuscript, please inform our press team within the next 48 hours. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information please contact onepress@plos.org.

For any other questions or concerns, please email plosone@plos.org.

Thank you for submitting your work to PLOS ONE.

With kind regards,

PLOS ONE Editorial Office Staff

on behalf of

Dr. Francesco Bertolini

Academic Editor

PLOS ONE

https://www.researchpad.co/tools/openurl?pubtype=article&doi=10.1371/journal.pone.0232520&title=Low LEF1 expression is a biomarker of early T-cell precursor, an aggressive subtype of T-cell lymphoblastic leukemia&author=Mei Wang,Chi Zhang,Francesco Bertolini,Francesco Bertolini,Francesco Bertolini,Francesco Bertolini,Francesco Bertolini,&keyword=&subject=Research Article,Biology and Life Sciences,Cell Biology,Cellular Types,Animal Cells,Blood Cells,White Blood Cells,T Cells,Biology and Life Sciences,Cell Biology,Cellular Types,Animal Cells,Immune Cells,White Blood Cells,T Cells,Biology and Life Sciences,Immunology,Immune Cells,White Blood Cells,T Cells,Medicine and Health Sciences,Immunology,Immune Cells,White Blood Cells,T Cells,Biology and Life Sciences,Genetics,Gene Expression,Biology and Life Sciences,Computational Biology,Genome Analysis,Transcriptome Analysis,Biology and Life Sciences,Genetics,Genomics,Genome Analysis,Transcriptome Analysis,Biology and Life Sciences,Biochemistry,Biomarkers,Medicine and Health Sciences,Oncology,Cancers and Neoplasms,Hematologic Cancers and Related Disorders,Leukemias,Medicine and Health Sciences,Hematology,Hematologic Cancers and Related Disorders,Leukemias,Medicine and Health Sciences,Diagnostic Medicine,Cancer Detection and Diagnosis,Medicine and Health Sciences,Oncology,Cancer Detection and Diagnosis,Research and Analysis Methods,Mathematical and Statistical Techniques,Statistical Methods,Forecasting,Physical Sciences,Mathematics,Statistics,Statistical Methods,Forecasting,Biology and life sciences,Molecular biology,Molecular biology techniques,Sequencing techniques,RNA sequencing,Research and analysis methods,Molecular biology techniques,Sequencing techniques,RNA sequencing,