A validated strategy to infer protein biomarkers from RNA-Seq by combining multiple mRNA splice variants and time-delay

Background Profiling of mRNA expression is an important method to identify biomarkers but complicated by limited correlations between mRNA expression and protein abundance. We hypothesised that these correlations could be improved by mathematical models based on measuring splice variants and time delay in protein translation. Methods We characterised time-series of primary human naïve CD4+ T cells during early T-helper type 1 differentiation with RNA-sequencing and mass-spectrometry proteomics. We then performed computational time-series analysis in this system and in two other key human and murine immune cell types. Linear mathematical mixed time-delayed splice variant models were used to predict protein abundances, and the models were validated using out-of-sample predictions. Lastly, we re-analysed RNA-Seq datasets to evaluate biomarker discovery in five T-cell associated diseases, validating the findings for multiple sclerosis (MS) and asthma. Results The new models demonstrated median correlations of mRNA-to-protein abundance of 0.79- 0.94, significantly out-performing models not including the usage of multiple splice variants and time-delays, as shown in cross-validation tests. Our mathematical models provided more differentially expressed proteins between patients and controls in all five diseases. Moreover, analysis of these proteins in asthma and MS supported their relevance. One marker, sCD27, was clinically validated in MS using two independent cohorts, for treatment response and prognosis. Conclusion Our splice variant and time-delay models substantially improved the prediction of protein abundance from mRNA data in three immune cell-types. The models provided valuable biomarker candidates, which were validated in clinical studies of MS and asthma. We propose that our strategy is generally applicable for biomarker discovery

Magnusson, R., Rundquist, O., Kim, M. J., Hellberg, S., Na, C. H., Benson, M., … Gustafsson, M. (2019). A validated strategy to infer protein biomarkers from RNA-Seq by combining multiple mRNA splice variants and time-delay. doi:10.1101/599373

This work was supported by the Swedish Cancer Society grants (CAN 2017/625), East Gothia Regional Funding, Åke Wiberg foundation, Neuro Sweden, the Swedish Research Council grants 2015-02575, 2015-03495, 2015-03807, 2016-07108, 2018-02776, National Research foundation of Korea, and Swedish foundation for strategic research.

