In the last ten years, the amount of experimental data acquired by high-throughput technologies such as microarrays and RNA sequencing (RNAseq) has increased exponentially and resulted in partly Gigabyte-sized expression matrices. It is not uncommon that the researcher is faced with tables of 20000 rows (transcripts, genes) and 2000 columns (samples), necessitating mathematical, computational and visual approaches that are specifically tailored to these high-dimensional datasets. Frequently, the wet lab scientist “outsources” these analyses to an associated bioinformatics department, getting in return an often black box-type sophisticated analysis on which to rely. Here, it is important that a common ground on existing analysis approaches of this kind of data must be established. In my talk, I will give a concise and comprehensive overview on existing methods to analyze large-scale gene expression data. Without going into deep mathematical details – these can be obtained from the literature – I will provide an outline on the important aspects and idiosyncrasies of current methodology based largely on the 2D- and 3D-visual depiction of data. Starting from very basic topics such as data cleaning/normalization/scaling, I will emphasize on efforts to uncover the intrinsic signature of the data (without imposing any presumptions), based on unsupervised clustering methods such as hierarchical clustering and dimension reduction methods such as PCA (linear) or the recent t-SNE approach (non-linear). I will demonstrate that in published datasets, the intrinsic structure of the data can be significantly different to the one assumed or defined by the experimental setup (such as batch effects). Next, I will give a summary on how to filter signatures that discriminate between different cellular states and how to use computationally expensive methods (bootstrapping, cross-validation) to avoid extracting signatures that perform great on the training set but bad on independent data (overfitting). Along these lines, a short introduction on recent machine learning approaches such a random forests, neural networks and gradient boosting will be delivered, and their advantage in finding predictive biomarkers and reduced discriminator sets through feature selection. For all the discussed approaches, I will also highlight the different pitfalls, for instance when to correct for multiple testing, why to never perform a statistical test before clustering, and (quite crucially) the identification of differential expression that is mimicked by the shifting of cellular proportions.
“Standardization of sample preparation” is our core mission with a focus on clinical and pharmaceutical samples. As pre-analytical processes are increasingly recognized as the limiting factors for sensitivity and specificity of biomarker detection, this is especially relevant for highly advanced analytical methods such as Next Generation Sequencing or Mass Spectrometry. The AFA® (Adaptive Focused Acoustics®) process is isothermal and non-contact, providing precise process control, which is beneficial to a number of scientific disciplines in both advanced biological and chemical applications. Its high level of experimental condition control enables processes to be developed or improved upon very quickly, easily, and reproducibly. Covaris Focused-ultrasonicators may be programmed for intensity, duration, and duty factor, supporting a wide variety of applications, from gentle mixing to extreme compound reformatting and dissolution. This talk will present some of the major applications driven by AFA (e.g. DNA and chromatin shearing, cfDNA isolation, nucleic acid and protein extraction from FFPE). Many of these were launched recently, including a series of kits in the truChIP/truXTRAC product line. We will also discuss insights into current developments in automation and robotization, introducing the first focused-ultrasonicator integrated on a liquid handler deck with precise energy, control, and a proprietary scanning process. This instrument provides increased workflow efficiency, full automation, and high-throughput sample prep workflows.
On major obstacle in current medicine and drug development is inherent in the way we define and approach diseases. Here, we will discuss the diagnostic and prognostic value of (multi-)omics panels in general. We will have a closer look at breast cancer subtyping and treatment outcome, as case example, using gene expression panels – and we will discuss the current “best practice” in the light of critical statistical considerations. Afterwards, we will introduce computational approaches for network-based medicine. We will discuss novel developments in graph-based machine learning using examples ranging from Huntington’s disease mechanisms via lung cancer drug target discovery back to where we started, i.e. breast cancer subtyping and treatment optimization – but now from a systems medicine point of view. We conclude that systems medicine and modern artificial intelligence open new avenues to shape future medicine.
Related paper: De novo pathway-based biomarker identification.
Non-cellular blood circulating microRNAs (plasma miRNAs) represent a promising source for the development of prognostic and diagnostic tools owing to their minimally invasive sampling, high stability, and simple quantification by standard techniques such as RT-qPCR. In this talk, I’ll briefly present projects investigating the potential of plasma miRNAs both in a population-based cohort study and in patient cohorts for specific diseases.
We profiled circulating miRNAs in the population-based sohort study SHIP and investigated associations with age, sex, BMI. After regressing out technical parameters and adjusting for the respective other two phenotypes, 7, 15, and 35 plasma miRNAs were significantly (q < 0.05) associated with age, BMI, and sex, respectively. Adjustment for blood cell parameters slightly increased the number of age- or BMI-associated miRNAs but drastically reduced the number of sex-associted miRNAs. These findings emphasize that circulating miRNAs are strongly impacted by age, BMI, and sex. These parameters should be considered as covariates in association studies based on plasma miRNA levels. The established experimental and computational workflow can be used in future screening studies to determine associations of plasma miRNAs with defined disease phenotypes.
In a multicentre, prospective ACS cohort, 1002 out of 2168 patients presented with ST-segment elevation myocardial infarction (STEMI). Sixty-three STEMI patients experienced an adjudicated major cardiovascular event (MACE, defined as cardiac death or recurrent myocardial infarction) within 1 year of follow-up. From a miRNA profiling in a matched derivation case–control cohort, 14 miRNAs were selected for validation. Comparing 63 cases vs. 126 controls, miR-26b-5p levels (P=0.038) were decreased, whereas miR-320a (P=0.047) and miR-660-5p (P=0.01) levels were increased in MACE patients. MiR-26b-5p has been suggested to prevent adverse cardiomyocyte hypertrophy, whereas miR-320a promotes cardiomyocyte death and apoptosis, and miR-660-5p has been related to active platelet production. This suggests that miR-26b-5p, miR-320a, and miR660-5p may reflect alterations of different pathophysiological pathways involved in clinical outcome after ACS. These three miRNAs also discriminated cases from controls in age- and sex-adjusted Cox regression (AUC=0.718). Addition of the three miRNAs to both, the Global Registry of Acute Coronary Events (GRACE) score and a clinical model led to a net reclassification improvement of 0.20 in both cases.
Harald H.H.W. Schmidt
Department of Pharmacology and Personalised Medicine, Faculty of Health, Medicine and Life Sciences, Maastricht University, The Netherlands
Following the IT revolution, the next socio-economic revolution appears to be a complete redefinition of health and disease, how we define them, how we handle them and how we finance this. Such revolutions follow upon a major crisis, and medicine is in a crisis. Existing drugs fail to provide benefit for most patients. The efficacy of drug discovery is in a constant decline and big pharma about to disappear in its current form by the end of the 2020s. Biomedical research has a poor translational success rate due to false incentives, lack of quality/reproducibility and publication bias. The most important reason and need for change, however, is our current concept of disease, i.e. mostly 19th/20th century-derived and based on organs or symptoms, but hardly every by mechanisms. Without a disease mechanism, however, no curative therapy is possible. Enabled by big-data and interdisciplinary research with applied bioinformaticians, the new Systems Medicine will lead to a mechanism-based redefinition of disease, precision diagnosis and therapy eliminating the need for drug discovery and a complete reorganization of how we teach, train and practice medicine.