RDML Consortium Meeting

Andreas Untergasser1,2
1University of Heidelberg, Germany; 
2On behalf of the RDML consortium

RDML development is coordinated by the RDML consortium, a group of scientists, software developers and instrument manufacturers (http://www.rdml.org). The joined efforts resulted in improved versions 1.1 and 1.2. This consortium is not limited to its current members; it invites all interested parties to join the effort, by joining this RDML Consortium Meeting.

Back to qPCR BioStatistics & BioInformatics

RDML qPCR Data Format – Ready For The Next Level?

Andreas Untergasser1, Steve Lefever2, Jasper Anckaert2, Jan M Ruijter3, Jan Hellemans4, Jo Vandesompele2,4
1University of Heidelberg, Heidelberg, Germany; 
2Ghent University, Ghent, Belgium; 
3Academic Medical Center, Amsterdam, The Netherlands; 
4Biogazelle, Zwijnaarde, Belgium

Quantitative PCR (qPCR) is the gold standard method for accurate and sensitive nucleic acid quantification. To improve the quality and transparency of experiment design, data-analysis and reporting of results, the MIQE guidelines were established in 2009 (Bustin et al., Clinical Chemistry). The Real-time PCR Data Markup Language (RDML) was designed to establish a vendor independent, freely available XML based file format to store and exchange qPCR data (Lefever et al., NAR). RDML stores the raw data acquired by the machine as well as the information required for its interpretation, such as sample annotation, primer and probe sequences and cycling protocol.
Today, several instrument manufacturers realized its potential and implemented functionality to export data in the RDML-format. Third party software (LinRegPCR and qbasePLUS) uses this information for advanced data analysis. Due to the flexibility of RDML, the majority of the current software uses only parts of the format. Furthermore, with different RDML versions available, the need to convert between versions became obvious. The open source editor RDML-Ninja was designed to edit RDML-files and convert between different versions (sourceforge.net/projects/qpcr-ninja/). It should serve as reference implementation of the RDML-format and assist researchers, reviewers as well as software developers by offering access to all data in an RDML-file.
Ultimately, RDML could be extended to store all information required by MIQE. Currently the information required by MIQE seems overwhelming to a researcher, but RDML offers an easy way out. All the information would be only entered once and stored in a basic RDML file. Researchers would not have to re-enter this information with every qPCR run, but will import from this RDML file only the parts needed for the current qPCR run. Furthermore integration of MIQE in RDML and RDML-Ninja would allow checking to which extend MIQE information is provided by calculating the checklist completeness based on a provided RDML-file. We would like to discuss this vision, its chances and its applicability.

Back to qPCR BioStatistics & BioInformatics

Impact of Smoothing on Parameter Estimation inQuantitative DNA Amplification Experiments

Stefan Rödiger1, Andrej-Nikolai Spiess2, Michał Burdukiewicz3
1BTU Cottbus – Senftenberg, Senftenberg, Germany; 
2University Medical Center Hamburg-Eppendorf, Hamburg, Germany;
3University of Wroclaw, Wroclaw, Poland

Quantitative real-time polymerase chain reaction (qPCR) is one of the most precise DNA quantification methods. The parameters quantification cycle (Cq) and amplification efficiency (AE) are commonly calculated from distinct location indices of the amplification curve (threshold fluorescence, first- or second-derivative maxima) to quantify qPCR reactions. Consequently, a precise analysis is the requirement to quantify the copy number in samples [1]. Several smoother and filter methods for minimizing inherent noise in qPCR data have been proposed in the peer-review literature. Despite the fact that smoothing steps are so frequently employed during amplification curve analysis and generally taken for granted, the question that arises is if should we really accept to use any of these methods without paying attention to their possible implications.
The smoothers and filters we compared in our investigation are widely used to compensate for noisy data. We found no fundamental controversy in the scientific community about the smoothers and filters used in our study. All of them are thoroughly tested, peer-reviewed, and well accepted. In our study we specifically addressed the question, which of the smoothers is appropriate for amplification curve data acquired by isothermal amplification or qPCR.
Due to the lack of comprehensive models we have chosen an empirical approach in combination with amplification curve simulation to evaluate the smoother and filter functions in a testable scenario. For this purpose, we analyzed the impact of the smoother methods on real-world data from different thermo cycler equipment (low through-put and high-throughput cyclers) as well as different amplification methods. We also used in our analysis “user-controlled” noise structures based on Monte Carlo simulations.
Our results indicate that selected smoothing algorithms affect the estimation of Cq and AE considerably. The commonly employed moving average filter performed worst in all qPCR scenarios. Least bias was observed for the Savitzky-Golay smoother, Cubic Splines and Whittaker smoother. In general, we found a low sensitivity to differences in AE, whereas other smoothers like Running Mean introduced a significant AE dependent bias. We developed open source software packages to facilitate the selection of smoothing algorithms that can be incorporated in an analysis pipeline of qPCR experiments. The findings of our study were implemented in the R packages chipPCR and qpcR [2,3], freely available from “The Comprehensive R Archive Network”. We anticipate that our findings serve as guidelines for the selection of an appropriate smoothing algorithm in diagnostic qPCR applications. However, a general feasibility of qPCR data smoothing remains to be demonstrated.
[1] Pabinger and Rödiger et al., Biomolecular Detection and Quantification (2014), 1/1, 23-33. [2] Spiess AN et al., Clinical Chemistry (2015), preprint. [3] Rödiger et al. (under revision), Bioinformatics (Oxf.)

Back to qPCR BioStatistics & BioInformatics

Occurrence of unexpected PCR artefacts warrants thorough quality control

Adrián Ruiz-Vilalba1, Bep van Pelt-Verkuil2, Quinn Gunst1, Maurice van den Hoff1, Jan Ruijter1
1Department of Anatomy, Embryology and Physiology, Academic Medical Centre (AMC), Amsterdam, The Netherlands; 
2Department of Innovative Molecular Diagnostics, University of Applied Sciences, Leiden, the Netherlands

A recent comparison of qPCR data analysis methods showed that some of the amplification curve analysis methods perform better than the classic standard curve and Cq approach on indicators like variability and sensitivity. In this comparison the possibility to characterize the amplification curve and thus assess its quality remained under-exposed. To this end, different datasets have been compared. Our results show that for a significant fraction of the genes, low initial target concentrations lead to the amplification of artifacts independently of the primer specificity. These non-specific amplification curves are indistinguishable from those resulting in the correct products; they show similar baseline, PCR efficiency and plateau fluorescence behaviours. The validation of specific amplification curves requires a quality control in which the design of the plate, a melting curve analysis (MCA) and electrophoresis gels are combined. In addition, our data suggest that the relative concentration of the template in the cDNA input and of the primers determines the appearance of the PCR artifacts. Unexpectedly, the presence of non-template foreign cDNA seems to be an essential requirement for the amplification of the correct specific qPCR target.

Back to qPCR BioStatistics & BioInformatics

The PrimerBank database: an analysis of high-throughput primer validation

Athanasia Spandidos1,2,3, Xiaowei Wang1,2,4, Huajun Wang1,2, Brian Seed1,2
1Center for Computational and Integrative Biology, Massachusetts General Hospital, Boston, MA; 
2Department of Genetics, Harvard Medical School; 
3Current address: 1st Department of Pathology, National and Kapodistrian University of Athens, Athens, Greece.;
4Current address: Division of Bioinformatics and Outcomes Research, Department of Radiation Oncology, Washington University School of Medicine, St Louis, MO.

qPCR remains the gold standard used for validation of gene expression measurements from high-throughput methods such as DNA microarrays, however, non-specific amplification is frequently an issue. In order to overcome this, we developed the PrimerBank database, a public resource containing primers that can be used under stringent and allele-invariant amplification conditions. PrimerBank can be used for the retrieval of human and mouse primer pairs for gene expression analysis by PCR and RT-qPCR. Currently, the database contains 497,156 primers which cover 17,076 and 18,086 genes for the human and mouse species, respectively, corresponding to around 94% of all known protein-coding gene sequences. PrimerBank also contains information on these primers such as Tm, location on the transcript and expected PCR product size. Primer pairs covering most known mouse genes have been experimentally validated by amplification plot, gel electrophoresis, DNA sequence and thermal denaturation profile analysis, and all the experimental validation information together with primer information can be freely retrieved from the PrimerBank website (http://pga.mgh.harvard.edu/primerbank/). The database can be searched using various search terms. One of the advantages of PrimerBank primers is that they have been designed to work under a common PCR thermal profile, so they can be used for high-throughput RT-qPCR in parallel or genome-wide RT-qPCR. The expression profiles of thousands of genes can be determined simultaneously using high-throughput platforms available, making PrimerBank primers useful for gene expression analysis on a genome-wide scale.

Back to qPCR BioStatistics & BioInformatics

Unexpected System-specific Periodicity In Quantitative Real-Time Polymerase Chain Reaction Data And Its Impact On Quantification

Andrej-Nikolai Spiess1, Stefan Rödiger2, Thomas Volksdorf3, Joel Tellinghuisen4
1Department of Andrology, University Hospital Hamburg-Eppendorf, Germany; 
2Faculty of Natural Sciences, BTU Cottbus – Senftenberg, Cottbus, Germany; 
3Department of Dermatology, University Hospital Hamburg-Eppendorf, Germany; 
4Department of Chemistry, Vanderbilt University, Nashville, Tennessee, USA

The “baseline noise” of quantitative real-time PCR (qPCR) data is a feature of every qPCR curve and has substantial impact on quantitation. In principle, two different forms of baseline noise can be encountered: (i) the dispersion of fluorescence values in the first few cycles of a qPCR curve around their mean (within-sample noise) and (ii) the dispersion of fluorescence values between different qPCR curves at the same cycle (between-sample noise). The most predominant effect that results in between-sample noise is an overall shifting of the qPCR curve on the y-axis (“baseline shift”), which is frequently compensated by “baselining” qPCR data. Common approaches are to subtract an averaged (Lievens et al., 2012; Rutledge, 2011; Goll et al., 2006), iteratively estimated (Ramakers et al., 2003; Ruijter et al., 2009) or lower asymptote derived (Tichopad et al, 2003; Peirson et al., 2003; Spiess et al., 2008) baseline value from all fluorescence values prior to quantitation (compare Table 1 in Ruijter et al., 2013).
Recently, we showed preliminary results on a published large scale technical replicate dataset (Ruijter et al., 2013) that indicated between-sample periodicity for fluorescence values at early and late cycle numbers (Tellinghuisen & Spiess, 2014). A more detailed interrogation of the between-run noise periodicity revealed that this effect occurs at all cycle numbers and constitutes a second and completely independent noise component that adds to the overall baseline shift. Most importantly, periodic noise persists even after classical “baselining” and results in a propagation of periodicity into estimated Cq values when using fixed threshold methods (LinReg, FPKM, DART, FPLM), hence resulting in periodic Cq values. In contrast, Cq values obtained from variable threshold methods based on first- or second-derivative maxima (Cy0, Miner, 5PSM) or from normalization of fluorescence data are completely devoid of periodic noise, corroborating the feasibility of these approaches.
The origin of periodic noise in qPCR data remains elusive. By employing a larger cohort of published and also self-generated high-replicate qPCR data from different platforms, we used classical algorithms of time series/signal analysis (i.e. autocorrelation analysis) to characterize the periodicity in more detail. Interestingly, we generally observed a periodicity of 24/12 for 384/96-well plate systems, respectively. These findings strongly suggest an effect of uneven temperature profiles in peltier block systems or variable liquid deposition of manual/automated multichannel pipetting systems, manifesting themselves as periodic qPCR data. We will present ways to eliminate periodic noise from qPCR data that results in a more reliable estimation of Cq values.

Back to qPCR BioStatistics & BioInformatics

Removal of Between-Plate Variation in qPCR with Factor Correction: Completion of the Analysis Pipeline Supported by RDML

Jan Ruijter1, Jan Hellemans2, Adrian Ruiz-Villalba1, Maurice Van Den Hoff1, Andreas Untergasser3
1Academic Medical Center, the Netherlands; 
2Biogazelle, Belgium; 
3Heidelberg University, Heidelberg, Germany

Quantitative PCR is the method of choice in gene expression analysis. However, the number of experimental conditions, target genes and technical replicates quickly exceeds the capacity of the qPCR machines. Statistical analysis of the resulting data then requires the correction of between-plate variation. Application of calibrator samples, with replicate measurements distributed over the plates assumes a multiplicative difference between plates. However, random and technical errors in these calibrators will propagate to all samples on the plate. To avoid this effect, the systematic bias can better be corrected when there is a maximal overlap between plates using Factor Correction [Ruijter et al. Retrovirology, 2006]. The original Factor Correction program is based on Excel input and calculates corrected target quantities. To implement this correction into the analysis pipeline from raw data through LinRegPCR into qbase-plus, a new version of the program was created to handle RDML files. This version saves the corrected N0 values as efficiency-corrected Cq values to be used in further calculations. This program thus completes the analysis pipeline of qPCR data supported by RDML.

Back to qPCR BioStatistics & BioInformatics