cDNA yield after C1 runs

Summary

This page shows where to retrieve the data files containing the measurement of DNA concentration in the 96-well plates after collection of the cDNAs from the capture array. It then presents some commands to import the data in R, and uses them to produce a consolidated table for the five runs, that is saved for integrated analysis later. Finally, some quality controls are run, showing strong variation between runs for the DNA yield after the on-chip PCR. An inspection of the standard curves rules out that it could be a simple problem of measurement.

Method

After a C1 run, the products are recovered from the outlets and transferred to a 96-well plate. Following the standard procedure (PN 100-5950 A1, page 25), 2 μL are quantified on a 384-well fluorometer using the PicoGreen dye. See also PicoGreen Quantitation of DNA: Effective Evaluation of Samples Pre-or Post-PCR. Susan J. Ahn, José Costa and Janet Rettig Emanuel, Nucl. Acids Res. (1996) 24(13):2623-2625.

The measured fluorescence intensities are transferred in an Excel sheet provided by Fluidigm (PN 100-6160), which converts them to DNA concentrations by linear regression to a standard curve.

The R commands here load the Excel sheets and produce one consolidated table for all runs, on which a few quality controls are run.

Datasets

Each file is named following the serial ID of the C1 chip from which DNA was collected.

The files are available from GitHub and will be deposited in a proper place later. The files need to be downloaded before running this knitr script.

shasum -c << __CHECKSUMS__
6d58020277bd2eb42f5702eabe17194671d9e87f  1772-062-248.picogreen.xlsx
83844dd8e077d54a619e6bff396992b23d8a26e8  1772-062-249.picogreen.xlsx
656fb4bd93466cd4f005881a046f3f17c17f4a78  1772-064-103.picogreen.xlsx
f4e2064d7c09826a8ad14ed2a95a66255018da39  1772-067-038.picogreen.xlsx
c9a1ece68f037589d40b382f59d202dbb90425de  1772-067-039.picogreen.xlsx
__CHECKSUMS__
## 1772-062-248.picogreen.xlsx: OK
## 1772-062-249.picogreen.xlsx: OK
## 1772-064-103.picogreen.xlsx: OK
## 1772-067-038.picogreen.xlsx: OK
## 1772-067-039.picogreen.xlsx: OK

Functions to load the data in R

library(gdata)
library(ggplot2)
library(reshape)

These files in Excel format contain the final result in their third sheet, from line 42 to 49.

First, extract the concentrations as a 8 × 12 table. An example is shown for run 1772-062-248.

readConcentrationTable <- function (FILE) {
  picogreen <- read.xls( FILE
                       , sheet =  3
                       , skip  = 41
                       , nrow  =  8
                       , head  = FALSE
                       , blank.lines.skip = FALSE )[,1:13]
  colnames(picogreen) <- c('Row', '01', '02', '03', '04', '05', '06', '07', '08', '09', '10', '11', '12')
  picogreen
}
readConcentrationTable('1772-062-248.picogreen.xlsx')
##   Row    01    02    03    04    05    06    07    08    09    10    11    12
## 1   A 3.743 0.173 2.832 3.739 4.200 4.485 2.444 0.091 4.404 0.046 1.672 4.781
## 2   B 3.099 3.935 3.927 2.192 3.958 0.101 0.243 2.229 0.099 4.557 3.287 4.028
## 3   C 4.190 0.095 2.545 4.009 3.756 3.064 3.155 0.116 4.217 2.505 0.138 3.847
## 4   D 4.760 4.299 3.198 3.174 3.756 2.657 0.108 3.170 4.209 3.512 0.116 2.908
## 5   E 4.742 3.243 2.339 0.120 2.254 3.498 3.248 3.178 3.168 4.259 3.808 3.854
## 6   F 4.244 3.083 3.152 0.122 2.352 3.339 3.384 3.242 3.475 2.524 2.958 2.430
## 7   G 3.408 3.632 2.969 3.526 2.978 3.613 2.921 3.599 3.392 3.799 2.321 3.361
## 8   H 3.370 3.506 3.152 3.402 3.604 0.115 3.995 4.319 3.568 3.079 2.375 3.622

Then, transform this table to have one measurement by line, with the run ID, and the row and column name. This is a typical format when plotting data with ggplot2.

meltConcentrationTable <- function (RUN, TABLE) {
  picogreen <- melt(TABLE, id.vars='Row')
  colnames(picogreen) <- c('Row', 'Column', 'Concentration')
  picogreen[,"Run"] <- RUN
  picogreen[,"Well"] <- paste(picogreen$Row, picogreen$Column, sep='')
  picogreen <- picogreen[, c('Run', 'Well', 'Row', 'Column', 'Concentration')]
  picogreen
}

The function below outputs the data for one run, provided that a properly named file (run ID plus picogreen.xlsx) is available in the same directory.

read_pg <- function(RUN) {
  FILE <- paste(RUN, 'picogreen.xlsx', sep='.')
  picogreen <- readConcentrationTable(FILE)
  picogreen <- meltConcentrationTable(RUN, picogreen)
  picogreen
}
head(read_pg('1772-062-248'))
##            Run Well Row Column Concentration
## 1 1772-062-248  A01   A     01         3.743
## 2 1772-062-248  B01   B     01         3.099
## 3 1772-062-248  C01   C     01         4.190
## 4 1772-062-248  D01   D     01         4.760
## 5 1772-062-248  E01   E     01         4.742
## 6 1772-062-248  F01   F     01         4.244

The standard curve is also in sheet 3, from row 2 to 12. Here is a function to get the standard curve from one run (with background correction already applied). The output is self-explanatory.

read_sc <- function(RUN) {
  FILE <- paste(RUN, "picogreen.xlsx", sep = ".")
  sc <- read.xls(FILE, sheet=3, skip=2, nrow=10, header=FALSE)[,c(2,5)]
  sc <- cbind(RUN, sc)
  colnames(sc) <- c('Run', 'dna', 'fluorescence')
  sc$fluorescence <- as.numeric(as.character(sc$fluorescence))
  return(sc)
}
read_sc('1772-062-248')
##             Run        dna fluorescence
## 1  1772-062-248 2.00000000     560415.0
## 2  1772-062-248 1.00000000     332109.0
## 3  1772-062-248 0.50000000     144873.0
## 4  1772-062-248 0.25000000      80629.5
## 5  1772-062-248 0.12500000      38842.0
## 6  1772-062-248 0.06250000      19577.0
## 7  1772-062-248 0.03125000       9952.0
## 8  1772-062-248 0.01562500       4536.5
## 9  1772-062-248 0.00781250       2437.0
## 10 1772-062-248 0.00390625       1204.0

Consolidated file (format)

The file cDNA_concentration.csv is made from the files above using R, and has the following columns.

Run

The serial ID of the C1 capture array for a given run. Example: 1772-062-248.

Well

The coordinates in the 96-well plate where the cDNAs have been transferred at the end of the C1 run. Examples: A01, F08, C12, etc. Combined with the run ID, this uniquely identifies a cell.

Row

The row coordinates in the 96-well plate where the cDNAs have been transferred at the end of the run. Possible values: A, B, C, D, E, F, G and H.

Column

The column coordinates in the 96-well plate where the cDNAs have been transferred at the end of the run. Possible values: 01, 02, 03, 04, 05, 06, 07, 08, 09, 10, 11 and 12.

Concentration

The DNA concentration in ng/μL.

Consolidated file (preparation)

Each Excel sheet is loaded in R, the DNA concentrations are extracted, and added to a table saved under the name cDNA_concentration.csv.

The cDNA_concentration.csv file contains the data for the following runs.

RUNS <- c('1772-062-248', '1772-062-249', '1772-064-103', '1772-067-038', '1772-067-039')

Create a picogreen table for the first run being processed, append the other runs, and save the file.

for (RUN in RUNS) {
  if (! exists('picogreen'))
    {picogreen <- read_pg(RUN)}
  else
    {picogreen <- rbind(picogreen, read_pg(RUN))}
}

summary(picogreen)
##      Run                Well                Row          Column    Concentration   
##  Length:480         Length:480         A      : 60   01     : 40   Min.   :0.0040  
##  Class :character   Class :character   B      : 60   02     : 40   1st Qu.:0.3125  
##  Mode  :character   Mode  :character   C      : 60   03     : 40   Median :1.0120  
##                                        D      : 60   04     : 40   Mean   :1.4048  
##                                        E      : 60   05     : 40   3rd Qu.:2.3750  
##                                        F      : 60   06     : 40   Max.   :4.7810  
##                                        (Other):120   (Other):240
write.csv(file='cDNA_concentration.csv', picogreen, row.names=FALSE)

Quality control

There is a strong variation between runs.

qplot( data=picogreen
     , Run
     , Concentration
     , geom="boxplot",
     , colour=Run) + coord_flip()

plot of chunk cDNA_concentration_boxplot

Still, for most runs except 1772-064-103, it is possible to detect low-concentration outliers, were there probably was no cell in the chamber. Note that the scale of each histogram is different.

qplot( data=picogreen
     , Concentration
     , geom="histogram"
     , colour=Run) + facet_wrap( ~Run, scales='free')

plot of chunk cDNA_concentration_histogram

Comparison between standard curves

Load the data from all runs.

for (RUN in RUNS) {
    if (!exists("sc")) {
        sc <- read_sc(RUN)
    } else {
        sc <- rbind(sc, read_sc(RUN))
    }
}

While the DNA concentrations in 1772-062-248 might have been overestimated as suggested by the shifted standard curve, overall the calibration of the fluorometer is stable and does not explain the strong variations of the DNA yield.

ggplot(
  sc,
  aes(
    x=fluorescence,
    y=dna,
    colour=Run)
) + geom_point() +
    geom_line() + 
    scale_x_log10('Average fluorescence (background subtracted)') +
    scale_y_log10('DNA concentration (ng/μL)')