Objectives

Pseudo algorithm (Data simulation)

  1. Draw expression values of 20,000 genes from an exponential distribution (\(rate = 1/250\)). Question
  2. Specify the percentage of DEGs
  3. For the DEGs, draw a log 2 fold change from a Normal \(N(\mu = 0, \sigma = 0.7)\)
  4. Go up and down from the base value, draw from \(exp(\lambda = 1/250)\), to a log fold change, draw from \(N(\mu = 0, \sigma = 0.7)\), to get the expression counts for the two conditions
  5. Specify a value or a function of dispersion parameter (the sahpe parameter of the gamma mixing distribution).
  6. With the expression levels (\(\mu\)) and the dispersion values (\(r\)), two transcriptomes can be simulated
  7. Repeat step 1-6 for 20,000 times.

Simulation setting

Evaluation steps

  1. Compare the estiamted dispersion value, \(\hat{r}\), or dispersion function with the true value or function.
  2. Evaluate type I error rate and power at different simulation settings.

Open questions

  1. Which is better in the simulation: randomness or specific values of the simulation parameters? And in general?

Appendix

Questions

Q1: Is this a reasonable distribution to use?

Back

Q2 Is it acceptable to use these values without justification?

Back

Q3: Other reasonable functons?

Back

Q4: should the size go even lower or higher? Coarser granularity?

Back


Supp. plots

exponential distribution(\(\lambda = 1/250\))

Back

Gene fold changes

Back

dispersion parameter function

Back


Supp. proof

If we want a precise estimation of, for example, type I error rate, a sufficent number of simulation repitions is needed. Assume each simulation repition is a Bernoulli trial with rate, \(p\), and there are \(n\) trails correponding to \(n\) simulation repitions. In the end, we count the number of false positives, and this count follows Binomial distribution, \(Binom(n,p)\).

\begin{align*} \hat{p} &= \frac{\sum X_i}{n}\\ CI_{\hat{p}} &= \hat{p} \pm Z_{\alpha/2}SE(\hat{p})\\ CI_{\hat{p}} &= \hat{p} \pm 1.96 \sqrt{\frac{p(1-p)}{n}} \end{align*} If I want a confidence interval with width smaller than 1%, then: \begin{align*} 1.96 \sqrt{ \frac{p(1-p)}{n} } < 0.005 \\ n > 7,300 \text{, if } p = 0.05 \\ n > 24,587 \text{, if } p = 0.8 \end{align*}

Back