![]() |
Detecting periodic patterns
in unevenly spaced gene expression time series using Lomb-Scargle periodograms |
Analysis Programs in |
Download
LombScargle.zip for Windows, or
LombScargle.tar.gz for Linux
(download files include Bozdech data in CSV format -- see notes)
Getting Started The following assumes you have R installed on Windows or Linux and gives the instructions for extracting the needed R and data files to reproduce figures from the paper. Windows
Linux
Web Testing (R versions 2.0.1 and earlier will give slightly different p.adjust values because of how missing values were treated. The use of R 2.1.0 or later is recommended for the improved treatment of missing values in p.adjust.) The files below reproduce the figures from the paper. To work with your dataset, see Step-by-Step Instructions: How to Apply Lomb-Scargle Analysis to Your Set of Expression Data. |
||
R File |
Requires |
Comments |
cosine.R | - | This file is perhaps a good starting point since it is a self-contained example of a Lomb-Scargle periodogram analysis of a 20-point cosine curve with even spacing over a 120 minute period. Run the script from a local file: source("cosine.R") or directly from this page (copy and paste this line into R): source("http://research.stowers-institute.org/efg/2005/LombScargle/R/cosine.R") Output consists of a single screen of four graphics showing:
|
R-Mnemonics.R | - | "R" symbolic constants, mostly for graphics. This list is far from complete. The R community doesn't seem to want to adopt a good programming practice of using mnemonic constants in programs, but rather uses arbitrary numeric values. For example, for "line type" R uses "1" for a solid line instead of a mnemonic like LTY_SOLID suggested in this file See Chapter 11, "The Power of Variable Names," in Code Complete 2 by Steve McConnell for why such mnemonics should adopted by the R community. |
R-efg.R |
R-Mnemonics.R | "WriteStamped..." routines to create time-stamped graphics files. |
LombScargle.R | R-Mnemonics.R | Library of Lomb-Scargle periodogram computations. Most used functions are: To use these functions in R directly from this page (copy and paste this line into R): source("http://research.stowers-institute.org/efg/2005/LombScargle/R/LombScargle.R") or using a local copy: source("LombScargle.R") The following is a brief example using an "Expression" profile at time values that are unevenly spaced (cut and paste these commands into R after running one of the above "source" statements) : unit <<- "hour" # hourly data # Use "random" times for uneven spacing Expression <- cos(2*pi*time/24 + pi/3) # 24 hour period M <- 4*length(time) # often 2 or 4 # Use test frequencies corresponding to # Use Horne & Baliunas' estimate of independent frequencies ComputeAndPlotLombScargle(time, Expression, Output consists of a single figure with four plots showing:
|
cosine48.R | LombScargleLibrary.R | Used for experiments with cosine curves. See below. Examples periodograms from cosine curves with periods of 48 and 24 hours: or copy and paste this line into R: source("http://research.stowers-institute.org/efg/2005/LombScargle/R/cosine48.R") and then par(mfrow=c(2,2)) Output from each DemoLombScargleCosine call results in a single screen with three graphics (the histogram is suppressed):
Alternatively, in R (under Windows) select History | Recording in graphics window and don't bother with the mfrow setting above. You'll see three individual graphics that way using the PageUp/PageDown keys to view them. |
SinglePeriod.R | cosine48.R | Show Lomb-Scargle periodograms for "ideal gene" with single dominant period ranging from 4 to 72 hours with even and uneven time spacing, and with and without noise. |
Nyquist.R | cosine48.R | Show spurious spikes seen in periodogram above Nyquist limit. Run script: source("Nyquist.R") Output is written to file Nyquist.pdf (6 pages) . See A-2 in Numerical Experiments for discussion. |
DutyCycle.R | LombScargleLibrary.R | "Duty cycle" represents fraction of expression cycle that gene is on or off. See TwoPeriods.R. Example of gene on 2/3rds of time and off 1/3 of time: source("DutyCycle.R") Output is a single screen of three graphics (as with cosine48.R). See A-3.2 in Numerical Experiments for discussion. |
TwoPeriods.R |
cosine48.R DutyCycle.R |
Two periodicities can occur from two separate independent periods, or from an asymmetric "duty cycle." Run script: source("TwoPeriods.R") Output is written to file TwoPeriods.pdf (16 pages) and DutyCycle.pdf (4 pages). See A-3 in Numerical Experiments for discussion. |
ThreePeriods.R | cosine48.R | Three periodicities from three separate independent periods. Run script: source("ThreePeriods.R") Output is written to file ThreePeriods.pdf (20 pages). See A-4 in Numerical Experiments for discussion. |
Ndetermination.R | cosine48.R | "N" determination for single gene "p" value for cosine curve using linear least squares curve fit. Run script: source("Ndetermination.R") Output is written to file Ndetermination.pdf (21 pages) with least squares results displayed on the screen. See A-5 in Numerical Experiments for discussion. |
ProcessLombScargle Dataset.R |
R-Mnemonics.R R-efg.R LombScargle.R |
Common code for processing a set of expression profiles, whether from a dataset like from Bozdech, or one from numerical experiments. Normally not used directly. |
NoiseExperiment Setup.R |
ProcessLombScargle DataSet.R |
Code for defining"Noise" and "Mixture" experiments (see below). Normally not used directly. |
NoiseExperiment Setup.R |
Experiment with "ideal gene" and noise: cosine curve plus known amounts of noise. Run scripts in this order: 1) source("Noise-1.R")
2) source("Noise-2.R")
3) source("Noise-3.R")
See A-6 in Numerical Experiments for discussion. |
|
Mixture-1.R Mixture-2.R |
NoiseExperiment Setup.R |
Histograms of "p" values for various mixtures. Run scripts in this order: 1) source("Mixture-1.R")
2. source("Mixture-2.R")
See A-7 in Numerical Experiments for discussion. |
FindExtremes.R | R-efg.R R-Mnemonics.R |
Find extreme values in raw Bozdech dataset and plot charts of "extreme" values. |
Bozdech03.R | ProcessLombScargle File: Complete_Dataset.csv |
Routines to process Bozdech 2003 Plasmodium dataset. Process complete dataset and create charts shown on Exploratory analysis page, sections B-1, B-3, B-4, B-5, B-6, and Results page, sections C-1 and C-2, (note the chart in B-7 is only created if certain Bioconductor packages are available.): source("Bozdech03.R") ProcessAll creates a "Complete" directory with three CSV files (Dominant.CSV, Periodogram.CSV, pValue.CSV) and 8 PDF files with figures for B-1 through B-7. Analyze a single probe at any time: source("Bozdech03.R") ShowSingleProbeProfile(d, 2000, "Row 2000") Any filename can be specified with ProcessAll or LoadPlasmodiumData but the default filename is Complete_Dataset.csv. |
Analyze1.R | R-efg.R File: Complete_Dataset.csv |
Post-process data created by ProcessAll function above. Create figures C-3 ("p" histogram), C-4 (period histogram) and C-5 (multiple testing methods), and file Complete/Dominant-p-adjust.CSV Run script: The script will prompt for a filename: Complete/Dominant.csv. This code should execute in both R 2.0.1 or R 2.1.0 or later. NOTE: Results will vary slightly by version of R. The output file Target.txt will have 4358 probes if run in R 2.0.1 or earlier and 4355 probes if run in R 2.1.0 or later. Output files: |
StatsAndSets.R | Files: Complete_Dataset.csv, QC_Dataset.csv, Overview_Dataset.csv, Target.txt |
Perform a variety of set operations to get the cardinality of the various sets in the Venn diagram (Fig 5. in the paper). Compare Lomb-Scargle "p" value to Bozdech's "Max Power" heuristic, as well as phase shifts. See Results, Sections C-6a - C-6d. Run script: source("StatsAndSets.R") Output files include: |
Analyze2.R | Bozdech03.R Files: |
Create "Small N" and "Long Period" charts in PDF files shown in C-7. Run script: source("Analyze2.R") Output files: SetLongPeriod.pdf and SetSmallN.pdf |
Analyze3.R | Bozdech03.R Files: |
Create Periodogram Map (C-8) and Phaseogram (C-9) in Results. Run script: source("Analyze3.R") Output files:
|
loess-imputation.R |
- | Demonstrate that loess could be used to impute pseudodata on large gap. Imputing data points for such a wide gap would usually not be considered appropriate. Run script: source("loess-imputation.R") Output: loess-imputation.pdf |
MissingTests.R | - | Process "Complete" Bozdech dataset for an ever-wider, hypothetical "gap" in the data. Run script: source("MissingTests.R") This script calls the ProcessAll() function, which was described above, and takes considerable time -- about an 1-2 hours, depending on processor speed. [Each ProcessAll() could be run in parallel under Linux for a good speedup.] Several subdirectories are created: Missing7, Missing11, Missing15, Missing19, Missing25. These directories contain the same output as in the "Complete" directory when ProcessAll() is run. Run this script to determine the number of significant periodic probes: The script will prompt for a filename: <directory>/Dominant.csv. Here is a output from Analyze1 for the MissingTests output.. |
Getting Started With Your Own Data |
||
SetupLombScargle Analyze |
- | See Step-by-Step Instructions: How to Apply Lomb-Scargle Analysis to Your Set of Expression Data. source("SetupLombScargleExampleSet.R") |
For paper and web site |
||
CreateFigures.R | cosine48.R File: Complete_Dataset.csv |
Create the computer-generated figures used in the paper. Run script: source("CreateFigures.R") Output files:
Figure1.pdf Figure2.pdf Figure3.pdf Figure4.pdf |
Logo.R | - | Create "logo" of two out-of-phase sine curves with uneven time-spacing representing periodic genes where red is "up regulated" and green is "down regulated." Run this script to reproduce the logo: source("http://research.stowers-institute.org/efg/2005/LombScargle/R/Logo.R") |
imputation-GIF.R | - | Use R to create 8 graphs as .png files to be frames in an animated GIF of missing data points. If on Linux, ImageMagick's convert is used to create the animated GIF from the PNGs. See "Missing Value Tolerance" (C-10). Run script: source("imputation-GIF.R") Output: missing*.png files and missing.gif |
|
Updated
22 Sept 2005