Title: | FISH Based Normalization and Copy Number Inference of SNP Microarray Data |
---|---|
Description: | Normalizes the data from a file containing the raw values of the SNP probes of microarray data by using the FISH probes and their corresponding copy number. |
Authors: | Adrian Andronache <[email protected]>, Luca Agnelli <[email protected]> |
Maintainer: | Luca Agnelli <[email protected]> |
License: | GPL (>= 2) |
Version: | 1.5.2 |
Built: | 2024-11-04 05:34:16 UTC |
Source: | https://github.com/cran/FBN |
Normalizes the data from a file containing the raw values of the SNP probes of microarray data by using the FISH probes and their corresponding CNs.
Package: | FBN |
Type: | Package |
Version: | 1.5.1 |
Date: | 2012-03-26 |
License: | GPL (>=2) |
LazyLoad: | yes |
To start using the FBN package, call library(FBN)
.
Adrian Andronache [email protected]
Luca Agnelli [email protected]
Agnelli L et al. (2009), "A SNP Microarray and FISH-Based Procedure to Detect Allelic Imbalances in Multiple Myeloma: an Integrated Genomics Approach Reveals a Wide Gene Dosage Effect", Genes Chrom Cancer
Finds at most 6 of the local maxima of the histogram of the inputData
representing the SNP microarray data.
Firstly, the function finds all local maxima onto the histogram,
and finally removes those that are closer than minSpan
.
The histogram is estimated with equi-spaced breaks (also the default),
defined by breaksData
(see the documentation of hist
for more details).
FBN.histogramMaxima(inputData, minSpan, breaksData) FBN.histogramMaxima(inputData, minSpan = 0.2, breaksData = NULL)
FBN.histogramMaxima(inputData, minSpan, breaksData) FBN.histogramMaxima(inputData, minSpan = 0.2, breaksData = NULL)
inputData |
A vector of SNP microarray values for which the local maxima of the histogram are desired |
minSpan |
The minimum distance separating consecutive local maxima. If |
breaksData |
One of:
|
This function has been designed based on SNP microarray data and FISH resolution. As in FISH analyses high numbers of signals do not allow a correct identification of discrete CNs, an empirical number of no more than 6 local maxima is therefore imposed. These maxima are used to initialize the k-means algorithm to determine the CN clusters.
Returns a vector containing at most 6 values of the inputData
in which the histogram shows local maxima.
Adrian Andronache [email protected]
Luca Agnelli [email protected]
require(stats) require(graphics) x = c(rnorm(1000, 1, .2), rnorm(1000, 2, .2)) y = FBN.histogramMaxima(x, minSpan = .1) h = hist(x) par(new = TRUE) plot(y,vector(mode=mode(y), length = length(y)), xlim = c(min(h$breaks), max(h$breaks)), ylim = c(0,max(h$counts)), xlab = NA, ylab = NA, col = 'red' ) x = c(1, 2, 2, 3, 4, 5, 5, 5, 6, 7, 8, 9, 10, 10, 10, 11) y = FBN.histogramMaxima(x, minSpan = 3, breaksData = 100) h = hist(x, 100) par(new = TRUE) plot(y,vector(mode=mode(y), length = length(y)), xlim = c(min(h$breaks), max(h$breaks)), ylim = c(0,max(h$counts)), xlab = NA, ylab = NA, col = 'red' )
require(stats) require(graphics) x = c(rnorm(1000, 1, .2), rnorm(1000, 2, .2)) y = FBN.histogramMaxima(x, minSpan = .1) h = hist(x) par(new = TRUE) plot(y,vector(mode=mode(y), length = length(y)), xlim = c(min(h$breaks), max(h$breaks)), ylim = c(0,max(h$counts)), xlab = NA, ylab = NA, col = 'red' ) x = c(1, 2, 2, 3, 4, 5, 5, 5, 6, 7, 8, 9, 10, 10, 10, 11) y = FBN.histogramMaxima(x, minSpan = 3, breaksData = 100) h = hist(x, 100) par(new = TRUE) plot(y,vector(mode=mode(y), length = length(y)), xlim = c(min(h$breaks), max(h$breaks)), ylim = c(0,max(h$counts)), xlab = NA, ylab = NA, col = 'red' )
Performs a k-means clustering of SNP microarray data. Returns clusters of values as being putatively characterized by different CN.
FBN.kmeans(inputData, minSpan, breaksData) FBN.kmeans(inputData = NULL, minSpan = 0.2, breaksData = NULL)
FBN.kmeans(inputData, minSpan, breaksData) FBN.kmeans(inputData = NULL, minSpan = 0.2, breaksData = NULL)
inputData |
A vector of values containig the SNP microarray data |
minSpan |
The minimum distance separating consecutive local maxima that are to be detected on
the histogram of the |
breaksData |
One of:
|
This fuction takes as input the vector of raw SNP microarray values, and perform a k-means clustering
trying to identify the groups of raw values characterized by different CNs. The clustering process is initialized
with the local maxima detected on the histogram of the input data (see the documentation of FBN.histogramMaxima
).
To increase the robustness of the clustering process and to remove possible small or noisy clusters,
a double filtering is done: firstly, removing those clusters populated by less than 1% values from the entire inputData
, and then, due to putative noisy histograms, merging those clusters whose centers are closer than 0.2 in nominal values.
An object of class kmeans
Adrian Andronache [email protected]
Luca Agnelli [email protected]
require(stats) require(graphics) x = c(rnorm(1000, 1, .2), rnorm(1000, 2, .2)) y = FBN.kmeans(x, minSpan = .001) h = hist(x) par(new = TRUE) plot(y$centers,vector(mode=mode(y$centers), length = length(y$centers)), xlim = c(min(h$breaks), max(h$breaks)), ylim = c(0,max(h$counts)), xlab = NA, ylab = NA, col = 'red' )
require(stats) require(graphics) x = c(rnorm(1000, 1, .2), rnorm(1000, 2, .2)) y = FBN.kmeans(x, minSpan = .001) h = hist(x) par(new = TRUE) plot(y$centers,vector(mode=mode(y$centers), length = length(y$centers)), xlim = c(min(h$breaks), max(h$breaks)), ylim = c(0,max(h$counts)), xlab = NA, ylab = NA, col = 'red' )
Normalization of the raw SNP microarray values, by multiplication (on linear scale) or addition (in log scale)
of all the raw SNP values with the normalization factor.
The normalization factor is estimated such that it brings the normalizingValue
of the raw
SNP values onto the nominalValueCN
.
FBN.valueCenter(inputData, normalizingValue, nominalValueCN, logScale) FBN.valueCenter(inputData = NULL, normalizingValue = NULL, nominalValueCN = 2, logScale = FALSE)
FBN.valueCenter(inputData, normalizingValue, nominalValueCN, logScale) FBN.valueCenter(inputData = NULL, normalizingValue = NULL, nominalValueCN = 2, logScale = FALSE)
inputData |
The vector of raw SNP values, as they come out from, e.g. Circular Binary Segmentation in |
normalizingValue |
The value representing the center of the cluster identified as having a certain CN |
nominalValueCN |
The nominal value representing a certain CN on which the |
logScale |
A logical value, specifying wether the data is on linear ( |
The nominalValueCN
is a real value representing the CN, e.g. CN has a
nominalValueCN
of 2,
but all other CN (
!= 2) will have a
nominalValueCN
different from .
Such
nominalValueCN
is identified by the FBN.kmeans
function.
Returns a vector containing the normalized values of the inputData
Adrian Andronache [email protected]
Luca Agnelli [email protected]
require(stats) require(graphics) x = c(rnorm(1000, 1, .1), rnorm(1000, 1.5, .1)) y = FBN.valueCenter(x, normalizingValue = 1, nominalValueCN = 2, logScale = FALSE) par(mfrow = c(2, 1), new = FALSE) h = hist(x) par(new = TRUE) plot(1, 0, col = 'red', xlim = c(min(h$breaks), max(h$breaks)), ylim = c(0,max(h$counts)), xlab = NA, ylab = NA) par(new = FALSE) h = hist(y) par(new = TRUE) plot(2, 0, col = 'red', xlim = c(min(h$breaks), max(h$breaks)), ylim = c(0,max(h$counts)), xlab = NA, ylab = NA)
require(stats) require(graphics) x = c(rnorm(1000, 1, .1), rnorm(1000, 1.5, .1)) y = FBN.valueCenter(x, normalizingValue = 1, nominalValueCN = 2, logScale = FALSE) par(mfrow = c(2, 1), new = FALSE) h = hist(x) par(new = TRUE) plot(1, 0, col = 'red', xlim = c(min(h$breaks), max(h$breaks)), ylim = c(0,max(h$counts)), xlab = NA, ylab = NA) par(new = FALSE) h = hist(y) par(new = TRUE) plot(2, 0, col = 'red', xlim = c(min(h$breaks), max(h$breaks)), ylim = c(0,max(h$counts)), xlab = NA, ylab = NA)
Normalizes the data from a file containing the raw values of the SNP probes of microarrray data by using the FISH probes and their corresponding CNs.
FBNormalization(rawDataFileName, fishProbesFileName, normDataFileName, debugFlag, plotFlag, plotAndSaveFlag) FBNormalization(rawDataFileName = NULL, fishProbesFileName = NULL, normDataFileName = NULL, debugFlag = FALSE, plotFlag = FALSE, plotAndSaveFlag = FALSE)
FBNormalization(rawDataFileName, fishProbesFileName, normDataFileName, debugFlag, plotFlag, plotAndSaveFlag) FBNormalization(rawDataFileName = NULL, fishProbesFileName = NULL, normDataFileName = NULL, debugFlag = FALSE, plotFlag = FALSE, plotAndSaveFlag = FALSE)
rawDataFileName |
The file containig the raw values of the SNP probes. It should be .txt format, tab delimited, containing on each column the SNP probes data of the different samples. The structure of the input file should be as follows:
|
fishProbesFileName |
The file containing the FISH probes information. It should be .txt format, tab delimited, containing a separate row for each available FISH probe and on each column the various CN data revealed by FISH of the different samples. The structure of the input file should be as follows:
|
normDataFileName |
The name of the output file where the normalized data will be written. Also, the function writes
a file starting with the same name as |
debugFlag |
Logical value, specifying wether the function should run in debug mode. If |
plotFlag |
Logical value, specifying wether the function should plot the histograms of the individual data. If |
plotAndSaveFlag |
Logical value, specifying wether the function should plot and save the various plots of the histograms. |
For further detalis, see Supporting Information in Agnelli L et al. (2009), "A SNP Microarray and FISH-Based Procedure to Detect Allelic Imbalances in Multiple Myeloma: an Integrated Genomics Approach Reveals a Wide Gene Dosage Effect", Genes Chrom Cancer
No value is returned. The function writes on the disk the output files as previously described.
Adrian Andronache [email protected]
Luca Agnelli [email protected]
## set path to FBN package data directory rawDataFileName = './../data/hmcls.txt' fishProbesFileName = './../data/FISHprobes.txt' normDataFileName= 'hmcls_NORM.txt' FBNormalization(rawDataFileName, fishProbesFileName, normDataFileName, debugFlag = FALSE)
## set path to FBN package data directory rawDataFileName = './../data/hmcls.txt' fishProbesFileName = './../data/FISHprobes.txt' normDataFileName= 'hmcls_NORM.txt' FBNormalization(rawDataFileName, fishProbesFileName, normDataFileName, debugFlag = FALSE)
Four FISH probes information for the two sample Human Myeloma Cell Lines (AMO1 and NCI-H929).
tab delimited .txt file
Agnelli L et al. (2009), "A SNP Microarray and FISH-Based Procedure to Detect Allelic Imbalances in Multiple Myeloma: an Integrated Genomics Approach Reveals a Wide Gene Dosage Effect", Genes Chrom Cancer
Two Human Myeloma Cell Lines (AMO1 and NCI-H929) profiled on GeneChip Human Mapping 250K NspI arrays.
image data .rda
Agnelli L et al. (2009), "A SNP Microarray and FISH-Based Procedure to Detect Allelic Imbalances in Multiple Myeloma: an Integrated Genomics Approach Reveals a Wide Gene Dosage Effect", Genes Chrom Cancer
1-dimensional mean filter with a specified windowSize
of the inputData
meanFilter(inputData, windowSize) meanFilter(inputData = NULL, windowSize = 3)
meanFilter(inputData, windowSize) meanFilter(inputData = NULL, windowSize = 3)
inputData |
The vector of values to be filtered |
windowSize |
The half-size of the filtering window (default |
Classical implementation of a mean filter, using a sliding window. By default, the half-size of the sliding window is set to 3 unless otherwise specified.
The output data has the same size of the input data. If the window half-size is smaller or equal to 1, then the input data is passed directly to the output data.
Adrian Andronache [email protected]
Luca Agnelli [email protected]
x <- meanFilter(c(0, 0, 0, 1, 1, 1, 0, 0, 1, 0)) x <- meanFilter(c(0, 0, 0, 1, 0, 0, 0, 0, 1, 0), windowSize = 5)
x <- meanFilter(c(0, 0, 0, 1, 1, 1, 0, 0, 1, 0)) x <- meanFilter(c(0, 0, 0, 1, 0, 0, 0, 0, 1, 0), windowSize = 5)
1-dimensinal median filter with a specified windowSize
of the inputData
medianFilter(inputData, windowSize) medianFilter(inputData = NULL, windowSize = 3)
medianFilter(inputData, windowSize) medianFilter(inputData = NULL, windowSize = 3)
inputData |
The vector of values to be filtered |
windowSize |
The half-size of the filtering window (default |
Classical implementation of a median filter, using a sliding window. By default, the half-size of the sliding window is set to 3 unless otherwise specified.
The output data has the same size of the input data. If the window half-size is smaller or equal to 1, then the input data is passed directly to the output data.
Adrian Andronache [email protected]
Luca Agnelli [email protected]
x <- medianFilter(c(0, 0, 0, 1, 1, 1, 0, 0, 1, 0)) x <- medianFilter(c(0, 0, 0, 1, 0, 0, 0, 0, 1, 0), windowSize = 5)
x <- medianFilter(c(0, 0, 0, 1, 1, 1, 0, 0, 1, 0)) x <- medianFilter(c(0, 0, 0, 1, 0, 0, 0, 0, 1, 0), windowSize = 5)