Documentation

SAMPLER v0.1 -- 

Copyright (c) 2012 Dario Ghersi


Thank you for your interest in SAMPLER. More information about the  methodology can be found in Ghersi D. and Singh M., "Disentangling Function from Topology to Infer the Network Properties of Disease Genes", submitted.

If you use the program, plese cite the paper.

The program is free software distributed under the GNU Public License (GPL)

(see http://www.gnu.org/copyleft/gpl.html).




1. INSTALLING THE PROGRAM


SAMPLER has been tested on Linux and Mac platforms, but it should compile virtually everywhere. Executable files for i386 Macintosh and PC/Linux platforms are provided in the bin directory.

The program depends on the GNU Scientific Library, which must be installed first if one wants to compile the program. In Ubuntu Linux, it should suffice to install the following two packages:

- libgsl0-dev
- libgsl0ldbl

Alternatively, the library is available at http://www.gnu.org/software/gsl.


To compile SAMPLER:

> gcc -O2 -o sampler sampler.c -lgsl -lgslcblas -lm




2. INPUT


Example of use:

> sampler -d target_dist.txt -a annotations.txt -g control_genes.txt -s 620 -m 1E5 -t 1 -n 100 -o samples.txt [-c exp_coefficient]


The program requires the following input parameters:


target_dist.txt

A text file with the target distribution, organized as follows:


GO:0022613 3

GO:0007268 30

GO:0000165 13

...


with the first field (MAX 20 characters) specifying the name of the GO term and the second field containing its desidered frequency in the samples.


annotations.txt

A text file with the annotations for the genes, arranged as follows:


ENSG00000137574 GO:0022613 GO:0000398 GO:0034660

ENSG00000161970 GO:0022613 GO:0034660 GO:0048610 GO:0019058

ENSG00000138442 GO:0022613 GO:0034660 GO:0008283

...


with the first field containing the gene identifier, and the remaning fields containing the annotations.


control_genes.txt

A text file with the genes to sample from.


size_sample

The desired sample size.


max_iter

The maximum number of iterations (scientific notation is allowed, e.g. 1E5 steps).


tolerance

The tolerated squared euclidean distance between the distribution in the sample and the target distribution.


num_samples

The desidered number of samples.


output

The name of the output file where the samples will be stored.


 exp_coefficient

This is an optional parameter. If supplied, the sampling algorithm uses to a Metropolis-Hastings criterion to accept the moves. The 'exp_coeff' parameter controls how likely it is that a "bad" move will be accepted (the larger the value, the lower the probability). Please consult the Supplementary Material of the paper for more information.




OUTPUT


The program creates a text file ('samples.txt') where each line contains the functionally constrained samples. Further, the program writes to standard output the final squared euclidean distance between the distribution in the sample and the target distribution, and the number of steps.


© Dario Ghersi 2012