Using BioTile

Binding Intensity Only Tile array analysis or "BioTile" is an algorithm written in perl designed for the identification of differentially enriched regions (DERs) in tiling array data.
BioTile requires both the user's data file and Annotation file to be properly formatted in order to run properly. Please follow the following steps to use BioTile:

Step 1.) Properly format your data file

Data should be tab delimited and adhere to the following column order:

Data Columns:
1.) chromosome
2.) chromosomal coordinate*
3.) ID column
4 and on.) data
Note: * chromosomal coordinates must be in ascending order per chromosome but chromosomes need not be in order

Example:


CHR	START	   UniqueID           case1	      case2	      case3	      case4	      case5            con1	       con2....
chr1	3260443	chr1-3260443	5.756197581	6.073657606	5.072443511	4.566772967	4.887691227	5.361445566	4.908736039
chr1	3260484	chr1-3260484	10.37899959	2.812171903	7.484823045	3.51284835	2.914004916	9.009255134	4.744176783
chr1	3260518	chr1-3260518	8.330929208	5.948986357	7.055280556	7.780053389	8.27809947	7.947379971	7.709037061
chr1	3260549	chr1-3260549	5.321461034	5.410797637	4.457120359	5.090445555	5.276087083	3.718002551	4.473440208
chr1	3260581	chr1-3260581	9.844482722	9.524443421	10.08292572	9.659773611	10.43945409	9.989464018	9.223844286

Step 2.) Properly format your Annotation file

The data structure of the Annotation file must be retained and the order and diagnosis variables must match the number of data columns in the data file

1.)Datafile: Input the name of your datafile. Your datafile should be in the same working directory as BioTile.pl
2.) Outfile: Input the name of the output file where your data will be saved.
3.) Minimum Probes/DMR: Enter the minimum number of probes to be included as a DMR. The minimum is 3, which is recommended to find small
	differences.
4.) Iterations for P value: Enter the number of iterations desired for P value generation. The default and recommended number is 1000.
	Varying this value will largely impact processing time.
5.) Spacing between probes (bp): Enter the spacing between probes in base pairs on the microarray platform used to generate the data.
	This value will ensure no large gaps are included in identified DMRs.
6.) Adjust for Covariates: Specify y/n to adjust for covariates prior to DMR identification and statistical testing
7.) Independent Variable Continous: Specify y/n if the independent variable in the "diag" column is continous and should be tested with
	a linear model. Specifying "y" with binary classifiers will return the same results as "n" but may increase processing time.
8.) Specify ID, independent variable (diag), order, and any optional covariates in tab delimited table format below as per the example:

Example:


##Please Enter Specifics Below
Datafile=	Example_Data_Set.txt
Outfile=	Analysis_Output.txt
Minimum Probes/DMR=	3
Iterations for P value=	1000
Spacing between probes (bp)=	35
Adjust for Covariates=	n
Independent Variable Continuous=	n
ID	diag	order	Covariates 1	Covariates2	Covariates3	Covariates4
1	1	1	1
2	1	2	1
3	1	3	2
4	1	4	2
5	1	5	1
6	2	6	2
7	2	7	1
8	2	8	2
9	2	9	2
10	2	10	1

Step 3.) Run BioTile

From the command line interface, call: perl BioTile.pl


Note: The example dataset provided contains the top 5000 loci and 5 randomly selected cases and control values from the simulated dataset interrogated in the published paper

BioTile will return the following columns:

1.) The chromosome of the identified DMR
2.) The start position of the identified DMR
3.) The end position of the identified DMR
4.) The number of probes in the identified DMR
5.) The mean effect size of the identified DMR *
6.) The maximum effect size within the identified DMR
7.) The position of the probe with the maximum effect size
8.) The p value obtained from permutation testing
9.) The Q statistic of the meta-analysis for the identified DMR. If multiple P values are equal to 0,
	 those with higher Q statistics most likely represent larger differences over longer DMRs.

*Note: The mean effect size will represent values of the independent variable coded as 1 minus that coded as 2.
	 If continous was was specified for the independent variable, the mean effect will represent the mean slope
	 of the linear models across probes at the identified DMR.

Example Output:


Chr	Start   End   #Probes	Mean Effect (1-2)	Max Effect      Max Pos P value	                Q statistic
chr1	3260655	3260722	3	0.206738228866666      0.3090028726	3260655	0.889110889110889	0.307765637851882
chr1	3658926	3658992	3	0.115758363	       0.1658440818	3658926	0.942057942057942	0.116155659174058
chr1	3662743	3662820	3	0.387891732133333      0.8852050682	3662820	0.891108891108891	0.273096675378496
chr1	3662968	3663088	4	0.38327091275	       0.9043733208	3663047	0.137862137862138	6.14582222894925
chr1	3665310	3665418	4	0.50777318565	       1.132457159	3665382	0.378621378621379	3.39035486298906

Note: BioTile will also generate a file called "Estimated Run Time" after a few minutes that will attempt to estimate the time until program completion