Getting Started¶
Installation¶
qcf
can be installed by cloning the code from GitHub using the following steps:
git clone https://github.com/pblischak/QCF.git # 1. Clone the repo from GitHub
cd QCF/ # 2. cd into the QCF/ folder
make # 3. compile the qcf executable
make test # 4. test that the executable works
sudo make install # 5. copy executable to /usr/local/bin
The sudo make install
step will also cp all the files in the scripts/
folder
to /usr/local/bin
.
Stable versions of QCF are also available on the Releases page.
Input Files¶
Phylip Files¶
Sequence data for each gene should be in its own file in Phylip format. The setup should be the same as if you were planning to run RAxML on each gene individually.
Example:
16 500
sp1_1 AGTACAAGGTAGACAGTAGACG...
sp1_2 AGTACAAGGTAGACAGTAGACG...
sp2_1 AGTACAAGGTAGACAGTAGACG...
.
.
.
spN_3 AGTACAAGGTAGACAGTAGACG...
Gene List File¶
The gene list file is a simple text file that has the name of each Phylip file that is to be included in an analysis on its own line.
Example:
gene1.phy
gene2.phy
gene3.phy
.
.
.
geneL.phy
If this file and the gene sequence files are not in the same directory, then
you can add the relevant path information to the Phylip files here so that
the program can still find them (e.g., path/to/geneL.phy
).
Map File¶
The mapping file maps haplotypes to sampled taxa.
The easiest way to do this is to sequentially number the haplotypes
for each gene (e.g., SpeciesName_1, SpeciesName_2, etc.).
Genes are treated as independent, so they can reuse the same
haplotype names. Also, not all genes need to have all haplotypes.
For each taxon, start with its name, followed by a colon (:
), then the
names of the haplotypes that are present in the Phylip files containing the
sequence data, each separated by a comma (,
). There should be no spaces.
This format is the same as the one used by
ASTRAL.
Example:
sp1:sp1_1,sp1_2,sp1_3
sp2:sp2_1,sp2_2
.
.
.
spN:spN_1,spN_2,spN_3,spN_4
Output Files¶
qcf
by default will produce an output file that contains the estimated quartet
concordance factors in a file called out-qcf.CFs.csv
. If you also print the raw
quartet scores, then the program will write another file called out-raw.csv
. This
file contains all of the raw scores for all haplotypes for each species quartet
(it is not intended to be human readable). The out-raw.csv
file is what can be
passed to the qcf_boot.py
Python script to conduct bootstrap resampling for confidence
interval estimation.