Skip to contents

A function that takes paths to an eigenstrat trio (ind, snp and geno file) and returns the pairwise mismatch rate for all pairs on a thinned set of SNPs. Options include choosing thinning parameter, subsetting by population names, and filtering out SNPs for which deamination is possible.

Usage

processEigenstrat(
  indfile,
  genofile,
  snpfile,
  filter_length = NULL,
  pop_pattern = NULL,
  filter_deam = FALSE,
  outfile = NULL,
  chromosomes = NULL,
  verbose = TRUE
)

Arguments

indfile

path to eigenstrat ind file

genofile

path to eigenstrat geno file.

snpfile

path to eigenstrat snp file.

filter_length

the minimum distance between sites to be compared (to reduce the effect of LD).

pop_pattern

a character vector of population names to filter the ind file if only some populations are to compared.

filter_deam

a TRUE/FALSE for if C->T and G->A sites should be ignored.

outfile

(OPTIONAL) a path and filename to which we can save the output of the function as a TSV, if NULL, no back up saved. If no outfile, then a tibble is returned.

chromosomes

the chromosome to filter the data on.

verbose

controls printing of messages to console

Value

out_tibble: A tibble containing four columns:

Examples

# Use internal files to the package as an example
indfile <- system.file("extdata", "example.ind.txt", package = "BREADR")
genofile <- system.file("extdata", "example.geno.txt", package = "BREADR")
snpfile <- system.file("extdata", "example.snp.txt", package = "BREADR")
processEigenstrat(
indfile, genofile, snpfile,
filter_length=1e5,
pop_pattern=NULL,
filter_deam=FALSE
)
#> Reading in SNP data.
#> Analysing chromosomes:
#> 1
#> Starting to read in genotype data.
#> 
  |                                                                            
  |                                                                      |   0%
  |                                                                            
  |==============                                                        |  20%
  |                                                                            
  |============================                                          |  40%
  |                                                                            
  |==========================================                            |  60%
  |                                                                            
  |========================================================              |  80%
  |                                                                            
  |======================================================================| 100%  Complete.
#> 
#> Starting to compare genotypes and calculate PMR.
#> 
  |                                                                            
  |                                                                      |   0%
  |                                                                            
  |=====                                                                 |   7%
  |                                                                            
  |==========                                                            |  14%
  |                                                                            
  |===============                                                       |  21%
  |                                                                            
  |====================                                                  |  29%
  |                                                                            
  |=========================                                             |  36%
  |                                                                            
  |==============================                                        |  43%
  |                                                                            
  |===================================                                   |  50%
  |                                                                            
  |========================================                              |  57%
  |                                                                            
  |=============================================                         |  64%
  |                                                                            
  |==================================================                    |  71%
  |                                                                            
  |=======================================================               |  79%
  |                                                                            
  |============================================================          |  86%
  |                                                                            
  |=================================================================     |  93%
  |                                                                            
  |======================================================================| 100%
#> Complete.
#> 
#> # A tibble: 15 × 4
#>    pair        nsnps mismatch    pmr
#>    <chr>       <dbl>    <dbl>  <dbl>
#>  1 Ind1 - Ind2     2        1  0.5  
#>  2 Ind1 - Ind3    14        2  0.143
#>  3 Ind1 - Ind4    14        3  0.214
#>  4 Ind1 - Ind5     1        0  0    
#>  5 Ind1 - Ind6     4        0  0    
#>  6 Ind2 - Ind3    13        3  0.231
#>  7 Ind2 - Ind4    11        2  0.182
#>  8 Ind2 - Ind5     0        0 NA    
#>  9 Ind2 - Ind6     4        2  0.5  
#> 10 Ind3 - Ind4    35        5  0.143
#> 11 Ind3 - Ind5    10        0  0    
#> 12 Ind3 - Ind6    19        2  0.105
#> 13 Ind4 - Ind5    10        1  0.1  
#> 14 Ind4 - Ind6    20        7  0.35 
#> 15 Ind5 - Ind6     2        1  0.5