Title: | Comparative Analyses of Phylogenetics and Evolution in R |
---|---|
Description: | Functions for performing phylogenetic comparative analyses. |
Authors: | David Orme [aut, cre] , Rob Freckleton [aut], Gavin Thomas [aut], Thomas Petzoldt [aut], Susanne Fritz [aut], Nick Isaac [aut], Will Pearse [aut] |
Maintainer: | David Orme <[email protected]> |
License: | GPL (>= 2) |
Version: | 1.0.3 |
Built: | 2024-10-16 06:18:24 UTC |
Source: | https://github.com/davidorme/caper |
The caper package provides a set of functions to conduct comparative analyses using both independent contrasts and phylogenetic generalised least squares methods. It also provides functions to combine phylogeny objects with data frames into simple comparative datasets and some utility functions for manipulating phylogenies.
In addition to linear models correcting for phylogeny, the package also provides functions to test tree imbalance, calculate various measures of phylogenetic diversity and simulate phylogenies and traits.
David Orme Maintainer: David Orme <[email protected]>
These functions provide ANOVA tables and model comparison using ANOVA and AIC, along with standard model diagnostic plots and accessor functions for phylogenetic independent contrast objects.
## S3 method for class 'caic' anova(object, ...) ## S3 method for class 'caiclist' anova(object, ..., scale=0, test='F') ## S3 method for class 'caic' logLik(object, ...) ## S3 method for class 'caic' predict(object, ...) ## S3 method for class 'caic' residuals(object, ...) ## S3 method for class 'caic' coef(object, ...) ## S3 method for class 'caic' plot(x, ...)
## S3 method for class 'caic' anova(object, ...) ## S3 method for class 'caiclist' anova(object, ..., scale=0, test='F') ## S3 method for class 'caic' logLik(object, ...) ## S3 method for class 'caic' predict(object, ...) ## S3 method for class 'caic' residuals(object, ...) ## S3 method for class 'caic' coef(object, ...) ## S3 method for class 'caic' plot(x, ...)
object |
An object of class 'caic'. |
scale |
A character string specifying the test statistic to be used. Can be one of "F", "Chisq" or "Cp", with partial matching allowed, or NULL for no test. |
test |
numeric. An estimate of the noise variance sigma^2. If zero this will be estimated from the largest model considered. |
x |
An object of class 'caic'. |
... |
Further argument to be passed to methods. |
The 'anova' method provides access to single anova tables for a model and to comparison of lists of models. The 'logLik' method provides access to the log likelihood of the 'caic' model and hence to AIC comparison of models.
The 'plot' method uses the standard set of model diagnostic plots for linear models. It is also wise to check the evolutionary assumptions of independent contrast models using the 'caic' specific diagnostic plots. The 'predict' and 'residuals' functions provide access to these parts of the 'caic' object.
David Orme
crunch
, brunch
,macrocaic
,caic.diagnostics
data(shorebird) shorebird.data$lgEgg.Mass <- log(shorebird.data$Egg.Mass) shorebird.data$lgM.Mass <- log(shorebird.data$M.Mass) shorebird.data$lgF.Mass <- log(shorebird.data$F.Mass) shorebird <- comparative.data(shorebird.tree, shorebird.data, Species) cMod1 <- crunch(lgEgg.Mass ~ lgM.Mass * lgF.Mass, data=shorebird) cMod2 <- crunch(lgEgg.Mass ~ lgM.Mass + lgF.Mass, data=shorebird) cMod3 <- crunch(lgEgg.Mass ~ lgM.Mass , data=shorebird) anova(cMod1, cMod2, cMod3) AIC(cMod1, cMod2, cMod3) plot(cMod3)
data(shorebird) shorebird.data$lgEgg.Mass <- log(shorebird.data$Egg.Mass) shorebird.data$lgM.Mass <- log(shorebird.data$M.Mass) shorebird.data$lgF.Mass <- log(shorebird.data$F.Mass) shorebird <- comparative.data(shorebird.tree, shorebird.data, Species) cMod1 <- crunch(lgEgg.Mass ~ lgM.Mass * lgF.Mass, data=shorebird) cMod2 <- crunch(lgEgg.Mass ~ lgM.Mass + lgF.Mass, data=shorebird) cMod3 <- crunch(lgEgg.Mass ~ lgM.Mass , data=shorebird) anova(cMod1, cMod2, cMod3) AIC(cMod1, cMod2, cMod3) plot(cMod3)
The 'anova' function creates ANOVA tables for a 'pgls' models using sequential sums of squares.
## S3 method for class 'pgls' anova(object, ...) ## S3 method for class 'pglslist' anova(object, ..., scale = 0, test = "F")
## S3 method for class 'pgls' anova(object, ...) ## S3 method for class 'pglslist' anova(object, ..., scale = 0, test = "F")
object |
A 'pgls' model object. |
... |
Additional 'pgls' models. |
scale |
A character string specifying the test statistic to be used. Can be one of "F", "Chisq" or "Cp", with partial matching allowed, or NULL for no test. |
test |
numeric. An estimate of the noise variance sigma^2. If zero this will be estimated from the largest model considered. |
The sequential sums of squares are calculated by refitting the model in the order of the terms of the formula and so can take a little time to calculate. Branch length transformations are held at the values of the initial object. The 'logLik.pgls' provides a simple accessor function that allows the use of AIC model comparisons. Note that the generic AIC methods do no checking to ensure that sensible models are being compared.
A table of class 'anova' and 'data.frame' that employs the generic plot methods for 'anova' tables.
The functions build heavily on the generic methods 'anova.lm' and 'anova.lmlist'.
Rob Freckleton, David Orme
data(shorebird) shorebird <- comparative.data(shorebird.tree, shorebird.data, Species, vcv=TRUE, vcv.dim=3) mod1 <- pgls(log(Egg.Mass) ~ log(M.Mass) * log(F.Mass), shorebird) anova(mod1) mod2 <- pgls(log(Egg.Mass) ~ log(M.Mass) + log(F.Mass), shorebird) mod3 <- pgls(log(Egg.Mass) ~ log(M.Mass) , shorebird) mod4 <- pgls(log(Egg.Mass) ~ 1, shorebird) anova(mod1, mod2, mod3, mod4) AIC(mod1, mod2, mod3, mod4)
data(shorebird) shorebird <- comparative.data(shorebird.tree, shorebird.data, Species, vcv=TRUE, vcv.dim=3) mod1 <- pgls(log(Egg.Mass) ~ log(M.Mass) * log(F.Mass), shorebird) anova(mod1) mod2 <- pgls(log(Egg.Mass) ~ log(M.Mass) + log(F.Mass), shorebird) mod3 <- pgls(log(Egg.Mass) ~ log(M.Mass) , shorebird) mod4 <- pgls(log(Egg.Mass) ~ 1, shorebird) anova(mod1, mod2, mod3, mod4) AIC(mod1, mod2, mod3, mod4)
The dataset contains a molecular phylogeny of 249 species of British birds and a data frame containing information on the conservation status of 181 of those species. The dataset is taken from Thomas (2008)
data(BritishBirds)
data(BritishBirds)
The data frame contains 26 variables:
binomial common_name Red_list Amber_list Green_list Red_amber_list IUCN Red_HD Amber_HD Red_amber_HD Red_list_BDp amb_BDp Red_amb_BDp Red_list_BDr Red_list_WDp amb_BDr amb_WDMp amb_spec amb_BR amb_BL amb_WL amb_BI amb_WI Red_amb_BDr pop_size range_size
Thomas, G. H. (2008). Phylogenetic distributions of british birds of conservation concern. Proceedings of the Royal Society B-Biological Sciences, 275(1647):2077-2083.
data(BritishBirds) BritishBirds <- comparative.data(BritishBirds.tree, BritishBirds.data, binomial)
data(BritishBirds) BritishBirds <- comparative.data(BritishBirds.tree, BritishBirds.data, binomial)
Calculate a linear model using the brunch algorithm.
brunch(formula, data, phy, names.col, stand.contr = TRUE, robust = Inf, ref.var=NULL, node.depth=NULL, equal.branch.length=FALSE)
brunch(formula, data, phy, names.col, stand.contr = TRUE, robust = Inf, ref.var=NULL, node.depth=NULL, equal.branch.length=FALSE)
formula |
A model formula. |
data |
An 'comparative.data' object. Alternatively, a data frame. |
phy |
An object of class 'phylo', required when data is not a 'comparative.data' object. |
names.col |
A name specifying the column in 'data' that matches rows to tips in 'phy', required when data is not a 'comparative.data' object. |
stand.contr |
A logical flag indicating whether or not to standardize contrasts |
.
robust |
A threshold value of studentized residuals to exclude from the model. |
ref.var |
A reference variable present in the model that is used to specify the direction of calculation of contrasts. If null, this is assumed to be the first explanatory variable. |
node.depth |
A positive integer greater than 1 used to restrict the model to contrasts with a node depth less than or equal to the specified depth. Tips have a depth of 1. |
equal.branch.length |
If set to 'TRUE' then all branch lengths are set to 2. |
This function implements the 'brunch' algorithm for modelling the relationship between variables that are phylogenetically non-independent. This method was described and previously implemented in the Mac Classic computer programs CAIC, written by Purvis and Rambaut (1995) and updated by Nick Isaac and Paul-Michael Agapow.
The 'brunch' algorithm calculates contrasts for models that include binary categorical variables. Contrasts are identified and calculated for all variables in the model for a set of nodes where each side can be unequivocally attributed to one or other of the categories. Unlike 'crunch', nested contrasts are not calculated and each row of data at the tips is used only once. This follows Burt (1989): contrasts whose paths do not meet or cross at any point will be phylogenetically independent.
Factors with more than two levels are supported but *must* be ordered to allow sensible contrasts to be drawn. In addition, there is no single best compromise set of contrasts with non-binary factors and implementations may differ in the set chosen.
The user provides a comparative dataset. The formula specifies the model to be fitted and contrasts are calculated in those variables. The specified reference variable is used to ensure that contrasts for multivariate models are calculated in a consistent direction at each node. The function brunch
acts as a data preparation wrapper for the function contrCalc
, which is not intended to be directly called by users. Missing data can be present in the explanatory variables: the algorithm makes use of the complete data available at each node as was the case with CAIC.
Polytomies - more detail here The Mac Classic program CAIC used 1 for both 'Brunch' and 'Crunch' analyses and this the default.
A object of class 'caic'.
David Orme
Purvis, A. and Rambaut, A. (1995) Comparative analysis by independent contrasts (CAIC): an Apple Macintosh application for analysing comparative data. Computer Appl. Biosciences 11, 247-251.
Burt, A. (1989). Comparative methods using phylogenetically independent contrasts. Oxford Surveys in Evolutionary Biology, 6:33-53.
caic-class
for 'caic' object structure and methods.
data(perissodactyla) perisso <- comparative.data(perissodactyla.tree, perissodactyla.data, Binomial) brunchMod <- brunch(log.female.wt ~ Territoriality, data=perisso) summary(brunchMod) # plot the contrasts brunchTab <- caic.table(brunchMod) plot(log.female.wt ~ Territoriality, brunchTab) # for the actual model diagnostics par(mfrow=c(3,1)) caic.diagnostics(brunchMod)
data(perissodactyla) perisso <- comparative.data(perissodactyla.tree, perissodactyla.data, Binomial) brunchMod <- brunch(log.female.wt ~ Territoriality, data=perisso) summary(brunchMod) # plot the contrasts brunchTab <- caic.table(brunchMod) plot(log.female.wt ~ Territoriality, brunchTab) # for the actual model diagnostics par(mfrow=c(3,1)) caic.diagnostics(brunchMod)
The functions 'crunch', 'brunch', 'macrocaic' and 'piclm' all return an object containing an independent contrast model. The structure of the object and the available methods are described here.
A 'caic' object is a list containing the following:
A list of the following:
A list containing matrices of the contrasts in the response
variable (contr$
response) and explanatory variables
(contr$
explanatory).
A list containing matrices of the nodal values in the
response variable (contr$
response) and explanatory variables
(contr$
explanatory).
A numeric vector of the expected variance for each contrast.
A vector showing the number of nodes descending from each internal node
A vector showing the maximum number of nodes between each internal node and the tips of the phylogeny (including both the node in question and the tip and hence always >=2)
A logical vector showing which internal nodes on the tree have valid contrasts, given the available data and any user constraints.
A 'comparative.data' object containing the phylogeny used to calculate contrasts and the original data.
An 'lm' object containing a regression model through the origin for the calculated contrast
In addition, the object may have the following attributes:
One of 'crunch', 'brunch' or 'piclm'.
Either 'RRD' or 'PDI' if the response contrasts are calculated as species richness contrasts using macrocaic
A logical value showing whether the contrasts in the object have been standardized.
These functions are a set of diagnostic tools to assess whether a particular contrast model is adequate.
caic.table(caicObj, validNodes=TRUE, nodalValues=FALSE, ultrametric.tol=0.0001, CAIC.codes=FALSE, style="CAIC") caic.diagnostics(caicObj, which.terms=NULL, which.tests=c("NV","SD","AGE"), plot=TRUE, outlier=3, ultrametric.tol=0.0001, plot.signif=plot, alpha=0.05, cex.id=0.7, ...) ## S3 method for class 'caic.diagnostics' print(x, ...) caic.robust(caicObj, robust=3) caic.label(phy, charset=NULL, action="insert", style="CAIC", tips=FALSE)
caic.table(caicObj, validNodes=TRUE, nodalValues=FALSE, ultrametric.tol=0.0001, CAIC.codes=FALSE, style="CAIC") caic.diagnostics(caicObj, which.terms=NULL, which.tests=c("NV","SD","AGE"), plot=TRUE, outlier=3, ultrametric.tol=0.0001, plot.signif=plot, alpha=0.05, cex.id=0.7, ...) ## S3 method for class 'caic.diagnostics' print(x, ...) caic.robust(caicObj, robust=3) caic.label(phy, charset=NULL, action="insert", style="CAIC", tips=FALSE)
caicObj |
An object of class 'caic' |
validNodes |
A logical indicating whether to return only the nodes at which valid contrasts have been drawn or all nodes in the phylogeny. |
nodalValues |
A logical indicating whether to include the estimated nodal values in the contrasts table. |
ultrametric.tol |
A value in branch length units indicating how large a departure from ultrametric path lengths is permitted before calculating node ages is blocked. |
CAIC.codes |
Should CAIC style node labels be added to the contrast table? |
style |
Either 'CAIC', which is the default, or 'RLE' for run-length encoded codes. |
which.terms |
A character vector of the terms in the linear model for which diagnostic checks should be made. The default is to use all terms. |
which.tests |
A character vector of the tests to conduct. Any of 'NV', 'SD' and 'Age' may be selected and the default is to use all three. |
plot |
A logical indicating whether or not to create plots of the diagnostic tests. |
outlier |
A value indicating the size of absolute studentized residuals above which a contrast will be highlighted in diagnostic plots |
robust |
A value indicating the size of absolute studentized residuals above which a contrast will be removed from the contrast model. |
plot.signif |
By default, significant relationships are indicated on the plots by showing the model line. |
alpha |
This value sets the significance at which model predictions are added to the diagnostic plots. |
cex.id |
Adjust the size of labels added to outlier points. |
x |
An object of class 'caic diagnostics' to be printed out |
... |
Generic arguments to |
phy |
A 'phylo' object. |
charset |
A vector of characters to be used to construct CAIC codes. |
action |
One of 'replace', 'append' or 'insert'. |
tips |
A logical indicating whether to modify the tip labels of the phylogeny in addition to the internal node labels. |
The most general function (caic.table
) extracts a data frame from a 'caic' object that contains the contrasts, expected variance ('contrVar'), node depth, number of descendant lineages and studentized residuals. It can also optionally include the nodal values estimated at internal nodes and CAIC style node labels. This data frame is very similar to the output files generated by CAIC and MacroCAIC. By default, only the valid nodes contributing contrasts to the model are shown but other internal nodes can also be included ('validNodes=FALSE').
The caic.diagnostics
function carries out a set of regression tests on the absolute values of contrasts in a model. These include regression against estimates of the nodal values ('NV), standard deviation at each node ('SD') and, where the underlying phylogeny is ultrametric, the log age at each node ('AGE'). Significant regressions indicate problems with the distribution of the contrast values. The user can select which of these plots are to be generated and also which terms in the contrasts model will be plotted. The function can also plot these relationships and show the slope of problematic tests (slope significance < alpha).
Plots from the caic.diagnostics
function will plot outliers in red and label these points. Outliers are identified as points with absolute studentized residuals greater than the cutoff specified in 'outlier'. The function caic.robust
is a simple method to obtain a refitted 'caic' object model from which these outlying contrasts have been excluded.
The caic.label
function provides a method to label nodes (and optionally tips) with CAIC phylogeny codes. These are simple alphabetic sequences that indicate the branching structure of the tree from the root and are unique for each node. With deeply nested trees, these codes can be very long and so the option 'RLE' is provided to reduce the length of codes for plotting onto trees: for example, the code AABAAAC would be converted to 2AB3AC. If action
is 'replace', then the phylogeny tip and node labels are replaced with the CAIC codes. If action
is 'append', then the codes are appended onto the end of existing labels, with the internal node numbers used if no node labels exist. Finally, if action
is 'insert', then a named character vector called 'edge.caic.code' is inserted into the 'phylo' object list.
caic.table
A data frame of contrasts and other nodal values.
caic.diagnostics
An array of slope coefficients for each test (rows) and for each term (depth).
caic.robust
A 'caic' object with a model omitting outliers.
caic.labels
A 'phylo' object with modified labels.
David Orme
data(shorebird) shorebird <- comparative.data(shorebird.tree, shorebird.data, Species) crunchMod <- crunch(log(Egg.Mass) ~ log(M.Mass) + log(F.Mass), data=shorebird) caic.diagnostics(crunchMod)
data(shorebird) shorebird <- comparative.data(shorebird.tree, shorebird.data, Species) crunchMod <- crunch(log(Egg.Mass) ~ log(M.Mass) + log(F.Mass), data=shorebird) caic.diagnostics(crunchMod)
These data files and model objects contain input datasets for benchmarking the caper package against other implementations and the results returned by those other implementations. For more details, see the benchmark vignette for caper.
vignette()
Takes a phylogeny in the 'ape' package format and converts it into a binary matrix showing which tips (matrix columns) subtend from each node in the phylogeny (matrix rows). This is a useful format for quickly calculating branch length information for subsets of the phylogeny.
clade.matrix(phy)
clade.matrix(phy)
phy |
A object of class 'phylo' |
The clade matrix shows the tips from a phylogeny that subtend from each internal and external node. Each tip is represented as column showing the nodes of which it is a member and hence each row shows the tips that are members of a given node. Dropping columns gives a quick and easy way to find out which edges are retained in a particular subset of the tree and this structure is used for quickly calculating branch lengths calculations or clade statistics.
A list of class 'clade.matrix' containing the following components:
clade.matrix |
A binary m x n matrix, where m is the total number of nodes in the phylogeny and n is the number of tips. An element is 1 if tip $n_i$ subtends from a node $m_j$. |
edge.length |
A numeric vector of length m showing the edge length leading to each node in the phylogeny and named with the node number. |
tip.label |
A character vector of length n giving the labels assigned to the tips of the phylogeny. |
edge |
The edge matrix from the original phylogeny. |
David Orme
data(perissodactyla) clade.matrix(perissodactyla.tree)
data(perissodactyla) clade.matrix(perissodactyla.tree)
Obtains a vector of the tips subtending from either one node or all nodes in a phylogeny.
clade.members(x, phy, tip.labels = FALSE, include.nodes=FALSE) clade.members.list(phy, tips = FALSE, tip.labels = FALSE, include.nodes=FALSE)
clade.members(x, phy, tip.labels = FALSE, include.nodes=FALSE) clade.members.list(phy, tips = FALSE, tip.labels = FALSE, include.nodes=FALSE)
x |
A integer identifying the node for which a list of tips is required. |
phy |
An object of class 'phylo'. |
tips |
A logical indicating whether to include external node membership in the list. |
tip.labels |
A logical flag indicating whether to return the node numbers of the tips or their tip labels. |
include.nodes |
A logical flag indicating whether to return the node number of descendent internal nodes |
The function clade.members.list
runs clade.members
over each node in the phylogeny, possibly including the external nodes as indicated by the tips argument, and returns a list of vectors showing the members of the clade defined by each node.
A numeric vector of external node (i.e. tip) numbers or a character vector of tip labels for a single internal node or, for clade.members.list
, a list of such vector for all nodes in the phylogeny. If include.nodes
is TRUE
then clade.members
returns a list of length two containing a vector of the descendent tips and a vector of the descendent internal nodes - clade.members.list
then contains a list of such lists.
David Orme, Lynsey McInnes
data(perissodactyla) # use comparative.data to add node labels perisso <- comparative.data(perissodactyla.tree, perissodactyla.data, Binomial, na.omit=FALSE) plot(perisso$phy, show.node.label=TRUE) clade.members(22, perisso$phy, tip.labels=TRUE) clade.members.list(perisso$phy, tip.labels=FALSE)
data(perissodactyla) # use comparative.data to add node labels perisso <- comparative.data(perissodactyla.tree, perissodactyla.data, Binomial, na.omit=FALSE) plot(perisso$phy, show.node.label=TRUE) clade.members(22, perisso$phy, tip.labels=TRUE) clade.members.list(perisso$phy, tip.labels=FALSE)
A simple tool to combine phylogenies with datasets and ensure consistent structure and ordering for use in functions.
comparative.data(phy, data, names.col, vcv=FALSE, vcv.dim=2, na.omit=TRUE, force.root=FALSE, warn.dropped=FALSE, scope=NULL) ## S3 method for class 'comparative.data' print(x, ...) ## S3 method for class 'comparative.data' na.omit(object, scope=NULL, ...) ## S3 method for class 'comparative.data' subset(x, subset, select, ...) ## S3 method for class 'comparative.data' reorder(x, order, ...) ## S3 method for class 'comparative.data' x[i, j] as.comparative.data(x, ...) caicStyleArgs(phy, data, names.col, warn.dropped=FALSE)
comparative.data(phy, data, names.col, vcv=FALSE, vcv.dim=2, na.omit=TRUE, force.root=FALSE, warn.dropped=FALSE, scope=NULL) ## S3 method for class 'comparative.data' print(x, ...) ## S3 method for class 'comparative.data' na.omit(object, scope=NULL, ...) ## S3 method for class 'comparative.data' subset(x, subset, select, ...) ## S3 method for class 'comparative.data' reorder(x, order, ...) ## S3 method for class 'comparative.data' x[i, j] as.comparative.data(x, ...) caicStyleArgs(phy, data, names.col, warn.dropped=FALSE)
data |
A data frame containing variables that can be attributed to the taxa at the tips of a phylogeny. |
phy |
A phylogeny (class 'phylo') to be matched to the data above. |
names.col |
The name of a column in the provided data frame that will be used to match data rows to phylogeny tips. |
vcv |
A logical value indicating whether to include a variance covariance array representing the phylogeny within the comparative dataset. |
vcv.dim |
Either 2 (a standard VCV matrix) or 3 (an array retaining the individual branches contributing to the standard values). The array form is of use for optimising some branch length transformations. |
na.omit |
A logical value indicating whether to reduce the comparative dataset to those tips for which all selected variables are complete. Note that some functions cannot handle missing data and will return an error. |
force.root |
Many functions consider a basal polytomy to indicate an unrooted tree. Using force.root=TRUE will set an arbitrary root edge below this polytomy. |
warn.dropped |
A logical value indicating whether to warn the user when data or tips are dropped in creating the comparative data object. |
scope |
A model formula, used to indicate which variables to consider when omitting row containing NA values. |
x |
An object of class 'comparative.data'. |
object |
An object of class 'comparative.data'. |
subset |
A logical expression indicating rows of data to keep: missing values are taken as false. |
select |
An expression, indicating columns to select from the data frame. |
order |
One of 'cladewise' or 'pruningwise'. See |
i , j
|
Indices specifying tips or data columns to extract. See details. |
... |
Further arguments to functions. |
The function matches rows in a data frame to tips on a phylogeny and ensures correct ordering of the data with respect to the tips. It also can add a variance covariance representation of the phylogeny. Mismatched rows and tips are removed and the taxon labels of these are stored in the 'dropped' slot of the 'comparative.data' object. The 'print' method displays a brief summary of the dataset contents and the names of the original 'phylo' and 'data.frame' objects. If any rows or tips were dropped, 'print' will also show a venn diagram of the data shared and dropped from each source. Node labels are preserved but must be unique - unlabelled nodes will be assigned numeric codes.
The 'na.omit' and 'subset' methods provide simple ways to clean up and extract parts of the comparative dataset. In particular, 'subset' acts exclusively with the data component of the object and, like subset on a data frame, expects the subset argument to produce a logical vector of data rows to include. The 'reorder' method is use to restructure all the components with the 'comparative.data' object into either pruningwise or cladewise order. This uses code from the 'ape' library: see reorder.phylo
.
The '[' method allows subsets to be taken of the data. There are no replace methods ('[<-'). If only one index is specified (e.g. x[2]), then this is interpreted as extracting data columns from the object. Otherwise (e.g. x[2,], x[1,1]), the first index will specify tips to extract and the second index will specify columns. Indices for tips are permitted to be numeric, logical or character vectors or empty (missing) or NULL. Numeric values are coerced to integer as by as.integer (and hence truncated towards zero). Character vectors will be matched to the names of the object (or for matrices/arrays, the dimnames): see 'Character indices' below for further details.
The function 'caicStyleArgs' handles turning 'phy', 'data' and 'names.col' arguments into a 'comparative.data' object when they are provided separately to a function. This argument structure was used in older versions of many functions.
All of these functions are in part a substitute for the considerably more sophisticated handling of such data in the package 'phylobase', which will be integrated into later releases.
A list of class 'comparative.data':
phy |
An object of class 'phylo' |
data |
A data frame of matched data |
data.name |
The original object name of the data |
phy.name |
The original object name of the phylogeny |
dropped |
A list of taxon names dropped from the dataset:
|
And optionally:
vcv |
A variance covariance matrix of the phylogeny |
vcv.dim |
The dimension of the VCV - 2 for a standard VCV matrix and 3 for an expanded array retaining individual branch lengths |
David Orme
data(shorebird) shorebird <- comparative.data(shorebird.tree, shorebird.data, 'Species') print(shorebird) subset(shorebird, subset=Mat.syst == 'MO') sandpipers <- grep('Calidris', shorebird$phy$tip.label) shorebird[-sandpipers, ] sandpipers <- grep('Calidris', shorebird$phy$tip.label, value=TRUE) shorebird[sandpipers, ] shorebird[] shorebird[,] shorebird[2:3] shorebird[, 2:3] shorebird[1:15, ] shorebird[1:15, 2:3]
data(shorebird) shorebird <- comparative.data(shorebird.tree, shorebird.data, 'Species') print(shorebird) subset(shorebird, subset=Mat.syst == 'MO') sandpipers <- grep('Calidris', shorebird$phy$tip.label) shorebird[-sandpipers, ] sandpipers <- grep('Calidris', shorebird$phy$tip.label, value=TRUE) shorebird[sandpipers, ] shorebird[] shorebird[,] shorebird[2:3] shorebird[, 2:3] shorebird[1:15, ] shorebird[1:15, 2:3]
Calculate a linear model using the crunch algorithm.
crunch(formula, data, phy, names.col, stand.contr = TRUE, robust=Inf, ref.var=NULL, node.depth=NULL, polytomy.brlen=0, equal.branch.length=FALSE, factor.action="abort")
crunch(formula, data, phy, names.col, stand.contr = TRUE, robust=Inf, ref.var=NULL, node.depth=NULL, polytomy.brlen=0, equal.branch.length=FALSE, factor.action="abort")
formula |
A model formula. |
data |
An 'comparative.data' object. Alternatively, a data frame. |
phy |
An object of class 'phylo', required when data is not a 'comparative.data' object. |
names.col |
A name specifying the column in 'data' that matches rows to tips in 'phy', required when data is not a 'comparative.data' object. |
stand.contr |
A logical flag indicating whether or not to standardize contrasts |
.
robust |
A threshold value of studentized residuals to exclude from the model. |
ref.var |
A reference variable present in the model that is used to specify the direction of calculation of contrasts. If null, this is assumed to be the first explanatory variable. |
node.depth |
A positive integer greater than 1 used to restrict the model to contrasts with a node depth less than or equal to the specified depth. Tips have a depth of 1. |
polytomy.brlen |
The internal branch length used for calculating contrasts at a polytomy, following Pagel's (1992) method. |
equal.branch.length |
If set to 'TRUE' then all branch lengths are set to 2. |
factor.action |
One of "abort", "warn" or "allow", describing whether to stop if the formula contains a factor ("abort"), or continue after converting the factor to a numeric variable, either with ("warn") or without ("allow") a warning. |
This function implements the 'crunch' algorithm for modelling the relationship between variables that are phylogenetically non-independent. The method was first described by Felsenstein (1985) and subsequently extended to permit the use of phylogenies with polytomies by Pagel (1992). This method was previously implemented in the Mac Classic computer programs CAIC, written by Andy Purvis, Andy Rambaut (Purvis and Rambaut, 1995) and updated by Nick Isaac and Paul-Michael Agapow.
The user provides a comparative dataset. The formula specifies the model to be fitted and contrasts are calculated in those variables. The specified reference variable is used to ensure that contrasts for multivariate models are calculated in a consistent direction at each node. The function crunch()
acts as a data preparation wrapper for the function contrCalc()
, which is not intended to be directly called by users. Missing data can be present in the explanatory variables: the algorithm makes use of the complete data available at each node as was the case with CAIC.
The resulting table of contrasts is then used to fit the specified model - note that the intercept is automatically dropped from the model if present, following REF HERE.
Contrasts at polytomies are calculated following Pagel (1992). The descendants from the node are split into two groups based on whether they are above or below the group mean in the reference variable. If there is no variation in the reference variable, then a 1:(N-1) split is used. Weighted means in the variables are then calculated for each subgroup and a contrast is calculated between these values using an arbitrary internal branch length.
A object of class 'caic'.
At a polytomy, subtracting the internal branch length from the real branch lengths can lead to negative branch lengths. CAIC used a hard-coded internal branch length of 1 for calculating crunch contrasts at polytomies. From version 2.6.9, CAIC issued a warning if this lead to negative branch lengths but allowed contrast calculation to continue. In contrast, the implementation in crunch()
uses a default internal branch length (polytomy.brlen
) of 0 and also treats a negative branch length in a polytomy calculation as an error. In either case, contrast calculation on negative branch lengths is not a desirable outcome. Duplication of CAIC results therefore requires polytomy.brlen
to be set to 1 and an analyis cannot be duplicated precisely if the phylogeny contains polytomies with descending branches shorter than 1. The method used by pic.lm
to handle polytomies avoids such problems.
David Orme
Felsenstein, J. (1985). Phylogenies and the comparative method. Am. Nat. 125, 1-15
Pagel, M. D. (1992). A method for the analysis of comparative data. J. theor. Biol. 156, 431-442.
Purvis, A. and Rambaut, A. (1995) Comparative analysis by independent contrasts (CAIC): an Apple Macintosh application for analysing comparative data. Computer Appl. Biosciences 11, 247-251.
caic-class
for 'caic' object structure and methods.
data(shorebird) shorebird <- comparative.data(shorebird.tree, shorebird.data, Species) crunchMod <- crunch(Egg.Mass ~ F.Mass + M.Mass, data=shorebird) summary(crunchMod) # plot the contrasts crunchTab <- caic.table(crunchMod) plot(Egg.Mass ~ F.Mass, crunchTab) # for the actual model diagnostics par(mfrow=c(3,2)) caic.diagnostics(crunchMod)
data(shorebird) shorebird <- comparative.data(shorebird.tree, shorebird.data, Species) crunchMod <- crunch(Egg.Mass ~ F.Mass + M.Mass, data=shorebird) summary(crunchMod) # plot the contrasts crunchTab <- caic.table(crunchMod) plot(Egg.Mass ~ F.Mass, crunchTab) # for the actual model diagnostics par(mfrow=c(3,2)) caic.diagnostics(crunchMod)
Fusco and Cronk (1995) described a method for testing the imbalance of phylogenetic trees based on looking at the distribution of I. I is calculated using the number of tips descending from each side of a bifurcating node using the formula I = (B-m)/(M-m) and is bounded between 0 (a perfectly balanced node) and 1 (maximum imbalance). B is the larger number of tips descending from each branch, M is the maximum size of this larger group (i.e. a 1 : (S-1) split, where S is the total number of descendent tips), and m is the minimum size of the larger group (ceiling of S/2). The method can cope with small proportions of polytomies in the phylogeny and these are not used in calculating balance statistics. It can also incorporate information about species richness at the tips of the phylogeny and can therefore be used to distinguish between an unbalanced topology and the unbalanced distribution of diversity at the tips of a phylogeny.
Purvis et al. (2002) demonstrated that I is not independent of the node size S, resulting in a bias to the expected median of 0.5. They proposed a modification (I') that corrects this to give a statistic with an expected median of 0.5 regardless of node size. The defaults in this function perform testing of imbalance using I', but it is also possible to use the original measure proposed by Fusco and Cronk (1995).
fusco.test(phy, data , names.col, rich, tipsAsSpecies=FALSE, randomise.Iprime=TRUE, reps=1000, conf.int=0.95) ## S3 method for class 'fusco' print(x, ...) ## S3 method for class 'fusco' summary(object, ...) ## S3 method for class 'fusco' plot(x, correction=TRUE, nBins=10, right=FALSE, I.prime=TRUE, plot=TRUE, ...)
fusco.test(phy, data , names.col, rich, tipsAsSpecies=FALSE, randomise.Iprime=TRUE, reps=1000, conf.int=0.95) ## S3 method for class 'fusco' print(x, ...) ## S3 method for class 'fusco' summary(object, ...) ## S3 method for class 'fusco' plot(x, correction=TRUE, nBins=10, right=FALSE, I.prime=TRUE, plot=TRUE, ...)
phy |
An object of class 'comparative.data' or of class 'phylo'. |
data |
A data frame containing species richness values. Not required if phy is a 'comparative.data' object. |
names.col |
A variable in |
rich |
A variable identifying species richness. |
tipsAsSpecies |
A logical value. If TRUE, a species richness column need not be specified and the tips will be treated as species. |
randomise.Iprime |
Use a randomization test on calculated I' to generate confidence intervals. |
reps |
Number of replicates to use in simulation or randomization. |
conf.int |
Width of confidence intervals required. |
x , object
|
An object of class 'fusco'. |
correction |
Apply the correction described in Appendix A of Fusco and Cronk (1995) to the histogram of nodal imbalance. |
nBins |
The number of bins to be used in the histogram of nodal imbalance. |
right |
Use right or left open intervals in plotting the distribution and calculating the correction |
I.prime |
Plot distribution of I' or I. |
plot |
If changed to FALSE, then the plot method does not plot the frequency histogram. Because the method invisibly returns a table of histogram bins along with the observed and corrected frequencies, this isn't as dim an option as it sounds. |
... |
Further arguments to generic methods |
I is calculated only at bifurcating nodes giving rise to more than 3 tips (or more than 3 species at the tips): nodes with three or fewer descendants have no variation in I and are not informative in assessing imbalance. The expected distribution of the nodal imbalance values between 0 and 1 is theoretically uniform under a Markov null model. However, the range of possible I values at a node is constrained by the number of descendent species. For example, for a node with 8 species, only the values 0, 0.33, 0.66, 1 are possible, corresponding to 4:4, 5:3, 6:2 and 7:1 splits (Fusco and Cronk, 1995). As node size increases, this departure from a uniform distribution decreases. The plot method incorporates a correction, described by Fusco and Cronk (1995), that uses the distribution of all possible splits at each node to characterize and correct for the departure from uniformity.
The randomization option generates confidence intervals around the mean I'.
The function fusco.test
produces an object of class 'fusco' containing:
observed |
A data frame of informative nodes showing nodal imbalance statistics. If the phylogeny has labelled nodes, then the node names are also returned in this data frame. |
median |
The median value of I. |
qd |
The quartile deviation of I. |
tipsAsSpecies |
A logical indicating whether the tips of the trees were treated as species or higher taxa. |
nInformative |
The number of informative nodes. |
nSpecies |
The number of species distributed across the tips. |
nTips |
The number of tips. |
reps |
The number of replicates used in randomization. |
conf.int |
The confidence levels used in randomization. |
If randomise.Iprime
is TRUE, or the user calls fusco.randomize
on a 'fusco' object, then the following are also present.
randomised |
A data frame of mean I' from the randomized observed values. |
rand.mean |
A vector of length 2 giving confidence intervals in mean I'. |
David Orme, Andy Purvis
Fusco, G. & Cronk, Q.C.B. (1995) A New Method for Evaluating the Shape of Large Phylogenies. J. theor. Biol. 175, 235-243
Purvis A., Katzourakis A. & Agapow, P-M (2002) Evaluating Phylogenetic Tree Shape: Two Modifications to Fusco & Cronk's Method. J. theor. Biol. 214, 93-103.
data(syrphidae) syrphidae <- comparative.data(phy=syrphidaeTree, dat=syrphidaeRich, names.col=genus) summary(fusco.test(syrphidae, rich=nSpp)) summary(fusco.test(syrphidae, tipsAsSpecies=TRUE)) plot(fusco.test(syrphidae, rich=nSpp))
data(syrphidae) syrphidae <- comparative.data(phy=syrphidaeTree, dat=syrphidaeRich, names.col=genus) summary(fusco.test(syrphidae, rich=nSpp)) summary(fusco.test(syrphidae, tipsAsSpecies=TRUE)) plot(fusco.test(syrphidae, rich=nSpp))
This dataset contains the phylogeny of bird families and species richness originally included with the FUSCO imbalance calculation programs.
data(fuscoData)
data(fuscoData)
The dataset provides a phylogeny (fuscoBirdTree
) and a data frame (fuscoBirdData
). The phylogeny is a 137 tip tree containing one polytomy and the dataframe provides tip labels and species richness for each bird family. This dataset was provided with the original DOS implementation of the test and the families were unlabelled in this original file.
fusco.test
This function provides a very general environment in which to simulate trees. The basic philosophy is that the user provides a series of expressions that define speciation rates, extinction rates and trait evolution. These expressions can make use of information about the internal state of the tree, allowing for very flexible definitions of rules for tree growth.
growTree(b = 1, d = 0, halt = 20, grain = 0.1, linObj = NULL, ct.start = NULL, ct.change = NULL, ct.var = NULL, dt.rates = NULL, inheritance = NULL, trace.events = FALSE, trace.cladesize = FALSE, output.lineages = FALSE, neg.rates = "abort", inf.rates = "abort", stall.time = 10, extend.proportion=0) ## S3 method for class 'growTree' as.comparative.data(x, ...)
growTree(b = 1, d = 0, halt = 20, grain = 0.1, linObj = NULL, ct.start = NULL, ct.change = NULL, ct.var = NULL, dt.rates = NULL, inheritance = NULL, trace.events = FALSE, trace.cladesize = FALSE, output.lineages = FALSE, neg.rates = "abort", inf.rates = "abort", stall.time = 10, extend.proportion=0) ## S3 method for class 'growTree' as.comparative.data(x, ...)
b |
A speciation rate. This can be a numeric constant, as in the default, which specifies a single speciation rate for the simulation. Alternatively, this can be an expression, or a list of expressions which define speciation rate in terms of the properties of the tree. See details for discussion of those properties. |
d |
An extinction rate, described as above. |
halt |
A rule use to halt the simulation. The default is the number of tips in the simulation, specified as a single integer, but this can also be an expression or list of expressions on the properties of the tree. The simulation is halted when any of these expressions becomes true. |
grain |
Where rates depend on time or trait values, it becomes necessary to allow time to pass discretely in order to re-evaluate waiting times under the changing values. This sets the amount of time that is allowed to pass before re-evaluation. If rates do not depend on such changing parameters, it is sensible to set this to infinity - this will ensure that the flow of the simulation is not slowed by checking. |
linObj |
This can be used to supply an existing simulation object, which will then continue to grow under the provided rules. This allows the user to simulate trees with different sets of rules operating in different epochs. The function |
ct.start |
A numeric vector specifying the starting values for continuous traits. If unnamed these will be sequentially named as 'ct1', 'ct2' etc. The names of traits may be used in expressions governing tree growth rules. |
ct.change |
A numeric vector describing the mean change per unit time in continuous trait values, used to simulate a directional bias in character evolution. If ct.change is NULL, then this is assumed to be zero for each species. |
ct.var |
Either a vector of variances for each trait or a square matrix describing the variances and covariances amongst the continuous traits. If this is NULL, then uncorrelated traits with a variance of 1 are assumed. |
dt.rates |
A list of matrices describing the rate of transition between discrete character traits. Each matrix defines a trait and, as with ct.start, the list names are used to identify the traits in the simulation and default to 'dt1', 'dt2', etc. The dimnames of the matrix are used to identify the states of the trait and default to 'st1', 'st2', etc. The matrix need not be symmetrical: the rates are defined from the states in the columns to the states in the rows, hence the diagonal should probably be zero. Each trait is assumed to start the simulation in the first state in the matrix. |
inheritance |
A list of rules that are applied after a speciation and can be used to modify trait values for the descendent lineages. The names of the list specify which traits are to be modified and, for each trait specified, should return a vector of length two which replaces the existing values. |
trace.events |
A logical flag, indicating whether or not report speciation, extinction and discrete character evolution events. |
trace.cladesize |
A positive integer giving an increment size for the simulation to report clade size if required. |
output.lineages |
A logical flag indicating whether to return the internal lineages object. |
neg.rates |
One of 'abort', 'warn' or 'quiet', defining the behaviour when a rate calculation produces a negative number. With 'warn' and 'quiet', negative rates are set to zero and the simulation continues. |
inf.rates |
One of 'abort', 'warn' or 'quiet', as for |
stall.time |
If the all rates within the simulation are zero then only this length of time is allowed to pass before the simulation exits with a 'stalled' status. If |
extend.proportion |
This option allows the simulation to continue running for a given proportion of the time to the next speciation. This makes sense when growing a clade to a given number of extant taxa; with the default setting of zero, the resulting tree ends at a bifurcation with zero branch lengths and this option allows the tree to grow (and taxa to go extinct and traits to evolve). |
x |
A lineage table output from growTree |
... |
Further arguments to as. |
The main idea behind this function (which is still in development) is to provide a flexible framework for simulating tree growth and trait evolution. The user provides expressions for the main arguments (b
, d
and halt
) which act as rules defining speciation and extinction and the ending of the simulation. These can be simple constants, but can also make use of the properties of the environment of the evolving tree. This includes both lineage specific properties (as described in the lineages
section of the returned value) or properties of the clade as a whole (as described in the clade
section of the returned value). For example, a extinction rate might increase with lineage age (d=0.01*lin.age
) or a speciation rate might decrease according to a density dependent process (b=1 - (nExtantTip/500)
). Halt expressions will typically use clade properties (halt=clade.age >= 5
or nTips >= 50
) but could use lineage properties, for example stopping when a trait value hits a certain value (halt=any(ct1 >= 10)
). It is not permitted to use '==' in a halt function of clade.age because it will allow the simulation to run away if the actual value steps over the test value.
Discrete traits are defined using a matrices of rates for transitions between states for each trait. At present, these are fixed for the duration of a simulation epoch and cannot be set as expressions of tree variables.
Continuous trait evolution currently employs a simple Brownian model, given a starting value and variance per unit time. The traits can have defined co-variance (the simulation uses mvrnorm at present) and can also have a defined mean change, allowing for a directional walk in the trait values. At present, it is not possible for the trait variance to vary according to the internal state of the tree; continuous characters retain the same variance and covariance for the whole of the simulation epoch.
Whilst none of the halt
rules are TRUE, then the function evaluates the birth, death and discrete trait rates and converts these to waiting times using random variates from a exponential distribution with the calculated rates. These competing waiting times are compared both to each other and the grain
of the simulation, the shortest waiting time is found and the relevant event is then triggered. The winning event is identified in the character vector winnerName
in order to allow inheritance rules to differentiate events.
Depending on the value of output.phylo
, either an object of class 'phylo' or an object of class 'growTree' with the following structure:
lineages |
A data frame with a row for each lineage in the tree. Each row identifies the |
clade |
A list containing: |
rules |
A list reporting the birth ( |
ct.set |
If continuous characters were simulated, a list of the |
dt.rates |
If discrete characters were simulated, a list containing the |
David Orme, drawing heavily on discussions with Paul-Michael Agapow.
## see the package vignette for a much fuller discussion of examples. # A basic 200 tip tree, output as a 'comparative.data' object tree <- growTree(halt=200, grain=Inf) plot(tree$phy) # A basic tree of age 4 time units, output as a 'comparative.data' object tree <- growTree(halt=expression(clade.age >= 4), grain=Inf) plot(tree$phy)
## see the package vignette for a much fuller discussion of examples. # A basic 200 tip tree, output as a 'comparative.data' object tree <- growTree(halt=200, grain=Inf) plot(tree$phy) # A basic tree of age 4 time units, output as a 'comparative.data' object tree <- growTree(halt=expression(clade.age >= 4), grain=Inf) plot(tree$phy)
This data set contains four species-level comparative datasets used in Isaac et al (2005)
data(IsaacEtAl)
data(IsaacEtAl)
The datafile contains species level phylogenies and accompanying data frames of nine variables for each of four mammalian orders (Primates, Carnivora, Chiroptera and Marsupialia). The data were published in supplementary material for Isaac et al. (2005) as CAIC format files and text data files and have been converted for use in 'caper'. The data files are incomplete, with some variables having little or no data for some orders.
The variables (all saved as natural log values) are:
Species richness at the tips - all are set to 1 for use in macrocaic
The average body mass in grams
Age at sexual maturity in months
Gestation length in days
Interbirth interval in months
The average number of offspring in a litter
Population density
Number of individuals in a typical group
Male mass /female mass
Male length / female length
Isaac, N., Jones, K., Gittleman, J., and Purvis, A. (2005). Correlates of species richness in mammals: Body size, life history, and ecology. American Naturalist, 165(5):600-607.
caic, pgls
data(IsaacEtAl) chiroptera <- comparative.data(chiroptera.tree, chiroptera.data, 'binomial', na.omit=FALSE) carnivora <- comparative.data(carnivora.tree, carnivora.data, 'binomial', na.omit=FALSE) primates <- comparative.data(primates.tree, primates.data, 'binomial', na.omit=FALSE) marsupialia <- comparative.data(marsupialia.tree, marsupialia.data, 'binomial', na.omit=FALSE)
data(IsaacEtAl) chiroptera <- comparative.data(chiroptera.tree, chiroptera.data, 'binomial', na.omit=FALSE) carnivora <- comparative.data(carnivora.tree, carnivora.data, 'binomial', na.omit=FALSE) primates <- comparative.data(primates.tree, primates.data, 'binomial', na.omit=FALSE) marsupialia <- comparative.data(marsupialia.tree, marsupialia.data, 'binomial', na.omit=FALSE)
Macroevolutionary hypotheses about correlates of species richness require testing in a phylogenetic framework in order to avoid phylogenetic autocorrelation. Independent contrasts as described by Felsenstein (1985) are appropriate for explanatory variables in such models but not for species richness as the response variable. This function implements two methods for calculating species richness constrasts described by Agapow and Isaac (2002) and originally implemented in the program MacroCAIC.
macrocaic(formula, data, phy, names.col, macroMethod = "RRD", stand.contr = TRUE, robust=Inf, ref.var = NULL, node.depth = NULL, macroMinSize = 3, equal.branch.length = FALSE)
macrocaic(formula, data, phy, names.col, macroMethod = "RRD", stand.contr = TRUE, robust=Inf, ref.var = NULL, node.depth = NULL, macroMinSize = 3, equal.branch.length = FALSE)
formula |
A formula describing a linear model predicting species richness. |
data |
A data frame containing the variables to be used in the model. |
phy |
An object of class 'phylo'. |
names.col |
A name identifying a column in |
macroMethod |
One of either "RRD" or "PDI" (see Details). |
stand.contr |
A logical flag indicating whether to standardize the contrasts. |
robust |
A threshold value of studentized residuals to exclude from the model. |
ref.var |
Identifies a predictor variable used for determining the direction of contrasts. |
node.depth |
A positive integer greater than 1 used to restrict the model to contrasts with a node depth less than or equal to the specified depth. Tips have a depth of 1. |
macroMinSize |
A positive integer giving the minimum species richness at a node for contrasts to be included in the model. |
equal.branch.length |
If set to 'TRUE' then all branch lengths are set to 2. |
The 'macrocaic' function fits a regression to the formula provided using 'crunch' contrasts for continuous explanatory variables and species richness contrasts for the response. The species richness contrasts are either the relative rate difference (RRD) or proportion dominance index (PDI):
RRD = ln(N_1/N_2)
PDI = (N_1/(N_1 + N_2))-0.5
The values and
are the species richness of the two daughter nodes and
is the species richness of the clade with the larger value of the reference variable. Species richness contrasts are not calculated at polytomies. Nodal values for species richness are calculated as the sum of the richness of the daughter nodes.
A object of class 'caic'.
David Orme
Felsenstein, J. (1985). Phylogenies and the comparative method. Am. Nat. 125, 1-15 Agapow, P.-M. and Isaac, N. J. B. (2002) MacroCAIC: correlates of species richness. Diversity & Distributions, 8, 41-43 Isaac, N., Agapow, P., Harvey, P., and Purvis, A. (2003). Phylogenetically nested com- parisons for testing correlates of species richness: A simulation study of continuous variables. Evolution, 57(1):18-26.
caic-class
for 'caic' object structure and methods.
data(IsaacEtAl) primates <- comparative.data(primates.tree, primates.data, binomial, na.omit=FALSE) primatesBodySize <- macrocaic(species.rich ~ body.mass, data=primates) summary(primatesBodySize)
data(IsaacEtAl) primates <- comparative.data(primates.tree, primates.data, binomial, na.omit=FALSE) primatesBodySize <- macrocaic(species.rich ~ body.mass, data=primates) summary(primatesBodySize)
These functions calculate various phylogenetic diversity measures for either a given set of nodes on a tree or for a randomly chosen set of nodes of a given size. The ed.calc function calculates a related species-level measurement of evolutionary distinctness.
pd.calc(cm, tip.subset = NULL, method = "TBL", root.edge=FALSE) pd.bootstrap(cm, ntips, reps = 1000, method = "TBL", tip.weights = NULL) ed.calc(cm, polytomy.cf=c("isaac","mooers","none"))
pd.calc(cm, tip.subset = NULL, method = "TBL", root.edge=FALSE) pd.bootstrap(cm, ntips, reps = 1000, method = "TBL", tip.weights = NULL) ed.calc(cm, polytomy.cf=c("isaac","mooers","none"))
cm |
A object of class 'clade matrix'. Alternatively an object of class 'phylo', which will be converted to a clade.matrix. |
tip.subset |
An optional vector identifying the subset of tips to use for PD calculations. If no tip.subset is provided the method is applied to the complete phylogeny [Hmm.. this might be undesirable]. Can either be a character vector, in which case the elements are matched against tip labels, or a vector of positive integers in the range 1 to the number of tips, in which case the tips with those numbers are used. |
method |
One of 'TBL', 'MST', 'UEH', 'SBL', defaulting to 'TBL'. See details. |
root.edge |
Logical indicating whether to include the root edge length in calculations, defaulting to FALSE. |
ntips |
A single integer giving the number of tips to be selected. |
reps |
The number of replicate values to calculate. |
tip.weights |
A numeric vector containing weights for all the tips in the phylogeny. Each element must be named in order to match weights to the tips. |
polytomy.cf |
Which correction factor to use for calculating ED at polytomies. One of 'isaac', 'mooers' or 'none'. |
There are five implemented PD measures:
The sum of all the edge lengths in the subtree given by the tip subset. This measure can be partitioned into the two next measures.
The sum of all edges in the subtree that are shared by more than one tip.
The sum of the edge lengths that give rise to only one tip in the subtree.
Length of tip branch lengths (TIPS)
Unlike UEH, this measure does not use the unique paths to each tips on the subtree and instead gives the sum of the unique branches leading to the tips on the complete tree.
The sum of the lengths of the edges for the smallest tree that links the subset tips, excluding any edges below the node of the most recent common ancestor.
These options are illustrated in the caper package vignette. The pd.calc function returns the PD value for a given set of tips, whereas the pd.bootstrap function returns a vector of PD values for randomly selected sets of tips of a given size.
The ed.calc function returns the evolutionary distinctness (ED) metric (Isaac et al, 2007) for the tips of a given phylogeny. The idea behind the ED measure is that the evolutionary history of each branch is shared equally between all tips descending from that branch. Each branch therefore has a per-tip values of the branch length divided by the number of descendants and the ED value for a tip is the sum of those per-tip contributions over the path to the root of the phylogeny. Polytomies inflate apparent ED since the branches of a properly resolved polytomy must be shorter than the branch lengths on the unresolved polytomy. The function provides two correction factors for this: 'isaac' uses a correction factor calibrated from simulations and 'mooers' uses empirical predictions from a pure birth model.
Both pd.calc and pd.bootstrap return a vector containing either a single value for the phylogenetic diversity of a given set of tips or a vector of length 'nrep' containing the pd values for a random set of tips of a given size. The method used is stored in the 'pd.method' attribute of the vector.
The ed.calc function returns a list containing:
A data frame of the ED contributions arising from each branch.
A data frame of the summed ED contributions for each species.
David Orme, Gavin Thomas, Nick Isaac
Faith, DP, Isaac, N. J. B., Turvey, S. T., Collen, B., Waterman, C., and Baillie, J. E. M. (2007). Mammals on the edge: Conservation priorities based on threat and phylogeny. Plos One, 2(3):e296
treeString <- paste('((((A:1,B:1):1.5,C:2.5):0.5,(D:0.6,E:0.6):2.4):0.5,', '((F:1.9,G:1.9):0.8,(H:1.6,I:1.6):1.1):0.8):0.2;' ,sep='') tre <- read.tree(text=treeString) clmat <- clade.matrix(tre) tips <- c("A","C","D","E","G","H") pd.calc(clmat, tip.subset=tips) pd.calc(clmat, tip.subset=c(1,3,4,5,7,8)) pd.calc(clmat, tip.subset=tips, root.edge=TRUE) pd.bootstrap(clmat, ntips=6, reps=1000, method='TBL') data(IsaacEtAl) primatesCM <- clade.matrix(primates.tree) primatesED <- ed.calc(primatesCM)
treeString <- paste('((((A:1,B:1):1.5,C:2.5):0.5,(D:0.6,E:0.6):2.4):0.5,', '((F:1.9,G:1.9):0.8,(H:1.6,I:1.6):1.1):0.8):0.2;' ,sep='') tre <- read.tree(text=treeString) clmat <- clade.matrix(tre) tips <- c("A","C","D","E","G","H") pd.calc(clmat, tip.subset=tips) pd.calc(clmat, tip.subset=c(1,3,4,5,7,8)) pd.calc(clmat, tip.subset=tips, root.edge=TRUE) pd.bootstrap(clmat, ntips=6, reps=1000, method='TBL') data(IsaacEtAl) primatesCM <- clade.matrix(primates.tree) primatesED <- ed.calc(primatesCM)
This is a comparative dataset on Perissodactyla taken from the examples include with the original CAIC program.
data(shorebird)
data(shorebird)
The datafile contains a phylogeny (perissodactyla.tree
) of 18 perissodactyl species as a 'phylo' object from the ape
library. The tip names are the binomial names of the species. The file also contains a data frame (perissodactyla.data
) of variables 5 variables for 13 of those species:
The species binomial name.
Log female weight
Log gestation length
Log neonatal weight
A factor indicating whether or not the species displays territorial behaviour.
The dataset is incomplete - it does not include data for each species in the phylogeny and contains missing values. See the examples for the behaviour of the 'comparative.data' function in handling missing data.
Purvis, A. and Rambaut, A. (1995). Comparative Analysis by Independent Contrasts (CAIC) User's Guide.
caic, pgls
data(perissodactyla) # default behaviour is to omit incomplete data rows (perisso <- comparative.data(perissodactyla.tree, perissodactyla.data, Binomial)) # but this can be turned off (perisso <- comparative.data(perissodactyla.tree, perissodactyla.data, Binomial, na.omit=FALSE)) na.omit(perisso)
data(perissodactyla) # default behaviour is to omit incomplete data rows (perisso <- comparative.data(perissodactyla.tree, perissodactyla.data, Binomial)) # but this can be turned off (perisso <- comparative.data(perissodactyla.tree, perissodactyla.data, Binomial, na.omit=FALSE)) na.omit(perisso)
Fits a linear model, taking into account phylogenetic non-independence between data points. The strength and type of the phylogenetic signal in the data matrix can also be accounted for by adjusting branch length transformations (lambda, delta and kappa). These transformations can also be optimised to find the maximum likelihood transformation given the data and the model.
pgls(formula, data, lambda = 1.0, kappa = 1.0, delta= 1.0, param.CI = 0.95, control = list(fnscale=-1), bounds = NULL) pgls.likelihood(optimPar, fixedPar, y, x, V, optim.output=TRUE, names.optim=NULL) pgls.blenTransform(V, fixedPar)
pgls(formula, data, lambda = 1.0, kappa = 1.0, delta= 1.0, param.CI = 0.95, control = list(fnscale=-1), bounds = NULL) pgls.likelihood(optimPar, fixedPar, y, x, V, optim.output=TRUE, names.optim=NULL) pgls.blenTransform(V, fixedPar)
formula |
A model formula |
data |
A 'comparative.data' object containing the covariance matrix and data to be used in the model. |
lambda |
A value for the lambda transformation. |
kappa |
A value for the kappa transformation. |
delta |
A value for the delta transformation. |
param.CI |
A p value used to calculate confidence intervals. |
control |
A list of control parameters for the optim function. |
bounds |
A list of bounds to use for branch length transformations (see Details). |
optimPar |
A named vector of branch length parameters to be optimised to find the maximum likelihood value. |
fixedPar |
A named vector of fixed values for branch length parameters. |
y |
A column matrix of the model response. |
x |
The design matrix of the model. |
V |
A phylogenetic covariance matrix. |
optim.output |
A logical value. If true then 'pgls.likelihood' returns only the likelihood value for use in the 'optim' function. |
names.optim |
The name of a single parameter being optimised. This is only required for estimating parameter confidence intervals, where the function 'uniroot' strips names from vectors. |
This function fits a linear model controlling for the non-independence between cases resulting from phylogenetic structure in the data. The stucture of the phylogenetic signal can be controlled by altering the parameters lambda, delta and kappa (see the 'caper' vignette for details). The implementation of the method is currently as described in Freckleton et al (2002).
The branch length transformations can be optimised between bounds using maximum likelihood by setting the value for a transformation to 'ML'. The default bounds are: lambda = c(1e-6,1), kappa = c(1e-6,3) and delta=c(1e-6,3). These defaults may be overridden by passing a named list with new elements to the bounds argument - only the bounds to be changed need to be provided (e.g. bounds=list(lambda=c(0,3))).
The 'pgls.likelihood' and 'pgls.blenTransform' methods are not primarily intended to be called by users. The 'pgls.likelihood' function provides a general method to calculate the likelihood of a model, given the covariance matrix, response, design matrix and branch length parameters.
The 'pgls' function returns an object of class pgls
containing the following:
"na.action" "param.CI"
call |
The original call to the 'pgls' function |
model |
A summary of the fitted model containing: |
formula |
The model formula supplied. |
data |
The comparative data object provided. |
dname |
The name of the comparative data object. |
logLikY |
The log likelihood of the response variable given the model. |
RMS |
The residual mean square variance in the model. |
RSSQ |
The residual sum of squares from the model. |
NMS |
The null mean square variance for the model. |
NSSQ |
The null sum of squares for the response. |
aic |
The AIC score of the model |
aicc |
The AICc score of the model, correcting for the number of cases and parameters estimated |
n |
The number of rows of data used in fitting the model |
k |
The number of parameter estimates |
sterr |
The standard errors of the parameter estimates |
Vt |
The phylogenetic covariance matrix used in the model, with branch length transformations applied. |
fitted |
The predicted values |
residuals |
The non-phylogenetic residuals |
phyres |
The phylogenetic residuals |
x |
The design matrix of the model |
varNames |
The variables include in the model. |
y |
The response of the model. |
namey |
The name of the response variable. |
param |
A named numeric vector of length three giving the branch length transformations used in the model. |
mlVals |
A named logical vector of length three indicating which branch length values in 'param' are maximum likelihood estimates. |
bounds |
The bounds on branch length parameter estimates used in the model. |
param.CI |
A named list of length three giving confidence intervals and the p values at the parameter bounds for optimised branch length transformations. Fixed parameters will have a NULL entry in this list. |
na.action |
A named vector identifying any rows of missing data excluded from the model. |
The model is fitted using a data frame reduced to complete row cases to eliminate missing values. In order to ensure that the models fitted using different subsets of the data are comparable, the whole data frame data
is reduced to complete cases. In the future, a scope argument may be provided to control this but at present the data frame should be reduced to only those variables used in the maximal model in order to avoid prevent redundant variables causing rows to be dropped unnecessarily.
Rob Freckleton; David Orme
R. P. Freckleton, P. H. Harvey, and M. Pagel. Phylogenetic analysis and comparative data: A test and review of evidence. American Naturalist, 160:712-726, 2002.
pgls.profile
, anova.pgls
, summary.pgls
data(shorebird) shorebird <- comparative.data(shorebird.tree, shorebird.data, Species, vcv=TRUE, vcv.dim=3) mod1 <- pgls(log(Egg.Mass) ~ log(M.Mass) * log(F.Mass), shorebird, lambda='ML') mod2 <- pgls(log(Egg.Mass) ~ log(M.Mass), data=shorebird, lambda='ML', delta='ML')
data(shorebird) shorebird <- comparative.data(shorebird.tree, shorebird.data, Species, vcv=TRUE, vcv.dim=3) mod1 <- pgls(log(Egg.Mass) ~ log(M.Mass) * log(F.Mass), shorebird, lambda='ML') mod2 <- pgls(log(Egg.Mass) ~ log(M.Mass), data=shorebird, lambda='ML', delta='ML')
These are simple summary methods, accessor functions and summary and print methods for 'pgls' models.
## S3 method for class 'pgls' coef(object, ...) ## S3 method for class 'pgls' residuals(object, phylo = FALSE, ...) ## S3 method for class 'pgls' fitted(object, ...) ## S3 method for class 'pgls' predict(object, newdata, ...) ## S3 method for class 'pgls' summary(object, ...) ## S3 method for class 'pgls' print(x, digits = max(3, getOption("digits") - 3), ...) ## S3 method for class 'summary.pgls' print(x, digits = max(3, getOption("digits") - 3), ...) ## S3 method for class 'pgls' nobs(object, ...)
## S3 method for class 'pgls' coef(object, ...) ## S3 method for class 'pgls' residuals(object, phylo = FALSE, ...) ## S3 method for class 'pgls' fitted(object, ...) ## S3 method for class 'pgls' predict(object, newdata, ...) ## S3 method for class 'pgls' summary(object, ...) ## S3 method for class 'pgls' print(x, digits = max(3, getOption("digits") - 3), ...) ## S3 method for class 'summary.pgls' print(x, digits = max(3, getOption("digits") - 3), ...) ## S3 method for class 'pgls' nobs(object, ...)
object |
An object of class 'pgls'. |
x |
An object of class 'pgls'. |
phylo |
Return phylogenetically corrected residuals or ordinary residuals (see details). |
newdata |
Alternative data for predicting from 'pgls' models. |
digits |
Number of digits to show in summary methods. |
... |
Further arguments to methods. |
Phylogenetically corrected residuals from 'pgls' models [TODO].
Note that the r^2 values reported by summary.pgls
have a specific interpretation. pgls
fits the intercept-only model for the data using _exactly_ the same covariance matrix (phylogeny plugged through any branch length transformations) as the fitted model to get a null model. The r-squared and adjusted r-squared that are reported therefore hold the covariance matrix constant, so show percentage of variance explained between a null model and the actual model given that precise model of trait change.
The actual ML null model for the data (optimising the BL transformation independently) might be different from this - but then the r squared values confound change in explanatory power from changing the model parameters and from changing the trait model.
The 'summary' method returns an object of class 'summary.pgls' containing:
call |
The original function call creating the model. |
df |
A vector of the degrees of freedom used to estimate parameters and the residual degrees of freedom. |
sigma |
The square root of the estimated variance of the random error. |
residuals |
The phylogenetically corrected residuals. |
coefficients |
A table of model coefficient, standard errors and t values. |
param |
A vector of branch length parameters used in the model. |
mlVals |
A vector showing which branch length parameters have been optimised. |
param.CI |
A list of length three containing confidence intervals and p values on parameter bounds for each parameter. |
fstatistic |
A vector of the F value, numerator and denominator degrees of freedom for the model. |
r.squared |
The r^2 for the model. |
adj.r.squared |
The adjusted r^2 for the model. |
Rob Freckleton, David Orme
data(shorebird) shorebird <- comparative.data(shorebird.tree, shorebird.data, Species, vcv=TRUE, vcv.dim=3) mod1 <- pgls(log(Egg.Mass) ~ log(M.Mass) * log(F.Mass), shorebird) print(mod1) mod1.sum <- summary(mod1) print(mod1.sum)
data(shorebird) shorebird <- comparative.data(shorebird.tree, shorebird.data, Species, vcv=TRUE, vcv.dim=3) mod1 <- pgls(log(Egg.Mass) ~ log(M.Mass) * log(F.Mass), shorebird) print(mod1) mod1.sum <- summary(mod1) print(mod1.sum)
These functions create likelihood profiles for branch length transformations in phylogenetic generalised least squares models and fit confidence intervals to estimated branch length parameters.
pgls.profile(pgls, which = c("lambda", "kappa", "delta"), N = 50, param.CI = NULL) pgls.confint(pgls, which=c('lambda','kappa','delta'), param.CI=0.95) ## S3 method for class 'pgls.profile' plot(x, ...)
pgls.profile(pgls, which = c("lambda", "kappa", "delta"), N = 50, param.CI = NULL) pgls.confint(pgls, which=c('lambda','kappa','delta'), param.CI=0.95) ## S3 method for class 'pgls.profile' plot(x, ...)
pgls |
A |
which |
A choice of which branch length transformation ('lambda', 'kappa' or 'delta') to use. |
N |
The number of points used to profile the likelihood |
param.CI |
A p value used to add confidence intervals to a likelihood profile for a parameter. |
x |
A 'pgls.profile' object to plot. |
... |
Further arguments to plot functions. |
The 'pgls.profile' function calculates the likelihood of a 'pgls' model under different values of branch length transformations. A single parameter is chosen from 'lambda', 'kappa' or 'delta' to be profiled and the model likelihood is calculated at 'N' equally spaced points between the parameter bounds used in the model. If the model contains a maximum likelihood estimate of the parameter (or if param.CI is not null) then the resulting 'pgls.profile' object will contain estimated confidence intervals.
Only one parameter is profiled at a time and the other branch length parameters will be held at the fixed or ML estimates used to fit the model. The 'pgls.confint' function is used by either 'pgls' or 'pgls.profile' to find confidence intervals around a maximum likelihood estimate of a given branch length. The model must contain an ML estimate of the parameter for confidence intervals to be calculated.
The plot method simply draws an annotated profile plot, showing the location of the ML estimate and confidence intervals if present.
The 'pgls.profile' function returns a list containing:
x |
Parameter values at which the likelihood has been calculated. |
logLik |
The likelihood value at each value. |
which |
The parameter being profiled. |
pars |
The value of the other fixed parameters. |
dname |
The name of the 'comparative.data' object used to fit the model. |
formula |
The formula of the model being profiled |
If the model contains an ML estimate of the parameter being profiled, then the 'pgls.profile' object will also contain the output of 'pgls.confint':
opt |
The maximum likelihood value of the parameter. |
bounds.val |
The values of the bounds on the parameter. |
bounds.p |
The p value of the likelihood at the bounds, given the ML value. |
ci.val |
The values of the parameter at the confidence intervals. |
ci |
The confidence interval value used. |
David Orme
data(shorebird) shorebird <- comparative.data(shorebird.tree, shorebird.data, Species, vcv=TRUE, vcv.dim=3) mod <- pgls(log(Egg.Mass) ~ log(M.Mass), shorebird, lambda='ML') mod.l <- pgls.profile(mod, 'lambda') plot(mod.l) pgls.confint(mod, 'lambda')
data(shorebird) shorebird <- comparative.data(shorebird.tree, shorebird.data, Species, vcv=TRUE, vcv.dim=3) mod <- pgls(log(Egg.Mass) ~ log(M.Mass), shorebird, lambda='ML') mod.l <- pgls.profile(mod, 'lambda') plot(mod.l) pgls.confint(mod, 'lambda')
Calculates the D value, a measure of phylogenetic signal in a binary trait, and tests the estimated D value for significant departure from both random association and the clumping expected under a Brownian evolution threshold model.
phylo.d(data, phy, names.col, binvar, permut = 1000, rnd.bias=NULL) ## S3 method for class 'phylo.d' print(x, ...) ## S3 method for class 'phylo.d' summary(object, ...) ## S3 method for class 'phylo.d' plot(x, bw=0.02, ...)
phylo.d(data, phy, names.col, binvar, permut = 1000, rnd.bias=NULL) ## S3 method for class 'phylo.d' print(x, ...) ## S3 method for class 'phylo.d' summary(object, ...) ## S3 method for class 'phylo.d' plot(x, bw=0.02, ...)
data |
A 'comparative.data' or 'data.frame' object. |
phy |
An object of class 'phylo', required when data is not a 'comparative.data' object. |
names.col |
A name specifying the column in 'data' that matches rows to tips in 'phy', required when data is not a 'comparative.data' object. |
binvar |
The name of the variable in |
permut |
Number of permutations to be used in the randomisation test. |
rnd.bias |
An optional name of a variable in |
x |
An object of class 'phylo.d' |
object |
An object of class 'phylo.d' |
bw |
The bandwidth to be used for the density plots |
... |
Further arguments to print and summary methods |
The sum of changes in estimated nodal values of a binary trait along edges in a phylogeny (D) provides a measure of the phylogenetic signal in that trait (Fritz and Purvis, 2010). If a trait is highly conserved, with only a basal division between two clades expressing either trait value, then the only change will be along the two daughters at the root. This will give a summed value of 1: the two differences between the root nodal value of 0.5 and the ancestors of the 1 and 0 clades. In contrast, if the trait is labile, more differences will be observed and the sum will be higher.
This function calculates the observed D for a binary trait on a tree and compares this to the value of D found using an equal number of simulations under each of two models:
Trait values are randomly shuffled relative to the tips of the phylogeny and D is calculated.
A continuous trait is evolved along the phylogeny under a Brownian process and then converted to a binary trait using a threshold that reproduces the relative prevalence of the observed trait.
The value of D depends on phylogeny size - more sister clades yield higher sums - and so the means of the two sets of simulated data are used as calibrations to scale both observed and simulated values of D to set points of 0 (as phylogenetically conserved as expected under a Brownian threshold model) and 1 (random). The value of D can be both smaller than 0 (highly conserved) and greater than 1 (overdispersed) and the distributions of scaled D from the simulations are used to assess the significance of the observed scaled D. The plot
method generates density plots of the distributions of the two simulations relative to the observed D value.
rnd.bias
is passed to sample
as the prob
argument to weight the random shuffles of the observed trait. The weights are not checked for validity.
Returns an object of class 'phylo.d', which is a list of the following:
DEstimate |
The estimated D value |
Pval1 |
A p value, giving the result of testing whether D is significantly different from one |
Pval0 |
A p value, giving the result of testing whether D is significantly different from zero |
Parameters |
A list of the Observed, MeanRandom and MeanBrownian sums of sister-clade differences |
Permutations |
A list with elements random and brownian, containing the sums of sister-clade differences from random permutations and simulations of Brownian evolution under a threshold model |
NodalVals |
A list with the elements observed, random and brownian, containing the nodal values estimated for the observed trait and permutations. The values are as matrices with rows labelled by the node names in the comparative data object. |
binvar |
The binary variable used |
phyName |
The name of the phylogeny object used |
dsName |
The name of the dataframe used |
nPermut |
The number of permutations used |
rnd.bias |
If a bias was introduced to the calculation of the random distribution, the bias used, else |
Susanne Fritz <[email protected]> and David Orme
Fritz, S. A. and Purvis, A. (2010). Selectivity in mammalian extinction risk and threat types: a new measure of phylogenetic signal strength in binary traits. Conservation Biology, 24(4):1042-1051.
data(BritishBirds) BritishBirds <- comparative.data(BritishBirds.tree, BritishBirds.data, binomial) redPhyloD <- phylo.d(BritishBirds, binvar=Red_list) print(redPhyloD) plot(redPhyloD)
data(BritishBirds) BritishBirds <- comparative.data(BritishBirds.tree, BritishBirds.data, binomial) redPhyloD <- phylo.d(BritishBirds, binvar=Red_list) print(redPhyloD) plot(redPhyloD)
Calculates the D value, a measure of phylogenetic signal in a binary trait, and tests the estimated D value for significant departure from both random association and the clumping expected under a Brownian evolution threshold model. Does this across clades within a phylogeny.
phylo.d.subset(data, phy, names.col, binvar, permut = 1000, rnd.bias=NULL, min.tips=1, max.tips=length(data$phy$tip.label), min.nodes=1, max.nodes=data$phy$Nnode, verbose=FALSE) ## S3 method for class 'phylo.d.subset' print(x, ...) ## S3 method for class 'phylo.d.subset' summary(object, ...)
phylo.d.subset(data, phy, names.col, binvar, permut = 1000, rnd.bias=NULL, min.tips=1, max.tips=length(data$phy$tip.label), min.nodes=1, max.nodes=data$phy$Nnode, verbose=FALSE) ## S3 method for class 'phylo.d.subset' print(x, ...) ## S3 method for class 'phylo.d.subset' summary(object, ...)
data |
A 'comparative.data' or 'data.frame' object. |
phy |
An object of class 'phylo', required when data is not a 'comparative.data' object. |
names.col |
A name specifying the column in 'data' that matches rows to tips in 'phy', required when data is not a 'comparative.data' object. |
binvar |
The name of the variable in |
permut |
Number of permutations to be used in the randomisation test. |
rnd.bias |
An optional name of a variable in |
verbose |
Logical; do you want to know how many clades are being assessed, and see when each is being assessed? |
min.tips |
The minimum number of tips a clade should have for it to have a D value calculated. Defaults to 1 (i.e. no limit). |
max.tips |
The maximum number of species a clade should have for it to have a D value calculated. Defaults to the number of species in the whole phylogeny (i.e. no limit). |
min.nodes |
The minimum number of nodes a clade should have for it to have a D value calculated. Defaults to 1 (i.e. no limit). |
max.nodes |
The maximum number of nodes a clade should have for it to have a D value calculated. Defaults to the number of nodes in the whole phylogeny (i.e. no limit). |
x |
An object of class 'phylo.d.subset' |
object |
An object of class 'phylo.d.subset' |
... |
Further arguments to print and summary methods |
A wrapper function for phylo.d
, calculating D values for clades within a given dataset. These clades can be filtered according to the number of species and nodes using the arguments above. See phylo.d
for more details on the method itself.
Any clades for which there is no variation in the binary variable have NA
values for all of the below slots.
Returns an object of class 'phylo.d.subset', which is a list of the following:
raw |
A list of the raw output from |
DEstimate |
A vector of the estimated D values |
Pval1 |
A vector of p values, giving the result of testing whether D is significantly different from one, for each clade |
Pval0 |
A vector of p values, giving the result of testing whether D is significantly different from zero, for each clade |
phy.depth |
A numeric vector giving the age of the clade for which each value was calculated |
Susanne Fritz ([email protected]), Will Pearse and David Orme
Fritz, S. A. and Purvis, A. (2010). Selectivity in mammalian extinction risk and threat types: a new measure of phylogenetic signal strength in binary traits. Conservation Biology, 24(4):1042-1051.
data(BritishBirds) BritishBirds <- comparative.data(BritishBirds.tree, BritishBirds.data, binomial) # Look at big clades only ## Not run: bigClades <- phylo.d.subset(BritishBirds, binvar=Red_list, verbose=TRUE, min.tips=10, min.nodes=5) print(bigClades) ## End(Not run)
data(BritishBirds) BritishBirds <- comparative.data(BritishBirds.tree, BritishBirds.data, binomial) # Look at big clades only ## Not run: bigClades <- phylo.d.subset(BritishBirds, binvar=Red_list, verbose=TRUE, min.tips=10, min.nodes=5) print(bigClades) ## End(Not run)
The function generates four diagnostics plots for 'pgls' models.
## S3 method for class 'pgls' plot(x, ...)
## S3 method for class 'pgls' plot(x, ...)
x |
An object of class 'pgls'. |
... |
Additional arguments to plot functions. |
The first two plots show the fit of the phylogenetic residuals from the model to a normal distribution: a density plot of the residuals and a normal Q-Q plot. The second two plots scatterplots show pattern in the distribution of the fitted values against the observed and residual values.
Rob Freckleton, David Orme
data(shorebird) shorebird <- comparative.data(shorebird.tree, shorebird.data, Species, vcv=TRUE, vcv.dim=3) mod1 <- pgls(log(Egg.Mass) ~ log(M.Mass) * log(F.Mass), shorebird) par(mfrow=c(2,2)) plot(mod1)
data(shorebird) shorebird <- comparative.data(shorebird.tree, shorebird.data, Species, vcv=TRUE, vcv.dim=3) mod1 <- pgls(log(Egg.Mass) ~ log(M.Mass) * log(F.Mass), shorebird) par(mfrow=c(2,2)) plot(mod1)
This is a comparative dataset on the evolution of shorebird egg size taken from Lislevand and Thomas (2006).
data(shorebird)
data(shorebird)
The datafile contains a phylogeny (shorebird.tree
) of 71 shorebird species as a 'phylo' object from the ape
library. The tip names are the binomial names of the species. The file also contains a data frame (shorebird.data
) of 71 complete cases for those species. The data frame contains six variables:
The species binomial name.
The adult male mass body mass in grams.
The adult female mass body mass in grams.
The fresh egg mass in grams.
The mean clutch size
The mating system, as a three level factor: monogamous (MO), polygynous (PO) or polyandrous (PA).
Lislevand, T and Thomas, G. H. (2006) Limited male incubation ability and the evolution of egg size in shorebirds. Biology Letters 2, 206 - 208
crunch, pgls
The summary method simply returns the linear model summary from the 'caic' object. The print method prints some basic information about the analysis followed by the model summary.
## S3 method for class 'caic' summary(object, ...) ## S3 method for class 'caic' print(x,...)
## S3 method for class 'caic' summary(object, ...) ## S3 method for class 'caic' print(x,...)
object |
An object of class 'caic'. |
x |
An object of class 'caic'. |
... |
Arguments to be passed to 'summary.lm'. |
The summary method returns an object of class 'summary.lm'.
David Orme
crunch
,brunch
, link{macrocaic}
data(shorebird) shorebird <- comparative.data(shorebird.tree, shorebird.data, Species) crunchMod <- crunch(Egg.Mass ~ F.Mass + M.Mass, data=shorebird) print(crunchMod) summary(crunchMod)
data(shorebird) shorebird <- comparative.data(shorebird.tree, shorebird.data, Species) crunchMod <- crunch(Egg.Mass ~ F.Mass + M.Mass, data=shorebird) print(crunchMod) summary(crunchMod)
A genus level phylogeny of the Syrphidae (sawflies) along with data on the species richness of each genus.
data(syrphidae)
data(syrphidae)
A 'phylo' object (syrphidaeTree
) containing a phylogeny of 204 genera of sawflies and a data frame (syrphidaeRich
) of the species richness of each genus.
Katzourakis, A., Purvis, A., Azmeh, S., Rotheray, G., and Gilbert, F. (2001). Macroevolution of hoverflies (Diptera : Syrphidae): the effect of using higher-level taxa in studies of biodiversity, and correlates of species richness. Journal of Evolutionary Biology, 14:219-227.
fusco.test
The function turns a phylogeny into a variance-covariance matrix, as in the function vcv.phylo
in the 'ape' package but can also return a 3D array retaining the individual branch lengths contributing to the shared branch lengths. This is useful for handling some branch length transformations, such as kappa, and has a lower overhead than repeatedly calling vcv.phylo
on a phylogeny after transforming the vector of edge lengths.
VCV.array(phy, dim=2, compact=TRUE)
VCV.array(phy, dim=2, compact=TRUE)
phy |
An object of class 'phylo'. |
dim |
Either 2, for a standard VCV matrix, or 3, for an array of branch lengths. |
compact |
A logical vector indicating the form to use for a 3D array. |
The compact form of the 3D array uses a shortened third dimension, which is only long enough to hold the maximum number of shared branches between root and tip for each pair of tips. Zeros are used to pad out this depth vector for tip pairs with shorter paths. The non-compact form returns 3D array showing, for each pair of tips and each node in the tree, either 0 if the node is not shared or the appropriate edge length if the node is shared. Note that, for maximally unbalanced trees, the size of the two forms will be identical.
The algorithm for the noncompact form is faster than for the compact form but it has very high memory overheads on big trees. The 2 dimensional algorithm is at least twice as fast as vcv.phylo
on trees up to 2500 tips.
The apply
function can be easily used to collapse the array down to a standard VCV matrix, as in the example.
When dim = 2, a variance covariance matrix of class 'VCV.array' of dimension nTips by nTips with dimnames set from the tip labels of the phylogeny.
When dim = 3, a 3 dimensional array of class 'VCV.array' with dimensions of the number of taxa in the phylogeny for the rows and columns and either the maximum number of branches on the root to tip path or the number of internal nodes as the depth, depending on the setting of compact
. The rows and columns are named using the tip labels from the phylogeny and the depth only named with node numbers if compact
is TRUE.
David Orme
tree <- rcoal(8) tree.VCV <- vcv.phylo(tree) tree.VCVA <- VCV.array(tree) # reconstruct a simple VCV array tree.VCVA.reduced <- apply(tree.VCVA, c(1,2), sum, na.rm=TRUE) # minimal differences between the two all((tree.VCVA.reduced - tree.VCV) < 1e-10) # a kappa transformation of 0.5 apply(tree.VCVA ^ 0.5, c(1,2), sum, na.rm=TRUE)
tree <- rcoal(8) tree.VCV <- vcv.phylo(tree) tree.VCVA <- VCV.array(tree) # reconstruct a simple VCV array tree.VCVA.reduced <- apply(tree.VCVA, c(1,2), sum, na.rm=TRUE) # minimal differences between the two all((tree.VCVA.reduced - tree.VCV) < 1e-10) # a kappa transformation of 0.5 apply(tree.VCVA ^ 0.5, c(1,2), sum, na.rm=TRUE)