Flags outliers based on Mahalanobis distance matrix for all records.
Source:R/outliermethods.R
mahal.Rd
Flags outliers based on Mahalanobis distance matrix for all records.
Usage
mahal(
data,
exclude = NULL,
output = "outlier",
mode = "soft",
pdf = 0.95,
tol = 1e-20,
pc = FALSE,
boot = FALSE,
var,
pcvar = NULL
)
Arguments
- data
dataframe
. Dataframe to check for outliers or extract the clean data.- exclude
vector or string
Variables that should not be considered in the executing the Mahalanobis distance matrix. These can be coordinates such as latitude/longitude or any column that the user doesn't want to consider.- output
string
Eitherclean
for a data set with no outliers oroutlier
to output a data frame with outliers.- mode
string
Eitherrobust
, if a robust mode is used which usesauto
estimator to instead of mean. Default mode issoft
.numeric
chisqure probability distribution value used for flagging outliers(Leys et al. 2018)
. Default is0.95
.- tol
numeric
tolernce value when the inverse calculation are too small. Default1e-20
.- pc
Whether principal component analysis will be computed. Default
FALSE
- boot
Whether bootstrapping will be computed. Default
FALSE
- var
The variable of concern, which is vital for univariate outlier detection methods
- pcvar
Principal component analysis to e used for outlier detection after PCA. Default
PC1
References
Leys C, Klein O, Dominicy Y, Ley C. 2018. Detecting multivariate outliers: Use a robust variant of the Mahalanobis distance. Journal of Experimental Social Psychology 74:150-156.
Examples
if (FALSE) { # \dontrun{
data("efidata")
gbd <- check_names(data = efidata, colsp='scientificName', pct=90, merge=TRUE)
danube <- system.file('extdata/danube.shp.zip', package='specleanr')
db <- sf::st_read(danube, quiet=TRUE)
wcd <- terra::rast(system.file('extdata/worldclim.tiff', package='specleanr'))
refdata <- pred_extract(data = gbd, raster= wcd ,
lat = 'decimalLatitude',
lon= 'decimalLongitude',
colsp = 'speciescheck',
bbox = db,
minpts = 10)
#outliers
outliers <- mahal(data = refdata[['Salmo trutta']], exclude = c("x", "y"),
output='outlier')
} # }