Skip to contents

Identify best outlier detection method using simple matching coefficient.

Usage

smc(x, sp = NULL, threshold = NULL, warn = FALSE, autothreshold = FALSE)

Arguments

x

datacleaner class for each methods used to identify outliers in multidetect function.

sp

string. Species name or index if multiple species are considered during outlier detection.

threshold

numeric. Maximum value to denote an absolute outlier. The threshold ranges from 0, which indicates a point has not been flagged by any outlier detection method as an outlier, to 1, which means the record is an absolute or true outlier since all methods have identified it. At both extremes, many records are classified at low threshold values, which may be due to individual method weakness or strength and data distribution. Also, at higher threshold values, the true outliers are retained. For example, if ten methods are considered and 9 methods flag a record as an outlier, If a cutoff of 1 is used, then that particular record is retained. Therefore, the default cutoff is 0.6, but autothreshold can be used to select the appropriate threshold.

warn

logical. If TRUE, warning on whether absolute outliers obtained at a low threshold is indicated. Default TRUE.

autothreshold

vector. Identifies the threshold with mean number of absolute outliers.The search is limited within 0.51 to 1 since thresholds less than are deemed inappropriate for identifying absolute outliers. The autothreshold is used when threshold is set to NULL.

Value

best method for identifying outliers based on simple matching coefficient.

Examples


if (FALSE) { # \dontrun{
data(efidata)

db <- sf::read_sf(system.file('extdata/danube/basinfinal.shp', package = "specleanr"), quiet = TRUE)

wcd <- terra::rast(system.file('extdata/worldclim.tiff', package = "specleanr"))

checkname <- check_names(data=efidata, colsp ='scientificName', pct = 90, merge = T)

extdf <- pred_extract(data = checkname, raster = wcd,
                      lat = 'decimalLatitude', lon = 'decimalLongitude',
                     colsp = 'speciescheck',
                     list = TRUE,verbose = F,
                     minpts = 6,merge = F)#basin removed

 #outlier detection

outliersdf <- multidetect(data = extdf, output='outlier', var = 'bio6',
                         exclude = c('x','y'), multiple = TRUE,
                         methods = c('mixediqr', "iqr", "mahal", "iqr", "logboxplot"),
                         showErrors = FALSE, warn = TRUE, verbose = FALSE, sdm = TRUE)

smcout <- smc(x = outliersdf, sp= 8, threshold = 0.2)#


} # }