Identifies the best method for outlier detection for a single species.
Source:R/bestmethod.R
bestmethod.Rd
Identifies the best method for outlier detection for a single species.
Usage
bestmethod(
x,
sp = NULL,
threshold = NULL,
autothreshold = FALSE,
warn = FALSE,
verbose = FALSE
)
Arguments
- x
List of dataframes for each methods used to identify outliers in
multdetect
function.- sp
species name or index if multiple species are considered during outlier detection.
- threshold
Maximum value to denote an absolute outlier. The threshold ranges from
0
which indicates a point has not been flagged by any outlier detection method as anoutlier
or1
, when means the record is an absolute or true outlier sicen it has been identified by all methods. At both extremes, at low threshold values, many records are classified, which may be due to individual method weakness or strength and data distribution. Also, at higher threshold values, the true outliers are retained Fo example, if 10 methods are considered and 9 methods flags a record as an outlier, If a cut off 1 is used, then that particular record is retained. Therefore thedefault
cutoff is 0.6 butautothreshold
can be used to select the appropriate threshold.- autothreshold
Identifies the threshold with mean number of absolute outliers.The search is limited within 0.51 to 1 since thresholds less than are deemed inappropriate for identifying absolute outliers. The autothreshold is used when
threshold
is set toNULL
.- warn
If
TRUE
, warning on whether absolute outliers obtained at a low threshold is indicated. DefaultTRUE
.- verbose
if
TRUE
then messages and warnings will be produced. DefaultFALSE
.
Examples
if (FALSE) { # \dontrun{
data("jdsdata")
matchdata <- match_datasets(datasets = list(jds = jdsdata, efi=efidata),
lats = 'lat',
lons = 'lon',
species = c('speciesname','scientificName'),
date = c('Date', 'sampling_date'),
country = c('JDS4_site_ID'))
datacheck <- check_names(matchdata, colsp= 'species', pct = 90, merge =TRUE)
db <- sf::st_read(system.file('extdata/danube/basinfinal.shp', package='specleanr'), quiet=TRUE)
worldclim <- terra::rast(system.file('extdata/worldclim.tiff', package='specleanr'))
rdata <- pred_extract(data = datacheck,
raster= worldclim ,
lat = 'decimalLatitude',
lon= 'decimalLongitude',
colsp = 'speciescheck',
bbox = db,
multiple = TRUE,
minpts = 10,
list=TRUE,
merge=F)
out_df <- multidetect(data = rdata, multiple = TRUE,
var = 'bio6',
output = 'outlier',
exclude = c('x','y'),
methods = c('zscore', 'adjbox','iqr', 'semiqr','hampel', 'kmeans',
'logboxplot', 'lof','iforest', 'mahal', 'seqfences'))
bmout <- bestmethod(x = out_df, sp= 1, threshold = 0.2)#
} # }