Flags outliers using kmeans clustering method
Usage
xkmeans(
data,
k,
exclude = NULL,
output,
mode = "soft",
method = "silhouette",
seed = 1135,
verbose = FALSE,
pc = FALSE,
boot = FALSE,
var,
pcvar = NULL
)
Arguments
- data
Dataframe to check for outliers
- k
The number of clusters to be used for optimization. It should be greater than 1. For many species k should be be greater 10 to ably cater for each species search for optimal k using the different optimization methods in kmethod
- exclude
Exclude variables that should not be considered in the fitting the one class model, for example x and y columns or latitude/longitude or any column that the user doesn't want to consider.
- output
Either clean: for a data set with no outliers or outlier: to output a data frame with outliers.
- mode
Either robust, if a robust mode is used which uses median instead of mean and median absolute deviation from median.
- method
The method to be used for the kmeans clustering. Default is
silhouette
.Elbow method
can be used but user input is required, and therefore multiple outlier detection method is not possible.- seed
An integer to fix the maintain the iterations by during the kmeans method optimisation.
- verbose
To indicate messages and the default is FALSE.
- pc
Whether principal component analysis will be computed. Default
FALSE
- boot
Whether bootstrapping will be computed. Default
FALSE
- var
The variable of concern, which is vital for univariate outlier detection methods
- pcvar
Principal component analysis to e used for outlier detection after PCA. Default
PC1
Examples
if (FALSE) { # \dontrun{
data("efidata")
gbd <- check_names(data = efidata, colsp='scientificName', pct=90, merge=TRUE)
danube <- system.file('extdata/danube.shp.zip', package='specleanr')
db <- sf::st_read(danube, quiet=TRUE)
wcd <- terra::rast(system.file('extdata/worldclim.tiff', package='specleanr'))
refdata <- pred_extract(data = gbd, raster= wcd ,
lat = 'decimalLatitude',
lon= 'decimalLongitude',
colsp = 'speciescheck',
bbox = db,
minpts = 10)
kmeansout <- xkmeans(data = refdata[['Salmo trutta']],
output='outlier', exclude = c('x', 'y'), mode = 'soft', k=3)
} # }