Extract final clean data using either absolute or best method generated outliers.
Source:R/classify_data.R
classify_data.Rd
Extract final clean data using either absolute or best method generated outliers.
Usage
classify_data(
refdata,
outliers,
var_col = NULL,
threshold = 0.1,
warn = FALSE,
verbose = TRUE,
classify = "med",
EIF = FALSE
)
Arguments
- refdata
dataframe
. The reference data for the species used in outlier detection.- outliers
string
. Output from the outlier detection process.- var_col
string
. A parameter to be used if thedata
is a data frame and the user must indicate the column with species names.- threshold
numeric
. Value to consider whether the outlier is an absolute outlier or not.- warn
logical
. If FALSE, warning on whether absolute outliers obtained at a low threshold is indicated. Default TRUE.- verbose
logical
. Produces messages or not. Default FALSE.- classify
string
. Categorize data base on the correlation coefficient manner based onAkoglu 2018
. For more information check in the details section.- EIF
logical
To calculate the empirical influence function for each value.
Details
Outlier cluster weights were based on statistical classification of coefficients mostly for correlation based on Akoglu 2018
.
They are classified based on three naming standards, namely Dancey & Reidy (Physchology), Quinni piac University (Politics) and Chan YH medicine.
All classifications have been used in the function and each affects the data clusters. The default is Chan YH (medicine).
Examples
if (FALSE) { # \dontrun{
data(jdsdata)
data(efidata)
matchdata <- match_datasets(datasets = list(jds = jdsdata, efi = efidata),
lats = 'lat',
lons = 'lon',
species = c('speciesname','scientificName'),
country= c('JDS4_site_ID'),
date=c('sampling_date', 'Date'))
datacheck <- check_names(matchdata, var_col= 'species', pct = 90, merge =TRUE)
db <- sf::st_read(system.file('extdata/danube/basinfinal.shp', package='specleanr'), quiet=TRUE)
worldclim <- terra::rast(system.file('extdata/worldclim.tiff', package='specleanr'))
rdata <- pred_extract(data = datacheck,
raster= worldclim ,
lat = 'decimalLatitude',
lon= 'decimalLongitude',
var_col = 'speciescheck',
bbox = db,
multiple = TRUE,
minpts = 10,
list=TRUE,
merge=F)
out_df <- multidetect(data = rdata, multiple = TRUE,
var = 'bio6',
output = 'outlier',
exclude = c('x','y'),
methods = c('zscore', 'adjbox','iqr', 'semiqr','hampel'))
#extracting use the absolute method for one species
extractabs <- extract_clean_data_label(refdata = rdata, outliers = out_df)
} # }