Skip to contents

Extract final clean data using either absolute or best method generated outliers.

Usage

classify_data(
  refdata,
  outliers,
  var_col = NULL,
  threshold = 0.1,
  warn = FALSE,
  verbose = TRUE,
  classify = "med",
  EIF = FALSE
)

Arguments

refdata

dataframe. The reference data for the species used in outlier detection.

outliers

string. Output from the outlier detection process.

var_col

string. A parameter to be used if the data is a data frame and the user must indicate the column with species names.

threshold

numeric. Value to consider whether the outlier is an absolute outlier or not.

warn

logical. If FALSE, warning on whether absolute outliers obtained at a low threshold is indicated. Default TRUE.

verbose

logical. Produces messages or not. Default FALSE.

classify

string. Categorize data base on the correlation coefficient manner based on Akoglu 2018. For more information check in the details section.

EIF

logical To calculate the empirical influence function for each value.

Value

Either a list or dataframe of cleaned records for multiple species.

Details

Outlier cluster weights were based on statistical classification of coefficients mostly for correlation based on Akoglu 2018. They are classified based on three naming standards, namely Dancey & Reidy (Physchology), Quinni piac University (Politics) and Chan YH medicine. All classifications have been used in the function and each affects the data clusters. The default is Chan YH (medicine).

References

Akoglu, H. 2018. User’s guide to correlation coefficients. - Turk J Emerg Med 18: 91–93.

See also

Examples


if (FALSE) { # \dontrun{

data(jdsdata)
data(efidata)
matchdata <- match_datasets(datasets = list(jds = jdsdata, efi = efidata),
                            lats = 'lat',
                            lons = 'lon',
                            species = c('speciesname','scientificName'),
                            country= c('JDS4_site_ID'),
                            date=c('sampling_date', 'Date'))

datacheck <- check_names(matchdata, var_col= 'species', pct = 90, merge =TRUE)


db <- sf::st_read(system.file('extdata/danube/basinfinal.shp', package='specleanr'), quiet=TRUE)


worldclim <- terra::rast(system.file('extdata/worldclim.tiff', package='specleanr'))

rdata <- pred_extract(data = datacheck,
                      raster= worldclim ,
                      lat = 'decimalLatitude',
                      lon= 'decimalLongitude',
                      var_col = 'speciescheck',
                      bbox = db,
                      multiple = TRUE,
                      minpts = 10,
                      list=TRUE,
                      merge=F)


out_df <- multidetect(data = rdata, multiple = TRUE,
                      var = 'bio6',
                      output = 'outlier',
                      exclude = c('x','y'),
                      methods = c('zscore', 'adjbox','iqr', 'semiqr','hampel'))

#extracting use the absolute method for one species

extractabs <- extract_clean_data_label(refdata = rdata, outliers = out_df)
} # }