Extract final clean data using either absolute or best method generated outliers.
Source:R/extract_clean_data.R
extract_clean_data.Rd
Extract final clean data using either absolute or best method generated outliers.
Usage
extract_clean_data(
refdata,
outliers,
mode = "abs",
var_col = NULL,
threshold = NULL,
warn = FALSE,
verbose = FALSE,
autothreshold = FALSE,
pabs = 0.1,
loess = FALSE,
outlier_to_NA = FALSE
)
Arguments
- refdata
dataframe
. The reference data for the species used in outlier detection.- outliers
string
. Output from the outlier detection process.- mode
character
. Eitherabs
to use absolute outliers to filter data orbest
to outliers from best method.- var_col
string
. A parameter to be used if thedata
is a data frame and the user must indicate the column wih species names.- threshold
numeric
. Value to consider whether the outlier is an absolute outlier or not.- warn
logical
. If FALSE, warning on whether absolute outliers obtained at a low threshold is indicated. Default TRUE.- verbose
logical
. Produces messages or not. Default FALSE.- autothreshold
vector
. Identifies the threshold with mean number of absolute outliers.The search is limited within 0.51 to 1 since thresholds less than are deemed inappropriate for identifying absolute outliers. The autothreshold is used whenthreshold
is set toNULL
.- pabs
numeric
. Percentage of outliers allowed to be extracted from the data. Ifbest
is used to extract outliers and thepabs
is exceeded, the absolute outliers are removed instead. This because some records in the best methods are repeated and they will likely to remove true values as outliers.- loess
logical
. Set toTRUE
to use loess threshold optimization to extract clean data.- outlier_to_NA
logical
IfTRUE
a clean dataset will have outliers replaced with NAs. This parameter is experimented to ouput dataframe when multiple variables of concerns are considered during outlier detection.###param multiple TRUE for multiple species and FALSE for single species considered during outlier detection.
Examples
if (FALSE) { # \dontrun{
data(jdsdata)
data(efidata)
matchdata <- match_datasets(datasets = list(jds = jdsdata, efi = efidata),
lats = 'lat',
lons = 'lon',
species = c('speciesname','scientificName'),
country= c('JDS4_site_ID'),
date=c('sampling_date', 'Date'))
datacheck <- check_names(matchdata, colsp= 'species', pct = 90, merge =TRUE)
danube <- system.file('extdata/danube.shp.zip', package='specleanr')
db <- sf::st_read(danube, quiet=TRUE)
worldclim <- terra::rast(system.file('extdata/worldclim.tiff', package='specleanr'))
rdata <- pred_extract(data = datacheck,
raster= worldclim ,
lat = 'decimalLatitude',
lon= 'decimalLongitude',
colsp = 'speciescheck',
bbox = db,
minpts = 10,
list=TRUE,
merge=F)
out_df <- multidetect(data = rdata, multiple = TRUE,
var = 'bio6',
output = 'outlier',
exclude = c('x','y'),
methods = c('zscore', 'adjbox','iqr', 'semiqr','hampel'))
#extracting use the absolute method for one species
extractabs <- extract_clean_data(refdata = rdata, outliers = out_df,
mode = 'abs', threshold = 0.6,
autothreshold = FALSE)
bestmout_bm <- extract_clean_data(refdata = rdata, outliers = out_df,
mode = 'best', threshold = 0.6,
autothreshold = FALSE)
} # }