Log boxplot based for outlier detection.
Arguments
- data
Dataframe or vector where to check outliers.
- var
Variable to be used for outlier detection if data is not in a vector format.
- output
Either clean: for clean data output without outliers; outliers: for outlier data frame or vectors.
- x
The constant for creating lower and upper fences. Extreme is 3, but default is 1.5.
- pc
Whether principal component analysis will be computed. Default
FALSE
- pcvar
Principal component analysis to e used for outlier detection after PCA. Default
PC1
- boot
Whether bootstrapping will be computed. Default
FALSE
Value
Dataframe with our without outliers depending on the output.
- clean
Data without outliers.
- outlier
Data with outliers.
Details
The loxplot for outlier detection Barbato et al. (2011) modifies the the interquartile range method to detect outlier but considering the sample sizes while indicating the fences (lower and upper fences).
$$ lowerfence = [Q1 -1.5*IQR[1+0.1 * log(n/10)]$$
$$upperfence = [Q3 +1.5*IQR[1+0.1 *log(n/10)]$$
Where; Q1 is the lower quantile and Q3 is the upper quantile. The method consider the sample
size in setting the fences, to address the weakness of the interquartile range method (Tukey, 1977).
However. similar to IQR method for flagging outlier, log boxplot modification is affected by
data skewness and which can be address using
distboxplot, seqfences
, mixediqr
and
semiIQR
.
References
Barbato G, Barini EM, Genta G, Levi R. 2011. Features and performance of some outlier detection methods. Journal of Applied Statistics 38:2133-2149
Examples
if (FALSE) { # \dontrun{
data("efidata")
gbd <- check_names(data = efidata, colsp='scientificName', pct=90, merge=TRUE)
danube <- system.file('extdata/danube.shp.zip', package='specleanr')
db <- sf::st_read(danube, quiet=TRUE)
wcd <- terra::rast(system.file('extdata/worldclim.tiff', package='specleanr'))
refdata <- pred_extract(data = gbd, raster= wcd , lat = 'decimalLatitude', lon= 'decimalLongitude',
colsp = 'speciescheck',
bbox = db,
minpts = 10)
logout <- logboxplot(data = refdata[['Salmo trutta']], var = 'bio6', output='outlier')
} # }