Title: | Reconstruct a Distribution from a Collection of Quantiles |
---|---|
Description: | Given a set of predictive quantiles from a distribution, estimate the distribution and create `d`, `p`, `q`, and `r` functions to evaluate its density function, distribution function, and quantile function, and generate random samples. On the interior of the provided quantiles, an interpolation method such as a monotonic cubic spline is used; the tails are approximated by a location-scale family. |
Authors: | Evan Ray [aut, cre], Aaron Gerding [aut], Li Shandross [ctb], Nick Reich [ctb] |
Maintainer: | Evan Ray <[email protected]> |
License: | GPL (>= 3) |
Version: | 1.0.4 |
Built: | 2024-11-13 05:45:09 UTC |
Source: | https://github.com/cran/distfromq |
Identify duplicated values in a sorted numeric vector, where comparison is up to a specified numeric tolerance. If there is a run of values where each consecutive pair is closer together than the tolerance, all are labeled as duplicates even if not all values in the run are within the tolerance.
duplicated_tol(x, tol = 1e-06, incl_first = FALSE)
duplicated_tol(x, tol = 1e-06, incl_first = FALSE)
x |
a numeric vector in which to identify duplicates |
tol |
numeric tolerance for identifying duplicates |
incl_first |
boolean indicator of whether or not the first entry in a
run of duplicates should be indicated as a duplicate. |
a boolean vector of the same length as x
Get indices of starts and ends of runs of duplicate values
get_dup_run_inds(dups)
get_dup_run_inds(dups)
dups |
a boolean vector that would result from calling
|
named list with entries starts
giving indices of the first element
in each sequence of runs of duplicate values and ends
giving indices of
the last element in each sequence of runs of duplicate values.
Creates a function that evaluates the probability density function of an approximation to a distribution obtained by interpolating and extrapolating from a set of quantiles of the distribution.
make_d_fn( ps, qs, interior_method = "spline_cdf", interior_args = list(), tail_dist = "norm", dup_tol = 1e-06, zero_tol = 1e-12 )
make_d_fn( ps, qs, interior_method = "spline_cdf", interior_args = list(), tail_dist = "norm", dup_tol = 1e-06, zero_tol = 1e-12 )
ps |
vector of probability levels |
qs |
vector of quantile values corresponding to ps |
interior_method |
method for interpolating the distribution on the
interior of the provided |
interior_args |
an optional named list of arguments that are passed
on to the |
tail_dist |
name of parametric distribution for the tails |
dup_tol |
numeric tolerance for identifying duplicated values indicating a discrete component of the distribution. If there is a run of values where each consecutive pair is closer together than the tolerance, all are labeled as duplicates even if not all values in the run are within the tolerance. |
zero_tol |
numeric tolerance for identifying values in |
The default interior_method
, "spline_cdf"
, represents the
distribution as a sum of a discrete component at any points where there
are duplicated qs
for multiple different ps
and a continuous component
that is estimated by using a monotonic cubic spline that interpolates the
provided (q, p)
pairs as an estimate of the CDF. The density function is
then obtained by differentiating this estimate of the CDF.
Optionally, the user may provide another function that accepts arguments
ps
, qs
, tail_dist
, and fn_type
(which will be either "d"
, "p"
,
or "q"
), and optionally additional named arguments to be specified via
interior_args
. This function should return a function with arguments
x
, log
that evaluates the pdf or its logarithm.
a function with arguments x
and log
that can be used to evaluate
the approximate density function (or its log
) at the points x
.
Creates a function that evaluates the cumulative distribution function of an approximation to a distribution obtained by interpolating and extrapolating from a set of quantiles of the distribution.
make_p_fn( ps, qs, interior_method = "spline_cdf", interior_args = list(), tail_dist = "norm", dup_tol = 1e-06, zero_tol = 1e-12 )
make_p_fn( ps, qs, interior_method = "spline_cdf", interior_args = list(), tail_dist = "norm", dup_tol = 1e-06, zero_tol = 1e-12 )
ps |
vector of probability levels |
qs |
vector of quantile values corresponding to ps |
interior_method |
method for interpolating the distribution on the
interior of the provided |
interior_args |
an optional named list of arguments that are passed
on to the |
tail_dist |
name of parametric distribution for the tails |
dup_tol |
numeric tolerance for identifying duplicated values indicating a discrete component of the distribution. If there is a run of values where each consecutive pair is closer together than the tolerance, all are labeled as duplicates even if not all values in the run are within the tolerance. |
zero_tol |
numeric tolerance for identifying values in |
The default interior_method
, "spline_cdf"
, represents the
distribution as a sum of a discrete component at any points where there
are duplicated qs
for multiple different ps
and a continuous component
that is estimated by using a monotonic cubic spline that interpolates the
provided (q, p)
pairs as an estimate of the CDF.
Optionally, the user may provide another function that accepts arguments
ps
, qs
, tail_dist
, and fn_type
(which will be either "d"
, "p"
,
or "q"
), and optionally additional named arguments to be specified via
interior_args
. This function should return a function with arguments
x
, log
that evaluates the pdf or its logarithm.
a function with arguments q
and log.p
that can be used to
evaluate the approximate cumulative distribution function (or its log
)
at the points q
.
Creates a function that evaluates the quantile function of an approximation to a distribution obtained by interpolating and extrapolating from a set of quantiles of the distribution.
make_q_fn( ps, qs, interior_method = "spline_cdf", interior_args = list(), tail_dist = "norm", dup_tol = 1e-06, zero_tol = 1e-12 )
make_q_fn( ps, qs, interior_method = "spline_cdf", interior_args = list(), tail_dist = "norm", dup_tol = 1e-06, zero_tol = 1e-12 )
ps |
vector of probability levels |
qs |
vector of quantile values corresponding to ps |
interior_method |
method for interpolating the distribution on the
interior of the provided |
interior_args |
an optional named list of arguments that are passed
on to the |
tail_dist |
name of parametric distribution for the tails |
dup_tol |
numeric tolerance for identifying duplicated values indicating a discrete component of the distribution. If there is a run of values where each consecutive pair is closer together than the tolerance, all are labeled as duplicates even if not all values in the run are within the tolerance. |
zero_tol |
numeric tolerance for identifying values in |
The default interior_method
, "spline_cdf"
, represents the
distribution as a sum of a discrete component at any points where there
are duplicated qs
for multiple different ps
and a continuous component
that is estimated by using a monotonic cubic spline that interpolates the
provided (q, p)
pairs as an estimate of the CDF. The quantile function
is then obtained by inverting this estimate of the CDF.
Optionally, the user may provide another function that accepts arguments
ps
, qs
, tail_dist
, and fn_type
(which will be either "d"
, "p"
,
or "q"
), and optionally additional named arguments to be specified via
interior_args
. This function should return a function with argument p
that evaluates the quantile function.
a function with argument p
that can be used to calculate quantiles
of the approximated distribution at the probability levels p
.
Creates a function that generates random deviates from an approximation to a distribution obtained by interpolating and extrapolating from a set of quantiles of the distribution.
make_r_fn( ps, qs, interior_method = "spline_cdf", interior_args = list(), tail_dist = "norm", dup_tol = 1e-06, zero_tol = 1e-12 )
make_r_fn( ps, qs, interior_method = "spline_cdf", interior_args = list(), tail_dist = "norm", dup_tol = 1e-06, zero_tol = 1e-12 )
ps |
vector of probability levels |
qs |
vector of quantile values corresponding to ps |
interior_method |
method for interpolating the distribution on the
interior of the provided |
interior_args |
an optional named list of arguments that are passed
on to the |
tail_dist |
name of parametric distribution for the tails |
dup_tol |
numeric tolerance for identifying duplicated values indicating a discrete component of the distribution. If there is a run of values where each consecutive pair is closer together than the tolerance, all are labeled as duplicates even if not all values in the run are within the tolerance. |
zero_tol |
numeric tolerance for identifying values in |
The default interior_method
, "spline_cdf"
, represents the
distribution as a sum of a discrete component at any points where there
are duplicated qs
for multiple different ps
and a continuous component
that is estimated by using a monotonic cubic spline that interpolates the
provided (q, p)
pairs as an estimate of the CDF. The quantile function
is then obtained by inverting this estimate of the CDF.
Optionally, the user may provide another function that accepts arguments
ps
, qs
, tail_dist
, and fn_type
(which will be either "d"
, "p"
,
or "q"
), and optionally additional named arguments to be specified via
interior_args
. This function should return a function with argument p
that evaluates the quantile function.
a function with argument n
that can be used to generate a sample of
size n
from the approximated distribution.
Create a polySpline object representing a monotonic Hermite spline interpolating a given set of points.
mono_Hermite_spline(x, y, m)
mono_Hermite_spline(x, y, m)
x |
vector giving the x coordinates of the points to be interpolated. |
y |
vector giving the y coordinates of the points to be interpolated. Must be increasing or decreasing for 'method = "hyman"'. |
m |
(for 'splinefunH()') vector of slopes |
This function essentially reproduces stats::splinefunH
, but it
returns a polynomial spline object as used in the splines
package rather
than a function that evaluates the spline, and potentially makes
adjustments to the input slopes m
to enforce monotonicity.
An object of class polySpline
with the spline object, suitable for
use with other functionality from the splines
package.
qs
and a continuous part for which the CDF is
estimated using a monotonic Hermite spline. See details for more.Approximate density function, CDF, or quantile function on the interior of
provided quantiles by representing the distribution as a sum of a discrete
part at any duplicated qs
and a continuous part for which the CDF is
estimated using a monotonic Hermite spline. See details for more.
spline_cdf(ps, qs, tail_dist, fn_type = c("d", "p", "q"), n_grid = 20)
spline_cdf(ps, qs, tail_dist, fn_type = c("d", "p", "q"), n_grid = 20)
ps |
vector of probability levels |
qs |
vector of quantile values corresponding to ps |
tail_dist |
name of parametric distribution for the tails |
fn_type |
the type of function that is requested: |
n_grid |
grid size to use when augmenting the input |
The CDF of the continuous part of the distribution is estimated
using a monotonic degree 3 Hermite spline that interpolates the quantiles
after subtracting the discrete distribution and renormalizing. In theory,
an estimate of the quantile function could be obtained by directly inverting
this spline. However, in practice, we have observed that this can suffer from
numerical problems. Therefore, the default behavior of this function is to
evaluate the "stage 1" CDF estimate corresponding to discrete point masses
plus monotonic spline at a fine grid of points, and use the "stage 2" CDF
estimate that linearly interpolates these points with steps at any duplicated
q values. The quantile function estimate is obtained by inverting this
"stage 2" CDF estimate. When the distribution is continuous, we can obtain an
estimate of the PDF by differentiating the CDF estimate, resulting in a
discontinuous "histogram density". The size of the grid can be specified with
the n_grid
argument. In settings where it is desirable to obtain a
continuous density function, the "stage 1" CDF estimate can be used by
setting n_grid = NULL
.
a function to evaluate the PDF, CDF, or quantile function.
Split ps and qs into those corresponding to discrete and continuous parts of a distribution.
split_disc_cont_ps_qs( ps, qs, dup_tol = 1e-06, zero_tol = 1e-12, is_hurdle = FALSE )
split_disc_cont_ps_qs( ps, qs, dup_tol = 1e-06, zero_tol = 1e-12, is_hurdle = FALSE )
ps |
vector of probability levels |
qs |
vector of quantile values corresponding to ps |
dup_tol |
numeric tolerance for identifying duplicated values indicating a discrete component of the distribution. If there is a run of values where each consecutive pair is closer together than the tolerance, all are labeled as duplicates even if not all values in the run are within the tolerance. |
zero_tol |
numeric tolerance for identifying values in |
is_hurdle |
boolean indicating whether or not this is a hurdle model.
If so, qs of zero always indicate the presence of a point mass at 0.
In this case, 0 is not included among the returned |
named list with the following entries:
disc_weight
: estimated numeric weight of the discrete part of the
distribution.
disc_ps
: estimated probabilities of discrete components. May be
numeric(0)
if there are no estimated discrete components.
disc_qs
: locations of discrete components, corresponding to duplicated
values in the input qs
. May be numeric(0)
if there are no discrete
components.
cont_ps
: probability levels for the continuous part of the distribution
cont_qs
: quantile values for the continuous part of the distribution
disc_ps_range
: a list of length equal to the number of point masses in
the discrete distribution. Each entry is a numeric vector of length two
with the value of the CDF approaching the point mass from the left and
from the right.
A factory that returns a function that performs linear interpolation, allowing for "steps" or discontinuities.
step_interp_factory(x, y, cont_dir = c("right", "left"), increasing = TRUE)
step_interp_factory(x, y, cont_dir = c("right", "left"), increasing = TRUE)
x |
numeric vector with the "horizontal axis" coordinates of the points to interpolate. |
y |
numeric vector with the "vertical axis" coordinates of the points to interpolate. |
cont_dir |
at steps or discontinuities, the direction from which the function is continuous. This will be "right" for a CDF or "left" for a QF. |
increasing |
boolean indicating whether the function is increasing or
decreasing. Only used in the degenerate case where there is only one unique
value of |
a function with argument x
that performs linear approximation of
the input data points.
Get unique values in a sorted numeric vector, where comparison is up to a specified numeric tolerance. If there is a run of values where each consecutive pair is closer together than the tolerance, all are labeled as corresponding to a single unique value even if not all values in the run are within the tolerance.
unique_tol(x, tol = 1e-06, ties = mean)
unique_tol(x, tol = 1e-06, ties = mean)
x |
a numeric vector in which to identify duplicates |
tol |
numeric tolerance for identifying duplicates |
ties |
a function that is used to summarize groups of values that fall within the tolerance |
a numeric vector of the unique values in x