moremata
/
mf_mm_ecdf.hlp

{smcl}
{* 23oct2020}{...}
{cmd:help mata mm_ecdf()}
{hline}

{title:Title}

{p 4 17 2}
{bf:mm_ecdf() -- Cumulative distribution function}


{title:Syntax}

{p 8 24 2}
{it:real matrix}{bind:    }
{cmd:mm_ecdf(}{it:X} [{cmd:,} {it:w}{cmd:,} {it:mid}{cmd:,} {it:nonorm}{cmd:,} {it:break}]{cmd:)}

{p 8 24 2}
{it:real colvector}{bind: }
{cmd:_mm_ecdf(}{it:x} [{cmd:,} {it:w}{cmd:,} {it:mid}{cmd:,} {it:nonorm}{cmd:,} {it:break}]{cmd:)}

{p 8 24 2}
{it:real matrix}{bind:    }
{cmd:mm_ecdf2(}{it:x} [{cmd:,} {it:w}{cmd:,} {it:mid}{cmd:,} {it:nonorm}]{cmd:)}

{p 8 24 2}
{it:real matrix}{bind:    }
{cmd:_mm_ecdf2(}{it:x} [{cmd:,} {it:w}{cmd:,} {it:mid}{cmd:,} {it:nonorm}]{cmd:)}

{p 4 8 2}
where

{p 14 18 2}{it:X}:  {it:real matrix} containing data (rows are observations, columns variables)
{p_end}
{p 14 18 2}{it:x}:  {it:real colvector} containing data (single variable)
{p_end}
{p 14 18 2}{it:w}:  {it:real colvector} containing weights
{p_end}
{p 12 18 2}{it:mid}:  {it:real scalar} requesting midpoint adjustment
{p_end}
{p 9 18 2}{it:nonorm}:  {it:real scalar} requesting the absolute distribution
{p_end}
{p 10 18 2}{it:break}:  {it:real scalar} requesting that ties be broken


{title:Description}

{pstd}
{cmd:mm_ecdf()} returns the empirical cumulative distribution
function (e.c.d.f.) of each column of {it:X}. Observations with equal values
receive the same cumulative value, unless {it:break}!=0 is
specified. {cmd:mm_ecdf()} is implemented as a wrapper of {helpb mf_mm_ranks:mm_ranks()}.

{pstd}
Argument {it:w} specifies weights associated
with the observations (rows) in {it:X}. Omit {it:w} or specify {it:w} as 1 to
obtain unweighted results.

{pstd}
Argument {it:mid}!=0 applies midpoint adjustment. In this case, at each step in the
cumulative distribution, the value of the midpoint of the step is returned.

{pstd}
Argument {it:nonorm}!=0 returns the distribution in frequency units (absolute
cumulative distribution). The default is to normalize the distribution (i.e.,
to divide by the number of observations or sum of weights).

{pstd}
Argument {it:break}!=0 causes ties to be broken (in random order).

{pstd}
{cmd:_mm_ecdf()} is like {cmd:mm_ecdf()}, but assumes that the data has
already been sorted (and, consequently, only accepts a single column as data
input). If {it:break}!=0 is specified, ties will be split in order of their
appearance. That is, {cmd:_mm_ecdf()} takes the order of the data as given and
does not rerandomize the order of ties.

{pstd}
{cmd:mm_ecdf()} returns the value of the cumulative distribution at each
observation in the data. To obtain the cumulative distribution
at {it:unique} values of the data, use {cmd:mm_ecdf2()}. {cmd:mm_ecdf2()} will
return a matrix with the (sorted) unique values of {it:x} in the first column
and the corresponding values of the cumulative distribution in the second column.

{pstd}
{cmd:_mm_ecdf2()} is like {cmd:mm_ecdf2()}, but assumes that the data has
already been sorted.


{title:Examples}

    {com}: x = (2,1,3,2,2)'
    {res}
    {com}: w = 1
    {res}
    {com}: p = order(x,1)
    {res}
    {com}: // default vs. break!=0
    : x[p], mm_ecdf(x,w)[p], mm_ecdf(x,1,0,0,1)[p]
    {res}       {txt} 1    2    3
        {c TLC}{hline 16}{c TRC}
      1 {c |}  {res} 1   .2   .2{txt}  {c |}
      2 {c |}  {res} 2   .8   .6{txt}  {c |}
      3 {c |}  {res} 2   .8   .4{txt}  {c |}
      4 {c |}  {res} 2   .8   .8{txt}  {c |}
      5 {c |}  {res} 3    1    1{txt}  {c |}
        {c BLC}{hline 16}{c BRC}

    {com}: // default vs. mid!=0
    : x[p], mm_ecdf(x,w)[p], mm_ecdf(x,1,1)[p]
    {res}       {txt} 1    2    3
        {c TLC}{hline 16}{c TRC}
      1 {c |}  {res} 1   .2   .1{txt}  {c |}
      2 {c |}  {res} 2   .8   .5{txt}  {c |}
      3 {c |}  {res} 2   .8   .5{txt}  {c |}
      4 {c |}  {res} 2   .8   .5{txt}  {c |}
      5 {c |}  {res} 3    1   .9{txt}  {c |}
        {c BLC}{hline 16}{c BRC}

    {com}: // default vs. nonorm!=0
    : x[p], mm_ecdf(x,w)[p], mm_ecdf(x,1,0,1)[p]
    {res}       {txt} 1    2    3
        {c TLC}{hline 16}{c TRC}
      1 {c |}  {res} 1   .2    1{txt}  {c |}
      2 {c |}  {res} 2   .8    4{txt}  {c |}
      3 {c |}  {res} 2   .8    4{txt}  {c |}
      4 {c |}  {res} 2   .8    4{txt}  {c |}
      5 {c |}  {res} 3    1    5{txt}  {c |}
        {c BLC}{hline 16}{c BRC}

    {com}: // CDF at unique values
    : mm_ecdf2(x,w)
    {res}       {txt} 1    2
        {c TLC}{hline 11}{c TRC}
      1 {c |}  {res} 1   .2{txt}  {c |}
      2 {c |}  {res} 2   .8{txt}  {c |}
      3 {c |}  {res} 3    1{txt}  {c |}
        {c BLC}{hline 11}{c BRC}{txt}


{title:Conformability}

    {cmd:mm_ecdf(}{it:X}{cmd:,} {it:w}{cmd:,} {it:mid}{cmd:,} {it:nonorm}{cmd:,} {it:break}{cmd:)}:
             {it:X}:  {it:r x c}
             {it:w}:  {it:r x} 1 or 1 {it:x} 1
           {it:mid}:  1 {it:x} 1
        {it:nonorm}:  1 {it:x} 1
         {it:break}:  1 {it:x} 1
        {it:result}:  {it:r x c}

    {cmd:_mm_ecdf(}{it:x}{cmd:,} {it:w}{cmd:,} {it:mid}{cmd:,} {it:nonorm}{cmd:,} {it:break}{cmd:)}:
             {it:x}:  {it:r x} 1
             {it:w}:  {it:r x} 1 or 1 {it:x} 1
           {it:mid}:  1 {it:x} 1
        {it:nonorm}:  1 {it:x} 1
         {it:break}:  1 {it:x} 1
        {it:result}:  {it:r x} 1

    {cmd:mm_ecdf2(}{it:x}{cmd:,} {it:w}{cmd:,} {it:mid}{cmd:,} {it:nonorm}{cmd:)}, {cmd:_mm_ecdf2(}{it:x}{cmd:,} {it:w}{cmd:,} {it:mid}{cmd:,} {it:nonorm}{cmd:)}
             {it:x}:  {it:r1 x} 1
             {it:w}:  {it:r1 x} 1 or 1 {it:x} 1
           {it:mid}:  1 {it:x} 1
        {it:nonorm}:  1 {it:x} 1
        {it:result}:  {it:r2 x} 2, {it:r2}<={it:r1}


{title:Diagnostics}

{pstd}
The functions return missing values for the CDF if the weights contain missing
values. Missing values in {it:X} are ordered last (i.e., receive highest CDF values).


{title:Source code}

{pstd}
{help moremata_source##mm_ecdf:mm_ecdf.mata}


{title:Author}

{pstd}
Ben Jann, University of Bern, ben.jann@soz.unibe.ch


{title:Also see}

{p 4 13 2}
Online:  help for {helpb cumul},
{helpb mf_mm_ranks:mm_ranks()},
{helpb mf_mm_relrank:mm_relrank()},
{helpb moremata}