diff options
| author | Shivesh Mandalia <shivesh.mandalia@outlook.com> | 2020-03-03 02:36:22 +0000 |
|---|---|---|
| committer | Shivesh Mandalia <shivesh.mandalia@outlook.com> | 2020-03-03 02:36:22 +0000 |
| commit | f18364ec6522466afbdd64bdf563a54c9e6e4db9 (patch) | |
| tree | 9bee7890ee256e999327813c00f0d80cd73b2c11 /docs/source | |
| parent | faadd4e4eb901f7a49128b3b7e6a0a3ffbcd4664 (diff) | |
| download | GolemFlavor-f18364ec6522466afbdd64bdf563a54c9e6e4db9.tar.gz GolemFlavor-f18364ec6522466afbdd64bdf563a54c9e6e4db9.zip | |
documentation
Diffstat (limited to 'docs/source')
| -rw-r--r-- | docs/source/_static/cascade.jpg | bin | 0 -> 67923 bytes | |||
| -rw-r--r-- | docs/source/_static/fr.png | bin | 0 -> 126787 bytes | |||
| -rw-r--r-- | docs/source/_static/icecube.png | bin | 0 -> 215856 bytes | |||
| -rw-r--r-- | docs/source/_static/jeffreys.png | bin | 0 -> 62949 bytes | |||
| -rw-r--r-- | docs/source/_static/mh.png | bin | 0 -> 32955 bytes | |||
| -rw-r--r-- | docs/source/_static/mixing.png | bin | 0 -> 17166 bytes | |||
| -rw-r--r-- | docs/source/_static/nu_messengers.png | bin | 0 -> 131814 bytes | |||
| -rw-r--r-- | docs/source/_static/osc_params.png | bin | 0 -> 39083 bytes | |||
| -rw-r--r-- | docs/source/_static/track.jpg | bin | 0 -> 66642 bytes | |||
| -rw-r--r-- | docs/source/_templates/footer.html | 53 | ||||
| -rw-r--r-- | docs/source/api.rst | 46 | ||||
| -rw-r--r-- | docs/source/examples.rst | 9 | ||||
| -rw-r--r-- | docs/source/index.rst | 2 | ||||
| -rw-r--r-- | docs/source/installation.rst | 48 | ||||
| l--------- | docs/source/notebooks/inference.ipynb | 1 | ||||
| l--------- | docs/source/notebooks/tutorial.ipynb | 1 | ||||
| -rw-r--r-- | docs/source/overview.rst | 34 | ||||
| -rw-r--r-- | docs/source/physics.rst | 437 | ||||
| -rw-r--r-- | docs/source/statistics.rst | 423 |
19 files changed, 1050 insertions, 4 deletions
diff --git a/docs/source/_static/cascade.jpg b/docs/source/_static/cascade.jpg Binary files differnew file mode 100644 index 0000000..e33b574 --- /dev/null +++ b/docs/source/_static/cascade.jpg diff --git a/docs/source/_static/fr.png b/docs/source/_static/fr.png Binary files differnew file mode 100644 index 0000000..dada2d9 --- /dev/null +++ b/docs/source/_static/fr.png diff --git a/docs/source/_static/icecube.png b/docs/source/_static/icecube.png Binary files differnew file mode 100644 index 0000000..07bf278 --- /dev/null +++ b/docs/source/_static/icecube.png diff --git a/docs/source/_static/jeffreys.png b/docs/source/_static/jeffreys.png Binary files differnew file mode 100644 index 0000000..a0cea17 --- /dev/null +++ b/docs/source/_static/jeffreys.png diff --git a/docs/source/_static/mh.png b/docs/source/_static/mh.png Binary files differnew file mode 100644 index 0000000..e715f47 --- /dev/null +++ b/docs/source/_static/mh.png diff --git a/docs/source/_static/mixing.png b/docs/source/_static/mixing.png Binary files differnew file mode 100644 index 0000000..cc9a9a4 --- /dev/null +++ b/docs/source/_static/mixing.png diff --git a/docs/source/_static/nu_messengers.png b/docs/source/_static/nu_messengers.png Binary files differnew file mode 100644 index 0000000..c32cfa1 --- /dev/null +++ b/docs/source/_static/nu_messengers.png diff --git a/docs/source/_static/osc_params.png b/docs/source/_static/osc_params.png Binary files differnew file mode 100644 index 0000000..94b35f1 --- /dev/null +++ b/docs/source/_static/osc_params.png diff --git a/docs/source/_static/track.jpg b/docs/source/_static/track.jpg Binary files differnew file mode 100644 index 0000000..6ac5f74 --- /dev/null +++ b/docs/source/_static/track.jpg diff --git a/docs/source/_templates/footer.html b/docs/source/_templates/footer.html new file mode 100644 index 0000000..554e08e --- /dev/null +++ b/docs/source/_templates/footer.html @@ -0,0 +1,53 @@ +<footer> + {% if (theme_prev_next_buttons_location == 'bottom' or theme_prev_next_buttons_location == 'both') and (next or prev) %} + <div class="rst-footer-buttons" role="navigation" aria-label="footer navigation"> + {% if next %} + <a href="{{ next.link|e }}" class="btn btn-neutral float-right" title="{{ next.title|striptags|e }}" accesskey="n" rel="next">{{ _('Next') }} <span class="fa fa-arrow-circle-right"></span></a> + {% endif %} + {% if prev %} + <a href="{{ prev.link|e }}" class="btn btn-neutral float-left" title="{{ prev.title|striptags|e }}" accesskey="p" rel="prev"><span class="fa fa-arrow-circle-left"></span> {{ _('Previous') }}</a> + {% endif %} + </div> + {% endif %} + + <hr/> + + <div role="contentinfo"> + <p> + {%- if show_copyright %} + {%- if hasdoc('copyright') %} + {% trans path=pathto('copyright'), copyright=copyright|e %}© <a href="{{ path }}">Copyright</a> {{ copyright }} <a href="https://shivesh.org">https://shivesh.org</a>{% endtrans %} + {%- else %} + {% trans copyright=copyright|e %}© Copyright {{ copyright }} <a href="https://shivesh.org">https://shivesh.org</a>{% endtrans %} + {%- endif %} + {%- endif %} + + {%- if build_id and build_url %} + {% trans build_url=build_url, build_id=build_id %} + <span class="build"> + Build + <a href="{{ build_url }}">{{ build_id }}</a>. + </span> + {% endtrans %} + {%- elif commit %} + {% trans commit=commit %} + <span class="commit"> + Revision <code>{{ commit }}</code>. + </span> + {% endtrans %} + {%- elif last_updated %} + <span class="lastupdated"> + {% trans last_updated=last_updated|e %}Last updated on {{ last_updated }}.{% endtrans %} + </span> + {%- endif %} + + </p> + </div> + + {%- if show_sphinx %} + {% trans %}Built with <a href="http://sphinx-doc.org/">Sphinx</a> using a <a href="https://github.com/rtfd/sphinx_rtd_theme">theme</a> provided by <a href="https://readthedocs.org">Read the Docs</a>{% endtrans %}. + {%- endif %} + + {%- block extrafooter %} {% endblock %} + +</footer> diff --git a/docs/source/api.rst b/docs/source/api.rst index c9a5f05..94a6b9e 100644 --- a/docs/source/api.rst +++ b/docs/source/api.rst @@ -2,6 +2,46 @@ :github_url: https://github.com/ShiveshM/GolemFlavor -*** -API -*** +*************** +GolemFlavor API +*************** + +Emcee hooks +----------- +.. automodule:: golemflavor.mcmc + :members: + +Enumerations +------------ +.. automodule:: golemflavor.enums + :members: + +Flavor Functions +---------------- +.. automodule:: golemflavor.fr + :members: + +GolemFit Hooks +-------------- +.. automodule:: golemflavor.gf + :members: + +Likelihood +---------- +.. automodule:: golemflavor.llh + :members: + +Miscellaneous +------------- +.. automodule:: golemflavor.misc + :members: + +Parameter class +--------------- +.. automodule:: golemflavor.param + :members: + +Visualization +------------- +.. automodule:: golemflavor.plot + :members: diff --git a/docs/source/examples.rst b/docs/source/examples.rst index df8023f..3ae9cef 100644 --- a/docs/source/examples.rst +++ b/docs/source/examples.rst @@ -5,3 +5,12 @@ ******** Examples ******** + +.. toctree:: + :maxdepth: 1 + + Basic Tutorial <notebooks/tutorial.ipynb> + Bayesian Inference <notebooks/inference.ipynb> + + +The above links lead to example notebooks. diff --git a/docs/source/index.rst b/docs/source/index.rst index 84454ff..39a478d 100644 --- a/docs/source/index.rst +++ b/docs/source/index.rst @@ -10,7 +10,7 @@ GolemFlavor GolemFlavor is a Python package for running a Bayesian inference analysis pipeline using Astrophysical Flavor data taken at the `IceCube Neutrino -Obervatory <https://icecube.wisc.edu/>`_ +Obervatory <https://icecube.wisc.edu/>`_. .. toctree:: :maxdepth: 1 diff --git a/docs/source/installation.rst b/docs/source/installation.rst index 51a9e01..99841fa 100644 --- a/docs/source/installation.rst +++ b/docs/source/installation.rst @@ -5,3 +5,51 @@ ************ Installation ************ + +--- +Pip +--- + +GolemFlavor can be installed using ``pip``: + +.. code-block:: bash + + pip install git+https://github.com/ShiveshM/GolemFlavor.git + +This installs GolemFlavor, along with all the necessary dependencies such as +NumPy and SciPy. + +GolemFlavor uses the IceCube software `GolemFit: The HESE +fitter <https://github.com/IceCubeOpenSource/GolemFit>`_ to fit with IceCube +Astrophysical Flavor data. This software is proprietary and so access is +currently limited to IceCube collaborators. A simple Gaussian likelihood can be +used as a substitute for test purposes if this requirement is not found. + +------------ +Dependencies +------------ + +GolemFlavor has the following dependencies: + +- `Python <https://www.python.org/>`_ >= 2.7 or >= 3.4 +- `NumPy <http://www.numpy.org/>`_ +- `SciPy <https://www.scipy.org/>`_ +- `Six <https://six.readthedocs.io/>`_ +- `mpmath <http://mpmath.org/>`_ +- `emcee <https://emcee.readthedocs.io/en/stable/>`_ +- `PyMultiNest <https://johannesbuchner.github.io/PyMultiNest/>`_ +- `tqdm <https://tqdm.github.io/>`_ +- `Shapely <https://shapely.readthedocs.io/en/latest/manual.html>`_ +- `Matplotlib <https://matplotlib.org/>`_ +- `python-ternary <https://github.com/marcharper/python-ternary>`_ +- `GetDist <https://getdist.readthedocs.io/en/latest/>`_ + +You can use ``pip`` to install the above automatically. Note that ``PyMultiNest`` +requires the ``MultiNest`` Bayesian inference library, see [the `PyMultiNest +documentation +<https://johannesbuchner.github.io/PyMultiNest/install.html#prerequisites-for-building-the-libraries>`_ +for install instructions. + +Additional dependencies: + +- `GolemFit <https://github.com/IceCubeOpenSource/GolemFit>`_ diff --git a/docs/source/notebooks/inference.ipynb b/docs/source/notebooks/inference.ipynb new file mode 120000 index 0000000..3caf6c6 --- /dev/null +++ b/docs/source/notebooks/inference.ipynb @@ -0,0 +1 @@ +../../../examples/inference.ipynb
\ No newline at end of file diff --git a/docs/source/notebooks/tutorial.ipynb b/docs/source/notebooks/tutorial.ipynb new file mode 120000 index 0000000..1a9beb5 --- /dev/null +++ b/docs/source/notebooks/tutorial.ipynb @@ -0,0 +1 @@ +../../../examples/tutorial.ipynb
\ No newline at end of file diff --git a/docs/source/overview.rst b/docs/source/overview.rst index e09628e..65e1871 100644 --- a/docs/source/overview.rst +++ b/docs/source/overview.rst @@ -5,3 +5,37 @@ ******** Overview ******** + +---------------------------------- +What is Astrophysical Flavor data? +---------------------------------- + +This is data of the *flavor* of a neutrino taken at the `IceCube neutrino +observatory <https://icecube.wisc.edu/>`_, which is a cubic kilometer array of +optical sensors embedded in the glacial ice at the South Pole. In particular, +*astrophysical* neutrinos are ones that are very-high-energy and come from +astrophysical origins such as `active galactic nuclei +<https://doi.org/10.1126/science.aat2890>`_. For more on the physics behind +neutrinos see the :doc:`physics` section. + +------------------------------------- +What does the GolemFlavor package do? +------------------------------------- + +This package provides utilities for astrophysical neutrino propagation and +Bayesian statistical modeling focused on advanced Markov Chain Monte Carlo +(MCMC) algorithms. It has been used to make constraints on New Physics models +in the Astrophysical Flavor, as motivated by the paper `*New Physics in +Astrophysical Neutrino Flavor* +<https://doi.org/10.1103/PhysRevLett.115.161303>`_. For more information on +the statistical modeling see the :doc:`statistics` section. + +-------- +Features +-------- + +- **Portable Flavor Functions**: A set of useful functions for calculating measured flavor compositions given a source composition and a mixing matrix. +- **MCMC Algorithms**: Affine invariant and nested sampling algorithms provided by `emcee <https://emcee.readthedocs.io/>`_ and `MultiNest <https://doi.org/10.1111/j.1365-2966.2009.14548.x>`_. +- **Anarchic Sampling**: Sampling of the neutrino mixing matrix is done under the `*neutrino mixing anarchy* <https://doi.org/10.1016/j.physletb.2003.08.045>`_ hypothesis to ensure an unbiased prior. +- **Distributed and parallel computing**: Scripts included to manage the workload across a CPU cluster using `HTCondor <https://research.cs.wisc.edu/htcondor/>`_. +- **Visualization**: Produce ternary plots of the flavour composition using the `python-ternary <https://zenodo.org/badge/latestdoi/19505/marcharper/python-ternary>`_ package and joint posterior plots for analyzing MCMC chains using the `getdist <https://getdist.readthedocs.io/en/latest/>`_ package. diff --git a/docs/source/physics.rst b/docs/source/physics.rst index 528215d..ed603f2 100644 --- a/docs/source/physics.rst +++ b/docs/source/physics.rst @@ -5,3 +5,440 @@ ******* Physics ******* + +Neutrinos +========= + +Introduction +------------ +The name *neutrino* was coined to describe a then hypothetical particle +suggested by Wolfgang Pauli in the 1930's [1]_. Pauli's idea was that of a +neutral, weakly interacting particle having a very small mass. He proposed this +particle could be used to explain the spectrum of electrons emitted by +:math:`\beta`-decays of atomic nuclei, which was of considerable controversy at +the time. The observed spectrum of these electrons is continuous [2]_, which is +incompatible with the two-body decay description successfully used to describe +discrete spectral lines in :math:`\alpha`- and :math:`\gamma`- decay of atomic +nuclei. As Pauli observed, by describing :math:`\beta`-decay as a three-body +decay instead, releasing both an electron and his proposed particle, one can +explain the continuous :math:`\beta`-decay spectrum. Soon after discovery of +the neutron [3]_, Enrico Fermi used Pauli's light neutral particle as an +essential ingredient in his successful theory of :math:`\beta`-decay, giving +the neutrino it's name as a play on words of *little neutron* in Italian [4]_. + +It was not until some 20 years later that the discovery of the neutrino was +realised. It was eventually understood that neutrinos came in three distinct +*flavours* :math:`\left (\nu_e,\nu_\mu,\nu_\tau\right )` along with their +associated antiparticles :math:`\left +(\bar{\nu}_e,\bar{\nu}_\mu,\bar{\nu}_\tau\right)`. + +Neutrino Mixing +--------------- +For the three massive neutrinos, the flavour eigenstates of the neutrino +:math:`\mid{\nu_\alpha}>`, :math:`\alpha\in\{e,\mu,\tau\}`, are related to the +mass eigenstates :math:`\mid{\nu_i}>`, :math:`i\in\{1,2,3\}` via a unitary +mixing matrix :math:`U_{\alpha i}` known as the PMNS matrix [5]_, [6]_: + +.. math:: + + \mid{\nu_\alpha}>=\sum^3_{i=1}U^*_{\alpha i}\mid{\nu_i}> + +This relationship can be seen better in this image: + +.. figure:: _static/mixing.png + :width: 500px + :align: center + + Graphical representation of the relationship between the neutrino flavour and + mass eigenstates. The three mass eigenstates are depicted as three boxes, + coloured such that the relative area gives the probability of finding the + corresponding flavour neutrino in that given mass state. + +The time evolution of the flavour eigenstate as the neutrino propagates is +given by: + +.. math:: + + \mid{\nu_\alpha\left(t\right)}>= + \sum^3_{i=1}U^*_{\alpha i}\mid{\nu_i\left(t\right)}> + +The oscillation probability gives the probability that a neutrino produced in a +flavour state :math:`\alpha` is then detected in a flavour state :math:`\beta` +after a propagation distance :math:`L`: + +.. math:: + + \begin{align} + P_{\nu_\alpha\rightarrow\nu_\beta}\left(L\right) &= + \mid{<{\nu_\beta\left(L\right)}|{\nu_\alpha\left(0\right)}>}\mid^2\\ + &=\mid{\sum_{i=1}^3\sum_{j=1}^3<{\nu_j\left(L\right)}| + U_{\beta j}U_{\alpha i}^*|{\nu_i\left(0\right)}>}\mid^2\\ + &=\mid{\sum_{i=1}^3\sum_{j=1}^3U_{\beta j}U_{\alpha i}^* + <{\nu_j\left(0\right)}|e^{-i\frac{m_j^2L}{2E}} + e^{-i\frac{m_i^20}{2E}}|{\nu_i\left(0\right)}>}\mid^2\\ + &=\mid{\sum_{i=1}^3U_{\beta i}U_{\alpha i}^* + e^{-i\frac{m_i^2L}{2E}}}\mid^2 + \end{align} + +where the relation :math:`<{\nu_i}|{\nu_j}>=\delta_{ij}` was used. Expanding +this expression gives [7]_: + +.. math:: + + \begin{align} + P_{\nu_\alpha\rightarrow\nu_\beta}\left(L, E\right) = + \delta_{\alpha\beta}&-4\sum_{i>j}\text{Re}\left(U_{\alpha i}^*U_{\beta i} + U_{\alpha j}U_{\beta j}^*\right)\sin^2 + \left(\frac{\Delta m^2_{ij}L}{4E}\right)\\ + &+2\sum_{i>j}\text{Im}\left(U_{\alpha i}^*U_{\beta i}U_{\alpha j} + U_{\beta j}^*\right)\sin\left(\frac{\Delta m^2_{ij}L}{2E}\right) + \end{align} + +where :math:`\Delta m_{ij}^2=m_i^2-m_j^2`. Note that for neutrino oscillations +to occur, there must be at least one non-zero :math:`\Delta m_{ij}^2` and +therefore there must exist at least one non-zero neutrino mass state. + +The mixing matrix can be parameterised using the standard factorisation [8]_: + +.. math:: + + \begin{align} + U= + \begin{pmatrix} + 1 & 0 & 0 \\ + 0 & c_{23} & s_{23} \\ + 0 & -s_{23} & c_{23} \\ + \end{pmatrix} + \begin{pmatrix} + c_{13} & 0 & s_{13}e^{-i\delta} \\ + 0 & 1 & 0 \\ + -s_{13}e^{i\delta} & 0 &c_{13} \\ + \end{pmatrix} + \begin{pmatrix} + c_{12} & s_{12} & 0 \\ + -s_{12} & c_{12} & 0 \\ + 0 & 0 & 1 \\ + \end{pmatrix} + \end{align} + +where :math:`s_{ij}\equiv\sin\theta_{ij}`, :math:`c_{ij}\equiv\cos\theta_{ij}`, +:math:`\theta_{ij}` are the three mixing angles and :math:`\delta` is the CP +violating phase. Overall phases in the mixing matrix do not affect neutrino +oscillations, which only depend on quartic products, and so they have been +omitted. Therefore, this gives a total of six independent free parameters +describing neutrino oscillations for three neutrino flavours in a vacuum. This +table outlines the current knowledge of these parameters determined by a fit to +global data [9]_: + +.. figure:: _static/osc_params.png + :width: 500px + :align: center + + Three neutrino flavour oscillation parameters from a fit to global data + [9]_. + +This table shows two columns of values, *normal ordering* and *inverted +ordering* corresponding to the case where the mass of :math:`\nu_3` is greater +than the mass of :math:`\nu_1` or the mass of :math:`\nu_1` is greater than the +mass of :math:`\nu_3`, respectively. The experimental determination of this +mass ordering is ongoing. + +Astrophysical Neutrinos +----------------------- +The origin and acceleration mechanism of ultra-high-energy cosmic rays is still +unknown. The difficulty comes from the fact that the cosmic rays are bent by +interstellar magnetic fields, and so their arrival direction on Earth does not +point back to their sources. The observation of these ultra-high-energy cosmic +rays supports the existence of neutrino production at the sources of a similar +energy range - an astrophysical neutrino flux. Neutrinos are electrically +neutral, so are not perturbed by interstellar magnetic fields, and they also +have a small enough interaction cross-section to escape from dense regions. +This makes them ideal messengers to help identify the sources of cosmic rays: + +.. figure:: _static/nu_messengers.png + :align: center + + Neutrinos as messengers of astrophysical objects. Exotic astrophysical + objects produce high-energy cosmic rays, photons and neutrinos, which can be + detected on Earth. Credit: IceCube, NSF. + +Ultra-high-energy cosmic rays detected on Earth manifestly succeed in +escaping their sources, therefore these sources must be optically thin +compared to the Earth's atmosphere. Thus, the following interactions of the +accelerated protons are expected to be more important than lengthy shower +processes. High-energy protons can interact with photons as such: + +.. math:: + + p+\gamma\rightarrow\Delta^+\rightarrow + \begin{cases} + p+\pi^0\\ + n+\pi^+ + \end{cases} + + +They can also interact with other hadrons: + +.. math:: + + p+p\rightarrow + \begin{cases} + p+p+\pi^0\\ + p+n+\pi^+ + \end{cases}\\ + p+n\rightarrow + \begin{cases} + p+n+\pi^0\\ + p+p+\pi^- + \end{cases} + +Importantly, final states here tend to produce pions which decay into either +photons if neutral, :math:`\pi^0\rightarrow\gamma\gamma`, or if they are charged +they decay into charged leptons and neutrinos. The neutral and charged pions are +produced in similar amounts, meaning that the neutrino and photon fluxes are +related. Indeed, the diffuse astrophysical neutrino flux can be estimated +through :math:`\gamma`-ray astronomy [10]_. + +Point source searches of neutrinos are also being pursued. In 2017, a +multi-messenger approach which searched for :math:`\gamma`-ray observations in +coincidence with neutrinos coming from a particular source has successfully +been able to identify for the very first time, a source of high-energy +astrophysical neutrinos [11]_, [12]_. + +Of particular interest is the composition of flavours produced at the source. +In the simple pion decay model described above, the *neutrino flavour +composition* (sometimes referred to as the *neutrino flavour ratio*) +produced at the source is: + +.. math:: + + \pi\text{ decay}\rightarrow + \left(f_e:f_\mu:f_\tau\right)_\text{S}=\left(1:2:0\right)_\text{S} + +For all discussions on the astrophysical neutrino flavour composition, the +neutrino and antineutrino fluxes will been summed over as it is not yet +experimentally possible to distinguish between the two. In the case that the +muon interacts in the source before it has a chance to decay, e.g.\@ losing +energy rapidly in strong magnetic fields or being absorbed in matter, only the +:math:`\nu_\mu` from the initial pion decay escapes and so the source flavour +composition is simply: + +.. math:: + \mu\text{ suppressed }\rightarrow + \left(f_e:f_\mu:f_\tau\right)_\text{S}=\left(0:1:0\right)_\text{S} + +Another popular model is one in which the produced flux is dominated by neutron +decay, :math:`n\rightarrow p+e^-+\bar{\nu}_e`, which gives rise to a purely +:math:`\nu_e` component: + +.. math:: + + n\text{ decay}\rightarrow + \left(f_e:f_\mu:f_\tau\right)_\text{S}=\left(1:0:0\right)_\text{S} + +Production of :math:`\nu_\tau` at the source is not expected in standard +astrophysics models. However, even in the standard construction, the +composition could vary between any of the three idealised models above, which +can be represented as a source flavour composition of :math:`(x:1-x:0)`, where +:math:`x` is the fraction of :math:`\nu_e` and can vary between +:math:`0\rightarrow1`. + +Once the neutrinos escape the source, they are free to propagate in the vacuum. +As discussed above, neutrinos can transform from one flavour to another. +Astrophysical neutrinos have :math:`\mathcal{O}(\text{Mpc})` or higher +baselines, large enough that the mass eigenstates completely decouple. The +astrophysical neutrinos detected on Earth are decoherent and are propagating in +pure mass eigenstates. Taking this assumption greatly simplifies the transition +probability as all the interference terms between the three mass eigenstates +can be dropped, and all that is left is to convert from the propagating mass +state to the flavour states: + +.. math:: + + \phi_{i,\oplus}&=\sum_\alpha\phi_{\alpha,\text{S}}\mid{U_{\alpha i}}\mid^2\\ + \phi_{\alpha,\oplus}&=\sum_{i,\beta} + \mid{U_{\alpha i}}\mid^2\mid{U_{\beta i}}\mid^2\phi_{\beta,\text{S}} + +where :math:`\phi_\alpha` is the flux for a neutrino flavour :math:`\nu_\alpha` +and :math:`\phi_i` is the flux for a neutrino mass state :math:`\nu_i`. The +subscript :math:`\text{S}` denotes the source and :math:`\oplus` denotes as +measured on Earth. The same result can be obtained in the plane wave picture of +the neutrino mixing equations above and taking the limit +:math:`L\rightarrow\infty`, thus this type of decoherent mixing is also known +as oscillation-averaged neutrino mixing. From this, the flavour composition on +Earth is defined as +:math:`f_{\alpha,\oplus}=\phi_{\alpha,\oplus}/\sum_\alpha\phi_{\alpha,\oplus}` +and this can be calculated using the mixing matrix parameters the table above. +For the three source models discussed above: + +.. math:: + + \begin{align} + \left(1:2:0\right)_\text{S}&\rightarrow\left(0.31:0.35:0.34\right)_\oplus\\ + \left(0:1:0\right)_\text{S}&\rightarrow\left(0.18:0.44:0.38\right)_\oplus\\ + \left(1:0:0\right)_\text{S}&\rightarrow\left(0.55:0.18:0.27\right)_\oplus + \end{align} + +This can be visualised in a ternary plot, which you can make yourself by +checking out the :doc:`examples` section! The axes here are the fraction of +each neutrino flavour as shown below. The coloured circle, square and triangle +show the source flavour compositions. The arrows show the effect of neutrino +mixing on the flavour composition. The unfilled circle, square and triangle +show the corresponding measured flavour composition. Neutrino mixing during +propagation has the effect of averaging out the flavour contributions, which is +why the arrows point towards the centre of the triangle. This effect is more +pronounced for :math:`\nu_\mu\leftrightarrow\nu_\tau` due to the their larger +mixings. Also shown on this figure in the hatched *Standard Model* area, is the +region of measured flavour compositions containing all source models of +:math:`\left(x:1-x:0\right)`, using Gaussian priors on the standard mixing +angles. Therefore, this hatched area is the region in which all standard +astrophysical models live. + +.. figure:: _static/fr.png + :width: 700px + :align: center + + Astrophysical neutrino flavour composition ternary plot. Axes show the + fraction of each neutrino flavour. Coloured shapes show 3 models for the + source flavour composition. The arrows indicate the effect of neutrino mixing + during propagation and the unfilled shapes show the corresponding measured + flavour compositions. The hatched area shows the region in measured flavour + space in which all standard astrophysical models live. + +IceCube +======= + +Introduction +---------------- +The `IceCube Neutrino Obervatory <https://icecube.wisc.edu/>`_ is a cubic +kilometre photomultiplier array embedded in the extremely thick and clear +glacial ice located near the geographic South Pole in Antarctica. The IceCube +array is made up of 5160 purpose built *Digital Optical Modules* (DOMs) which +are deployed on 86 cables between 1450 and 2450 m below the ground. The +interaction of a neutrino releases a burst of light in the detector, which is +detected by this array of DOMs. The timing and intensity of these photons form +the raw data set at IceCube. This data is analysed so that we can learn more +about the properties of neutrinos. You can checkout some cool animations of how +an event looks like in IceCube on `this website +<https://www.nsf.gov/news/mmg/mmg_disp.jsp?med_id=184062>`_. + +A schematic layout of IceCube is shown below. The IceCube *In-Ice* array is +made up of 5160 purpose built *Digital Optical Modules* (DOMs) which are +deployed on 86 *strings* (or cables) between 1450 and 2450 m below the ground. +The inner string separation is 125 m with a vertical DOM separation of 17 m. +Eight of the centrally located strings make up the subarray *DeepCore* which +are sensitive to lower energy neutrinos. It achieves this through denser +instrumentation, having an inner string separation of 60 m and a vertical DOM +separation of 7 m. A surface air shower array, IceTop, is instrumented on the +surface and consists of a set of frozen water tanks which act as a veto against +the background cosmic rays. + +.. figure:: _static/icecube.png + :width: 600px + :align: center + + The IceCube neutrino observatory with the In-Ice array, its subarray DeepCore + and the cosmic ray shower array IceTop. + +Event Signatures +---------------- +Cherenkov telescope arrays such as IceCube are able to classify the properties +of a neutrino event by looking at the morphology of photon hits across its PMT +array. There are two main types of neutrino event signatures at IceCube - +*tracks* and *cascades*. + +Tracks are predominantly made by muons which are directly produced by +neutrinos in the *charged current* :math:`\nu_\mu` interaction channel. Muons +have a long lifetime, :math:`\sim` 2 :math:`\mu` s at rest, and in ice they +have relatively low energy losses. Therefore, as shown in in the figure below, +a high-energy muon travelling through the IceCube array will leave a long trail +of hits. These features, along with the timing information of hits across the +DOMs, help in determining the directionality of the muon, giving an angular +resolution typically around :math:`0.5-1^\circ` [13]_. At energies of concern +here, there is little deviation between the direction of the neutrino and the +induced muon as they are both heavily boosted. Therefore, this pointing ability +of tracks makes them the most attractive events to use for point source +searches. Energy reconstruction is more complicated, however. At the lower +energies (:math:`\lesssim100` GeV), the muon's range is short enough that it is +able to deposit all its energy inside the detector. This is the ideal situation +for a good energy reconstruction as the IceCube array acts as a calorimeter, so +the total deposited charge is proportional to the energy of the muon. At higher +energies, the range of the muon is typically greater than the length of the +detector. Therefore the energy of muon must be extrapolated from the portion of +energy deposited inside the detector. This is particularly challenging for +muons which are not produced inside the detector, for which only a lower bound +can be made. The typical approach taken to reconstruct the muon is to segment +the reconstruction along the track. In this way, biases from stochastic +variations of the energy loss can be minimised by applying some averaging over +each segment. In each segment, the mean :math:`\text{d} E/\text{d} x` is +determined, which is then roughly proportional to the muon momentum. The energy +resolution improves with the muon energy up to an uncertainty of a factor of 2 +[14]_. For more details on this see the IceCube energy reconstruction methods +publication [15]_. + +.. figure:: _static/track.jpg + :width: 700px + :align: center + + A track event initiated by a CC muon neutrino interaction in the detector. + The muon deposits 74 TeV before escaping. + +Cascades are created as a result of hadronic cascades and/or EM cascades. +*Neutral current* interactions and CC :math:`\nu_e` interactions are the +channels in which a pure cascade is created, and an example of one is shown in +the figure below. However, this does not mean that neutrino events produce +exclusively one type of signature, in fact all high-energy neutrino-nucleon +events produce at least a hadronic cascade at the interaction vertex. +Characteristic of a cascade is the isotropic deposition of energy in a +localised region near the neutrino vertex. Contrary to tracks, cascade events +have much shorter typical lengths and so the entire energy deposition is easily +contained within the detector array. This is ideal for energy reconstruction +giving a deposited energy resolution of :math:`\sim` 15% at neutrino energies +above 10 TeV [15]_. Inferring the true neutrino energy is more difficult, +however, as IceCube is not capable of resolving the difference between EM +showers and hadronic showers, which potentially have a large amount of missing +energy, leading to :math:`\sim` 15% lower light yield compared to an equivalent +EM shower. Deposited energy is reconstructed using an EM shower hypothesis, and +therefore this quantity gives the lower limit of the neutrino energy. +Directional reconstruction is more challenging than for tracks and is done by +looking for timing/light intensity anisotropies around the interaction vertex. +The deviations are small, but it is expected that the light deposition in the +forward direction is greater. Typical angular resolutions are +:math:`10-15^\circ` [16]_. + +.. figure:: _static/cascade.jpg + :width: 700px + :align: center + + A cascade event initiated by a neutrino interaction in the detector. The + cascade deposits 1070 TeV in the detector. + +Not mentioned so far are the charged current :math:`\nu_\tau` interactions, +which for energies :math:`\gtrsim` 1 PeV, can produce :math:`\tau` which +travels a detectable distance before decaying. This provides a unique signature +for such events. The initial :math:`\nu_\tau` interaction produces a hadronic +cascade, followed by a track by the :math:`\tau` itself, in turn followed by +either a track from the :math:`\tau`'s muonic decay +(:math:`\tau^-\rightarrow\mu^-\bar{\nu}_\mu\nu_\tau` with branching ratio +:math:`\sim` 17%), or a cascade from its other decays. Because of their +distinctive signatures, such events are called *double bangs* or *double +cascades*. See [17]_, [18]_ for more details. + +.. [1] Pauli, W. *Letter to Tübingen conference participants* Web document. 1930. +.. [2] Chadwick, J. Intensitätsverteilung im magnetischen Spectrum der :math:`\beta`-Strahlen von radium B + C. Verhandl. Dtsc. Phys. Ges. 16, 383 (1914). +.. [3] Chadwick, J. Possible Existence of a Neutron. Nature 129, 312 (1932). +.. [4] Fermi, E. An attempt of a theory of beta radiation. 1. Z. Phys. 88, 161–177 (1934). +.. [5] Pontecorvo, B. Neutrino Experiments and the Problem of Conservation of Leptonic Charge. Sov. Phys. JETP 26. [Zh. Eksp. Teor. Fiz.53,1717(1967)], 984–988 (1968). +.. [6] Maki, Z., Nakagawa, M. & Sakata, S. Remarks on the unified model of elementary particles. Prog. Theor. Phys. 28. [,34(1962)], 870–880 (1962). +.. [7] Giunti, C. & Kim, C. W. Fundamentals of Neutrino Physics and Astrophysics isbn: 9780198508717 (2007). +.. [8] Beringer, J. et al. Review of Particle Physics (RPP). Phys. Rev. D86, 010001 (2012). +.. [9] Esteban, I., Gonzalez-Garcia, M. C., Maltoni, M., Martinez-Soler, I. & Schwetz, T. Updated fit to three neutrino mixing: exploring the accelerator-reactor complemen- tarity. JHEP 01, 087 (2017). +.. [10] Kappes, A., Hinton, J., Stegmann, C. & Aharonian, F. A. Potential Neutrino Signals from Galactic Gamma-Ray Sources. Astrophys. J. 656. [Erratum: Astrophys. J.661,1348(2007)], 870–896 (2007). +.. [11] Aartsen, M. G. et al. Multimessenger observations of a flaring blazar coincident with high-energy neutrino IceCube-170922A. Science 361, eaat1378 (2018). +.. [12] Aartsen, M. G. et al. Neutrino emission from the direction of the blazar TXS 0506+056 prior to the IceCube-170922A alert. Science 361, 147–151 (2018). +.. [13] Aartsen, M. G. et al. All-sky Search for Time-integrated Neutrino Emission from Astrophysical Sources with 7 yr of IceCube Data. Astrophys. J. 835, 151 (2017). +.. [14] Weaver, C. Evidence for Astrophysical Muon Neutrinos from the Northern Sky PhD thesis (Wisconsin U., Madison, 2015). https://docushare.icecube.wisc.edu/dsweb/Get/Document-73829/. +.. [15] Aartsen, M. G. et al. Energy Reconstruction Methods in the IceCube Neutrino Telescope. JINST 9, P03009 (2014). +.. [16] Aartsen, M. G. et al. Evidence for High-Energy Extraterrestrial Neutrinos at the IceCube Detector. Science 342, 1242856 (2013). +.. [17] Hallen, P. On the Measurement of High-Energy Tau Neutrinos with IceCube PhD thesis (RWTH Aachen University, 2013). https://www.institut3b.physik.rwth-aachen.de/global/show_document.asp?id=aaaaaaaaaapwhzq. +.. [18] Xu, D. L. Search for astrophysical tau neutrinos in three years of IceCube data PhD thesis (The University of Alabama, 2015). http://acumen.lib.ua.edu/content/u0015/0000001/0001906/u0015_0000001_0001906.pdf diff --git a/docs/source/statistics.rst b/docs/source/statistics.rst index 4b45795..e2dd85f 100644 --- a/docs/source/statistics.rst +++ b/docs/source/statistics.rst @@ -5,3 +5,426 @@ ********** Statistics ********** +Data is stochastic in nature, so an experimenter uses statistical methods on a +data sample :math:`\textbf{x}` to make inferences about unknown parameters +:math:`\mathbf{\theta}` of a probabilistic model of the data +:math:`f\left(\textbf{x}\middle|\mathbf{\theta}\right)`. The *likelihood +principle* describes a function of the parameters :math:`\mathbf{\theta}`, +determined by the observed sample, that contains all the information about +:math:`\mathbf{\theta}` that is available from the sample. Given +:math:`\textbf{x}` is observed and is distributed according to a joint +*probability distribution function* (PDF), +:math:`f\left(\textbf{x}\middle|\mathbf{\theta}\right)`, the function of +:math:`\mathbf{\theta}` which is defined by + +.. math:: + + L\left(\mathbf{\theta}\middle|\textbf{x}\right)\equiv + f\left(\textbf{x}\middle|\mathbf{\theta}\right) + +is called the *likelihood function* [1]_. If the data :math:`\textbf{x}` consists +of independent and identically distributed values, then: + +.. math:: + + L\left(\mathbf{\theta}\middle|\textbf{x}\right)= + L\left(\theta_1,\dots,\theta_k\middle|x_1,\dots,x_n\right)= + \prod_{i=1}^{n}f\left(x_i\middle|\theta_1,\dots,\theta_k\right) + +Classical Statistics +-------------------- +First we will take a look at the classical approach to statistics, so that we +can then compare against the Bayesian approach. The *maximum likelihood +estimators* (MLE) of the parameters :math:`\mathbf{\theta}`, denoted by +:math:`\mathbf{\hat\theta}(\textbf{x})`, are attained by maximising +:math:`L\left(\mathbf{\theta}\middle|\textbf{x}\right)`. + +Often, not all the model parameters are of direct inferential interest however +they are included as they may reduce the effect of systematic bias. These are +called the nuisance parameters, :math:`\mathbf{\theta}_\nu`. To reduce the +impact of the nuisance parameters, prior knowledge based on past measurements +:math:`\textbf{y}` may be included in the likelihood: + +.. math:: + + L\left(\mathbf{\theta}_I,\mathbf{\theta}_\nu\middle|\textbf{x},\textbf{y}\right)= + \prod_{i=1}^Nf_I\left(x_i\middle|\mathbf{\theta}_I\right)\cdot + f_\nu\left(\textbf{y}\middle| \mathbf{\theta}_\nu\right) + +By finding the MLE for the nuisance parameters, +:math:`\mathbf{\hat\theta}_{\nu}(\textbf{y})`, the *profile likelihood* can be written +in terms independent of :math:`\mathbf{\theta}_\nu`: + +.. math:: + + L_P\left(\mathbf{\theta}_I\middle|\textbf{x},\textbf{y}\right)= + L\left(\mathbf{\theta}_I,\mathbf{\hat\theta}_{\nu}\left(\textbf{y}\right)\middle|\textbf{x},\textbf{y}\right) + +The key method of inference in physics is *hypothesis testing*. The two +complementary hypotheses in a hypothesis testing problem are called the *null +hypothesis* and the *alternative hypothesis*. The goal of a hypothesis test is +to decide, based on a sample of data, which of the two complementary hypotheses +is preferred. The general format of the null and alternative hypothesis is +:math:`H_0:\mathbf{\theta}\in\Theta_0` and +:math:`H_1:\mathbf{\theta}\in\Theta_0^c`, where :math:`\Theta_0` is some subset +of the parameter space and :math:`\Theta_0^c` is its complement. Typically, a +hypothesis test is specified in terms of a *test statistic* (TS) which is used +to define the *rejection region*, which is the region in which the null +hypothesis can be rejected. The Neyman-Pearson lemma [2]_ then states that the +*likelihood ratio test* (LRT) statistic is the most powerful test statistic: + +.. math:: + + \lambda\left(\textbf{x}\right)=\frac{L\left(\mathbf{\hat\theta}_{0}\middle| + \textbf{x}\right)}{L\left(\mathbf{\hat\theta}\middle|\textbf{x}\right)} + +where :math:`\mathbf{\hat\theta}` is a MLE of :math:`\mathbf{\theta}` and +:math:`\mathbf{\hat\theta}_{0}` is a MLE of :math:`\theta` obtained by doing a +restricted maximisation, assuming :math:`\Theta_0` is the parameter space. The +effect of nuisance parameters can be included by substituting the likelihood +with the profile likelihood :math:`L\rightarrow L_P`. The rejection region then +has the form +:math:`R\left(\mathbf{\theta}_0\right)=\{\textbf{x}:\lambda\left(\textbf{x}\right) +\leq{c}\left(\alpha\right)\}`, such that +:math:`P_{\mathbf{\theta}_0}\left(\textbf{x}\in +R\left(\mathbf{\theta}_0\right)\right)\leq\alpha`, where :math:`\alpha` +satisfies :math:`0\leq{\alpha}\leq 1` and is called the *size* or +*significance level* of the test and is specified prior to the +experiment. Typically in high-energy physics the level of significance where an +effect is said to qualify as a discovery is at the :math:`5\sigma` level, +corresponding to an :math:`\alpha` of :math:`2.87\times10^{-7}`. In a binned +analysis, the distribution of :math:`\lambda\left(\textbf{n}\right)` can be +approximated by generating mock data or *realisations* of the null +hypothesis. However, this can be computationally difficult to perform. +Instead, according to Wilks' theorem [3]_, for sufficiently large +:math:`\textbf{x}` and provided certain regularity conditions are met (MLE +exists and is unique), :math:`-2\ln\lambda\left(\textbf{x}\right)` can be +approximated to follow a :math:`\chi^2` distribution. The :math:`\chi^2` +distribution is parameterised by :math:`k`, the *number of degrees of +freedom*, which is defined as the number of independent normally distributed +variables that were summed together. When the profile likelihood is used to +account for :math:`n` nuisance parameters, the effective number of degrees of +freedom will be reduced to :math:`k-n`, due to additional constraints placed +via profiling. From this, :math:`c\left(\alpha\right)` can be easily obtained +from :math:`\chi^2` *cumulative distribution function* (CDF) lookup +tables. + +As shown so far, the LRT informs on the best fit point of parameters +:math:`\mathbf{\theta}`. In addition to this point estimator, some sort of +*interval* estimator is also useful to report to reflect its statistical +precision. Interval estimators together with a measure of confidence are also +known as *confidence intervals* which has the important property of *coverage*. +The purpose of using an interval estimator is to have some guarantee of +capturing the parameter of interest, quantified by the coverage coefficient, +:math:`1-\alpha`. In this classical approach, one needs to keep in mind that +the interval is the random quantity, not the parameter - which is fixed. Since +we do not know the value of :math:`\theta`, we can only guarantee a coverage +probability equal to the *confidence coefficient*. There is a strong +correspondence between hypothesis testing and interval estimation. The +hypothesis test fixes the parameter and asks what sample values (the acceptance +region) are consistent with that fixed value. The interval estimation fixes +the sample value and asks what parameter values (the confidence interval) make +this sample value most plausible. This can be written in terms of the LRT +statistic: + +.. math:: + + \lambda\left(\mathbf{\theta}\right)=\frac{L\left(\mathbf{\theta}\middle|\textbf{x} + \right)}{L\left(\mathbf{\hat\theta}\middle|\textbf{x}\right)} + + +The confidence interval is then formed as +:math:`C\left(\textbf{x}\right)=\{\mathbf{ +\theta}:\lambda\left(\mathbf{\theta}\right)\geq c\left(1-\alpha\right)\}`, such +that :math:`P_{\mathbf{\theta}}\left(\mathbf{\theta}\in +C\left(\textbf{x}\right)\right)\geq1-\alpha`. Assuming Wilks' theorem holds, +this can be written more specifically as +:math:`C\left(\textbf{x}\right)=\{\mathbf{\theta}:-2\ln\lambda\left( +\mathbf{\theta}\right) \leq\text{CDF}_{\chi^2}^{-1}\left(k,1-\alpha\right)\}`, +where :math:`\text{CDF}_{\chi^2}^{-1}` is the inverse CDF for a :math:`\chi^2` +distribution. One again, the effect of nuisance parameters can be accounted +for by using the profile likelihood analogously to hypothesis testing. + +Bayesian Statistics +------------------- +The Bayesian approach to statistics is fundamentally different to the +classical (*frequentist*) approach that has been taken so far. In the +classical approach the parameter :math:`\theta` is thought to be an unknown, but +fixed quantity. In a Bayesian approach, :math:`\theta` is considered to be a +quantity whose variation can be described by a probability distribution +(called the *prior distribution*), which is subjective based on the +experimenter's belief. The Bayesian approach is also different in terms of +computation, whereby the classical approach consists of optimisation problems +(finding maxima), the Bayesian approach often results in integration +problems. + +In this approach, when an experiment is performed it updates the prior +distribution with new information. The updated prior is called the +*posterior distribution* and is computed through the use of Bayes' +Theorem [4]_: + +.. math:: + + \pi\left(\mathbf{\theta}\middle|\textbf{x}\right)= + \frac{f\left(\textbf{x}\middle|\mathbf{\theta}\right) + \pi\left(\mathbf{\theta}\right)}{m\left(\textbf{x}\right)} + +where :math:`\pi\left(\mathbf{\theta}\middle|\textbf{x}\right)` is the +posterior distribution, +:math:`f\left(\textbf{x}\middle|\mathbf{\theta}\right)\equiv +L\left(\mathbf{\theta}\middle|\textbf{x}\right)` is the likelihood, +:math:`\pi\left(\mathbf{\theta}\right)` is the prior distribution and +:math:`m\left(\textbf{x}\right)` is the *marginal distribution*, + +.. math:: + + m\left(\textbf{x}\right)=\int f\left(\textbf{x}\middle|\mathbf{\theta}\right)\pi\left( + \mathbf{\theta}\right)\text{d}\mathbf{\theta} + +The *maximum a posteriori probability* (MAP) estimate of the parameters +:math:`\mathbf{\theta}`, denoted by +:math:`\mathbf{\hat\theta}_\text{MAP}\left(\textbf{x} \right)` are attained by +maximising :math:`\pi\left(\mathbf{\theta}\middle|\textbf{x} \right)`. This +estimator can be interpreted as an analogue of the MLE for Bayesian estimation, +where the distribution has become the posterior. An important distinction to +make from the classical approach is in the treatment of the nuisance +parameters. The priors on :math:`\theta_\nu` are naturally included as part of +the prior distribution :math:`\pi\left(\mathbf{\theta}_\nu\right)= +\pi\left(\mathbf{\theta}_\nu\middle|\textbf{y}\right)` based on a past +measurement :math:`\textbf{y}`. Then the posterior distribution can be obtained +for :math:`\theta_I` alone by *marginalising* over the nuisance parameters: + +.. math:: + \pi\left(\mathbf{\theta}_I\middle|\textbf{x}\right)=\int \pi\left(\mathbf{\theta}_I,\mathbf{\theta}_\nu\middle|\textbf{x}\right)\text{d}\mathbf{\theta}_\nu + +In contrast to the classical approach, where an interval is said to have +coverage of the parameter :math:`\theta`, the Bayesian approach allows one to +say that :math:`\theta` is inside the interval with a probability +:math:`1-\beta`. This distinction is emphasised by referring to Bayesian +intervals as *credible intervals* and :math:`\beta` is referred to here as the +*credible coefficient*. Therefore, both the interpretation and construction is +more straightforward than for the classical approach. The credible interval is +formed as: + +.. math:: + \pi\left(\mathbf{\theta}\in A\middle|\textbf{x}\right)=\int_A\pi\left( + \mathbf{\theta}\middle|\textbf{x}\right)\text{d}\mathbf{\theta}\geq1-\beta + +This does not uniquely specify the credible interval however. The most useful +convention when working in high dimensions is the *Highest Posterior Density* +(HPD) credible interval. Here the interval is constructed such that the +posterior meets a minimum threshold :math:`A\left(\textbf{x}\right)=\{\mathbf{ +\theta}:\pi\left(\mathbf{\theta}\middle|\textbf{x}\right)\geq +t\left(1-\beta\right)\}`. This can be seen as an interval starting at the MAP, +growing to include an area whose integrated probability is equal to +:math:`\beta` and where all points inside the interval have a higher posterior +value than all points outside the interval. + +In the Bayesian approach, hypothesis testing can be generalised further to +*model selection* in which possible models (or hypotheses) +:math:`\mathcal{M}_0,\mathcal{M}_1,\dots,\mathcal{M}_k` of the data +:math:`\textbf{x}` can be compared. This is again done through Bayes theorem +where the posterior probability that :math:`\textbf{x}` originates from a model +:math:`\mathcal{M}_j` is + +.. math:: + + \pi\left(\mathcal{M}_j\middle|\textbf{x}\right)=\frac{ + f\left(\textbf{x}\middle|\mathcal{M}_j\right)\pi\left(\mathcal{M}_j\right)}{ + M\left(\textbf{x}\right)}\quad\text{where}\quad + M\left(\textbf{x}\right)=\sum_{i=0}^kf\left(\textbf{x}\middle|\mathcal{M}_i\right) + \pi\left(\mathcal{M}_i\right) + + +the likelihood of the model +:math:`f\left(\textbf{x}\middle|\mathcal{M}_j\right)` can then be written as +the marginal distribution over model parameters :math:`\mathbf{\theta}_j`, also +referred to as the *evidence* of a particular model: + +.. math:: + + \mathcal{Z}_j\left(\textbf{x}\right)=f\left(\textbf{x}\middle|\mathcal{M}_j\right)= + \int f_j\left(\textbf{x}\middle|\mathbf{\theta}_j\right) + \pi_j\left(\mathbf{\theta}_j\right)\text{d}\mathbf{\theta}_j + + +This was seen before as just a normalisation constant above; however, this +quantity is central in Bayesian model selection, which for two models +:math:`\mathcal{M}_0` and :math:`\mathcal{M}_1` is realised through the ratio +of the posteriors: + +.. math:: + + \frac{\pi\left(\mathcal{M}_0\middle|\textbf{x}\right)} + {\pi\left(\mathcal{M}_1\middle|\textbf{x}\right)}=B_{\,0/1}\left(\textbf{x}\right) + \frac{\pi\left(\mathcal{M}_0\right)}{\pi\left(\mathcal{M}_1\right)}\quad + \text{where}\quad B_{\,0/1}\left(\textbf{x}\right)= + \frac{\mathcal{Z}_0\left(\textbf{x}\right)}{\mathcal{Z}_1\left(\textbf{x}\right)} + +here we denote the quantity :math:`B_{\,0/1}\left(\textbf{x}\right)` as the +*Bayes factor* which measures the relative evidence of each model and +can be reported independent of the prior on the model. It can be interpreted +as the *strength of evidence* for a particular model and a convention +formulated by Jeffreys [5]_ is one way to interpret this +inference, as shown here: + +.. figure:: _static/jeffreys.png + :width: 500px + :align: center + + List of Bayes factors and their inference convention according to Jeffreys' + scale [5]_. + + +Markov Chain Monte Carlo +------------------------ +The goal of a Bayesian inference is to maintain the full posterior probability +distribution. The models in question may contain a large number of parameters +and so such a large dimensionality makes analytical evaluations and +integrations of the posterior distribution unfeasible. Instead the posterior +distribution can be approximated using *Markov Chain Monte Carlo* (MCMC) +sampling algorithms. As the name suggests these are based on *Markov chains* +which describe a sequence of random variables :math:`\Theta_0,\Theta_1,\dots` +that can be thought of as evolving over time, with probability of transition +depending on the immediate past variable, :math:`P\left(\Theta_{k+1}\in +A\middle|\theta_0,\theta_1,\dots,\theta_k\right)= P\left(\Theta_{k+1}\in +A\middle|\theta_k\right)` [6]_. In an MCMC algorithm, Markov chains are +generated by a simulation of *walkers* which randomly walk the parameter space +according to an algorithm such that the distribution of the chains +asymptotically reaches a *target density* :math:`f\left(\theta\right)` that is +unique and stationary (i.e. no longer evolving). This technique is +particularly useful as the chains generated for the posterior distribution can +automatically provide the chains of any marginalised distribution, e.g. a +marginalisation over any or all of the nuisance parameters. + +The *Metropolis-Hastings* (MH) algorithm [7]_ is the most well known MCMC +algorithm and is shown in this algorithm: + +.. figure:: _static/mh.png + :width: 700px + :align: center + + Metropolis-Hastings Algorithm [6]_, [7]_ + +An initialisation state (seed) is first chosen, :math:`\theta^{(t)}`. The +distribution :math:`q\left(\theta'\middle|\theta\right)`, called the *proposal +distribution*, is drawn from in order to propose a candidate state +:math:`\theta'_t` to transition to. Commonly a Gaussian distribution is used +here. Then the MH *acceptance probability*, +:math:`\rho\left(\theta,\theta'\right)` defines the probability of either +accepting the candidate state :math:`\theta'_t`, or repeating the state +:math:`\theta^{(t)}` for the next step in the Markov chain. The acceptance +probability always accepts the state :math:`\theta'_t` such that the ratio +:math:`f\left(\theta'_t\right)/\,q\left(\theta'_t\middle| \theta^{(t)}\right)` +has increased compared with the previous state +:math:`f\left(\theta^{(t)}\right)/\,q\left(\theta^{(t)}\middle|\theta'_t\right)`, +where :math:`f` is the *target density*. Importantly, in the case that the +target density is the posterior distribution, +:math:`f\left(\theta\right)=\pi\left(\theta\middle|\textbf{x}\right)`, the +acceptance probability only depends on the ratio of posteriors +:math:`\pi\left(\theta'\middle|\textbf{x}\right)/\pi\left(\theta\middle|\textbf{x}\right)`, +crucially cancelling out the difficult to calculate marginal distribution +:math:`m\left(\textbf{x}\right)`. As the chain evolves, the target density +relaxes to a stationary state of samples which, in our case, can be used to map +out the posterior distribution. + +To reduce the impact of any biases arising from the choice of the +initialisation state, typically the first few chains after a *burn-in* period +are discarded, after which the chains should approximate better the target +density :math:`f`. While the MH algorithm forms the foundation of all MCMC +algorithms, its direct application has two disadvantages. First the proposal +distribution :math:`q\left(\theta'\middle|\theta\right)` must be chosen +carefully and secondly the chains can get caught in local modes of the target +density. More bespoke and sophisticated implementations of the MH algorithm +exist, such as *affine invariant* algorithms [8]_ which use already-drawn +samples to define the proposal distribution or *nested sampling* algorithms +[9]_, designed to compute the evidence :math:`\mathcal{Z}`. Both these types +will be used in the Bayesian approach of this analysis. A more in depth look +into the topic of MCMC algorithms is discussed in detail in Robert and Casella +[6]_ or Collin [10]_. + +Anarchic Sampling +----------------- +As with any Bayesian based inference, the prior distribution used for a given +parameter needs to be chosen carefully. In this analysis, a further technology +needs to be introduced in order to ensure the prior distribution is not biasing +any drawn inferences. More specifically, the priors on the standard model +mixing parameters are of concern. These parameters are defined in the +:doc:`physics` section as a representation of the :math:`3\times3` unitary +mixing matrix :math:`U`, in such a way that any valid combination of the mixing +angles can be mapped into a unitary matrix. The ideal and most ignorant choice +of prior here is one in which there is no distinction among the three neutrino +flavours, compatible with the hypothesis of *neutrino mixing anarchy*, which is +the hypothesis that :math:`U` can be described as a result of random draws from +an unbiased distribution of unitary :math:`3\times3` matrices [11]_, [12]_, +[13]_, [14]_. Simply using a flat prior on the mixing angles however, does +not mean that the prior on the elements of :math:`U` is also flat. Indeed doing +this would introduce a significant bias for the elements of :math:`U`. +Statistical techniques used in the study of neutrino mixing anarchy will be +borrowed in order to ensure that :math:`U` is sampled in an unbiased way. Here, +the anarchy hypothesis requires the probability measure of the neutrino mixing +matrix to be invariant under changes of basis for the three generations. This +is the central assumption of *basis independence* and from this, distributions +over the mixing angles are determined by the integration invariant *Haar +measure* [13]_. For the group :math:`U(3)` the Haar measure is given by the +volume element :math:`\text{d} U`, which can be written using the mixing angles +parameterisation: + +.. math:: + + \text{d} U=\text{d}\left(\sin^2\theta_{12}\right)\wedge\, + \text{d}\left(\cos^4\theta_{13}\right)\wedge\, + \text{d}\left(\sin^2\theta_{23}\right)\wedge\,\text{d}\delta + + +which says that the Haar measure for the group :math:`U(3)` is flat in +:math:`\sin^2\theta_{12}`, :math:`\cos^4\theta_{13}`, :math:`\sin^2\theta_{23}` +and :math:`\delta`. Therefore, in order to ensure the distribution over the +mixing matrix :math:`U` is unbiased, the prior on the mixing angles must be +chosen according to this Haar measure, i.e. in :math:`\sin^2\theta_{12}`, +:math:`\cos^4\theta_{13}`, :math:`\sin^2\theta_{23}` and :math:`\delta`. You +can see an example on this in action in the :doc:`examples` notebooks. + +This also needs to be considered in the case of a flavour composition +measurement using sampling techniques in a Bayesian approach. In this case, the +posterior of the measured composition :math:`f_{\alpha,\oplus}` is sampled over +as the parameters of interest. Here, the effective number of parameters can be +reduced from three to two due to the requirement :math:`\sum_\alpha +f_{\alpha,\oplus}=1`. Therefore, the prior on these two parameters must be +determined by Haar measure of the flavour composition volume element, +:math:`\text{d} f_{e,\oplus}\wedge\text{d} f_{\mu,\oplus}\wedge\text{d} +f_{\tau,\oplus}`. The following *flavour angles* parameterisation is found to +be sufficient: + +.. math:: + + \begin{align} + f_{\alpha,\oplus}= + \begin{pmatrix} + f_{e,\oplus} \\ f_{\mu,\oplus} \\ f_{\tau,\oplus} + \end{pmatrix}= + \begin{pmatrix} + \sin^2\phi_\oplus\,\cos^2\psi_\oplus \\ + \sin^2\phi_\oplus\,\sin^2\psi_\oplus \\ + \cos^2\phi_\oplus + \end{pmatrix} \\ + \text{d} f_{e,\oplus}\wedge\text{d} f_{\mu,\oplus}\wedge\text{d} f_{\tau,\oplus}= + \text{d}\left(\sin^4\phi_\oplus\right)\wedge + \text{d}\left(\cos\left(2\psi_\oplus\right)\right) + \end{align} + + +.. [1] Casella, G. & Berger, R. Statistical Inference isbn: 9780534243128. https://books.google.co.uk/books?id=0x\_vAAAAMAAJ (Thomson Learning, 2002). +.. [2] Neyman, J. & Pearson, E. S. On the Problem of the Most Efficient Tests of Statistical Hypotheses. Phil. Trans. Roy. Soc. Lond. A231, 289–337 (1933). +.. [3] Wilks, S. S. The Large-Sample Distribution of the Likelihood Ratio for Testing Composite Hypotheses. Annals Math. Statist. 9, 60–62 (1938). +.. [4] Bayes, R. An essay toward solving a problem in the doctrine of chances. Phil. Trans. Roy. Soc. Lond. 53, 370–418 (1764). +.. [5] Jeffreys, H. The Theory of Probability isbn: 9780198503682, 9780198531937, 0198531931. https://global.oup.com/academic/product/the-theory-of-probability-9780198503682?cc=au&lang=en&#(1939). +.. [6] Robert, C. & Casella, G. Monte Carlo Statistical Methods isbn: 9781475730715. https://books.google.co.uk/books?id=lrvfBwAAQBAJ (Springer New York, 2013) +.. [7] Hastings, W. K. Monte Carlo sampling methods using Markov chains and their applications. Biometrika 57, 97–109. issn: 0006-3444 (Apr. 1970). +.. [8] Goodman, J. & Weare, J. Ensemble samplers with affine invariance. *Communications in Applied Mathematics and Computational Science*, Vol. 5, No. 1, p. 65-80, 2010 5, 65–80 (2010). +.. [9] Skilling, J. Nested Sampling. AIP Conference Proceedings 735, 395–405 (2004) +.. [10] Collin, G. L. *Neutrinos, neurons and neutron stars : applications of new statistical and analysis techniques to particle and astrophysics* PhD thesis (Massachusetts Institute of Technology, 2018). http://hdl.handle.net/1721.1/118817. +.. [11] De Gouvea, A. & Murayama, H. Neutrino Mixing Anarchy: Alive and Kicking. Phys. Lett. B747, 479–483 (2015). +.. [12] Hall, L. J., Murayama, H. & Weiner, N. Neutrino mass anarchy. Phys. Rev. Lett. 84, 2572–2575 (2000). +.. [13] Haba, N. & Murayama, H. Anarchy and hierarchy. Phys. Rev. D63, 053010 (2001). +.. [14] De Gouvea, A. & Murayama, H. Statistical test of anarchy. Phys. Lett. B573, 94–100 (2003). |
