Histograms

Practical Bayesian applications: Histograms

How many bins should I use in a histogram? Though it is typically not necessary to bin the data before estimating model parameters there are a number of somewhat principled ways of deciding on your bin size (other that choosing something the "makes it look good")

Scott's rule suggests a bin width

$\Delta_b = {3.5 \sigma \over N^{1/3}}$

with $\sigma$ is the sample standard deviation, and $N$ is the sample size. This minimizes the mean integrated square error (assumes distribution is Gaussian)

Read more…

Classification

Classification

In density estimation we estimate joint probability distributions from multivariate data sets to identify the inherent clustering. This is essentially unsupervised classification

If we have labels for some of these data points (e.g., an object is tall, short, red, or blue) we can develop a relationship between the label and the properties of a source. This is supervised classification

Read more…

Regression

The definition of regression

Often we think about regression from the perspective of maximum-likelihood (or least squares). If we consider it from the Bayesian perspective we can get a more physical intuition for how we can undertake regression in the case of errors, and limits on the data.

Read more…

Dimensionality Reduction

Dimensionality Reduction

Fitting and overfitting get worse with ''curse of dimensionality'' Bellman 1961

Think about a hypersphere. Its volume is given by

\begin{equation} V_D(r) = \frac{2r^D\pi^{D/2}}{D\ \Gamma(D/2)}, \end{equation}

where $\Gamma(z)$ is the complete gamma function, $D$ is the dimension, and $r$ the radius of the sphere.

Read more…

Time Series

Time Series Data

There is a broad range of variability signatures that we need to be sensisitve to. From transient events such as GRBs to periodic variables. Analysis methods are related to parameter estimation and model selection problems used in regression (the time variable $t$ replaces $x$). In many astronomical cases, characterization of the underlying physical processes that produced the data is key (searches for pulsating vs eclipsing variable stars)

Read more…