AstroML Cambridge

May 1st - May 22nd 2015

An introduction to AstroML

This course will provide a quick fire introduction to a number of machine learning techniques used in astrophysics. We will make use of the textbook Statistics, Data Mining, and Machine Learning in Astronomy by Ivezić, Connolly, VanderPlas, and Gray. It is not required that you have a copy, but I encourage you to bring a copy if you have it (there is one in the IoA library)

The objective of this course is to provide a hands-on introduction to machine learning that uses Python and astronomical data sets to illustrate the techniques. All figures and examples in the class will be available for download. The lectures will be presented as iPython notebooks that can be either viewed on a browser or executed as part of the class (as well as being available for downloading). To access the notebooks simply go to the link at the top of the page. From these pages you will be able to access the iPython source or submit comments and questions.

Schedule

For each class the notes will be provide in the form of iPython notebooks, which will download and install the required data sets as well as executing the machine learning applications. Some of these data sets are quite large so it is encouraged that you try out the python notebook ahead of time (as the data are cached locally to your machine the first time the notebook is run). Instructions for installing iPython and the associated software (e.g. astroML etc) are given in the installation link at the top.

The following is a draft of the schedule; there will likely be modifications depending on how quickly we cover topics in the class, and interest in diving into different machine learning techniques

Friday, May 1st, 2.00-3.00pm (Ryle)

Classifying your data and understanding how successful you were

Friday, May 8th, 2.00-3.00pm (Ryle)

How to fit models to data (including data with errors)

Friday, May 15th, 2.00-3.00pm (Ryle)

Reducing high dimensional data to the most significant directions

Friday, May 22nd, 2.00-3.00pm (Ryle)

Variable sources and how to analyze time series data

Setting Up Your Computer

It is important that each student has a laptop or other computer to use during the course. Though it is possible to use a Windows machine, a Unix-based architecture (such as Mac OSX or Linux) will make your life much easier, both for this course and in the long run.

There are two primary software requirements for this course: Python and Git, which is a very useful version control system. Details on installation can be found below.

Another possibility is to do all work online, using a Wakari account. This will allow you to work with Python entirely in your browser, with commands executed on the cloud, skipping the installation steps outlined below.

Installing Python

The most important piece of software to have is Python. In this course we will be using Python 2.7.5, though any version 2.6.x-2.7.x should work. Python 3.x has slightly different syntax but should work with all of the required packages.

Packages

In addition to the Python interpreter, a number of scientific packages will be required:

  • NumPy version 1.5+: efficient array operations
  • SciPy version 0.11+: scientific computing tools
  • matplotlib version 1.0+: plotting and visualization
  • IPython version 1.0+: interactive computing
  • Scikit-learn version 0.12+: machine learning
  • astroML: an astronomical machine learning toolkit

Installation

For installation from-scratch, I highly recommend the Anaconda Installer, a free product offered by ContinuumIO. It gives you a fast local Python installation with up-to-date packages.

  1. Download and install Anaconda on your system, by going to the above link, downloading the appropriate package. For Mac OSX you can download the dmg and follow the instructions here
  2. Open a new terminal window, and make sure your $PATH variable points to the Anaconda installation. You can do this by typing
    [~]$ which python
    The result should show the path to the newly-installed anaconda folder. If not, you must modify your $PATH variable to point to the anaconda directory.
  3. Update your Anaconda distribution by typing
    [~]$ conda update conda
    Conda is the package management system that comes with anaconda.
  4. Update your IPython installation to version 3.0 using
    [~]$ conda update ipython
  5. Check whether your IPython notebook is working correctly: type
    [~] ipython notebook
    and a browser window should open to the notebook dashboard.

More detailed system-specific help can be found on the Anaconda installation page.

Even if you have an existing Python installation, I'd still recommend installing Anaconda for this course. It will make your life a lot easier.

Installing astroML

the astroML packages and add-ons can be installed using pip

[~]$ pip install astroML
then
[~]$ pip install astroML_addons
the most uptodate code can be downloaded from Githib and by following these instructions

Installing Git

Git can be installed rather easily on OSX or linux. Detailed instructions can be found here

Linux

If you are on linux, you can use the package management system, via

[~]$ sudo apt-get install git
or
[~]$ yum install git-core

Mac OSX

On Mac OSX, you can either use the Mac installer at

http://code.google.com/p/git-osx-installer
or if you use Macports, type
[~]$ sudo port install git-core