Saturday, July 11, 2015

Torch 7 Deep Learning Installation (Ubuntu 14.04)


As an alternative to Caffe framework, Torch 7, which is maintained by Facebook Research provides a more flexible framework for machine learning algorithms, especially for popular Deep Learning. Difference from the Caffe that is developed by C++ & CUDA (see here for source codes) and wrapped with multiple script language such as python and matlab, Torch 7 is developed using C & CUDA (see here for source code) under the hood while the interface is provided through LuaJIT, an easy and efficient script language.  In this post, I install the Torch 7 framework on my Ubuntu 14.04 laptop and run the basic test programs on it, using only CPU.

To find out what is Lua, you can refer this article (http://cellux.github.io/articles/introduction-to-luajit-part-1/)


Torch 7:

Website:
http://torch.ch/
Github:
https://github.com/torch/torch7


Installation Instruction:
http://torch.ch/docs/getting-started.html#_

According to this instruction:

# in a terminal, run the commands curl -sk https://raw.githubusercontent.com/torch/ezinstall/master/install-deps | bash

The first line is to install dependencies that are required for Torch 7,
Go into this bash file (https://raw.githubusercontent.com/torch/ezinstall/master/install-deps), you can see that the dependencies are (part of it them listed):
OpenBLAS:  an optimized BLAS library, BLAS, called Basic Linear Algebra Subprograms, provides standard building blocks for performing basic vector and matrix operations. 
gcc/g++, cmake, git, gnuplot, python, etc.

This will take several minutes (mine for 20 mins) according to your computer. 

git clone https://github.com/torch/distro.git ~/torch --recursive
cd ~/torch; ./install.sh
Second and third lines install LuaJIT, LuaRocks, and use LuaRocks to install Torch and other packages. You can see some of the packages in their github (https://github.com/torch/distro).  Torch is installed on your computer under the path: "~/torch/".

This will take several minutes (mine for ~2 min clone from git and ~15 min to install)


Nicely Done! 

After several minutes installation, you can just type:

"th" in terminal to run torch. 

Note that if it shows "th: command not found", please refresh the bashrc file by command:
"source ~/.bashrc"


Then command "th" will give you this:





Test Torch 7:
Install torch/demos  from github by command:

"git clone https://github.com/torch/demos"

Go into the demos folder, then just run one of the demos:
"th demos/train-on-cifar/train-on-cifar.lua"

Done! You will see the demo runs to train different models on the CIFAR dataset. The screenshot is like this:
















Saturday, January 3, 2015

Caffe Deep Learning Framework Installation (Ubuntu 12.04, CPU only)

Caffe Deep Learning Framework Installation

Generally, the installation of the popular opensource deep learning framework "Caffe" is pretty straightforward on Ubuntu. If you are using the latest (or 12.04/12.10) version of Ubuntu, and familiar with the basic package installing method, usually it will be not hard for the installation of "Caffe".

According to the installation instructions, the general steps are:
  1. Install the prerequisite packages.
  2. Download the "Caffe" source code
  3. Modify the makefile.config file in the Caffe folder
  4. Make and install "Caffe"


Install the prerequisite packages

(1) Atlas:  

This is used for basic matrix and vector computation
(http://math-atlas.sourceforge.net/)
(http://www.netlib.org/blas/)

Ubuntu: sudo apt-get install libatlas-base-dev
Installs to: /usr/include/atlas

(2) Boost:

This is a C++ library. (http://www.boost.org/)

Ubuntu: sudo apt-get install libboost-all-dev
Installs into  "/usr/include/boost"

(3) OpenCV(If you have already installed opencv, no need to use this):

Ubuntu: sudo apt-get install libopencv-dev

(4) hdf5:

This a popular data format for scientific computing (http://www.hdfgroup.org/HDF5/)

Ubuntu: sudo apt-get install libhdf5-serial-dev
Install into "/usr/lib/"

(5) Python (if you don't have it and want to use the python wrapper):

Ubuntu: sudo apt-get install the python-dev


If you are using either Ubuntu 12 or 14, the above packages can be installed once in the terminal:

sudo apt-get install libprotobuf-dev libleveldb-dev libsnappy-dev libopencv-dev libboost-all-dev libhdf5-serial-dev

(6) glog:

This is a logging library for C++ (http://code.google.com/p/google-glog/)
To install on Ubuntu 12:


  • Download the package: 
    • wget https://google-glog.googlecode.com/files/glog-0.3.3.tar.gz 
  • Unzip the package: 
    • tar zxvf glog-0.3.3.tar.gz
  • Install the package:
    • cd glog-0.3.3
    • ./configure
    • make
    • sudo make install

(7) gflags


      Commandline flags processing library (https://code.google.com/p/gflags/?redir=1)

  • Download the package:
    • wget https://github.com/schuhschuh/gflags/archive/master.zip
  • Unzip the package:
    • unzip master.zip
  • Install the package:
    • cd gflags-master
    • mkdir build && cd build
    • export CXXFLAGS="-fPIC" && cmake .. && make VERBOSE=1
    • make
    • sudo make install

I have met some errors when doing the "make" step, which shows some errors like:

In file included from ../src/logging_unittest.cc:58:0:../src/googletest.h:177:35: error: expected ‘;’ before ‘fs’
     static void Run() { FlagSaver fs; RunTest(); }                                         ^
../src/logging_unittest.cc:866:1: note: in expansion of macro ‘TEST’
 TEST(SafeFNMatch, logging) { ^
../src/logging_unittest.cc: In static member function ‘static void Test_Strerror_logging::Run()’:
../src/googletest.h:177:25: error: ‘FlagSaver’ was not declared in this scope
     static void Run() { FlagSaver fs; RunTest(); }      \"

The solution of this error is adding :

#ifdef HAVE_LIB_GFLAGS
   using namespace gflags;
#endif

in the following files:
glog-0.3.3/src/demangle_unittest.cc
glog-0.3.3/src/logging_unittest.cc
glog-0.3.3/src/signalhandler_unittest.cc
glog-0.3.3/src/symbolize_unittest.cc
glog-0.3.3/src/utilities_unittest.cc

The insertion position is below the "using namespace xxx" in the source code files.

(8) lmdb

Memory Mapped Database (http://symas.com/mdb/)

  • Clone the package from git (must install git firstly):
    • git clone git://gitorious.org/mdb/mdb.git
  • Install the package:
    • cd mdb/libraries/liblmdb
    • make
    • sudo make install


If you are using Ubuntu 14, it would be much easier to install these packages by:
sudo apt-get install libgflags-dev libgoogle-glog-dev liblmdb-dev protobuf-compiler


After install all the packages required, don't forget to run "ldconfig",  otherwise it is possible to have "library cannot be found" error when install Caffe.




Download the "Caffe" source code

You can either use the "git clone https://github.com/BVLC/caffe.git" or just click the "Download ZIP" button in the github page (https://github.com/BVLC/caffe)
to download the Caffe source code.




Modify the makefile.config file in the Caffe folder

After unzip the download caffe package, go into the folder, copy and rename the file "Makefile.config.example"  to "Makefile.config".

Open the Makefile.config file and modify:


  • Uncomment the cpu_only flag (in my case I just use CPU for now)

                  # CPU-only switch (uncomment to build without GPU support).
                     CPU_ONLY := 1



  • Check the matlab or python path if you want to compile in either interface
                  # This is required only if you will compile the matlab interface.
                  # MATLAB directory should contain the mex binary in /bin.
                  # MATLAB_DIR := /usr/local
                  # MATLAB_DIR := /Applications/MATLAB_R2012b.app

                  # NOTE: this is required only if you will compile the python interface.
                  # We need to be able to find Python.h and numpy/arrayobject.h.
                 PYTHON_INCLUDE := /usr/include/python2.7 \
               /usr/lib/python2.7/dist-packages/numpy/core/include
                 # Anaconda Python distribution is quite popular. Include path:
                 # PYTHON_INCLUDE := $(HOME)/anaconda/include \
 # $(HOME)/anaconda/include/python2.7 \
 # $(HOME)/anaconda/lib/python2.7/site-packages/numpy/core/include
  • Save the config file.

 

Make and install "Caffe"

Just type the following commands to make the Caffe:

make all
make test
make runtest

Then you will see the following if everything goes well:






Thursday, November 13, 2014

Tutorial and Illustration: Gaussian Mixture Model

Gaussian Mixture Model


Recently, I started to learn some basic concepts and implementations about the famous "Gaussian Mixture Model". In this post, I'd like to share some key notes about this technique.

1. Clustering


Let's start with the problem of clustering.

Goal: represent a data set in terms of K clusters each of which is summarized by a prototype \(\mu_{k}\).

One method to find these K cluster centers is to use the K-means technique.
In general, there are three major steps in its simplest version:

  1. Initialize prototypes (cluster centers): random select K cluster centers.
  2. Assign each data point to its nearest prototype.
  3. Update prototypes to be the cluster means.

Repeat step 2 and 3 until the algorithm converges (when the cluster centers are no longer changed).
Note that the simplest version is based on Euclidean distance.
Figure 1. K-means clustering. (from http://simplystatistics.org/)

To easily understand the general idea of clustering, consider the case that we have a dataset \(D=\{x_{1},x_{2},\cdots x_{n}\}\), each data \(x_i\) is a vector of length n, where n is the feature dimension. If n = 2, dataset D can be drawn in a 2D plane, each axis represents one of the two features (see Figure 1). For an unsupervised learning system, if we know there are totally 3 different kinds of data (3 class labels), then clustering can be used to classify the data point. The underlying assumption is: data belonging to the same class are supposed to have similar patterns so that they are more closer to each other. While data from different classes may have different patterns and their distance is far. So clustering is trying to find the class centers of each class, the more closer the data point lies in, the higher probability it has belonging to this class.  

2. Gaussian Distribution


    Gaussian Distribution in its 1-dimension form can be written as:
\[f(x, \mu, \sigma) = \dfrac{1}{\sigma\sqrt{2\pi}}e^{-\dfrac{(x-\mu)^2}{2\sigma^2}}\]
where, \(x\) is the data point, \(\mu\) is the mean of the data, and \(\sigma\) is the covariance.
In its 2D case:
\[f(x,\mu,\Sigma)=\dfrac{1}{(2\pi)^{d/2}\sqrt{|\Sigma|}}exp(-\dfrac{1}{2}(x-\mu)^{T}\Sigma^{-1}(x-\mu))\]
where, \(\Sigma\) is now the covariance matrix of the Gaussian.

Intuitively, \(\mu\) controls the "center position" of the Gaussian, \(\Sigma\) controls the "shape" the Gaussian.


Figure 2. Gaussian distribution (from en.wikipedia.org)
In many cases, when we want to model the random data (or more formally random variables), Gaussian distribution is very commonly used. Other common distributions are Bernoulli (p), Discrete Uniform, or Poisson distributions. 

3. Maximum likelihood Estimation


   Given a dataset \(X=\{x_{n}\}, n=1..N\), draw from an unknown distribution,  assume the data follow a single Gaussian distribution, we can model the data using a Gaussian function.

Assume observed data points generated independently, the likelihood probability of the data can be written as:
\[p(X|\mu,\Sigma)=\Pi_{n=1}^{N}p(x_{n}|\mu,\Sigma)\], where \(p(x|\mu,\Sigma)\) is Gaussian distribution. This is called likelihood function.

The goal is to find parameters \(\mu\) and \(\Sigma\), of which the Gaussian distribution fits the data. In other words, the Gaussian is most likely to be the "true" distribution of the data.

One of the solution of this problem, it called Maximum likelihood  \(p(D|\mu,\Sigma)\). This can be also viewed as an optimization problem that:
\[\theta^{*}=\arg\max_{\theta}p(D|\theta)=\arg\max_{\theta}\Pi_{i=1}^{N}p(x_{i}|\theta)\]
,where \(\theta=\{\mu,\Sigma\}\)

To solve the optimization problem, there are many approaches. E.g.,
  1. Maximize the log likelihood.  Take the "log" of the likelihood function, compute the derivative, set the gradient to zero, solve the equation to get the value of parameters.
  2. Expectation-Maximization (EM) approach. General idea of this approach lies in two steps, specifically:
    • E-Step: Estimate the distribution of the hidden variable given the data and the current value of the parameters.
    • M-Step: Maximize the joint distribution of the data and the hidden variable.
In this post, we will not cover all the math part of the approach. But intuitively,  we can consider this approach like:  Step 1, compute some measurement of the data using current parameters. Step 2, maximize the measurement by updating the parameters. Repeat doing these two steps, until the approach convergence. More details see the next section.

4. Gaussian Mixture Model


In order to model more complex data distribution, instead of using one Gaussian function, the linear combination of several Gaussian functions can be used. This is the basic concept of the Gaussian Mixture Model (GMM). Specifically,
\[p(x)=\Sigma_{k=1}^{K}\pi_{k}\mathcal{N}(X|\mu_{k},\Sigma_{k})\]
where \(\forall k:\pi_{k}\geq0,\Sigma_{k=1}^{K}\pi_{k}=1\)

Recall the definition of Gaussian, for a D-dimensional feature vector, \(x\), the mixture component \(p_i\), has a parameter of Dx1 mean vector and a DxD covariance matrix:
\[p_{i}(x)=\dfrac{1}{(2\pi)^{D/2}|\Sigma_{i}|^{1\2}}exp\left\{ -\dfrac{1}{2}(x-\mu_{i})'(\Sigma_{i})^{-1}(x-\mu_{i})\right\} \]

Fitting the Gaussian Mixture model is to get the parameters: \(\lambda=\{\omega_{i},\mu_{i},\Sigma_{i}\}\). We wish to model the data using the GMM, which means given the dataset, find the corresponding parameters :
  • Mixing paramteres \(\pi\)
  • Means \(\mu\)
  • Covariances  \(\Sigma\)
Figure 3 Gaussian Mixture Model

One of the method to fit the GMM to the data is the EM method.  The EM algorithm iteratively refines the GMM parameters to increase the likelihood of the estimated model.

The hidden variable introduced in EM method for GMM is the "labels" (which component generate each data point).  In other words, if we know for each point, which Gaussian generate it, by maximizing the likelihood of the data would results in the model parameters.
So the E-step of the EM: for each point, estimate the probability that each Gaussian generated it.
The M-step of the EM: modify the parameters according to the hidden variable to maximize the likelihood of the data (and the hidden variable).

5. Code & Example

In this section, we use the python library(module) scikit learn to show how to use GMM. 
First of all, import the related python module:
import numpy as np
from sklearn import mixture
import matplotlib.pyplot as plt

Then generate some random data for demo:
#Generate data
def gendata():
    obs = np.concatenate((1.6*np.random.randn(300, 2), 6 + 1.3*np.random.randn(300, 2), np.array([-5, 5]) + 1.3*np.random.randn(200, 2), np.array([2, 7]) + 1.1*np.random.randn(200, 2)))
    return obs

Also define the 2D Gaussian function for plotting:
def gaussian_2d(x, y, x0, y0, xsig, ysig):
    return 1/(2*np.pi*xsig*ysig) * np.exp(-0.5*(((x-x0) / xsig)**2 + ((y-y0) / ysig)**2))

To generate the GMM model:
#Generate GMM model and fit the data
def gengmm(nc=4, n_iter = 2):
    g = mixture.GMM(n_components=nc)  # number of components
    g.init_params = ""  # No initialization
    g.n_iter = n_iter   # iteration of EM method
    return g
To fit the GMM to the data, just call the .fit method, which will return the GMM with related parameters.

To plot the GMM model:
def plotGMM(g, n, pt):
    delta = 0.025
    x = np.arange(-10, 10, delta)
    y = np.arange(-6, 12, delta)
    X, Y = np.meshgrid(x, y)

    if pt == 1:
        for i in xrange(n):
            Z1 = gaussian_2d(X, Y, g.means_[i, 0], g.means_[i, 1], g.covars_[i, 0], g.covars_[i, 1])
            plt.contour(X, Y, Z1, linewidths=0.5)

    #print g.means_
    plt.plot(g.means_[0][0],g.means_[0][1], '+', markersize=13, mew=3)
    plt.plot(g.means_[1][0],g.means_[1][1], '+', markersize=13, mew=3)
    plt.plot(g.means_[2][0],g.means_[2][1], '+', markersize=13, mew=3)
    plt.plot(g.means_[3][0],g.means_[3][1], '+', markersize=13, mew=3)
    
    #plot the GMM with mixing parameters (weights)
    #i=0
    #Z2= g.weights_[i]*gaussian_2d(X, Y, g.means_[i, 0], g.means_[i, 1], g.covars_[i, 0], g.covars_[i, 1])
    #for i in xrange(1,n):
    #    Z2 = Z2+ g.weights_[i]*gaussian_2d(X, Y, g.means_[i, 0], g.means_[i, 1], g.covars_[i, 0], g.covars_[i, 1])
    #plt.contour(X, Y, Z2)


The main function:
obs = gendata()
fig = plt.figure(1)
g = gengmm(4, 100)
g.fit(obs)
plt.plot(obs[:, 0], obs[:, 1], '.', markersize=3)
plotGMM(g, 4, 0)
plt.title('Gaussian Mixture Model')
plt.show()


To intuitively show how EM approach works, Gaussian mixtures fitted in different iterations can be plotted:
g = gengmm(4, 1)
g.fit(obs)
plt.plot(obs[:, 0], obs[:, 1], '.', markersize=3)
plotGMM(g, 4, 1)
plt.title('Gaussian Models (Iter = 1)')
plt.show()

g = gengmm(4, 5)
g.fit(obs)
plt.plot(obs[:, 0], obs[:, 1], '.', markersize=3)
plotGMM(g, 4, 1)
plt.title('Gaussian Models (Iter = 5)')
plt.show()

g = gengmm(4, 20)
g.fit(obs)
plt.plot(obs[:, 0], obs[:, 1], '.', markersize=3)
plotGMM(g, 4, 1)
plt.title('Gaussian Models (Iter = 20)')
plt.show()

g = gengmm(4, 100)
g.fit(obs)
plt.plot(obs[:, 0], obs[:, 1], '.', markersize=3)
plotGMM(g, 4, 1)
plt.title('Gaussian Models (Iter = 100)')
plt.show()

The plot of different iterations in EM method shows in the following, one can see that the Gaussian models are incrementally updated and refined to fit the data. Both the means (cross markers in the figure) and covariances (curves in the figures) are updated during the iteration, until the approach converges.




                        Figure 4 Gaussian Mixture Model fitted using different EM iterations. 

If we consider the GMM with the weights (Mixing parameters \(\pi\)), the plotting is shown like the following figure:


                 Figure 5 Gaussian Mixture Model after iterations of EM approach (draw with weights)