Very efficient linear classification and regression

Supports Python interface as well.

http://www.csie.ntu.edu.tw/~cjlin/liblinear/

Advertisements

linux sort does case-insensitive sort by default. To inactivate it, you should first export LC_ALL=C then sort.

http://www.skorks.com/2010/05/sort-files-like-a-master-with-the-linux-sort-command-bash/

rsync -Wav –progress <dir> <target_host>:<target_dir>

cd source_dir
find . -type d -depth | cpio -dumpl destination_dir

Matlab Embedding for visualization toolbox

Awesome.

t-SNE

An effective nonlinear embedding for visualization.

Dimension reduction and Low-dimensional embedding

Good slides explaning PCA, MDS, ISOMAP, and LLE.

PCA: Preserving variance.

MDS: Preserving pairwise distances.

ISOMAP: Nonlinear embedding.

LLE: Local neighborhood is linear, but globally non-linear.

Latent Factor Models (LFMs) and LDA

Simple explanation on LFM.

Latent Factor Models are also called Latent Variable Models or Factor Analysis. http://www.cs.sjtu.edu.cn/~liwujun/paper/PhdThesis.pdf

LSA figure

A cool figure easily describing SVD process for LSA.

LSA cannot address the polysem problem, but the synonym problem can be addressed by this model

LSI, LSA, pLSA, LDA, etc

•What is the difference between LSI and LSA?
–LSI refers to using this technique for indexing, or information retrieval.
–LSA refers to using it for everything else.
–It’s the same technique, just different applications.

 

•Two problems that arise using the vector space model:
–synonymy: many ways to refer to the same object, e.g. car and automobile
•leads to poor recall
–polysemy: most words have more than one distinct meaning, e.g. model, python, chip
•leads to poor precision