Tuesday, December 1, 2009

Human Action Recognition and Localization

Semi-supervised Human Action Recoginition and Localization using Spatially and Temporally Integrated Local Features

Joint work by Tuan Hue THI, Jian Zhang (NICTA-Sydney), Li Cheng (TTI-Chicago), Li Wang (SEU-China), and Shin'ichi Satoh (NII-Tokyo)


This paper presents a novel framework in recognizing and localizing human action in video sequences using weakly supervised approach. Local space-time features are detected from video shots and represented in histogram vector of oriented gradients and flows. A Sparse Bayesian Kernel classification model is built to represent the compact characteristics of supervised data and adaptive to unknown data, which purpose is to label each local feature according to relevant class of action. Group Constraints among local features and Markov Chain Monte Carlo sampling are augmented into the model via data association to boost up the performance in accuracy and processing time. The labeling assignment results are first passed into a non-linear Support Vector Machine to decide the class action of the whole video shot. Then the same label values are fed into a Conditional Random Field to propagate label information among neighboring regions, hence, accurately locate the event areas. Testing of this proposed weakly trained model on the classical KTH dataset, the realistic Hollywood Human Action dataset and the challenging TRECVID event detection dataset has yielded the comparable results to most of state-of-the-art fully supervised techniques.

Local Feature Description as Video Representation

Space Time Interest Points detection approach developed from Laptev [1] is used to detect and describe local features of a video shot, hence its representation, as seen in the following few snapshots of TRECVID dataset.


Sparse Bayesian Kernel Machine for Labeling task

Video action labeling is a tedious and time-consuming task. The vast amount of growing video has also brought in the need for a technique that can learn from a little supervised amount of data and still be able to catch similar motion patterns in completely unknown environment. Among many popularly known classification techniques, Bayesian Learning approach seems to fit most to our interest of semi-supervised learning task, since it is more flexible in representing the diversion of learning and testing data source, as well as explicitly shows the link between each hypothesis with its computed score. The core idea of Bayesian approach is to analyze the approximation of the posterior distribution based on multiple trained hypotheses. We extend
the Bayesian idea of object recognition in image from Carbonetto et al. [2] into human action recognition in video with more constraints on the structure among interest points in both space and time. Each action class will have one classifier trained from its small supervised set, the negative samples are randomly sampled from the pool of all other classes.

The following directed graph visualize the semi-supervised sparse Bayesian Kernel classification framework with Group Statistics constraints augmentation.


In this design, x is the observation of interest point description, y is the class label indicating if this interest point belongs to this class action with binary values of -1 and 1, shaded is for supervised nodes, blank is for unknowns, squared is for fixed hyperparameters, z is the low-dimensional latent variables introduced to simplify the model computation. Gamma and Beta are the parameters representing the probit link function model introduced by Tham et al. [3], where Gamma is the feature selection parameters following a Bernoulli distribution of success rate T, which value is again defined by a Beta distribution of parameters a and b. Beta is the regression coefficients vector regularized by the term Delta square, which in turn assigned an inverse Gamma prior with two hyperparameters Mu and Nu.

Computation is done using Markov Chain Monte Carlo sampling in addition with a blocked Gibbs sampler. The outputs of the interest point labeling taks are shown in the following figure with greens representing interest points belong to the action class PersonRuns and yellows are those from the noisy background.


Action Classification using Support Vector Machine

Those positive labels are then passed into a non-linear Support Vector Machine for the classification task with weighting against probabilistic output to find the most possible action class, results are shown for action class ObjectPut in the folowing figure.


Action Localization using Conditional Random Fields Propagation

In video processing research domain, the concept of human activity or human event is rather abstract and loosely defined, especially for those videos obtained from the web or real world surveillance scenarios, the automatic retrieval of event regions is very essential and helpful for the activity analysis society. We tackle this challenging task by extending the work of Carbonetto et al. [1] from image processing domain to video segmentation, we also introduce a new way to define event region in a video shot, which can be used later for many other analysis purposes
like 3D modeling of objects involved.

We combine space and time correlation of all interest point regions into a Conditional Random Fields model, by extracting a cube of proportional size around each interest point center. In this case, not only we represent the relationship between points in the frame, but also point lying in adjacent frames. Two kinds of CRF potentials are defined, for each interest point region and for the relative position of each pair of regions.

The following figure shows the postively labeled interest points with CRF scorings of each region.


We use a Event Block definition to represent each event region in the video shot, which maintains the absolute and relative relationship of all interest points around what we call the integral volume of all highly weighted regions.

Experimental Results

Testing of 5% of supervision on three datasets yield the overall rate of 82.17% on KTH, 25.63% on HOHA, and 12.88% on TRECVID, which are highly comparable with other supervised systems.

Overall semi-supervised result on KTH is compared with other state-of-the-art fully supervised systems as in the following table.

All steps of the recognition and localization of PersonRuns action are demonstrated in the following steps.


Our system framework for semi-supervised human action recognition and localization using spatially and temporally local features.




Testing demonstrations for our system of semi-supervised human action recognition and localization using spatially and temporally integrated local features. 8 action models of TRECVID dataset are trained and tested on each unknown video sequence, those actions considered are: CellToEar, Embrace, ObjectPut, OpposingFlow, PersonMeet, PersonSplitUp, PersonRuns, and Pointing.




Action CellToEar on CAM1




Action CellToEar on CAM2




Action CellToEar on CAM3




Action CellToEar on CAM5




References

[1] P. Carbonetto, G. Dorko, C. Schmid, H. Kuck, and N. de Freitas. Learning to recognize objects with little supervision. Intl. Journal of Computer Vision, 77(1-3):219–237, May 2008.

[2] I. Laptev. On space-time interest points. Intl. Journal of Computer Vision, 64(2-3):107–123, September 2005.

[3] S.-S. Tham, A. Doucet, and K. Ramamohanarao. Sparse bayesian learning for regression and classification using markov chain monte carlo. In Proc. Intl. Conf. Machine Learning, 634–641, San Francisco, CA, USA, 2002.

Monday, October 26, 2009

Matlab installation and ssh connection

Ok, here i am blogging once again.
Pretty much nothing fancy around this time of the story.
I've been trying to install Matlab, and here are few notes regarding the instruction of license and activation setup.
Using the username created from Matworld, together with the activation file obtained from cse website, we can go into the account page to do the download and activation, for each machine, there needs to be one license.dat file and the installation file. Use them to install, the current version is 2009b.

For SSH connection, I want to use the server from NII for faster processing, but since it's a secure server that can only be accessed via a middle server, all connections made to NII must be via that machine. More tragically, that machine requires key authentication instead of normal login. So the process first requires generation of a public/private key pair, initially I was working on a Windows PC, so I use putty with its associates to do the job, need to ask admin of the middle server to add my public key to the access list, then using Auth/SSH option from Putty, can i successfully login to that machine. Today I have to use the Linux to connect there, and with few unsuccessful tries using the private key Putty gave me, I realise the private key is incompatible, so I went back, use puttygen to export the private key to an openssh format, and that does the tricks.
Another thing regarding the file transfer, in order to copy files from the NII server to my client machine, I need to do that over the middle machine, still havent found a way to do this automatically. Use scp -i privatekeyfile source destination on Unix or pscp -i privatrekeyfile source destincation on Windows, pscp provided on the same page as putty.
Also, in order to keep the connection alive without being kicked due to long idle time, we can specify ssh -o SeverAliveInterval=30.

About life, nothing much except Counter Strike all nights, too many conferences are due soon, and yeah, research on video is a hard thing, since everything is just too s...l....o.....w, and without results, you cant do no writing. Well, gods bless me, I've always been a good boy.

Thursday, September 24, 2009

Great Site

for learning Chinese via songs

in reference to:

"故事"
- Tong Hua, Guang Liang - 童话 (光良) - Chinese song (view on Google Sidewiki)

SRM

The higher the VC dimension, the more likely the EE be low

in reference to: Structural risk minimization - Wikipedia, the free encyclopedia (view on Google Sidewiki)

Monday, April 27, 2009

Han So tranh hung



cuoc song nay, doi luc con nguoi, sinh ra vao moi mot vi tri, giong nhu tren ban co
tuy vao vi tri minh dung ma khon kheo vuon len
co nhung noi, co the phat trien ngay tu dau, nhung phai biet cach giu vung
co nhung noi, khong duoc thuan loi, nhung neu khong mem deo, nhung nhin, thi se bi diet som
trong do co mot cau "Nhan Gia Vo Dich"
Hang Vo la mot dung tuong, co the noi danh dau thang do, khong co doi thu, nhung chi la Cuong Gia
Luu Bang thi khac, tuy truoc mat to ra la mot ke that phu, nhung nho ban be tot, biet cach thu phuc nhan tam, nen nhieu nguoi trung thanh, goi la Nhan Gia
de thu tom thien ha, co le can Cuong Gia
nhung de dung dau thien ha, chi co Nhan Gia co the lam
trong luc cac chu hau bi Tan quoc (do Tan Thuy Hoang thau tom) dan ap
tat ca dung len, nhung chi co Hang Vo la co the danh bai
nhung sau vi khong lam phuc long nhan tam, ma dan dan mat vao tay Luu Bang

ngay ca nhu Han Tin
la mot tuong gioi
nhung dung duoi truong Hang Vo
khong duoc trong dung
cung phai an phan thu thuong, doi co hoi
roi den luc co hoi den, dau quan Luu Bang, cung nhau dai lap nha Han
nhung trong ca phim nay
thay phuc nhat van la Truong Luong
ong la nguoi nuoc Han, cung bi mat nuoc vao tay nha Tan
sau ngay vong quoc, phai luu lac khap noi de tim cach diet Tan

dau tien qua Hang gia quan, duoc trong dung, nhung nhan ra Hang Vo chi la dung tuong, ma khong biet su dung nhan tai, nen ong da ra di
sau mot thoi gian, ong nhan thay Luu Bang la nguoi tuy be ngoai tam thuong, nhung ben duoi deu nghe loi, nen ve dau duoi truong
bao nhieu phuong ke cua Luu Bang deu tu Truong Luong ma ra

Truong Luong cung giong nhu Gia Cat Luong thoi tam quoc
doi khi cuoc song hien nay cung vay
biet minh biet ta, luc nen nhan nhin, luc nen cung ran

va quan trong hon het, phai lay duoc long nguoi, do la cach de lay duoc thien ha

con mot chuyen nua
giang son my nhan

cau nguoi ta thuong noi
dung sau mot nguoi dang ong thanh dat, la mot phu nu biet cach xu su

doi luc nghi co ve trade off va thiet thoi
nhung neu muon dat giang son, khong the co my nhan ben canh duoc
my nhan o day co nghia la nguoi tinh tri ky, hoac cai gi do anh huong qua nhieu den suy nghi cua minh
nhu Hang Vo
mot phan cung vi qua yeu Ngu Co
doi luc co nhung viec lai su xu nhu dan ba
that ra khong phai la loi cua Hang Vo
chi la duyen troi vay, ai trong position vay chac cung se lam y chang

lam cho nguoi minh yeu thuong, thi khong the noi la sai duoc
trai lai
sau lung Luu Bang
la La Hau
mot nguoi co the noi la da muu tuc tri, va cuc ky trung thanh

La Hau khong qua yeu kieu, va cung khong biet cach chieu chuong chong nhu nhung co gai tre dep khac
nhung trai lai, da muu tuc tri nhat trong tat ca cac nhan vat nu trong do

chiu nhieu thiet thoi, nhung luc nao cung co hoai bao lon, luon giup do de Luu Bang thong nhat thien ha va ba len lam Hoang Hau

co nhung luc, vi bi Luu Bang phan boi, ba rat dau kho, roi luc do cung co nguoi cam do, neu minh coi phim, co le nghi La Hau se lam dieu gi do phan boi de tra thu lai, khong ngo ba rat cung ran, that ra trong long ba, da chap nhan su that do, va vi quyet tam thanh cong cho tuong lai ma ba khong nghi den bat cudieu gi khac

noi chung, La Hau dung la nguoi vo mau muc cho mot nguoi dan ong thanh cong
nhu da noi luc dau
moi nhin vo thi thay rat la loan
moi nguoi moi hoan canh, moi tinh cach, tinh tiet dan cheo

nhung nhin toan cuc, moi thay moi thu nhu da duoc sap san, moi nguoi mot vi tri
cu qua moi stage, tinh the lai duoc update, dan dan converge theo cot truyet chinh

Friday, April 24, 2009

Happy birthday to me


Elementary Space Time Stage (ESTS), this concept can be used for classifying different events in video.

Ru Tình - Trịnh Công Sơn
Ru em đầu cơn gió, em hong tóc bên hồ
Khi sen hồng mới nở, nụ đời ôi thơm quá
Ru em tình khi nhớ, ru em tình lúc xa
Ru cho bầy lá nhỏ, rụng đầy một mùa Thu

Ru khi mùa mưa tới, ru em mãi yêu người
Ru em hoài bé dại, một hồn thơm cây trái
Ru em chờ em nói, trên môi tình thoát thai
Ru em ngồi yên đấy, ru tình à..ơi

Ru người ngồi mãi cùng tôi
Ru người ngồi mãi cùng tôi

Ru em hài nhung gấm, ru em gót sen hồng
Ru bay tà áo rộng, vượt tình tôi chấp cánh
Ru trên đường em đến, xôn xao từng tiếng chim
Ru em là cánh nhạn, miệng ngọt hạt từ tâm

Ru em tình như lá, trăm năm vẫn quay về
Môi em là đốm lửa, cuộc đời đâu biết thế
Xin em còn đâu đó, cho tôi còn tiếng ru
Ru em ngồi yên đấy, tôi tìm cuộc tình cho

Monday, April 20, 2009

Fải luôn luôn cố gắng


Hôm nay tôi lại viết blog, và bằng Tiếng Việt.
Ngày 21-04-09, tôi, Thi Huê Tuấn hứa sẽ cố gắng, hoặc ít ra, sẽ học cách cố gắng. "This life shall pass, but it will never meant to be wasted". Tôi chưa biết sẽ cố gắng làm gì, nhưng trước hết sẽ bắt đầu bằng việc nghiên cứu tại NII cho TrecVID 2009 lần này. Xin tất cả các vị thánh thần hãy phù hộ cho tôi luôn có sự nhiệt huyết và sức mạnh để thực hiện được giấc mơ của mình. Khó khăn đang chờ đón, và tôi sẽ dũng cảm đối mặt nó, vì tôi biết rằng tôi sinh ra trên đời này là để yêu thương con người và yêu thương đất nước tôi.
Tôi sẽ cố gắng.

Tuesday, March 31, 2009

End of March


So, i'm here in Tokyo for half a month. Tokyo appears to be much more modern than I expected, clean, civilized, and peaceful. First impression about everything is they'r all engineered and optimized, somehow reflecting Japanese tradition.

Nevertheless, i'm always locked in the cubical of the lab regardless of this is Sydney or Tokyo, finding it even harder with research and future plans. Well, probably we will never know until we'r ready to know.
Posted by Picasa

Monday, January 12, 2009

Ok, first blog for 2009.
2008 was a tough year, and this year is expected to bring even more challenges, so first advice is, stay alert.
Getting closer to 25, it's almost 1/3 of the lifetime and 1/2 of the total work-time, it's now or never to claim your stance.
New president for USA, economy crises over the world, companies changing CEOs, blah blah, hopefully all mankind can work together, live in peace and there'll be no more war.
Personally, I'll try to hang in here with this unexpected and imperfect world, stay focused, stay young and fool.
Who knows what's gonna come next