View ACASVA Actions Dataset (public)
























- Summary
HOG3D vectors extracted on the bounding boxes of players in videos of tennis and badminton
- License
- ODbL
- Dependencies
- Tags
- action-recognition computer-vision HOG3D transfer-learning
- Attribute Types
- Download
-
# Instances: 11003 / # Attributes: 960
tgz (61.1 MB)Files are converted on demand and the process can take up to a minute. Please wait until download begins.
You can edit this item to add more meta information and make use of the site's premium features.
- Original Data Format
- tgz
- Name
- Version mldata
- Comment
- Names
- Data (first 10 data points)
(Zipped) TAR archive ACASVA_actions, ACASVA_actions/BMSB08_300.zip, ACASVA_actions/TWSJ09_960.zip, ACASVA_actions/TMSA03_960.zip, ACASVA_actions/TWSJ09_300.zip, ACASVA_actions/TWDU06_960.tgz, ACASVA_actions/TWDA09_960.zip, ACASVA_actions/BMSB08_960.zip, ACASVA_actions/TWDU06_300.tgz, ACASVA_actions/TWSA03_300.zip, ACASVA_actions/TWDA09_300.zip, ACASVA_actions/ACASVA_actions.html, ACASVA_actions/TMSA03_300.zip, ACASVA_actions/TWSA03_960.zip
- Description
Following [deCampos et al, WACV2011], we used HOG3D descriptors extracted on player bounding boxes.
Two different sets of feature extraction parameters were used: the 960D parameters (4x4x3x20) optimised for the KTH dataset and the 300D parameters (2x2x5x5x3) optimised for the Hollywood dataset (see Alexander Klaser's page for details). In our preliminary experiments, we found that the KTH parameters (960D) give better results for the tennis dataset.
- labels.txt: contains action labels; Non-Hit (0), Hit (1) and Serve (2);
- frames.txt: for each sample, it indicates the time stamp of the original video when the features were extracted - note that multiple players are visible in each frame and for this reason consecutive lines have the same frame number;
- teams.txt: represents players as Far player (0) and Near player (1) where far and near players are decided based on the player's feet position in relative to court's mid-line;
- features.txt: contains feature vectors which has either 300 or 960 dimensional vectors, extracted using HOG3D - each line represents a feature vector for a action sample. The first element of each line indicates the dimensionality.
- URLs
- http://kahlan.eps.surrey.ac.uk/acasva/Downloads.html
- Publications
T. deCampos, M. Barnard, K. Mikolajczyk, J. Kittler, F. Yan, W. Christmas and D. Windridge. "An evaluation of bags-of-words and spatio-temporal shapes for action recognition". In IEEE Workshop on Applications of Computer Vision (WACV), 2011.
Bags-of-visual-Words (BoW) and Spatio-Temporal Shapes (STS) are two very popular approaches for action recognition from video. The former (BoW) is an un-structured global representation of videos which is built using a large set of local features. The latter (STS) uses a single feature located on a region of interest (where the actor is) in the video. Despite the popularity of these methods, no comparison between them has been done. Also, given that BoW and STS differ intrinsically in terms of context inclusion and globality/locality of operation, an appropriate evaluation framework has to be designed carefully. This paper compares these two approaches using four different datasets with varied degree of space-time specificity of the actions and varied relevance of the contextual background. We use the same local feature extraction method and the same classifier for both approaches. Further to BoW and STS, we also evaluated novel variations of BoW constrained in time or space. We observe that the STS approach leads to better results in all datasets whose background is of little relevance to action classification.
N. FarajiDavar, T. deCampos, D. Windridge, J. Kittler and W. Christmas. "Domain Adaptation in the Context of Sport Video Action Recognition". In Domain Adaptation Workshop, in conjunction with NIPS, Sierra Nevada, Spain 2011.
We apply domain adaptation to the problem of recognizing common actions between differing court-game sport videos (in particular tennis and badminton games). Actions are characterized in terms of HOG3D features extracted at the bounding box of each detected player, and thus have large intrinsic dimensionality. The techniques evaluated here for domain adaptation are based on estimating linear transformations to adapt the source domain features in order to maximize the similarity between posterior PDFs for each class in the source domain and the expected posterior PDF for each class in the target domain. As such, the problem scales linearly with feature dimensionality, making the video-environment domain adaptation problem tractable on reasonable time scales and resilient to over-fitting. We thus demonstrate that significant performance improvement can be achieved by applying domain adaptation in this context.
N. FarajiDavar, T. deCampos, J. Kittler and F.Yan. "Transductive Transfer Learning for Action Recognition in Tennis Games". In 3rd International Workshop on Video Event Categorization, Tagging and Retrieval for Real-World Applications (VECTaR), in conjunction with 13th Internatinal Conference on Computer Vision (ICCV), Barcelona, Spain 2011.
This paper investigates the application of transductive transfer learning methods for action classification. The application scenario is that of off-line video annotation for retrieval. We show that if a classification system can analyze the unlabeled test data in order to adapt its models, a significant performance improvement can be achieved. We applied it for action classification in tennis games for train and test videos of different nature. Actions are described using HOG3D features and for transfer we used a method based on feature re-weighting and a novel method based on feature translation and scaling.
- Data Source
- HOG3D feature extraction method [Klaser et al, BMVC2008] applied to the space-time bounding box of players in videos of tennis and badminton.
- Measurement Details
Performance is evaluated by using data from one of the sports video as training and another for testing, i.e., a whole file is used either for training, validation or testing, we do not encourage to use N-fold cross-validation. We encourage users to report results in terms average accuracy, but it may also be relevant to report True Positive, True Negative and False Positive rates for each of the classes. Area under the ROC curve has also been used.
- Usage Scenario
Transductive transfer learning
- revision 1
- by teo on 2012-03-07 12:07
- revision 2
- by teo on 2012-03-16 15:33
- revision 3
- by teo on 2012-03-16 15:35
- revision 4
- by teo on 2012-03-16 15:38
- revision 5
- by teo on 2012-03-16 15:42
- revision 6
- by teo on 2012-09-24 17:04
- revision 7
- by teo on 2012-09-24 17:18
- revision 8
- by teo on 2012-09-24 17:19
- revision 9
- by teo on 2012-09-24 17:20
- revision 10
- by teo on 2012-09-24 17:22
- revision 11
- by teo on 2012-09-24 17:23
- revision 12
- by teo on 2012-09-24 17:24
- revision 13
- by teo on 2012-09-24 17:32
- revision 14
- by teo on 2012-09-24 17:34
- revision 15
- by teo on 2012-09-24 17:36
- revision 16
- by teo on 2012-09-24 17:37
- revision 17
- by teo on 2012-09-24 17:50
- revision 18
- by teo on 2012-09-24 19:37
No one has posted any comments yet. Perhaps you would like to be the first?
Leave a comment
To post a comment, please sign in.This item was downloaded 2985 times and viewed 24056 times.
No Tasks yet on dataset ACASVA Actions Dataset
Submit a new Task for this Data itemDisclaimer
We are acting in good faith to make datasets submitted for the use of the scientific community available to everybody, but if you are a copyright holder and would like us to remove a dataset please inform us and we will do it as soon as possible.
Acknowledgements
This project is supported by PASCAL (Pattern Analysis, Statistical Modelling and Computational Learning)
http://www.pascal-network.org/.