View Yahoo! Web Directory Topics (public)
























- Summary
Contains parsed webpages along with their topics extracted from Yahoo! web directory
- License
- unknown
- Dependencies
- Tags
- bag-of-words Classification multi-class text web-pages Yahoo!
- Attribute Types
- Download
-
# Instances: 2212 / # Attributes: 10630
HDF5 (3.6 MB) XML CSV ARFF LibSVM Matlab OctaveFiles are converted on demand and the process can take up to a minute. Please wait until download begins.
- Original Data Format
- libsvm
- Name
- yahoo-web-directory-topics
- Version mldata
- 0
- Comment
LibSVM
- Names
- Data (first 10 data points)
4 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 ... 2 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 ... 1 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 ... 3 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 ... 4 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 ... 1 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 ... 2 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 ... 1 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 ... 2 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 ... 4 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 ... ... ... ... ... ... ... ... ... ... ... ...
- Description
Extracted from Yahoo! web directory, this data set contains parsed webpages along with their topics. Each line is the bag of words representation of a web page whose label is it is first level topic in the Yahoo directory hierarchy. The topics ids correspond to the following semantic topic: 1 Arts 2 Business and Economy 3 Education 4 Entertainment
The format used is libsvm format with zeros being omitted. For more info, please don't hesitate to contact me on : jean . faddoul (at) gmail . com http://www.grappa.univ-lille3.fr/~faddoul/
- URLs
- http://dir.yahoo.com/
- Publications
- Data Source
- Yahoo! web Directory
- Measurement Details
- Usage Scenario
- revision 1
- by jeanbaptiste on 2012-03-05 14:30
- revision 2
- by jeanbaptiste on 2012-03-13 15:16
- revision 3
- by jeanbaptiste on 2012-03-13 15:16
No one has posted any comments yet. Perhaps you would like to be the first?
Leave a comment
To post a comment, please sign in.This item was downloaded 13261 times and viewed 2333 times.
No Tasks yet on dataset Yahoo! Web Directory Topics
Submit a new Task for this Data itemDisclaimer
We are acting in good faith to make datasets submitted for the use of the scientific community available to everybody, but if you are a copyright holder and would like us to remove a dataset please inform us and we will do it as soon as possible.
Acknowledgements
This project is supported by PASCAL (Pattern Analysis, Statistical Modelling and Computational Learning)
http://www.pascal-network.org/.