View DMOZ Web Directory Topics (public)
























- Summary
Contains parsed webpages along with their topics extracted from DMOZ web directory
- License
- unknown
- Dependencies
- Tags
- bag-of-words Classification DMOZ libsvm multi-class text web-pages
- Attribute Types
- Download
-
# Instances: 2658 / # Attributes: 10630
HDF5 (4.1 MB) XML CSV ARFF LibSVM Matlab OctaveFiles are converted on demand and the process can take up to a minute. Please wait until download begins.
You can edit this item to add more meta information and make use of the site's premium features.
- Original Data Format
- libsvm
- Name
- dmoz-web-directory-topics
- Version mldata
- 0
- Comment
LibSVM
- Names
- Data (first 10 data points)
5 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 ... 6 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 ... 8 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 ... 8 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 ... 9 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 ... 9 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 ... 5 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 ... 9 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 ... 9 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 ... 9 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 ... ... ... ... ... ... ... ... ... ... ... ...
- Description
Extracted from DMOZ (Open Directory Project) web directory. ¨The Open Directory Project is the largest, most comprehensive human-edited directory of the Web. It is constructed and maintained by a vast, global community of volunteer editors.¨ This data set contains parsed webpages along with their topics. Each line is the bag of words representation of a web page whose label is it is first level topic in the Yahoo directory hierarchy. The topics ids correspond to the following semantic topic: 1 Arts 2 Games 3 Kids and Teens 4 Shopping 5 Society The format used is libsvm format with zeros being omitted. For more info, please don't hesitate to contact me on : jean . faddoul (at) gmail . com , http://www.grappa.univ-lille3.fr/~faddoul/
NB: Other similar dataset but from Yahoo Directory is available at: https://mldata.org/repository/tags/data/Yahoo!/
- URLs
- http://www.dmoz.org/
- Publications
- Data Source
- Measurement Details
- Usage Scenario
- revision 1
- by jeanbaptiste on 2012-03-13 15:10
- revision 2
- by jeanbaptiste on 2012-03-13 15:11
- revision 3
- by jeanbaptiste on 2012-03-13 15:16
- revision 4
- by jeanbaptiste on 2012-03-13 15:17
- revision 5
- by jeanbaptiste on 2012-03-29 16:45
- revision 6
- by jeanbaptiste on 2012-03-29 16:47
No one has posted any comments yet. Perhaps you would like to be the first?
Leave a comment
To post a comment, please sign in.This item was downloaded 11487 times and viewed 6484 times.
No Tasks yet on dataset DMOZ Web Directory Topics
Submit a new Task for this Data itemDisclaimer
We are acting in good faith to make datasets submitted for the use of the scientific community available to everybody, but if you are a copyright holder and would like us to remove a dataset please inform us and we will do it as soon as possible.
Acknowledgements
This project is supported by PASCAL (Pattern Analysis, Statistical Modelling and Computational Learning)
http://www.pascal-network.org/.