View uci-20070111 soybean (public)

2011-09-14 15:57 by mldata | Version 1 | Rating Empty StarEmpty StarEmpty StarEmpty StarEmpty StarEmpty Star
Rating
Empty StarEmpty StarEmpty StarEmpty StarEmpty StarEmpty Star Overall (based on 0 votes)
Empty StarEmpty StarEmpty StarEmpty StarEmpty StarEmpty Star Interesting
Empty StarEmpty StarEmpty StarEmpty StarEmpty StarEmpty Star Documentation
Summary

(No information yet)

License
unknown (from Weka repository)
Dependencies
Tags
arff slurped Weka
Attribute Types
String
Download
# Instances: 683 / # Attributes: 36
HDF5 (1.0 MB) XML CSV ARFF LibSVM Matlab Octave

Files are converted on demand and the process can take up to a minute. Please wait until download begins.

Completeness of this item currently: 55%.
You can edit this item to add more meta information and make use of the site's premium features.
Original Data Format
arff
Name
soybean
Version mldata
0
Comment

Notes: The large soybean database (soybean-large-data.arff) and it's corresponding test database (soybean-large-test.arff) combined into a single file (soybean-large.arff).

  1. Title: Large Soybean Database

  2. Sources: (a) R.S. Michalski and R.L. Chilausky "Learning by Being Told and Learning from Examples: An Experimental Comparison of the Two Methods of Knowledge Acquisition in the Context of Developing an Expert System for Soybean Disease Diagnosis", International Journal of Policy Analysis and Information Systems, Vol. 4, No. 2, 1980. (b) Donor: Ming Tan & Jeff Schlimmer (Jeff.Schlimmer%cs.cmu.edu) (c) Date: 11 July 1988

  3. Past Usage:

    1. See above.
    2. Tan, M., & Eshelman, L. (1988). Using weighted networks to represent classification knowledge in noisy domains. Proceedings of the Fifth International Conference on Machine Learning (pp. 121-134). Ann Arbor, Michigan: Morgan Kaufmann. -- IWN recorded a 97.1% classification accuracy -- 290 training and 340 test instances
    3. Fisher,D.H. & Schlimmer,J.C. (1988). Concept Simplification and Predictive Accuracy. Proceedings of the Fifth International Conference on Machine Learning (pp. 22-28). Ann Arbor, Michigan: Morgan Kaufmann. -- Notes why this database is highly predictable
  4. Relevant Information Paragraph: There are 19 classes, only the first 15 of which have been used in prior work. The folklore seems to be that the last four classes are unjustified by the data since they have so few examples. There are 35 categorical attributes, some nominal and some ordered. The value dna'' means does not apply. The values for attributes are encoded numerically, with the first value encoded as0,'' the second as 1,'' and so forth. An unknown values is encoded as?''.

  5. Number of Instances: 683

  6. Number of Attributes: 35 (all have been nominalized)

  7. Attribute Information: -- 19 Classes diaporthe-stem-canker, charcoal-rot, rhizoctonia-root-rot, phytophthora-rot, brown-stem-rot, powdery-mildew, downy-mildew, brown-spot, bacterial-blight, bacterial-pustule, purple-seed-stain, anthracnose, phyllosticta-leaf-spot, alternarialeaf-spot, frog-eye-leaf-spot, diaporthe-pod-&-stem-blight, cyst-nematode, 2-4-d-injury, herbicide-injury.

  8. date: april,may,june,july,august,september,october,?.

  9. plant-stand: normal,lt-normal,?.

  10. precip: lt-norm,norm,gt-norm,?.

  11. temp: lt-norm,norm,gt-norm,?.

  12. hail: yes,no,?.

  13. crop-hist: diff-lst-year,same-lst-yr,same-lst-two-yrs, same-lst-sev-yrs,?.

  14. area-damaged: scattered,low-areas,upper-areas,whole-field,?.

  15. severity: minor,pot-severe,severe,?.

  16. seed-tmt: none,fungicide,other,?.

  17. germination: '90-100%','80-89%','lt-80%',?.

  18. plant-growth: norm,abnorm,?.

  19. leaves: norm,abnorm.

  20. leafspots-halo: absent,yellow-halos,no-yellow-halos,?.

  21. leafspots-marg: w-s-marg,no-w-s-marg,dna,?.

  22. leafspot-size: lt-1/8,gt-1/8,dna,?.

  23. leaf-shread: absent,present,?.

  24. leaf-malf: absent,present,?.

  25. leaf-mild: absent,upper-surf,lower-surf,?.

  26. stem: norm,abnorm,?.

  27. lodging: yes,no,?.

  28. stem-cankers: absent,below-soil,above-soil,above-sec-nde,?.

  29. canker-lesion: dna,brown,dk-brown-blk,tan,?.

  30. fruiting-bodies: absent,present,?.

  31. external decay: absent,firm-and-dry,watery,?.

  32. mycelium: absent,present,?.

  33. int-discolor: none,brown,black,?.

  34. sclerotia: absent,present,?.

  35. fruit-pods: norm,diseased,few-present,dna,?.

  36. fruit spots: absent,colored,brown-w/blk-specks,distort,dna,?.

  37. seed: norm,abnorm,?.

  38. mold-growth: absent,present,?.

  39. seed-discolor: absent,present,?.

  40. seed-size: norm,lt-norm,?.

  41. shriveling: absent,present,?.

  42. roots: norm,rotted,galls-cysts,?.

Names
date,plant-stand,precip,temp,hail,crop-hist,area-damaged,severity,seed-tmt,germination,
Types
  1. nominal:april,may,june,july,august,september,october
  2. nominal:normal,lt-normal
  3. nominal:lt-norm,norm,gt-norm
  4. nominal:lt-norm,norm,gt-norm
  5. nominal:yes,no
  6. nominal:diff-lst-year,same-lst-yr,same-lst-two-yrs,same-lst-sev-yrs
  7. nominal:scattered,low-areas,upper-areas,whole-field
  8. nominal:minor,pot-severe,severe
  9. nominal:none,fungicide,other
  10. nominal:90-100,80-89,lt-80
Data (first 10 data points)
    date plan... precip temp hail crop... area... seve... seed... germ... ...
    octo... normal gt-n... norm yes same... low-... pot-... none 90-100 ...
    august normal gt-n... norm yes same... scat... severe fung... 80-89 ...
    july normal gt-n... norm yes same... scat... severe fung... lt-80 ...
    july normal gt-n... norm yes same... scat... severe none 80-89 ...
    octo... normal gt-n... norm yes same... scat... pot-... none lt-80 ...
    sept... normal gt-n... norm yes same... scat... pot-... none 80-89 ...
    sept... normal gt-n... norm yes same... scat... pot-... fung... 90-100 ...
    august normal gt-n... norm no same... scat... pot-... none lt-80 ...
    octo... normal gt-n... norm yes same... scat... pot-... fung... 80-89 ...
    august normal gt-n... norm yes same... scat... severe none lt-80 ...
    ... ... ... ... ... ... ... ... ... ... ...
Description

A gzip'ed tar containing UCI and UCI KDD datasets (uci-20070111.tar.gz, 17,952,832 Bytes)

URLs
(No information yet)
Publications
    Data Source
    http://www.ics.uci.edu/~mlearn/MLRepository.html http://kdd.ics.uci.edu/
    Measurement Details
    Usage Scenario
    revision 1
    by mldata on 2011-09-14 15:57

    No one has posted any comments yet. Perhaps you would like to be the first?

    Leave a comment

    To post a comment, please sign in.

    This item was downloaded 4534 times and viewed 3246 times.

    No Tasks yet on dataset uci-20070111 soybean

    Submit a new Task for this Data item

    Data

    Sort by

    Disclaimer

    We are acting in good faith to make datasets submitted for the use of the scientific community available to everybody, but if you are a copyright holder and would like us to remove a dataset please inform us and we will do it as soon as possible.

    Data | Task | Method | Challenge

    Acknowledgements

    This project is supported by PASCAL (Pattern Analysis, Statistical Modelling and Computational Learning)
    PASCAL Logo
    http://www.pascal-network.org/.