News Article Data Sets [closed]
Asked Answered
I

2

6

I am doing a project in news classification. Basically the system will classifying news articles based on the pre-defined topic (e.g. sports, politic, international). To build the system, I need free data sets for training the system.

So far, after few hours googling and links from here the only suitable data sets I could find is this. While this will hopefully enough, I think I will try to find more.

Note that the data sets I want:

  1. Contains full news articles, not just title
  2. Is in English
  3. In .txt format,not XML or db

Can anybody help me?

Israelite answered 18/11, 2011 at 14:48 Comment(0)
C
1

Have you tried to use Reuters21578? It is the most common dataset for text classification. It is formated in SGML, but it is quite simple to parse and transform to a txt format.

Consume answered 21/5, 2013 at 13:14 Comment(0)
B
0

You can build it, you can write a Python/Perl/PHP script where you run a search, then when you find the answers you can isolate the attributes with regex... I think is the best option. Is not easy but should be fun, finally you can share this dataset with us.

Bugbee answered 20/11, 2011 at 16:1 Comment(1)
Yeah, I am trying to find dataset because I will be busy with the project so I try to reduce things to do. Furthermore, I do not know how to write a script in Python/Perl/PHP.Israelite

© 2022 - 2024 — McMap. All rights reserved.