The TTC-3600 data set is a collection of Turkish news and articles including categorized 3,600 documents from 6 well-known portals in Turkey. The name of the portals are as follows:
1. http://dosyalar.hurriyet.com.tr/rss.
2. http://www.posta.com.tr/rss.
3. http://www.iha.com.tr/rss.html.
4. http://www.haberturk.com/rss.
5. http://www.radikal.com.tr/rss/.
6. http://www.zaman.com.tr/rss_rssMainPage.action?sectionId=341.
This data set is created in order to perform text mining operations on Turkish and make experimental results re-producable. The TTC-3600 data set has 4 different forms in terms of pre-processing:
Each form of TTC-3600 dataset includes two types of files. The first file with ".txt" extension contains the names and contents of the features whereas the second file in ARFF (Attribute-Relation File Format) Weka format that describes a list of instances sharing a set of features.