nlp_chinese_corpus download

nlp_chinese_corpus is a large-scale Chinese corpus collection for natural language processing research and development. It was created to make high-volume Chinese text data easier to access for students, researchers, and practitioners. The repository gathers several major datasets, including Chinese Wikipedia entries, news articles, encyclopedia-style question answering data, community question answering data, and Chinese-English translation sentence pairs. Each dataset includes descriptions, download links, structure notes, and examples to help users understand how the data is formatted. The corpora can support tasks such as language model pretraining, word vector training, question answering, title generation, keyword generation, translation, and sentence representation learning. Overall, it is a practical resource hub for building or testing Chinese NLP models with larger and more varied datasets.

Features

Large-scale Chinese NLP corpus collection
Wikipedia, news, encyclopedia QA, community QA, and translation datasets
JSON-formatted data structures for easier processing
Dataset descriptions, schemas, examples, and usage notes
Download links through Google Drive and Baidu Netdisk
Useful for pretraining, word vectors, QA, summarization, translation, and representation learning

Project Samples

Project Activity

See All Activity >

License

MIT License

Follow nlp_chinese_corpus

nlp_chinese_corpus Web Site

Other Useful Business Software

$300 Free Credits for Your Google Cloud Projects

Start building on Google Cloud with $300 in free credits. No commitment, no credit card required until you're ready to scale.

Launch your next project with $300 in free Google Cloud credits—no strings attached. Test, build, and deploy without risk. Use your credits across the entire Google Cloud platform to find what works best for your needs. After your credits are used, continue with always-free tier services. Only pay when you're ready to scale. Sign up in minutes and start exploring.

Start Free Trial

Rate This Project

User Reviews

Be the first to post a review of nlp_chinese_corpus!

Additional Project Details

Registered

2 days ago

Similar Business Software

SurveyJS

SurveyJS is an embeddable, self-hosted, white-label form builder for teams building custom forms, surveys, questionnaires, and other data collection tools inside web applications. It runs entirely on the client and is fully compatible with all modern JavaScript frameworks, including React,...

See Software
DataViewsJS

DataViewsJS enables you to easily and professionally customize the presentation of your data using different layouts, row templates, data fields, calculations, and editing modes that are completely and easily customizable. Leverage the powerful calc engine to perform calculations on any set of...

See Software
DHTMLX

DHTMLX is a JavaScript UI library that provides a set of highly customizable and flexible components for building modern and responsive web applications. The library includes more than 30 UI components, such as Gantt, Scheduler, Kanban, diagrams, charts, grids, spreadsheets, calendars, trees,...

See Software
FusionCharts

FusionCharts is a powerful and easy-to-use JavaScript charting library that helps developers to add interactive charts and data visualizations to their web and mobile applications. With 100+ chart types, including column, bar, line, area, pie, doughnut, scatter, bubble, and more, it's easy to...

See Software
React

React makes it painless to create interactive UIs. Design simple views for each state in your application, and React will efficiently update and render just the right components when your data changes. Declarative views make your code more predictable and easier to debug. Build encapsulated...

See Software
Kendo UI

Kendo UI is the ultimate collection of JavaScript UI components with libraries for jQuery, Angular, React, and Vue. Quickly build eye-catching, high-performance, responsive web applications—regardless of your JavaScript framework choice. Easily add advanced JavaScript components into your...

See Software

Report inappropriate content

nlp_chinese_corpus

Large Scale Chinese Corpus for NLP

Get an email when there's a new version of nlp_chinese_corpus

Features

Project Samples

Project Activity

Categories

License

Follow nlp_chinese_corpus

User Reviews

Additional Project Details

Registered