Cyberspace of Shujun LI
Shortcuts
General AI
XAI:
AI Art Generators:
Open AI's DALL·E 2
Bing Image Creator
Stability AI
(
Stable Diffusion,
Visual ChatGPT)
Midjourney
Dream by WOMBO
Natural Language Processing and Computational Linguistics
General Tools:
NLTK (Natural Language Toolkit)
spaCy
(

)

(
GitHub)
Natural
CogCompNLP
Hugging Face
(
datasets;
Write With Transformer)
Talk to Transformer (InferKit online demo)
quanteda: Quantitative Analysis of Textual Data in R
(
GitHub)
gensim – Topic Modelling in Python
Transformer-XL
bert-as-service
BERTweet: A pre-trained language model for English Tweets (EMNLP 2020)
RNNTagger
TreeTagger
Python Word Segmentation
Word Ninja
SymSpell
(
Python port: symspellpy)
Language Style Transfer (NIPS 2017)
GeoTxt (Transactions in GIS 2019)
Edinburgh Geoparser
GeoPy
XAI for Natural Language Processing (AACL-IJCNLP 2020)
DetectGPT (2022)
BERTective (EACL 2021)
mauve-experiments (NeurIPS 2021)
Pretrained Models:
预训练模型仓库
OpenAI's ChatGPT
Google's BERT

(
GitHub)

(
WuDaoCorpora;
GitHub,
GLM,
CLM;
BMInf)
Chinese NLP Resources:
百度ERNIE
Pretrained language model and its related optimization techniques developed by Huawei Noah's Ark Lab
(
鹏程.盘古α / PanGu-α)
awesome-chinese-nlp (Guan Wang)
“结巴”中文分词
THUAIPoet (九歌) research group
(
九歌V2.0;
BERT-CCPoem,
MixPoet @ AAAI 2020,
Stylistic Poetry @ EMNLP 2018,
WMPoetry @ IJCAI 2018;
中国古典诗歌匹配数据集 / CCPM = Chinese Classical Poetry Matching Dataset,
Other datasets)
少女诗人小冰
tensorflow_poems / LiBai AI Composer / 中文古诗自动作诗机器人
中文语料小数据
Datasets:
Nicolas Iderhoff's nlp-datasets
WordNet
Wikimedia Downloads

(
Frequency lists)
WordNet
Amazon MASSIVE dataset
WebNLG Challenge
Wiktextract
(
data @ kaikki.org)
Use of corpora in translation studies @ Centre for Translation Studies, University of Leeds
OpenLexicon
Lexique
(
WorldLex: Blog, Twitter and Newspapers Word Frequencies for 66 languages)
Datasets of Automatic Keyphrase Extraction @ LIAAD, INESCTEC
KPTimes Corpus @ INLG 2019
dewiki-wordrank
OAGSX Title Generation Dataset
OAGKX Keyword Generation Dataset
GeoNames
M4: Multi-generator, Multi-domain, and Multi-lingual Black-Box Machine-Generated Text Detection (2023)
Privacy-related resources:

(
PrivaSeer Corpus @ ACL 2021,
PrivBERT @ ACL 2021)
Federated Learning
General Resources:
Awesome-Federated-Learning
The Federated Learning Portal
Open-source Tools:
TensorFlow Federated (TFF)
(
GitHub)
NVIDIA Clara
FedML: A Research Library and Benchmark for Federated Machine Learning

(
GitHub)

(
Federated Learning Research at Webank AI)
Commercial Solutions:
Footer

|
|
|
China
Germany (CET)
|