# 迁移学习在自然语言处理中的应用

## 通用语言建模

### 平均随机梯度法 (ASGD) 权重下降的长短期记忆网络 (AWD-LSTM)

Dropout 网络

DropConnect 网络

`dropouts = np.array([0.4,0.5,0.05,0.3,0.1]) x 0.5`

## 通用语言模型

### 三角变化学习率 (STLR)

Leslie Smith 提出的周期性学习率解决了这个问题。使用循环学习率 (CLR) 之后，精确度 (CMC) 提高了 10％。有关更多信息，请参阅文章：《用于训练神经网络的周期性学习率》

### 针对任务特定权重的分类器微调

```trn_ds = TextDataset(trn_clas, trn_labels)
val_ds = TextDataset(val_clas, val_labels)
trn_samp = SortishSampler(trn_clas, key=lambda x: len(trn_clas[x]), bs=bs//2)
val_samp = SortSampler(val_clas, key=lambda x: len(val_clas[x]))
trn_dl = DataLoader(trn_ds, bs//2, transpose=True, num_workers=1, pad_idx=1, sampler=trn_samp)
val_dl = DataLoader(val_ds, bs, transpose=True, num_workers=1, pad_idx=1, sampler=val_samp)
md = ModelData(PATH, trn_dl, val_dl)
dropouts = np.array([0.4,0.5,0.05,0.3,0.4])*0.5
m = get_rnn_classifer(bptt, 20*70, c, vs, emb_sz=em_sz, n_hid=nh, n_layers=nl, pad_token=1,
layers=[em_sz*3, 50, c], drop=[dropouts[4], 0.1],
dropouti=dropouts[0], wdrop=dropouts[1], dropoute=dropouts[2], dropouth=dropouts[3])```

### Concat 池

```trn_dl = DataLoader(trn_ds, bs//2, transpose=True, num_workers=1, pad_idx=1, sampler=trn_samp)
val_dl = DataLoader(val_ds, bs, transpose=True, num_workers=1, pad_idx=1, sampler=val_samp)
md = ModelData(PATH, trn_dl, val_dl)```

### 文本分类的基于时间的反向传播 (BPTT)

```class SortishSampler(Sampler):
def __init__(self, data_source, key, bs):
self.data_source,self.key,self.bs = data_source,key,bs
def __len__(self): return len(self.data_source)
def __iter__(self):
idxs = np.random.permutation(len(self.data_source))
sz = self.bs*50
ck_idx = [idxs[i:i+sz] for i in range(0, len(idxs), sz)]
sort_idx = np.concatenate([sorted(s, key=self.key, reverse=True) for s in ck_idx])
sz = self.bs
ck_idx = [sort_idx[i:i+sz] for i in range(0, len(sort_idx), sz)]
max_ck = np.argmax([self.key(ck[0]) for ck in ck_idx])  # find the chunk with the largest key,
ck_idx[0],ck_idx[max_ck] = ck_idx[max_ck],ck_idx[0]     # then make sure it goes first.
sort_idx = np.concatenate(np.random.permutation(ck_idx[1:]))
sort_idx = np.concatenate((ck_idx[0], sort_idx))
return iter(sort_idx)```

Epoch

1

0.210046

0.202856

0.942858

2

0.212139

0.149009

0.943746

3

0.21163

0.186739

0.946553

4

0.186233

0.1508

0.945218

5

0.176255

0.1504472

0.947985

6

0.198024

0.146215

0.948345