ZhipuAI/ChatGLM-6B 的 flextrain 微调模型训练成功之后,跟着示例走,走到 snapshot_download 下来状态字典,里面的层名是类似:
transformer.layers.【层数】.attention.query_key_value.lora_A
transformer.layers.【层数】.attention.query_key_value.lora_B
然后进行 model.load_state_dict(state_dict) 的时候,上述的所有key都是 unexpected_keys。
看了一下swift model里的层名,只有:
transformer.layers.【层数】.attention.query_key_value.loramodule_default.lora_A
transformer.layers.【层数】.attention.query_key_value.loramodule_default.lora_B
不知道为什么FlexTrain出来的状态字典层名会和swift model的不一样。
求大神指教!
FlexTrain出来的状态字典:
OrderedDict([('transformer.layers.0.attention.query_key_value.lora_A',
tensor([[ 0.0092, -0.0156, -0.0047, ..., -0.0121, 0.0107, 0.0067],
[ 0.0104, -0.0084, -0.0010, ..., 0.0071, 0.0146, 0.0155],
[-0.0013, -0.0006, -0.0031, ..., -0.0021, 0.0052, 0.0039],
...,
[ 0.0041, 0.0074, 0.0130, ..., -0.0078, 0.0101, -0.0084],
[ 0.0047, 0.0113, -0.0137, ..., 0.0008, -0.0052, -0.0006],
[ 0.0027, 0.0006, 0.0005, ..., -0.0086, -0.0156, -0.0050]],
dtype=torch.bfloat16)),
('transformer.layers.0.attention.query_key_value.lora_B',
tensor([[ 1.5640e-04, 7.3624e-04, -6.0654e-04, ..., 3.2043e-04,
1.4722e-05, -8.9169e-05],
[ 3.1281e-04, 5.0306e-05, 6.7711e-05, ..., 6.2943e-05,
2.6703e-04, 1.7643e-04],
[ 1.5163e-04, -1.7738e-04, 1.5831e-04, ..., 1.8692e-04,
-9.1076e-05, 6.0654e-04],
...,
[ 3.8910e-04, -5.9891e-04, -6.7902e-04, ..., -3.2187e-05,
1.1215e-03, -8.9169e-05],
[-8.7357e-04, -8.4305e-04, -1.0300e-03, ..., 8.8120e-04,
-1.0605e-03, 1.5335e-03],
[ 1.6479e-03, 4.8256e-04, -9.7275e-04, ..., -3.4904e-04,
-8.6594e-04, 1.8616e-03]], dtype=torch.bfloat16)),
('transformer.layers.1.attention.query_key_value.lora_A',
tensor([[-0.0060, 0.0018, -0.0060, ..., -0.0096, -0.0156, -0.0123],
[ 0.0013, 0.0107, -0.0028, ..., 0.0117, 0.0077, 0.0033],
[ 0.0131, 0.0135, 0.0126, ..., -0.0156, 0.0048, -0.0113],
...,
[-0.0084, -0.0086, -0.0120, ..., -0.0117, 0.0142, -0.0137],
[-0.0078, 0.0025, 0.0005, ..., 0.0056, -0.0059, 0.0016],
[-0.0140, -0.0038, -0.0036, ..., 0.0034, 0.0011, -0.0085]],
dtype=torch.bfloat16)),
('transformer.layers.1.attention.query_key_value.lora_B',
tensor([[ 1.3733e-04, -3.5858e-04, 8.4305e-04, ..., -6.2943e-04,
9.1171e-04, 1.0529e-03],
[-1.0986e-03, -7.2861e-04, -1.8539e-03, ..., 7.1335e-04,
1.9836e-04, -1.7853e-03],
[-4.5013e-04, -2.0599e-04, 1.4496e-04, ..., 6.2585e-06,
4.6790e-06, -6.4087e-04],
...,
[-6.7139e-04, -1.0986e-03, -9.3079e-04, ..., 1.0147e-03,
-1.3351e-03, -1.0910e-03],
[ 1.2665e-03, 1.6251e-03, 1.5564e-03, ..., -7.4387e-04,
1.4191e-03, 1.5488e-03],
[-7.1335e-04, -5.4169e-04, -5.1498e-04, ..., 1.1139e-03,
-1.4267e-03, -2.5940e-04]], dtype=torch.bfloat16)),
('transformer.layers.2.attention.query_key_value.lora_A',
tensor([[ 0.0055, 0.0134, 0.0111, ..., -0.0116, 0.0038, -0.0154],
[ 0.0085, 0.0151, 0.0085, ..., 0.0070, -0.0028, 0.0041],
[ 0.0153, 0.0117, 0.0115, ..., 0.0045, 0.0131, -0.0135],
...,
[-0.0135, 0.0061, -0.0079, ..., 0.0051, -0.0134, 0.0042],
[ 0.0052, -0.0025, -0.0104, ..., -0.0006, 0.0084, -0.0112],
[-0.0083, 0.0001, -0.0029, ..., -0.0125, 0.0049, 0.0138]],
dtype=torch.bfloat16)),
('transformer.layers.2.attention.query_key_value.lora_B',
tensor([[-1.4038e-03, -7.3242e-04, -1.5259e-04, ..., 1.2817e-03,
8.3923e-04, -1.0605e-03],
[-1.7853e-03, -6.5613e-04, -5.3024e-04, ..., 6.0654e-04,
1.5030e-03, -7.9155e-05],
[-1.6117e-04, 6.7902e-04, 8.5831e-04, ..., -8.5068e-04,
-2.8038e-04, 4.8256e-04],
...,
[-5.6744e-05, 5.4932e-04, 2.2507e-04, ..., -3.9864e-04,
1.0395e-04, -4.4346e-05],
[-4.3297e-04, 5.1117e-04, 3.8910e-04, ..., -3.9339e-05,
1.6975e-04, -1.4114e-03],
[-5.3048e-06, 1.0681e-03, 7.0572e-04, ..., -5.0354e-04,
-7.1335e-04, -6.8665e-05]], dtype=torch.bfloat16)),
('transformer.layers.3.attention.query_key_value.lora_A',
tensor([[-0.0084, -0.0140, 0.0123, ..., 0.0004, -0.0038, 0.0125],
[ 0.0063, 0.0002, 0.0019, ..., -0.0028, 0.0078, 0.0131],
[ 0.0033, -0.0151, 0.0019, ..., 0.0065, 0.0092, 0.0042],
...,
[ 0.0123, -0.0045, 0.0078, ..., 0.0078, 0.0066, -0.0042],
[-0.0092, 0.0098, 0.0095, ..., 0.0140, -0.0049, 0.0137],
[-0.0082, -0.0003, -0.0142, ..., -0.0089, 0.0156, 0.0022]],
dtype=torch.bfloat16)),
('transformer.layers.3.attention.query_key_value.lora_B',
tensor([[-2.0504e-04, -5.4932e-04, -2.5368e-04, ..., -7.2861e-04,
5.3787e-04, 4.2915e-05],
[-1.5354e-04, -1.1444e-03, -5.5075e-05, ..., 3.3951e-04,
-5.3406e-04, -9.1171e-04],
[-5.5313e-04, 2.3460e-04, -3.9864e-04, ..., -6.2180e-04,
-2.5868e-05, 2.8610e-04],
...,
[-4.0817e-04, 1.2970e-03, -6.4468e-04, ..., -5.1117e-04,
8.2397e-04, 1.0681e-03],
[-5.4932e-04, 3.9482e-04, -5.0354e-04, ..., -5.9891e-04,
5.4169e-04, 4.0054e-04],
[ 1.9684e-03, -2.6245e-03, 2.4719e-03, ..., 2.0599e-03,
-2.0752e-03, -2.7466e-03]], dtype=torch.bfloat16)),
('transformer.layers.4.attention.query_key_value.lora_A',
tensor([[ 6.8359e-03, 1.6556e-03, 1.1841e-02, ..., -6.2943e-04,
1.1139e-03, 9.7656e-03],
[-1.6708e-03, 1.1841e-02, -6.5002e-03, ..., -1.1658e-02,
6.8970e-03, 1.3367e-02],
[-1.2634e-02, 3.9101e-05, -7.8125e-03, ..., -1.3794e-02,
-6.4087e-04, -9.9487e-03],
...,
[-4.7913e-03, -3.3722e-03, -5.2795e-03, ..., 8.1177e-03,
1.1169e-02, 2.4261e-03],
[-7.5073e-03, -5.7373e-03, 9.7656e-03, ..., 9.0332e-03,
-1.5259e-02, -8.1787e-03],
[-2.3460e-04, -3.4637e-03, -3.3569e-03, ..., -1.1292e-02,
1.5625e-02, -1.4709e-02]], dtype=torch.bfloat16)),
('transformer.layers.4.attention.query_key_value.lora_B',
tensor([[ 4.5586e-04, 2.5368e-04, 5.9509e-04, ..., -8.6975e-04,
-2.3270e-04, 2.3174e-04],
[ 5.8365e-04, 4.8828e-04, 5.4169e-04, ..., -5.6458e-04,
1.7881e-05, 1.0986e-03],
[ 4.1962e-05, 2.8229e-04, -2.0027e-04, ..., 6.6376e-04,
-9.3937e-05, -2.0218e-04],
...,
[ 1.4954e-03, 1.0376e-03, 1.9836e-03, ..., -1.2207e-03,
-9.9945e-04, 1.7014e-03],
[ 1.0071e-03, 9.7275e-04, 1.1902e-03, ..., -1.5411e-03,
-9.2316e-04, 2.4719e-03],
[ 1.4648e-03, 1.1139e-03, 1.4801e-03, ..., -1.4343e-03,
-9.3079e-04, 1.2512e-03]], dtype=torch.bfloat16)),
('transformer.layers.5.attention.query_key_value.lora_A',
tensor([[ 0.0124, 0.0027, -0.0061, ..., -0.0030, 0.0148, 0.0051],
[ 0.0115, 0.0142, -0.0137, ..., 0.0074, -0.0112, 0.0127],
[ 0.0079, 0.0091, -0.0126, ..., -0.0050, 0.0067, -0.0140],
...,
[ 0.0019, 0.0114, 0.0125, ..., 0.0137, 0.0016, -0.0034],
[-0.0124, 0.0002, 0.0078, ..., -0.0105, -0.0108, -0.0082],
[ 0.0095, 0.0156, 0.0038, ..., -0.0008, -0.0106, -0.0122]],
dtype=torch.bfloat16)),
('transformer.layers.5.attention.query_key_value.lora_B',
tensor([[-1.3351e-03, 2.9564e-04, -1.2741e-03, ..., 8.9645e-04,
6.8283e-04, -1.3351e-03],
[ 1.1921e-04, 2.6321e-04, -3.1281e-04, ..., 1.8692e-04,
2.0885e-04, 6.7520e-04],
[-4.3297e-04, 9.6130e-04, -5.0354e-04, ..., 7.5912e-04,
1.6880e-04, 1.5163e-04],
...,
[ 7.1716e-04, -3.3569e-04, 5.4932e-04, ..., -4.5598e-06,
-2.9755e-04, 1.1597e-03],
[-1.4725e-03, 1.4267e-03, -1.7090e-03, ..., 1.3580e-03,
1.7090e-03, -1.4038e-03],
[ 1.8539e-03, -2.0142e-03, 2.4261e-03, ..., -1.8387e-03,
-1.9302e-03, 2.0294e-03]], dtype=torch.bfloat16)),
('transformer.layers.6.attention.query_key_value.lora_A',
tensor([[-8.0566e-03, 9.6436e-03, 1.4877e-03, ..., 6.2866e-03,
1.1963e-02, 4.9114e-05],
[ 2.9907e-03, -5.1498e-04, -1.5259e-03, ..., 1.3489e-02,
8.1787e-03, -9.6436e-03],
[ 1.5320e-02, -1.7242e-03, -7.8125e-03, ..., -1.4832e-02,
7.1411e-03, 2.0752e-03],
...,
[ 1.2573e-02, 7.2327e-03, 1.3916e-02, ..., -1.5625e-02,
1.0315e-02, 1.0315e-02],
[ 9.0942e-03, 1.4526e-02, -6.7139e-03, ..., 8.0566e-03,
-8.2016e-05, -4.5776e-03],
[ 1.3306e-02, -1.3428e-02, -1.6861e-03, ..., -7.0801e-03,
1.0010e-02, 1.5137e-02]], dtype=torch.bfloat16)),
('transformer.layers.6.attention.query_key_value.lora_B',
tensor([[ 4.6539e-04, -7.6294e-04, 2.0599e-04, ..., 4.8828e-04,
-5.1880e-04, 9.9659e-05],
[-5.9509e-04, -6.3324e-04, -7.6675e-04, ..., -1.2493e-04,
-2.0504e-04, 5.0068e-05],
[ 3.6621e-04, 6.1035e-05, 3.2616e-04, ..., 1.3924e-04,
-9.5367e-06, 2.7847e-04],
...,
[-2.4109e-03, -1.8845e-03, -1.8005e-03, ..., -1.8082e-03,
-1.8387e-03, -1.7776e-03],
[-7.7438e-04, -3.1853e-04, -7.2861e-04, ..., -7.6294e-04,
-4.1580e-04, -7.3242e-04],
[-2.3041e-03, -1.8921e-03, -1.5488e-03, ..., -1.7853e-03,
-1.6022e-03, -1.8158e-03]], dtype=torch.bfloat16)),
('transformer.layers.7.attention.query_key_value.lora_A',
tensor([[-0.0120, -0.0084, 0.0154, ..., -0.0010, -0.0074, -0.0088],
[-0.0050, 0.0088, -0.0112, ..., 0.0009, 0.0071, -0.0134],
[ 0.0041, 0.0079, -0.0110, ..., 0.0106, -0.0025, -0.0052],
...,
[ 0.0054, -0.0008, 0.0140, ..., -0.0049, -0.0021, -0.0132],
[ 0.0006, -0.0012, 0.0030, ..., -0.0047, -0.0054, -0.0036],
[-0.0061, -0.0014, 0.0089, ..., 0.0049, -0.0146, -0.0108]],
dtype=torch.bfloat16)),
('transformer.layers.7.attention.query_key_value.lora_B',
tensor([[ 1.0071e-03, 1.9455e-04, -1.7548e-04, ..., -2.7466e-04,
1.7166e-04, -9.5367e-05],
[-6.1417e-04, -7.9727e-04, 7.6675e-04, ..., 1.9646e-04,
-1.2360e-03, 6.2180e-04],
[-1.4305e-04, 9.1553e-04, 1.9550e-05, ..., 8.3542e-04,
-3.4714e-04, -6.2561e-04],
...,
[ 1.4420e-03, -1.6632e-03, -1.2283e-03, ..., -1.6403e-03,
6.4468e-04, 1.4191e-03],
[-9.1934e-04, 1.9455e-04, 6.3324e-04, ..., 1.3809e-03,
-1.7834e-04, -7.7820e-04],
[ 1.5488e-03, -1.0910e-03, -9.6130e-04, ..., -1.6403e-03,
9.2697e-04, 9.8419e-04]], dtype=torch.bfloat16)),
('transformer.layers.8.attention.query_key_value.lora_A',
tensor([[-0.0038, -0.0039, 0.0078, ..., -0.0093, -0.0099, -0.0001],
[ 0.0028, 0.0123, -0.0042, ..., 0.0156, 0.0042, -0.0104],
[ 0.0040, 0.0023, 0.0073, ..., -0.0038, -0.0147, -0.0114],
...,
[-0.0046, -0.0068, -0.0050, ..., 0.0009, 0.0020, 0.0019],
[ 0.0128, -0.0089, 0.0096, ..., 0.0078, 0.0032, 0.0068],
[ 0.0007, 0.0001, 0.0008, ..., 0.0009, -0.0004, 0.0037]],
dtype=torch.bfloat16)),
('transformer.layers.8.attention.query_key_value.lora_B',
tensor([[ 1.4246e-05, 4.7493e-04, 2.8801e-04, ..., -2.4605e-04,
-6.9046e-04, -5.0354e-04],
[-1.1978e-03, 1.9932e-04, -6.1798e-04, ..., 4.5013e-04,
-5.9128e-04, -7.9155e-05],
[-1.9455e-04, -8.1539e-05, 1.5545e-04, ..., -5.2261e-04,
7.4768e-04, 3.4714e-04],
...,
[ 1.0910e-03, 7.2861e-04, 9.0408e-04, ..., -1.2131e-03,
8.8882e-04, -6.3705e-04],
[-1.5945e-03, -1.7700e-03, -1.5793e-03, ..., 1.6556e-03,
-1.6251e-03, 1.5488e-03],
[ 2.3956e-03, 2.7466e-03, 2.6550e-03, ..., -1.7548e-03,
2.4414e-03, -2.2736e-03]], dtype=torch.bfloat16)),
('transformer.layers.9.attention.query_key_value.lora_A',
tensor([[ 0.0012, 0.0132, -0.0034, ..., -0.0156, 0.0049, -0.0025],
[-0.0127, 0.0038, 0.0093, ..., 0.0124, 0.0007, -0.0129],
[-0.0073, 0.0112, 0.0119, ..., -0.0106, 0.0156, -0.0109],
...,
[ 0.0015, 0.0094, 0.0103, ..., 0.0031, 0.0129, -0.0048],
[ 0.0154, 0.0018, -0.0140, ..., 0.0156, -0.0066, -0.0127],
[-0.0123, 0.0007, -0.0019, ..., 0.0132, 0.0132, -0.0101]],
dtype=torch.bfloat16)),
('transformer.layers.9.attention.query_key_value.lora_B',
tensor([[ 2.0862e-05, -3.0279e-05, -1.3351e-04, ..., 3.1090e-04,
-9.4891e-05, -7.2098e-04],
[-3.7193e-04, -1.4019e-04, 5.6839e-04, ..., 1.3256e-04,
7.8964e-04, -4.1771e-04],
[-1.6327e-03, -4.8447e-04, -5.6839e-04, ..., 9.1171e-04,
1.1597e-03, -4.4060e-04],
...,
[ 2.8610e-04, -1.8311e-04, 3.0327e-04, ..., -5.0735e-04,
-4.5395e-04, 3.1281e-04],
[-1.2283e-03, 1.6251e-03, -1.3542e-04, ..., 1.1292e-03,
1.0681e-03, -1.0757e-03],
[-1.8403e-06, 3.7384e-04, -6.8283e-04, ..., 4.4441e-04,
5.6839e-04, -8.0872e-04]], dtype=torch.bfloat16)),
('transformer.layers.10.attention.query_key_value.lora_A',
tensor([[ 4.5776e-03, -9.1553e-03, -2.4567e-03, ..., 3.5553e-03,
-5.3406e-03, -8.8501e-03],
[-1.3184e-02, 8.9111e-03, 1.4893e-02, ..., 4.3640e-03,
4.5776e-03, -7.7724e-05],
[-9.8267e-03, 1.6937e-03, -1.6098e-03, ..., -3.0060e-03,
-7.5378e-03, -4.5471e-03],
...,
[-1.3275e-03, 8.6670e-03, -6.0120e-03, ..., -1.0193e-02,
-1.4343e-02, 1.3123e-02],
[ 4.2419e-03, -1.5259e-02, 6.6223e-03, ..., 1.5793e-03,
-1.3062e-02, -7.6294e-03],
[-1.2146e-02, -6.4087e-03, 5.4626e-03, ..., -9.6436e-03,
7.8125e-03, -8.8501e-03]], dtype=torch.bfloat16)),
('transformer.layers.10.attention.query_key_value.lora_B',
tensor([[-1.1902e-03, -1.5182e-03, 8.2779e-04, ..., 1.6708e-03,
-9.9945e-04, -2.3193e-03],
[-1.4191e-03, -1.7242e-03, 1.6861e-03, ..., 1.9455e-03,
-1.4191e-03, -1.6785e-03],
[-3.3760e-04, 3.7909e-05, 5.1260e-05, ..., -9.6512e-04,
4.1389e-04, 4.8256e-04],
...,
[ 1.1749e-03, 1.1826e-03, -1.4496e-03, ..., -1.2436e-03,
1.5335e-03, 1.0071e-03],
[-3.3569e-04, 6.0272e-04, -1.4114e-03, ..., 9.2506e-05,
5.4932e-04, -2.3365e-04],
[-1.0223e-03, -1.7776e-03, 1.8005e-03, ..., 1.0910e-03,
-1.4343e-03, -1.0834e-03]], dtype=torch.bfloat16)),
('transformer.layers.11.attention.query_key_value.lora_A',
tensor([[-0.0117, -0.0148, 0.0129, ..., -0.0087, 0.0009, -0.0032],
[ 0.0114, 0.0078, 0.0107, ..., -0.0045, -0.0008, 0.0152],
[ 0.0087, 0.0001, 0.0039, ..., 0.0005, -0.0008, -0.0069],
...,
[ 0.0035, -0.0154, 0.0120, ..., -0.0064, 0.0009, 0.0145],
[-0.0123, -0.0120, -0.0018, ..., 0.0134, -0.0017, 0.0103],
[ 0.0078, -0.0006, -0.0021, ..., 0.0052, 0.0056, 0.0021]],
dtype=torch.bfloat16)),
('transformer.layers.11.attention.query_key_value.lora_B',
tensor([[ 2.6822e-05, -1.4019e-04, -1.1539e-04, ..., 1.6117e-04,
4.1199e-04, 2.8419e-04],
[-1.3809e-03, -6.8665e-04, 1.1292e-03, ..., -1.2360e-03,
-9.1934e-04, -1.0452e-03],
[-5.8746e-04, -4.0627e-04, 8.0490e-04, ..., -5.5790e-05,
1.8024e-04, -4.9973e-04],
...,
[-7.7057e-04, -8.2016e-04, 8.5449e-04, ..., -8.7357e-04,
-1.0071e-03, -8.7357e-04],
[-6.9427e-04, -5.9128e-04, 6.3324e-04, ..., -8.1253e-04,
-7.8583e-04, -6.2561e-04],
[ 2.7084e-04, 4.8447e-04, -4.7302e-04, ..., 2.1267e-04,
4.1962e-04, 6.2561e-04]], dtype=torch.bfloat16)),
('transformer.layers.12.attention.query_key_value.lora_A',
tensor([[ 0.0134, 0.0112, 0.0050, ..., 0.0034, 0.0106, 0.0137],
[ 0.0013, -0.0124, 0.0073, ..., -0.0058, -0.0041, 0.0069],
[ 0.0095, -0.0090, 0.0054, ..., 0.0014, -0.0131, -0.0044],
...,
[ 0.0052, 0.0050, 0.0008, ..., 0.0118, -0.0074, -0.0021],
[ 0.0117, -0.0019, 0.0054, ..., 0.0078, -0.0101, 0.0030],
[ 0.0142, 0.0072, 0.0146, ..., 0.0124, -0.0058, -0.0093]],
dtype=torch.bfloat16)),
('transformer.layers.12.attention.query_key_value.lora_B',
tensor([[-9.1553e-04, -6.7139e-04, 5.4550e-04, ..., 6.6376e-04,
1.1749e-03, -5.9891e-04],
[ 8.8501e-04, 1.0300e-03, -6.9809e-04, ..., -1.0986e-03,
-8.5831e-04, 5.4550e-04],
[-4.2152e-04, -8.1062e-06, 5.0545e-05, ..., -9.9659e-05,
-1.7929e-04, -1.7524e-05],
...,
[ 7.9727e-04, 1.8311e-04, -2.7466e-04, ..., -3.7575e-04,
-3.0708e-04, 3.3379e-04],
[-1.1902e-03, -1.5335e-03, 1.5182e-03, ..., 1.3580e-03,
1.3046e-03, -1.3809e-03],
[ 4.6158e-04, 1.7643e-04, 2.1458e-04, ..., 2.8229e-04,
3.2997e-04, -3.6430e-04]], dtype=torch.bfloat16)),
('transformer.layers.13.attention.query_key_value.lora_A',
tensor([[ 0.0065, 0.0063, 0.0046, ..., 0.0105, 0.0103, -0.0099],
[ 0.0060, -0.0016, -0.0046, ..., 0.0097, -0.0104, 0.0105],
[-0.0143, -0.0151, 0.0029, ..., 0.0029, -0.0151, 0.0040],
...,
[ 0.0071, 0.0136, -0.0039, ..., 0.0103, 0.0150, 0.0140],
[ 0.0096, 0.0107, 0.0070, ..., 0.0036, -0.0101, 0.0067],
[-0.0010, -0.0139, 0.0156, ..., -0.0132, -0.0049, -0.0059]],
dtype=torch.bfloat16)),
('transformer.layers.13.attention.query_key_value.lora_B',
tensor([[ 7.9155e-05, 5.6076e-04, 9.1171e-04, ..., -8.4686e-04,
-1.5831e-04, 2.7657e-04],
[-4.2725e-04, -5.6505e-05, -9.6512e-04, ..., 4.5395e-04,
2.5368e-04, -1.2159e-04],
[-5.1498e-04, -2.7084e-04, -1.0347e-04, ..., 8.5354e-05,
-2.0313e-04, 1.3351e-04],
...,
[ 8.3160e-04, -1.1063e-03, -1.1597e-03, ..., 5.2643e-04,
1.0376e-03, -1.0910e-03],
[ 1.1749e-03, -1.2207e-03, -1.1597e-03, ..., 1.5945e-03,
1.2665e-03, -1.1902e-03],
[ 2.2430e-03, -1.7548e-03, -1.7929e-03, ..., 2.2430e-03,
1.8234e-03, -1.7548e-03]], dtype=torch.bfloat16)),
('transformer.layers.14.attention.query_key_value.lora_A',
tensor([[-0.0105, -0.0117, -0.0014, ..., -0.0092, -0.0117, -0.0059],
[ 0.0002, -0.0046, 0.0114, ..., -0.0079, 0.0029, 0.0136],
[-0.0106, 0.0086, -0.0156, ..., -0.0065, -0.0042, -0.0078],
...,
[-0.0013, 0.0084, -0.0063, ..., 0.0007, 0.0081, 0.0013],
[ 0.0130, 0.0018, -0.0079, ..., -0.0095, -0.0107, 0.0087],
[ 0.0103, 0.0031, -0.0028, ..., 0.0038, -0.0096, 0.0031]],
dtype=torch.bfloat16)),
('transformer.layers.14.attention.query_key_value.lora_B',
tensor([[-3.2425e-04, -5.3048e-06, 2.1696e-05, ..., -4.8256e-04,
-2.9564e-04, -7.9346e-04],
[ 1.1444e-03, -5.8746e-04, 4.5586e-04, ..., -5.8365e-04,
-1.1253e-04, -5.9509e-04],
[ 3.5524e-05, 4.5204e-04, -1.2159e-04, ..., -3.3140e-05,
-3.8624e-05, 1.6785e-04],
...,
[ 7.2479e-04, -5.9891e-04, 2.9373e-04, ..., 4.9591e-04,
-5.4932e-04, -6.7139e-04],
[ 1.5793e-03, -1.5488e-03, 1.7624e-03, ..., 1.6785e-03,
-1.6708e-03, -1.5564e-03],
[-9.8944e-06, 1.3733e-04, 9.8705e-05, ..., 9.3937e-05,
5.0783e-05, 3.0994e-05]], dtype=torch.bfloat16)),
('transformer.layers.15.attention.query_key_value.lora_A',
tensor([[-0.0069, 0.0100, -0.0074, ..., -0.0025, -0.0155, -0.0072],
[ 0.0090, -0.0096, 0.0125, ..., -0.0049, -0.0134, -0.0064],
[-0.0102, 0.0080, -0.0148, ..., 0.0045, 0.0008, 0.0068],
...,
[ 0.0087, -0.0107, -0.0015, ..., 0.0055, 0.0002, 0.0001],
[ 0.0138, -0.0068, 0.0135, ..., 0.0115, 0.0047, -0.0060],
[-0.0126, -0.0078, -0.0121, ..., -0.0137, -0.0139, -0.0032]],
dtype=torch.bfloat16)),
('transformer.layers.15.attention.query_key_value.lora_B',
tensor([[ 0.0008, 0.0014, -0.0019, ..., -0.0015, 0.0006, -0.0011],
[ 0.0010, 0.0010, -0.0003, ..., -0.0008, -0.0010, -0.0013],
[ 0.0007, 0.0005, 0.0011, ..., -0.0006, -0.0009, -0.0006],
...,
[-0.0011, -0.0010, 0.0012, ..., 0.0013, 0.0008, 0.0008],
[-0.0013, -0.0010, 0.0013, ..., 0.0014, 0.0012, 0.0010],
[ 0.0013, 0.0013, -0.0013, ..., -0.0013, -0.0013, -0.0013]],
dtype=torch.bfloat16)),
('transformer.layers.16.attention.query_key_value.lora_A',
tensor([[ 0.0149, -0.0030, -0.0094, ..., -0.0101, -0.0140, 0.0134],
[-0.0074, -0.0156, -0.0042, ..., 0.0146, 0.0118, -0.0060],
[-0.0132, -0.0056, -0.0081, ..., -0.0105, 0.0134, 0.0061],
...,
[ 0.0028, -0.0101, 0.0044, ..., 0.0041, 0.0083, -0.0101],
[-0.0049, -0.0078, 0.0078, ..., 0.0021, 0.0123, 0.0057],
[ 0.0037, -0.0135, 0.0092, ..., 0.0117, -0.0114, -0.0089]],
dtype=torch.bfloat16)),
('transformer.layers.16.attention.query_key_value.lora_B',
tensor([[ 1.4420e-03, -1.2817e-03, 1.5106e-03, ..., -1.5182e-03,
-1.5945e-03, 1.2207e-03],
[-4.6158e-04, -7.1716e-04, 3.7766e-04, ..., -4.0245e-04,
-2.8419e-04, 1.9741e-04],
[-9.3079e-04, 9.3842e-04, -1.1520e-03, ..., 1.0529e-03,
9.6512e-04, -4.2725e-04],
...,
[ 1.6861e-03, -1.8158e-03, 1.7166e-03, ..., -1.6403e-03,
-1.7853e-03, 1.6785e-03],
[ 2.1667e-03, -2.0599e-03, 2.2430e-03, ..., -2.0294e-03,
-2.1362e-03, 2.7161e-03],
[ 2.9945e-04, -4.7112e-04, 1.7929e-04, ..., -4.8828e-04,
-6.2180e-04, 4.7207e-05]], dtype=torch.bfloat16)),
('transformer.layers.17.attention.query_key_value.lora_A',
tensor([[ 0.0084, 0.0138, 0.0069, ..., -0.0107, -0.0043, 0.0067],
[ 0.0104, 0.0028, 0.0064, ..., -0.0112, 0.0014, -0.0021],
[-0.0021, -0.0136, 0.0028, ..., 0.0115, 0.0085, -0.0016],
...,
[ 0.0070, 0.0082, -0.0082, ..., -0.0019, -0.0118, -0.0078],
[ 0.0123, 0.0034, -0.0017, ..., -0.0084, -0.0078, -0.0006],
[-0.0129, 0.0113, 0.0091, ..., -0.0092, -0.0139, 0.0120]],
dtype=torch.bfloat16)),
('transformer.layers.17.attention.query_key_value.lora_B',
tensor([[-8.2016e-04, -5.7983e-04, -5.5313e-04, ..., 4.5586e-04,
4.3678e-04, -6.1417e-04],
[ 4.4632e-04, 5.4169e-04, 5.7602e-04, ..., -8.9645e-04,
-6.5994e-04, 6.1798e-04],
[-4.4060e-04, -2.2697e-04, 3.3855e-05, ..., 4.7112e-04,
-8.1635e-04, -4.1246e-05],
...,
[ 1.2207e-03, 1.1826e-03, 8.1635e-04, ..., -1.2589e-03,
-1.0071e-03, 1.1902e-03],
[ 3.6240e-04, 7.7057e-04, 9.1934e-04, ..., -6.4468e-04,
1.4305e-04, 7.4768e-04],
[ 1.2131e-03, 1.2970e-03, 2.9182e-04, ..., -9.4986e-04,
-1.3657e-03, 1.0605e-03]], dtype=torch.bfloat16)),
('transformer.layers.18.attention.query_key_value.lora_A',
tensor([[ 0.0110, -0.0063, -0.0134, ..., 0.0020, 0.0039, -0.0136],
[-0.0061, -0.0086, 0.0002, ..., 0.0004, 0.0089, -0.0022],
[ 0.0016, 0.0092, -0.0067, ..., -0.0054, -0.0138, -0.0084],
...,
[ 0.0019, 0.0098, 0.0030, ..., -0.0153, 0.0004, -0.0072],
[ 0.0152, -0.0099, 0.0004, ..., -0.0154, 0.0046, 0.0060],
[ 0.0116, -0.0078, -0.0028, ..., 0.0146, -0.0089, 0.0102]],
dtype=torch.bfloat16)),
('transformer.layers.18.attention.query_key_value.lora_B',
tensor([[-2.6584e-05, -1.3828e-04, 3.5286e-04, ..., -1.7166e-04,
-6.5804e-05, -3.3951e-04],
[-3.9864e-04, -6.9046e-04, 1.0834e-03, ..., 2.1577e-05,
6.7139e-04, -1.6975e-04],
[ 1.3580e-03, 1.1139e-03, -8.0109e-04, ..., -9.5367e-04,
-1.3733e-03, 9.6893e-04],
...,
[ 3.7575e-04, -7.4768e-04, 5.4550e-04, ..., 7.8964e-04,
2.0218e-04, 2.1935e-04],
[ 3.9101e-05, 4.8065e-04, 2.4414e-04, ..., -5.5313e-04,
-5.4169e-04, 1.1396e-04],
[-8.7738e-04, 4.2152e-04, 3.2997e-04, ..., -4.6158e-04,
-9.6321e-05, -1.5488e-03]], dtype=torch.bfloat16)),
('transformer.layers.19.attention.query_key_value.lora_A',
tensor([[ 0.0101, 0.0084, 0.0111, ..., 0.0066, 0.0100, 0.0089],
[-0.0007, 0.0071, 0.0153, ..., 0.0063, 0.0135, -0.0047],
[ 0.0048, -0.0098, 0.0060, ..., 0.0004, -0.0050, 0.0111],
...,
[ 0.0152, -0.0114, -0.0085, ..., 0.0137, 0.0144, -0.0024],
[-0.0070, -0.0111, 0.0111, ..., -0.0078, 0.0150, 0.0059],
[ 0.0106, -0.0131, -0.0105, ..., -0.0052, 0.0009, 0.0140]],
dtype=torch.bfloat16)),
('transformer.layers.19.attention.query_key_value.lora_B',
tensor([[-9.2316e-04, -8.5831e-04, -6.2561e-04, ..., 3.4332e-04,
6.3324e-04, -3.3379e-04],
[ 1.0071e-03, 7.0953e-04, -1.1587e-04, ..., -9.0408e-04,
-7.0953e-04, 9.9182e-04],
[ 1.6880e-04, 5.4240e-06, 8.9645e-04, ..., -4.8876e-05,
-3.6955e-05, -2.8491e-05],
...,
[-3.2997e-04, -3.1853e-04, 5.0354e-04, ..., 2.3079e-04,
1.3733e-04, 8.9645e-05],
[-2.2411e-04, -7.5340e-05, 1.2493e-04, ..., 2.8610e-05,
3.2425e-04, 6.5231e-04],
[ 1.0431e-05, 2.2030e-04, -2.0885e-04, ..., -6.2561e-04,
-3.6430e-04, 1.0681e-03]], dtype=torch.bfloat16)),
('transformer.layers.20.attention.query_key_value.lora_A',
tensor([[-1.2756e-02, 9.3994e-03, 1.4648e-02, ..., -6.0272e-04,
1.5503e-02, -1.4832e-02],
[-1.2329e-02, -3.3112e-03, -4.6253e-05, ..., 1.2878e-02,
-1.4038e-02, 1.3916e-02],
[-1.6479e-03, -5.6763e-03, -1.0559e-02, ..., -6.0730e-03,
-1.4709e-02, 8.7891e-03],
...,
[ 6.0425e-03, 1.3245e-02, -3.5858e-03, ..., -8.5068e-04,
1.3977e-02, 6.0730e-03],
[ 1.2024e-02, -6.1035e-03, 1.1475e-02, ..., -6.4392e-03,
7.9346e-03, 1.1292e-02],
[-4.5166e-03, 1.2024e-02, -2.4414e-03, ..., 1.4954e-02,
-1.4343e-02, -3.3760e-04]], dtype=torch.bfloat16)),
('transformer.layers.20.attention.query_key_value.lora_B',
tensor([[-7.2861e-04, -3.8147e-04, 3.8910e-04, ..., -3.5667e-04,
4.7112e-04, -3.6049e-04],
[ 1.0452e-03, 3.3760e-04, 1.6975e-04, ..., 6.0797e-05,
-4.2725e-04, -1.5831e-04],
[-7.2861e-04, -6.8283e-04, 4.1771e-04, ..., -5.0735e-04,
6.9046e-04, -3.5858e-04],
...,
[-2.5368e-04, 7.7057e-04, -7.2861e-04, ..., 5.0735e-04,
-2.9755e-04, 1.0834e-03],
[-1.6327e-03, -1.4954e-03, 1.5106e-03, ..., -7.3624e-04,
1.3657e-03, -6.8665e-04],
[-5.0068e-06, -5.3406e-04, 1.2665e-03, ..., 2.2030e-04,
-6.8283e-04, -4.9210e-04]], dtype=torch.bfloat16)),
('transformer.layers.21.attention.query_key_value.lora_A',
tensor([[-0.0049, 0.0082, -0.0139, ..., 0.0145, 0.0002, 0.0103],
[ 0.0045, 0.0132, -0.0051, ..., 0.0128, 0.0073, -0.0027],
[-0.0017, 0.0132, 0.0134, ..., 0.0017, 0.0002, 0.0154],
...,
[-0.0078, -0.0112, -0.0089, ..., 0.0035, 0.0156, 0.0077],
[ 0.0084, 0.0002, 0.0117, ..., 0.0140, -0.0007, 0.0143],
[ 0.0126, -0.0125, 0.0125, ..., -0.0087, -0.0129, -0.0030]],
dtype=torch.bfloat16)),
('transformer.layers.21.attention.query_key_value.lora_B',
tensor([[-1.2302e-04, -4.8256e-04, -1.1730e-04, ..., 5.3024e-04,
-3.0708e-04, -2.6512e-04],
[ 2.8801e-04, 2.3174e-04, 9.7275e-05, ..., 1.2302e-04,
-3.0100e-06, 2.1839e-04],
[ 5.0306e-05, -6.1417e-04, 2.4319e-04, ..., -3.8147e-04,
-3.0708e-04, -1.4019e-04],
...,
[-1.3199e-03, 1.7548e-04, -1.2817e-03, ..., 1.2970e-03,
-2.0504e-04, -1.1349e-04],
[ 6.8665e-05, 4.1008e-04, -1.8787e-04, ..., -4.6015e-05,
-3.8719e-04, 4.5013e-04],
[ 1.3351e-03, 4.6158e-04, 1.5259e-03, ..., -1.0910e-03,
-9.0790e-04, 1.0376e-03]], dtype=torch.bfloat16)),
('transformer.layers.22.attention.query_key_value.lora_A',
tensor([[ 0.0115, -0.0066, -0.0078, ..., -0.0079, -0.0078, -0.0034],
[ 0.0129, -0.0080, 0.0116, ..., 0.0079, -0.0079, 0.0101],
[-0.0078, 0.0070, -0.0103, ..., 0.0024, -0.0109, 0.0087],
...,
[-0.0131, -0.0045, -0.0035, ..., 0.0078, 0.0053, -0.0082],
[ 0.0150, -0.0156, -0.0156, ..., 0.0077, -0.0129, 0.0120],
[ 0.0033, -0.0027, 0.0026, ..., -0.0049, 0.0008, -0.0006]],
dtype=torch.bfloat16)),
('transformer.layers.22.attention.query_key_value.lora_B',
tensor([[ 3.6049e-04, 3.0899e-04, -3.0708e-04, ..., -1.5855e-05,
-1.5163e-04, -2.3651e-04],
[ 1.6594e-04, -1.3161e-04, 2.8610e-04, ..., -4.7684e-04,
9.9659e-05, -2.9182e-04],
[ 1.3733e-03, 1.2589e-03, 1.4496e-03, ..., 9.6893e-04,
-1.5640e-03, 1.3504e-03],
...,
[ 2.0905e-03, 2.0752e-03, 1.5335e-03, ..., 2.1362e-03,
-1.9684e-03, 2.1973e-03],
[ 8.6594e-04, 9.3460e-04, -1.3504e-03, ..., 7.9346e-04,
-8.2016e-04, 8.1253e-04],
[ 1.1597e-03, 8.8120e-04, 3.8910e-04, ..., 1.2894e-03,
-1.1749e-03, 1.0223e-03]], dtype=torch.bfloat16)),
('transformer.layers.23.attention.query_key_value.lora_A',
tensor([[ 0.0047, 0.0006, 0.0095, ..., -0.0130, -0.0152, 0.0071],
[-0.0021, -0.0138, 0.0151, ..., -0.0014, 0.0108, -0.0084],
[-0.0016, 0.0079, -0.0031, ..., 0.0044, 0.0014, 0.0142],
...,
[-0.0092, 0.0031, 0.0140, ..., 0.0060, 0.0098, -0.0103],
[ 0.0094, 0.0027, -0.0132, ..., 0.0057, -0.0054, 0.0129],
[ 0.0075, 0.0087, -0.0112, ..., 0.0003, -0.0083, 0.0118]],
dtype=torch.bfloat16)),
('transformer.layers.23.attention.query_key_value.lora_B',
tensor([[-6.5327e-05, 3.4142e-04, -5.2643e-04, ..., -1.7357e-04,
-1.7643e-04, -3.1662e-04],
[ 1.8311e-03, -1.5945e-03, 1.8311e-03, ..., 1.6556e-03,
1.1749e-03, 1.5869e-03],
[-7.9346e-04, 6.1035e-05, -8.3542e-04, ..., -3.3951e-04,
3.1090e-04, -8.2016e-04],
...,
[ 7.6294e-04, -9.4604e-04, -6.0499e-06, ..., 1.1444e-03,
3.3569e-04, -1.6556e-03],
[ 9.2697e-04, 2.9945e-04, 1.0376e-03, ..., -7.7438e-04,
1.1368e-03, 2.1210e-03],
[ 1.5974e-05, -8.8215e-06, -3.8719e-04, ..., -2.9564e-04,
-5.2643e-04, -5.5075e-05]], dtype=torch.bfloat16)),
('transformer.layers.24.attention.query_key_value.lora_A',
tensor([[-0.0080, -0.0103, -0.0128, ..., -0.0078, -0.0095, -0.0156],
[-0.0120, 0.0014, -0.0010, ..., 0.0032, 0.0050, 0.0021],
[ 0.0026, 0.0099, 0.0080, ..., 0.0040, -0.0139, 0.0037],
...,
[-0.0015, 0.0131, 0.0090, ..., -0.0044, -0.0121, 0.0035],
[ 0.0034, -0.0026, 0.0058, ..., -0.0109, -0.0052, -0.0084],
[-0.0079, 0.0082, -0.0089, ..., 0.0123, 0.0116, -0.0042]],
dtype=torch.bfloat16)),
('transformer.layers.24.attention.query_key_value.lora_B',
tensor([[-7.1049e-05, -6.2561e-04, 7.0572e-05, ..., -1.9836e-04,
4.8256e-04, -7.3624e-04],
[ 7.5817e-05, -4.1389e-04, -4.3869e-04, ..., -6.7902e-04,
5.4550e-04, -2.3746e-04],
[ 3.7193e-05, 4.7874e-04, 1.0300e-03, ..., 5.9509e-04,
-6.2943e-04, -1.1826e-03],
...,
[-2.6703e-03, -1.6327e-03, -1.4801e-03, ..., -1.7319e-03,
1.5793e-03, 2.1057e-03],
[-8.7357e-04, 1.4114e-03, 1.4801e-03, ..., 1.2665e-03,
-1.8692e-03, -1.9073e-03],
[-2.3346e-03, -1.9684e-03, -1.8692e-03, ..., -1.8768e-03,
1.8311e-03, 2.0447e-03]], dtype=torch.bfloat16)),
('transformer.layers.25.attention.query_key_value.lora_A',
tensor([[-0.0048, -0.0005, 0.0001, ..., -0.0117, 0.0122, -0.0090],
[ 0.0112, 0.0035, 0.0148, ..., 0.0095, 0.0110, 0.0037],
[-0.0085, 0.0007, 0.0074, ..., -0.0076, 0.0078, 0.0050],
...,
[-0.0038, 0.0011, 0.0150, ..., -0.0026, 0.0016, -0.0156],
[ 0.0013, 0.0148, -0.0107, ..., -0.0078, 0.0140, 0.0007],
[ 0.0027, 0.0121, -0.0115, ..., -0.0095, -0.0092, -0.0043]],
dtype=torch.bfloat16)),
('transformer.layers.25.attention.query_key_value.lora_B',
tensor([[ 5.8365e-04, 9.7656e-04, -2.5749e-04, ..., 4.3488e-04,
-8.8215e-05, 2.5177e-04],
[ 1.8539e-03, 1.3351e-03, -1.6708e-03, ..., 1.7776e-03,
1.8311e-03, 2.0142e-03],
[ 2.4109e-03, 2.3956e-03, -2.3956e-03, ..., 2.5787e-03,
2.4109e-03, 2.6550e-03],
...,
[ 6.9046e-04, -3.2997e-04, 7.5531e-04, ..., 7.6675e-04,
6.9809e-04, 6.7139e-04],
[ 1.3046e-03, 6.1035e-04, 1.3351e-03, ..., 1.3504e-03,
1.3275e-03, 9.8419e-04],
[-8.0109e-04, -2.1362e-03, 1.8921e-03, ..., -7.5817e-05,
-4.1199e-04, -2.0294e-03]], dtype=torch.bfloat16)),
('transformer.layers.26.attention.query_key_value.lora_A',
tensor([[-0.0004, 0.0095, 0.0145, ..., 0.0070, -0.0149, 0.0131],
[ 0.0023, 0.0054, -0.0156, ..., -0.0018, 0.0002, -0.0030],
[ 0.0116, -0.0014, 0.0027, ..., 0.0107, 0.0114, 0.0145],
...,
[ 0.0023, -0.0156, 0.0094, ..., 0.0147, -0.0014, 0.0056],
[ 0.0087, -0.0027, 0.0156, ..., 0.0094, 0.0022, -0.0072],
[-0.0023, 0.0080, 0.0020, ..., -0.0019, -0.0052, -0.0106]],
dtype=torch.bfloat16)),
('transformer.layers.26.attention.query_key_value.lora_B',
tensor([[-0.0008, -0.0021, 0.0011, ..., 0.0014, 0.0018, 0.0011],
[-0.0012, -0.0015, 0.0015, ..., 0.0012, 0.0009, 0.0014],
[ 0.0009, 0.0014, -0.0011, ..., -0.0010, -0.0016, -0.0011],
...,
[ 0.0010, 0.0026, 0.0014, ..., -0.0029, -0.0028, 0.0006],
[-0.0003, 0.0013, 0.0002, ..., -0.0009, -0.0010, 0.0002],
[-0.0025, -0.0016, 0.0024, ..., 0.0007, -0.0004, 0.0028]],
dtype=torch.bfloat16)),
('transformer.layers.27.attention.query_key_value.lora_A',
tensor([[ 0.0094, -0.0043, -0.0104, ..., -0.0098, 0.0050, 0.0079],
[ 0.0079, -0.0032, -0.0089, ..., 0.0106, 0.0081, 0.0093],
[ 0.0074, -0.0129, 0.0156, ..., -0.0090, -0.0151, 0.0056],
...,
[-0.0103, -0.0149, 0.0156, ..., -0.0110, 0.0004, 0.0029],
[ 0.0045, 0.0028, 0.0110, ..., 0.0033, 0.0052, -0.0103],
[-0.0146, -0.0019, -0.0031, ..., -0.0052, -0.0129, 0.0065]],
dtype=torch.bfloat16)),
('transformer.layers.27.attention.query_key_value.lora_B',
tensor([[-5.1117e-04, 1.0223e-03, -1.5926e-04, ..., 8.4686e-04,
-1.4496e-03, 7.0190e-04],
[ 4.9353e-05, 3.8338e-04, 2.1744e-04, ..., 1.3123e-03,
-1.2894e-03, 1.2398e-04],
[-7.9727e-04, 9.0027e-04, -1.8082e-03, ..., -1.8082e-03,
3.7956e-04, -1.3962e-03],
...,
[ 3.4943e-03, -4.9133e-03, 5.3406e-03, ..., 4.4861e-03,
2.1057e-03, 5.4321e-03],
[ 1.2054e-03, -1.1139e-03, 1.6327e-03, ..., 1.9836e-03,
-9.5367e-04, 3.2501e-03],
[ 1.3885e-03, -7.4387e-04, 1.4572e-03, ..., 2.0142e-03,
-9.4986e-04, 3.2806e-03]], dtype=torch.bfloat16))])
ChatGlm6b模型
SwiftModel(
(model): ChatGLMForConditionalGeneration(
(transformer): ChatGLMModel(
(word_embeddings): Embedding(130528, 4096)
(layers): ModuleList(
(0-27): 28 x GLMBlock(
(input_layernorm): LayerNorm((4096,), eps=1e-05, elementwise_affine=True)
(attention): SelfAttention(
(rotary_emb): RotaryEmbedding()
(query_key_value): Linear(
in_features=4096, out_features=12288, bias=True
(loramodule_default): Linear(
in_features=4096, out_features=12288, bias=True
(lora_dropout): Dropout(p=0.05, inplace=False)
)
)
(dense): Linear(in_features=4096, out_features=4096, bias=True)
)
(post_attention_layernorm): LayerNorm((4096,), eps=1e-05, elementwise_affine=True)
(mlp): GLU(
(dense_h_to_4h): Linear(in_features=4096, out_features=16384, bias=True)
(dense_4h_to_h): Linear(in_features=16384, out_features=4096, bias=True)
)
)
)
(final_layernorm): LayerNorm((4096,), eps=1e-05, elementwise_affine=True)
)
(lm_head): Linear(in_features=4096, out_features=130528, bias=False)
)
)
根据错误描述来看可能是 state_dict 里有的 layer 名称跟 Swift model 不一样导致的问题,请核实你使用的 model 是否与 FlexTrain 对应版本完全相同,并且是否加载了正确的 state_dict 文件。
例如:检查 state_dict 中的层名是否为 transformer.layers.<层数>.attentions.<子模块>.query_key_value.lora_A 或 transformer.layers.<层数>.attentions.query_key_value.lora_B 等。
“load_state_dict unexpected_keys”这个错误是因为您正在尝试加载的状态字典中包含了一些不在当前模型中的层,这些层可能是在预训练过程中定义的。
为了解决这个问题,您可以在载入状态字典之前,使用
torch.nn.Module.load_state_dict()
函数的strict=False
参数,这将忽略掉与当前模型中不存在的层相对应的关键字。例如:
出现 “load_state_dict unexpected_keys” 错误通常是因为在加载状态字典时,模型的层或模块名称发生了变化,导致加载时出现了不匹配的情况。
在这种情况下,你可以尝试使用 load_state_dict 函数的 strict 参数来加载状态字典。当 strict 参数设置为 True 时,如果模型的层或模块名称与状态字典中的名称不匹配,load_state_dict 函数会自动忽略这些不匹配的项,只加载匹配的项。这样,模型将使用已存在的层和模块来恢复状态,而不会尝试创建新的层或模块。
示例代码如下:
model.load_state_dict(state_dict, strict=True)
CopyCopy
你提到的 “ZhipuAI/ChatGLM-6B” 模型可能是基于清华大学 KEG 实验室和智谱AI共同训练的 GLM-130B 模型进行微调的。由于模型结构的差异,可能需要对状态字典进行一些调整,以便正确地将其加载到微调后的模型中。