flextrain微调模型示例代码报:load_state_dict unexpected_keys[阿里云]

ZhipuAI/ChatGLM-6B 的 flextrain 微调模型训练成功之后,跟着示例走,走到 snapshot_download 下来状态字典,里面的层名是类似:
transformer.layers.【层数】.attention.query_key_value.lora_A
transformer.layers.【层数】.attention.query_key_value.lora_B

然后进行 model.load_state_dict(state_dict) 的时候,上述的所有key都是 unexpected_keys。

看了一下swift model里的层名,只有:
transformer.layers.【层数】.attention.query_key_value.loramodule_default.lora_A
transformer.layers.【层数】.attention.query_key_value.loramodule_default.lora_B

不知道为什么FlexTrain出来的状态字典层名会和swift model的不一样。

求大神指教!

FlexTrain出来的状态字典:

OrderedDict([('transformer.layers.0.attention.query_key_value.lora_A',
              tensor([[ 0.0092, -0.0156, -0.0047,  ..., -0.0121,  0.0107,  0.0067],
                      [ 0.0104, -0.0084, -0.0010,  ...,  0.0071,  0.0146,  0.0155],
                      [-0.0013, -0.0006, -0.0031,  ..., -0.0021,  0.0052,  0.0039],
                      ...,
                      [ 0.0041,  0.0074,  0.0130,  ..., -0.0078,  0.0101, -0.0084],
                      [ 0.0047,  0.0113, -0.0137,  ...,  0.0008, -0.0052, -0.0006],
                      [ 0.0027,  0.0006,  0.0005,  ..., -0.0086, -0.0156, -0.0050]],
                     dtype=torch.bfloat16)),
             ('transformer.layers.0.attention.query_key_value.lora_B',
              tensor([[ 1.5640e-04,  7.3624e-04, -6.0654e-04,  ...,  3.2043e-04,
                        1.4722e-05, -8.9169e-05],
                      [ 3.1281e-04,  5.0306e-05,  6.7711e-05,  ...,  6.2943e-05,
                        2.6703e-04,  1.7643e-04],
                      [ 1.5163e-04, -1.7738e-04,  1.5831e-04,  ...,  1.8692e-04,
                       -9.1076e-05,  6.0654e-04],
                      ...,
                      [ 3.8910e-04, -5.9891e-04, -6.7902e-04,  ..., -3.2187e-05,
                        1.1215e-03, -8.9169e-05],
                      [-8.7357e-04, -8.4305e-04, -1.0300e-03,  ...,  8.8120e-04,
                       -1.0605e-03,  1.5335e-03],
                      [ 1.6479e-03,  4.8256e-04, -9.7275e-04,  ..., -3.4904e-04,
                       -8.6594e-04,  1.8616e-03]], dtype=torch.bfloat16)),
             ('transformer.layers.1.attention.query_key_value.lora_A',
              tensor([[-0.0060,  0.0018, -0.0060,  ..., -0.0096, -0.0156, -0.0123],
                      [ 0.0013,  0.0107, -0.0028,  ...,  0.0117,  0.0077,  0.0033],
                      [ 0.0131,  0.0135,  0.0126,  ..., -0.0156,  0.0048, -0.0113],
                      ...,
                      [-0.0084, -0.0086, -0.0120,  ..., -0.0117,  0.0142, -0.0137],
                      [-0.0078,  0.0025,  0.0005,  ...,  0.0056, -0.0059,  0.0016],
                      [-0.0140, -0.0038, -0.0036,  ...,  0.0034,  0.0011, -0.0085]],
                     dtype=torch.bfloat16)),
             ('transformer.layers.1.attention.query_key_value.lora_B',
              tensor([[ 1.3733e-04, -3.5858e-04,  8.4305e-04,  ..., -6.2943e-04,
                        9.1171e-04,  1.0529e-03],
                      [-1.0986e-03, -7.2861e-04, -1.8539e-03,  ...,  7.1335e-04,
                        1.9836e-04, -1.7853e-03],
                      [-4.5013e-04, -2.0599e-04,  1.4496e-04,  ...,  6.2585e-06,
                        4.6790e-06, -6.4087e-04],
                      ...,
                      [-6.7139e-04, -1.0986e-03, -9.3079e-04,  ...,  1.0147e-03,
                       -1.3351e-03, -1.0910e-03],
                      [ 1.2665e-03,  1.6251e-03,  1.5564e-03,  ..., -7.4387e-04,
                        1.4191e-03,  1.5488e-03],
                      [-7.1335e-04, -5.4169e-04, -5.1498e-04,  ...,  1.1139e-03,
                       -1.4267e-03, -2.5940e-04]], dtype=torch.bfloat16)),
             ('transformer.layers.2.attention.query_key_value.lora_A',
              tensor([[ 0.0055,  0.0134,  0.0111,  ..., -0.0116,  0.0038, -0.0154],
                      [ 0.0085,  0.0151,  0.0085,  ...,  0.0070, -0.0028,  0.0041],
                      [ 0.0153,  0.0117,  0.0115,  ...,  0.0045,  0.0131, -0.0135],
                      ...,
                      [-0.0135,  0.0061, -0.0079,  ...,  0.0051, -0.0134,  0.0042],
                      [ 0.0052, -0.0025, -0.0104,  ..., -0.0006,  0.0084, -0.0112],
                      [-0.0083,  0.0001, -0.0029,  ..., -0.0125,  0.0049,  0.0138]],
                     dtype=torch.bfloat16)),
             ('transformer.layers.2.attention.query_key_value.lora_B',
              tensor([[-1.4038e-03, -7.3242e-04, -1.5259e-04,  ...,  1.2817e-03,
                        8.3923e-04, -1.0605e-03],
                      [-1.7853e-03, -6.5613e-04, -5.3024e-04,  ...,  6.0654e-04,
                        1.5030e-03, -7.9155e-05],
                      [-1.6117e-04,  6.7902e-04,  8.5831e-04,  ..., -8.5068e-04,
                       -2.8038e-04,  4.8256e-04],
                      ...,
                      [-5.6744e-05,  5.4932e-04,  2.2507e-04,  ..., -3.9864e-04,
                        1.0395e-04, -4.4346e-05],
                      [-4.3297e-04,  5.1117e-04,  3.8910e-04,  ..., -3.9339e-05,
                        1.6975e-04, -1.4114e-03],
                      [-5.3048e-06,  1.0681e-03,  7.0572e-04,  ..., -5.0354e-04,
                       -7.1335e-04, -6.8665e-05]], dtype=torch.bfloat16)),
             ('transformer.layers.3.attention.query_key_value.lora_A',
              tensor([[-0.0084, -0.0140,  0.0123,  ...,  0.0004, -0.0038,  0.0125],
                      [ 0.0063,  0.0002,  0.0019,  ..., -0.0028,  0.0078,  0.0131],
                      [ 0.0033, -0.0151,  0.0019,  ...,  0.0065,  0.0092,  0.0042],
                      ...,
                      [ 0.0123, -0.0045,  0.0078,  ...,  0.0078,  0.0066, -0.0042],
                      [-0.0092,  0.0098,  0.0095,  ...,  0.0140, -0.0049,  0.0137],
                      [-0.0082, -0.0003, -0.0142,  ..., -0.0089,  0.0156,  0.0022]],
                     dtype=torch.bfloat16)),
             ('transformer.layers.3.attention.query_key_value.lora_B',
              tensor([[-2.0504e-04, -5.4932e-04, -2.5368e-04,  ..., -7.2861e-04,
                        5.3787e-04,  4.2915e-05],
                      [-1.5354e-04, -1.1444e-03, -5.5075e-05,  ...,  3.3951e-04,
                       -5.3406e-04, -9.1171e-04],
                      [-5.5313e-04,  2.3460e-04, -3.9864e-04,  ..., -6.2180e-04,
                       -2.5868e-05,  2.8610e-04],
                      ...,
                      [-4.0817e-04,  1.2970e-03, -6.4468e-04,  ..., -5.1117e-04,
                        8.2397e-04,  1.0681e-03],
                      [-5.4932e-04,  3.9482e-04, -5.0354e-04,  ..., -5.9891e-04,
                        5.4169e-04,  4.0054e-04],
                      [ 1.9684e-03, -2.6245e-03,  2.4719e-03,  ...,  2.0599e-03,
                       -2.0752e-03, -2.7466e-03]], dtype=torch.bfloat16)),
             ('transformer.layers.4.attention.query_key_value.lora_A',
              tensor([[ 6.8359e-03,  1.6556e-03,  1.1841e-02,  ..., -6.2943e-04,
                        1.1139e-03,  9.7656e-03],
                      [-1.6708e-03,  1.1841e-02, -6.5002e-03,  ..., -1.1658e-02,
                        6.8970e-03,  1.3367e-02],
                      [-1.2634e-02,  3.9101e-05, -7.8125e-03,  ..., -1.3794e-02,
                       -6.4087e-04, -9.9487e-03],
                      ...,
                      [-4.7913e-03, -3.3722e-03, -5.2795e-03,  ...,  8.1177e-03,
                        1.1169e-02,  2.4261e-03],
                      [-7.5073e-03, -5.7373e-03,  9.7656e-03,  ...,  9.0332e-03,
                       -1.5259e-02, -8.1787e-03],
                      [-2.3460e-04, -3.4637e-03, -3.3569e-03,  ..., -1.1292e-02,
                        1.5625e-02, -1.4709e-02]], dtype=torch.bfloat16)),
             ('transformer.layers.4.attention.query_key_value.lora_B',
              tensor([[ 4.5586e-04,  2.5368e-04,  5.9509e-04,  ..., -8.6975e-04,
                       -2.3270e-04,  2.3174e-04],
                      [ 5.8365e-04,  4.8828e-04,  5.4169e-04,  ..., -5.6458e-04,
                        1.7881e-05,  1.0986e-03],
                      [ 4.1962e-05,  2.8229e-04, -2.0027e-04,  ...,  6.6376e-04,
                       -9.3937e-05, -2.0218e-04],
                      ...,
                      [ 1.4954e-03,  1.0376e-03,  1.9836e-03,  ..., -1.2207e-03,
                       -9.9945e-04,  1.7014e-03],
                      [ 1.0071e-03,  9.7275e-04,  1.1902e-03,  ..., -1.5411e-03,
                       -9.2316e-04,  2.4719e-03],
                      [ 1.4648e-03,  1.1139e-03,  1.4801e-03,  ..., -1.4343e-03,
                       -9.3079e-04,  1.2512e-03]], dtype=torch.bfloat16)),
             ('transformer.layers.5.attention.query_key_value.lora_A',
              tensor([[ 0.0124,  0.0027, -0.0061,  ..., -0.0030,  0.0148,  0.0051],
                      [ 0.0115,  0.0142, -0.0137,  ...,  0.0074, -0.0112,  0.0127],
                      [ 0.0079,  0.0091, -0.0126,  ..., -0.0050,  0.0067, -0.0140],
                      ...,
                      [ 0.0019,  0.0114,  0.0125,  ...,  0.0137,  0.0016, -0.0034],
                      [-0.0124,  0.0002,  0.0078,  ..., -0.0105, -0.0108, -0.0082],
                      [ 0.0095,  0.0156,  0.0038,  ..., -0.0008, -0.0106, -0.0122]],
                     dtype=torch.bfloat16)),
             ('transformer.layers.5.attention.query_key_value.lora_B',
              tensor([[-1.3351e-03,  2.9564e-04, -1.2741e-03,  ...,  8.9645e-04,
                        6.8283e-04, -1.3351e-03],
                      [ 1.1921e-04,  2.6321e-04, -3.1281e-04,  ...,  1.8692e-04,
                        2.0885e-04,  6.7520e-04],
                      [-4.3297e-04,  9.6130e-04, -5.0354e-04,  ...,  7.5912e-04,
                        1.6880e-04,  1.5163e-04],
                      ...,
                      [ 7.1716e-04, -3.3569e-04,  5.4932e-04,  ..., -4.5598e-06,
                       -2.9755e-04,  1.1597e-03],
                      [-1.4725e-03,  1.4267e-03, -1.7090e-03,  ...,  1.3580e-03,
                        1.7090e-03, -1.4038e-03],
                      [ 1.8539e-03, -2.0142e-03,  2.4261e-03,  ..., -1.8387e-03,
                       -1.9302e-03,  2.0294e-03]], dtype=torch.bfloat16)),
             ('transformer.layers.6.attention.query_key_value.lora_A',
              tensor([[-8.0566e-03,  9.6436e-03,  1.4877e-03,  ...,  6.2866e-03,
                        1.1963e-02,  4.9114e-05],
                      [ 2.9907e-03, -5.1498e-04, -1.5259e-03,  ...,  1.3489e-02,
                        8.1787e-03, -9.6436e-03],
                      [ 1.5320e-02, -1.7242e-03, -7.8125e-03,  ..., -1.4832e-02,
                        7.1411e-03,  2.0752e-03],
                      ...,
                      [ 1.2573e-02,  7.2327e-03,  1.3916e-02,  ..., -1.5625e-02,
                        1.0315e-02,  1.0315e-02],
                      [ 9.0942e-03,  1.4526e-02, -6.7139e-03,  ...,  8.0566e-03,
                       -8.2016e-05, -4.5776e-03],
                      [ 1.3306e-02, -1.3428e-02, -1.6861e-03,  ..., -7.0801e-03,
                        1.0010e-02,  1.5137e-02]], dtype=torch.bfloat16)),
             ('transformer.layers.6.attention.query_key_value.lora_B',
              tensor([[ 4.6539e-04, -7.6294e-04,  2.0599e-04,  ...,  4.8828e-04,
                       -5.1880e-04,  9.9659e-05],
                      [-5.9509e-04, -6.3324e-04, -7.6675e-04,  ..., -1.2493e-04,
                       -2.0504e-04,  5.0068e-05],
                      [ 3.6621e-04,  6.1035e-05,  3.2616e-04,  ...,  1.3924e-04,
                       -9.5367e-06,  2.7847e-04],
                      ...,
                      [-2.4109e-03, -1.8845e-03, -1.8005e-03,  ..., -1.8082e-03,
                       -1.8387e-03, -1.7776e-03],
                      [-7.7438e-04, -3.1853e-04, -7.2861e-04,  ..., -7.6294e-04,
                       -4.1580e-04, -7.3242e-04],
                      [-2.3041e-03, -1.8921e-03, -1.5488e-03,  ..., -1.7853e-03,
                       -1.6022e-03, -1.8158e-03]], dtype=torch.bfloat16)),
             ('transformer.layers.7.attention.query_key_value.lora_A',
              tensor([[-0.0120, -0.0084,  0.0154,  ..., -0.0010, -0.0074, -0.0088],
                      [-0.0050,  0.0088, -0.0112,  ...,  0.0009,  0.0071, -0.0134],
                      [ 0.0041,  0.0079, -0.0110,  ...,  0.0106, -0.0025, -0.0052],
                      ...,
                      [ 0.0054, -0.0008,  0.0140,  ..., -0.0049, -0.0021, -0.0132],
                      [ 0.0006, -0.0012,  0.0030,  ..., -0.0047, -0.0054, -0.0036],
                      [-0.0061, -0.0014,  0.0089,  ...,  0.0049, -0.0146, -0.0108]],
                     dtype=torch.bfloat16)),
             ('transformer.layers.7.attention.query_key_value.lora_B',
              tensor([[ 1.0071e-03,  1.9455e-04, -1.7548e-04,  ..., -2.7466e-04,
                        1.7166e-04, -9.5367e-05],
                      [-6.1417e-04, -7.9727e-04,  7.6675e-04,  ...,  1.9646e-04,
                       -1.2360e-03,  6.2180e-04],
                      [-1.4305e-04,  9.1553e-04,  1.9550e-05,  ...,  8.3542e-04,
                       -3.4714e-04, -6.2561e-04],
                      ...,
                      [ 1.4420e-03, -1.6632e-03, -1.2283e-03,  ..., -1.6403e-03,
                        6.4468e-04,  1.4191e-03],
                      [-9.1934e-04,  1.9455e-04,  6.3324e-04,  ...,  1.3809e-03,
                       -1.7834e-04, -7.7820e-04],
                      [ 1.5488e-03, -1.0910e-03, -9.6130e-04,  ..., -1.6403e-03,
                        9.2697e-04,  9.8419e-04]], dtype=torch.bfloat16)),
             ('transformer.layers.8.attention.query_key_value.lora_A',
              tensor([[-0.0038, -0.0039,  0.0078,  ..., -0.0093, -0.0099, -0.0001],
                      [ 0.0028,  0.0123, -0.0042,  ...,  0.0156,  0.0042, -0.0104],
                      [ 0.0040,  0.0023,  0.0073,  ..., -0.0038, -0.0147, -0.0114],
                      ...,
                      [-0.0046, -0.0068, -0.0050,  ...,  0.0009,  0.0020,  0.0019],
                      [ 0.0128, -0.0089,  0.0096,  ...,  0.0078,  0.0032,  0.0068],
                      [ 0.0007,  0.0001,  0.0008,  ...,  0.0009, -0.0004,  0.0037]],
                     dtype=torch.bfloat16)),
             ('transformer.layers.8.attention.query_key_value.lora_B',
              tensor([[ 1.4246e-05,  4.7493e-04,  2.8801e-04,  ..., -2.4605e-04,
                       -6.9046e-04, -5.0354e-04],
                      [-1.1978e-03,  1.9932e-04, -6.1798e-04,  ...,  4.5013e-04,
                       -5.9128e-04, -7.9155e-05],
                      [-1.9455e-04, -8.1539e-05,  1.5545e-04,  ..., -5.2261e-04,
                        7.4768e-04,  3.4714e-04],
                      ...,
                      [ 1.0910e-03,  7.2861e-04,  9.0408e-04,  ..., -1.2131e-03,
                        8.8882e-04, -6.3705e-04],
                      [-1.5945e-03, -1.7700e-03, -1.5793e-03,  ...,  1.6556e-03,
                       -1.6251e-03,  1.5488e-03],
                      [ 2.3956e-03,  2.7466e-03,  2.6550e-03,  ..., -1.7548e-03,
                        2.4414e-03, -2.2736e-03]], dtype=torch.bfloat16)),
             ('transformer.layers.9.attention.query_key_value.lora_A',
              tensor([[ 0.0012,  0.0132, -0.0034,  ..., -0.0156,  0.0049, -0.0025],
                      [-0.0127,  0.0038,  0.0093,  ...,  0.0124,  0.0007, -0.0129],
                      [-0.0073,  0.0112,  0.0119,  ..., -0.0106,  0.0156, -0.0109],
                      ...,
                      [ 0.0015,  0.0094,  0.0103,  ...,  0.0031,  0.0129, -0.0048],
                      [ 0.0154,  0.0018, -0.0140,  ...,  0.0156, -0.0066, -0.0127],
                      [-0.0123,  0.0007, -0.0019,  ...,  0.0132,  0.0132, -0.0101]],
                     dtype=torch.bfloat16)),
             ('transformer.layers.9.attention.query_key_value.lora_B',
              tensor([[ 2.0862e-05, -3.0279e-05, -1.3351e-04,  ...,  3.1090e-04,
                       -9.4891e-05, -7.2098e-04],
                      [-3.7193e-04, -1.4019e-04,  5.6839e-04,  ...,  1.3256e-04,
                        7.8964e-04, -4.1771e-04],
                      [-1.6327e-03, -4.8447e-04, -5.6839e-04,  ...,  9.1171e-04,
                        1.1597e-03, -4.4060e-04],
                      ...,
                      [ 2.8610e-04, -1.8311e-04,  3.0327e-04,  ..., -5.0735e-04,
                       -4.5395e-04,  3.1281e-04],
                      [-1.2283e-03,  1.6251e-03, -1.3542e-04,  ...,  1.1292e-03,
                        1.0681e-03, -1.0757e-03],
                      [-1.8403e-06,  3.7384e-04, -6.8283e-04,  ...,  4.4441e-04,
                        5.6839e-04, -8.0872e-04]], dtype=torch.bfloat16)),
             ('transformer.layers.10.attention.query_key_value.lora_A',
              tensor([[ 4.5776e-03, -9.1553e-03, -2.4567e-03,  ...,  3.5553e-03,
                       -5.3406e-03, -8.8501e-03],
                      [-1.3184e-02,  8.9111e-03,  1.4893e-02,  ...,  4.3640e-03,
                        4.5776e-03, -7.7724e-05],
                      [-9.8267e-03,  1.6937e-03, -1.6098e-03,  ..., -3.0060e-03,
                       -7.5378e-03, -4.5471e-03],
                      ...,
                      [-1.3275e-03,  8.6670e-03, -6.0120e-03,  ..., -1.0193e-02,
                       -1.4343e-02,  1.3123e-02],
                      [ 4.2419e-03, -1.5259e-02,  6.6223e-03,  ...,  1.5793e-03,
                       -1.3062e-02, -7.6294e-03],
                      [-1.2146e-02, -6.4087e-03,  5.4626e-03,  ..., -9.6436e-03,
                        7.8125e-03, -8.8501e-03]], dtype=torch.bfloat16)),
             ('transformer.layers.10.attention.query_key_value.lora_B',
              tensor([[-1.1902e-03, -1.5182e-03,  8.2779e-04,  ...,  1.6708e-03,
                       -9.9945e-04, -2.3193e-03],
                      [-1.4191e-03, -1.7242e-03,  1.6861e-03,  ...,  1.9455e-03,
                       -1.4191e-03, -1.6785e-03],
                      [-3.3760e-04,  3.7909e-05,  5.1260e-05,  ..., -9.6512e-04,
                        4.1389e-04,  4.8256e-04],
                      ...,
                      [ 1.1749e-03,  1.1826e-03, -1.4496e-03,  ..., -1.2436e-03,
                        1.5335e-03,  1.0071e-03],
                      [-3.3569e-04,  6.0272e-04, -1.4114e-03,  ...,  9.2506e-05,
                        5.4932e-04, -2.3365e-04],
                      [-1.0223e-03, -1.7776e-03,  1.8005e-03,  ...,  1.0910e-03,
                       -1.4343e-03, -1.0834e-03]], dtype=torch.bfloat16)),
             ('transformer.layers.11.attention.query_key_value.lora_A',
              tensor([[-0.0117, -0.0148,  0.0129,  ..., -0.0087,  0.0009, -0.0032],
                      [ 0.0114,  0.0078,  0.0107,  ..., -0.0045, -0.0008,  0.0152],
                      [ 0.0087,  0.0001,  0.0039,  ...,  0.0005, -0.0008, -0.0069],
                      ...,
                      [ 0.0035, -0.0154,  0.0120,  ..., -0.0064,  0.0009,  0.0145],
                      [-0.0123, -0.0120, -0.0018,  ...,  0.0134, -0.0017,  0.0103],
                      [ 0.0078, -0.0006, -0.0021,  ...,  0.0052,  0.0056,  0.0021]],
                     dtype=torch.bfloat16)),
             ('transformer.layers.11.attention.query_key_value.lora_B',
              tensor([[ 2.6822e-05, -1.4019e-04, -1.1539e-04,  ...,  1.6117e-04,
                        4.1199e-04,  2.8419e-04],
                      [-1.3809e-03, -6.8665e-04,  1.1292e-03,  ..., -1.2360e-03,
                       -9.1934e-04, -1.0452e-03],
                      [-5.8746e-04, -4.0627e-04,  8.0490e-04,  ..., -5.5790e-05,
                        1.8024e-04, -4.9973e-04],
                      ...,
                      [-7.7057e-04, -8.2016e-04,  8.5449e-04,  ..., -8.7357e-04,
                       -1.0071e-03, -8.7357e-04],
                      [-6.9427e-04, -5.9128e-04,  6.3324e-04,  ..., -8.1253e-04,
                       -7.8583e-04, -6.2561e-04],
                      [ 2.7084e-04,  4.8447e-04, -4.7302e-04,  ...,  2.1267e-04,
                        4.1962e-04,  6.2561e-04]], dtype=torch.bfloat16)),
             ('transformer.layers.12.attention.query_key_value.lora_A',
              tensor([[ 0.0134,  0.0112,  0.0050,  ...,  0.0034,  0.0106,  0.0137],
                      [ 0.0013, -0.0124,  0.0073,  ..., -0.0058, -0.0041,  0.0069],
                      [ 0.0095, -0.0090,  0.0054,  ...,  0.0014, -0.0131, -0.0044],
                      ...,
                      [ 0.0052,  0.0050,  0.0008,  ...,  0.0118, -0.0074, -0.0021],
                      [ 0.0117, -0.0019,  0.0054,  ...,  0.0078, -0.0101,  0.0030],
                      [ 0.0142,  0.0072,  0.0146,  ...,  0.0124, -0.0058, -0.0093]],
                     dtype=torch.bfloat16)),
             ('transformer.layers.12.attention.query_key_value.lora_B',
              tensor([[-9.1553e-04, -6.7139e-04,  5.4550e-04,  ...,  6.6376e-04,
                        1.1749e-03, -5.9891e-04],
                      [ 8.8501e-04,  1.0300e-03, -6.9809e-04,  ..., -1.0986e-03,
                       -8.5831e-04,  5.4550e-04],
                      [-4.2152e-04, -8.1062e-06,  5.0545e-05,  ..., -9.9659e-05,
                       -1.7929e-04, -1.7524e-05],
                      ...,
                      [ 7.9727e-04,  1.8311e-04, -2.7466e-04,  ..., -3.7575e-04,
                       -3.0708e-04,  3.3379e-04],
                      [-1.1902e-03, -1.5335e-03,  1.5182e-03,  ...,  1.3580e-03,
                        1.3046e-03, -1.3809e-03],
                      [ 4.6158e-04,  1.7643e-04,  2.1458e-04,  ...,  2.8229e-04,
                        3.2997e-04, -3.6430e-04]], dtype=torch.bfloat16)),
             ('transformer.layers.13.attention.query_key_value.lora_A',
              tensor([[ 0.0065,  0.0063,  0.0046,  ...,  0.0105,  0.0103, -0.0099],
                      [ 0.0060, -0.0016, -0.0046,  ...,  0.0097, -0.0104,  0.0105],
                      [-0.0143, -0.0151,  0.0029,  ...,  0.0029, -0.0151,  0.0040],
                      ...,
                      [ 0.0071,  0.0136, -0.0039,  ...,  0.0103,  0.0150,  0.0140],
                      [ 0.0096,  0.0107,  0.0070,  ...,  0.0036, -0.0101,  0.0067],
                      [-0.0010, -0.0139,  0.0156,  ..., -0.0132, -0.0049, -0.0059]],
                     dtype=torch.bfloat16)),
             ('transformer.layers.13.attention.query_key_value.lora_B',
              tensor([[ 7.9155e-05,  5.6076e-04,  9.1171e-04,  ..., -8.4686e-04,
                       -1.5831e-04,  2.7657e-04],
                      [-4.2725e-04, -5.6505e-05, -9.6512e-04,  ...,  4.5395e-04,
                        2.5368e-04, -1.2159e-04],
                      [-5.1498e-04, -2.7084e-04, -1.0347e-04,  ...,  8.5354e-05,
                       -2.0313e-04,  1.3351e-04],
                      ...,
                      [ 8.3160e-04, -1.1063e-03, -1.1597e-03,  ...,  5.2643e-04,
                        1.0376e-03, -1.0910e-03],
                      [ 1.1749e-03, -1.2207e-03, -1.1597e-03,  ...,  1.5945e-03,
                        1.2665e-03, -1.1902e-03],
                      [ 2.2430e-03, -1.7548e-03, -1.7929e-03,  ...,  2.2430e-03,
                        1.8234e-03, -1.7548e-03]], dtype=torch.bfloat16)),
             ('transformer.layers.14.attention.query_key_value.lora_A',
              tensor([[-0.0105, -0.0117, -0.0014,  ..., -0.0092, -0.0117, -0.0059],
                      [ 0.0002, -0.0046,  0.0114,  ..., -0.0079,  0.0029,  0.0136],
                      [-0.0106,  0.0086, -0.0156,  ..., -0.0065, -0.0042, -0.0078],
                      ...,
                      [-0.0013,  0.0084, -0.0063,  ...,  0.0007,  0.0081,  0.0013],
                      [ 0.0130,  0.0018, -0.0079,  ..., -0.0095, -0.0107,  0.0087],
                      [ 0.0103,  0.0031, -0.0028,  ...,  0.0038, -0.0096,  0.0031]],
                     dtype=torch.bfloat16)),
             ('transformer.layers.14.attention.query_key_value.lora_B',
              tensor([[-3.2425e-04, -5.3048e-06,  2.1696e-05,  ..., -4.8256e-04,
                       -2.9564e-04, -7.9346e-04],
                      [ 1.1444e-03, -5.8746e-04,  4.5586e-04,  ..., -5.8365e-04,
                       -1.1253e-04, -5.9509e-04],
                      [ 3.5524e-05,  4.5204e-04, -1.2159e-04,  ..., -3.3140e-05,
                       -3.8624e-05,  1.6785e-04],
                      ...,
                      [ 7.2479e-04, -5.9891e-04,  2.9373e-04,  ...,  4.9591e-04,
                       -5.4932e-04, -6.7139e-04],
                      [ 1.5793e-03, -1.5488e-03,  1.7624e-03,  ...,  1.6785e-03,
                       -1.6708e-03, -1.5564e-03],
                      [-9.8944e-06,  1.3733e-04,  9.8705e-05,  ...,  9.3937e-05,
                        5.0783e-05,  3.0994e-05]], dtype=torch.bfloat16)),
             ('transformer.layers.15.attention.query_key_value.lora_A',
              tensor([[-0.0069,  0.0100, -0.0074,  ..., -0.0025, -0.0155, -0.0072],
                      [ 0.0090, -0.0096,  0.0125,  ..., -0.0049, -0.0134, -0.0064],
                      [-0.0102,  0.0080, -0.0148,  ...,  0.0045,  0.0008,  0.0068],
                      ...,
                      [ 0.0087, -0.0107, -0.0015,  ...,  0.0055,  0.0002,  0.0001],
                      [ 0.0138, -0.0068,  0.0135,  ...,  0.0115,  0.0047, -0.0060],
                      [-0.0126, -0.0078, -0.0121,  ..., -0.0137, -0.0139, -0.0032]],
                     dtype=torch.bfloat16)),
             ('transformer.layers.15.attention.query_key_value.lora_B',
              tensor([[ 0.0008,  0.0014, -0.0019,  ..., -0.0015,  0.0006, -0.0011],
                      [ 0.0010,  0.0010, -0.0003,  ..., -0.0008, -0.0010, -0.0013],
                      [ 0.0007,  0.0005,  0.0011,  ..., -0.0006, -0.0009, -0.0006],
                      ...,
                      [-0.0011, -0.0010,  0.0012,  ...,  0.0013,  0.0008,  0.0008],
                      [-0.0013, -0.0010,  0.0013,  ...,  0.0014,  0.0012,  0.0010],
                      [ 0.0013,  0.0013, -0.0013,  ..., -0.0013, -0.0013, -0.0013]],
                     dtype=torch.bfloat16)),
             ('transformer.layers.16.attention.query_key_value.lora_A',
              tensor([[ 0.0149, -0.0030, -0.0094,  ..., -0.0101, -0.0140,  0.0134],
                      [-0.0074, -0.0156, -0.0042,  ...,  0.0146,  0.0118, -0.0060],
                      [-0.0132, -0.0056, -0.0081,  ..., -0.0105,  0.0134,  0.0061],
                      ...,
                      [ 0.0028, -0.0101,  0.0044,  ...,  0.0041,  0.0083, -0.0101],
                      [-0.0049, -0.0078,  0.0078,  ...,  0.0021,  0.0123,  0.0057],
                      [ 0.0037, -0.0135,  0.0092,  ...,  0.0117, -0.0114, -0.0089]],
                     dtype=torch.bfloat16)),
             ('transformer.layers.16.attention.query_key_value.lora_B',
              tensor([[ 1.4420e-03, -1.2817e-03,  1.5106e-03,  ..., -1.5182e-03,
                       -1.5945e-03,  1.2207e-03],
                      [-4.6158e-04, -7.1716e-04,  3.7766e-04,  ..., -4.0245e-04,
                       -2.8419e-04,  1.9741e-04],
                      [-9.3079e-04,  9.3842e-04, -1.1520e-03,  ...,  1.0529e-03,
                        9.6512e-04, -4.2725e-04],
                      ...,
                      [ 1.6861e-03, -1.8158e-03,  1.7166e-03,  ..., -1.6403e-03,
                       -1.7853e-03,  1.6785e-03],
                      [ 2.1667e-03, -2.0599e-03,  2.2430e-03,  ..., -2.0294e-03,
                       -2.1362e-03,  2.7161e-03],
                      [ 2.9945e-04, -4.7112e-04,  1.7929e-04,  ..., -4.8828e-04,
                       -6.2180e-04,  4.7207e-05]], dtype=torch.bfloat16)),
             ('transformer.layers.17.attention.query_key_value.lora_A',
              tensor([[ 0.0084,  0.0138,  0.0069,  ..., -0.0107, -0.0043,  0.0067],
                      [ 0.0104,  0.0028,  0.0064,  ..., -0.0112,  0.0014, -0.0021],
                      [-0.0021, -0.0136,  0.0028,  ...,  0.0115,  0.0085, -0.0016],
                      ...,
                      [ 0.0070,  0.0082, -0.0082,  ..., -0.0019, -0.0118, -0.0078],
                      [ 0.0123,  0.0034, -0.0017,  ..., -0.0084, -0.0078, -0.0006],
                      [-0.0129,  0.0113,  0.0091,  ..., -0.0092, -0.0139,  0.0120]],
                     dtype=torch.bfloat16)),
             ('transformer.layers.17.attention.query_key_value.lora_B',
              tensor([[-8.2016e-04, -5.7983e-04, -5.5313e-04,  ...,  4.5586e-04,
                        4.3678e-04, -6.1417e-04],
                      [ 4.4632e-04,  5.4169e-04,  5.7602e-04,  ..., -8.9645e-04,
                       -6.5994e-04,  6.1798e-04],
                      [-4.4060e-04, -2.2697e-04,  3.3855e-05,  ...,  4.7112e-04,
                       -8.1635e-04, -4.1246e-05],
                      ...,
                      [ 1.2207e-03,  1.1826e-03,  8.1635e-04,  ..., -1.2589e-03,
                       -1.0071e-03,  1.1902e-03],
                      [ 3.6240e-04,  7.7057e-04,  9.1934e-04,  ..., -6.4468e-04,
                        1.4305e-04,  7.4768e-04],
                      [ 1.2131e-03,  1.2970e-03,  2.9182e-04,  ..., -9.4986e-04,
                       -1.3657e-03,  1.0605e-03]], dtype=torch.bfloat16)),
             ('transformer.layers.18.attention.query_key_value.lora_A',
              tensor([[ 0.0110, -0.0063, -0.0134,  ...,  0.0020,  0.0039, -0.0136],
                      [-0.0061, -0.0086,  0.0002,  ...,  0.0004,  0.0089, -0.0022],
                      [ 0.0016,  0.0092, -0.0067,  ..., -0.0054, -0.0138, -0.0084],
                      ...,
                      [ 0.0019,  0.0098,  0.0030,  ..., -0.0153,  0.0004, -0.0072],
                      [ 0.0152, -0.0099,  0.0004,  ..., -0.0154,  0.0046,  0.0060],
                      [ 0.0116, -0.0078, -0.0028,  ...,  0.0146, -0.0089,  0.0102]],
                     dtype=torch.bfloat16)),
             ('transformer.layers.18.attention.query_key_value.lora_B',
              tensor([[-2.6584e-05, -1.3828e-04,  3.5286e-04,  ..., -1.7166e-04,
                       -6.5804e-05, -3.3951e-04],
                      [-3.9864e-04, -6.9046e-04,  1.0834e-03,  ...,  2.1577e-05,
                        6.7139e-04, -1.6975e-04],
                      [ 1.3580e-03,  1.1139e-03, -8.0109e-04,  ..., -9.5367e-04,
                       -1.3733e-03,  9.6893e-04],
                      ...,
                      [ 3.7575e-04, -7.4768e-04,  5.4550e-04,  ...,  7.8964e-04,
                        2.0218e-04,  2.1935e-04],
                      [ 3.9101e-05,  4.8065e-04,  2.4414e-04,  ..., -5.5313e-04,
                       -5.4169e-04,  1.1396e-04],
                      [-8.7738e-04,  4.2152e-04,  3.2997e-04,  ..., -4.6158e-04,
                       -9.6321e-05, -1.5488e-03]], dtype=torch.bfloat16)),
             ('transformer.layers.19.attention.query_key_value.lora_A',
              tensor([[ 0.0101,  0.0084,  0.0111,  ...,  0.0066,  0.0100,  0.0089],
                      [-0.0007,  0.0071,  0.0153,  ...,  0.0063,  0.0135, -0.0047],
                      [ 0.0048, -0.0098,  0.0060,  ...,  0.0004, -0.0050,  0.0111],
                      ...,
                      [ 0.0152, -0.0114, -0.0085,  ...,  0.0137,  0.0144, -0.0024],
                      [-0.0070, -0.0111,  0.0111,  ..., -0.0078,  0.0150,  0.0059],
                      [ 0.0106, -0.0131, -0.0105,  ..., -0.0052,  0.0009,  0.0140]],
                     dtype=torch.bfloat16)),
             ('transformer.layers.19.attention.query_key_value.lora_B',
              tensor([[-9.2316e-04, -8.5831e-04, -6.2561e-04,  ...,  3.4332e-04,
                        6.3324e-04, -3.3379e-04],
                      [ 1.0071e-03,  7.0953e-04, -1.1587e-04,  ..., -9.0408e-04,
                       -7.0953e-04,  9.9182e-04],
                      [ 1.6880e-04,  5.4240e-06,  8.9645e-04,  ..., -4.8876e-05,
                       -3.6955e-05, -2.8491e-05],
                      ...,
                      [-3.2997e-04, -3.1853e-04,  5.0354e-04,  ...,  2.3079e-04,
                        1.3733e-04,  8.9645e-05],
                      [-2.2411e-04, -7.5340e-05,  1.2493e-04,  ...,  2.8610e-05,
                        3.2425e-04,  6.5231e-04],
                      [ 1.0431e-05,  2.2030e-04, -2.0885e-04,  ..., -6.2561e-04,
                       -3.6430e-04,  1.0681e-03]], dtype=torch.bfloat16)),
             ('transformer.layers.20.attention.query_key_value.lora_A',
              tensor([[-1.2756e-02,  9.3994e-03,  1.4648e-02,  ..., -6.0272e-04,
                        1.5503e-02, -1.4832e-02],
                      [-1.2329e-02, -3.3112e-03, -4.6253e-05,  ...,  1.2878e-02,
                       -1.4038e-02,  1.3916e-02],
                      [-1.6479e-03, -5.6763e-03, -1.0559e-02,  ..., -6.0730e-03,
                       -1.4709e-02,  8.7891e-03],
                      ...,
                      [ 6.0425e-03,  1.3245e-02, -3.5858e-03,  ..., -8.5068e-04,
                        1.3977e-02,  6.0730e-03],
                      [ 1.2024e-02, -6.1035e-03,  1.1475e-02,  ..., -6.4392e-03,
                        7.9346e-03,  1.1292e-02],
                      [-4.5166e-03,  1.2024e-02, -2.4414e-03,  ...,  1.4954e-02,
                       -1.4343e-02, -3.3760e-04]], dtype=torch.bfloat16)),
             ('transformer.layers.20.attention.query_key_value.lora_B',
              tensor([[-7.2861e-04, -3.8147e-04,  3.8910e-04,  ..., -3.5667e-04,
                        4.7112e-04, -3.6049e-04],
                      [ 1.0452e-03,  3.3760e-04,  1.6975e-04,  ...,  6.0797e-05,
                       -4.2725e-04, -1.5831e-04],
                      [-7.2861e-04, -6.8283e-04,  4.1771e-04,  ..., -5.0735e-04,
                        6.9046e-04, -3.5858e-04],
                      ...,
                      [-2.5368e-04,  7.7057e-04, -7.2861e-04,  ...,  5.0735e-04,
                       -2.9755e-04,  1.0834e-03],
                      [-1.6327e-03, -1.4954e-03,  1.5106e-03,  ..., -7.3624e-04,
                        1.3657e-03, -6.8665e-04],
                      [-5.0068e-06, -5.3406e-04,  1.2665e-03,  ...,  2.2030e-04,
                       -6.8283e-04, -4.9210e-04]], dtype=torch.bfloat16)),
             ('transformer.layers.21.attention.query_key_value.lora_A',
              tensor([[-0.0049,  0.0082, -0.0139,  ...,  0.0145,  0.0002,  0.0103],
                      [ 0.0045,  0.0132, -0.0051,  ...,  0.0128,  0.0073, -0.0027],
                      [-0.0017,  0.0132,  0.0134,  ...,  0.0017,  0.0002,  0.0154],
                      ...,
                      [-0.0078, -0.0112, -0.0089,  ...,  0.0035,  0.0156,  0.0077],
                      [ 0.0084,  0.0002,  0.0117,  ...,  0.0140, -0.0007,  0.0143],
                      [ 0.0126, -0.0125,  0.0125,  ..., -0.0087, -0.0129, -0.0030]],
                     dtype=torch.bfloat16)),
             ('transformer.layers.21.attention.query_key_value.lora_B',
              tensor([[-1.2302e-04, -4.8256e-04, -1.1730e-04,  ...,  5.3024e-04,
                       -3.0708e-04, -2.6512e-04],
                      [ 2.8801e-04,  2.3174e-04,  9.7275e-05,  ...,  1.2302e-04,
                       -3.0100e-06,  2.1839e-04],
                      [ 5.0306e-05, -6.1417e-04,  2.4319e-04,  ..., -3.8147e-04,
                       -3.0708e-04, -1.4019e-04],
                      ...,
                      [-1.3199e-03,  1.7548e-04, -1.2817e-03,  ...,  1.2970e-03,
                       -2.0504e-04, -1.1349e-04],
                      [ 6.8665e-05,  4.1008e-04, -1.8787e-04,  ..., -4.6015e-05,
                       -3.8719e-04,  4.5013e-04],
                      [ 1.3351e-03,  4.6158e-04,  1.5259e-03,  ..., -1.0910e-03,
                       -9.0790e-04,  1.0376e-03]], dtype=torch.bfloat16)),
             ('transformer.layers.22.attention.query_key_value.lora_A',
              tensor([[ 0.0115, -0.0066, -0.0078,  ..., -0.0079, -0.0078, -0.0034],
                      [ 0.0129, -0.0080,  0.0116,  ...,  0.0079, -0.0079,  0.0101],
                      [-0.0078,  0.0070, -0.0103,  ...,  0.0024, -0.0109,  0.0087],
                      ...,
                      [-0.0131, -0.0045, -0.0035,  ...,  0.0078,  0.0053, -0.0082],
                      [ 0.0150, -0.0156, -0.0156,  ...,  0.0077, -0.0129,  0.0120],
                      [ 0.0033, -0.0027,  0.0026,  ..., -0.0049,  0.0008, -0.0006]],
                     dtype=torch.bfloat16)),
             ('transformer.layers.22.attention.query_key_value.lora_B',
              tensor([[ 3.6049e-04,  3.0899e-04, -3.0708e-04,  ..., -1.5855e-05,
                       -1.5163e-04, -2.3651e-04],
                      [ 1.6594e-04, -1.3161e-04,  2.8610e-04,  ..., -4.7684e-04,
                        9.9659e-05, -2.9182e-04],
                      [ 1.3733e-03,  1.2589e-03,  1.4496e-03,  ...,  9.6893e-04,
                       -1.5640e-03,  1.3504e-03],
                      ...,
                      [ 2.0905e-03,  2.0752e-03,  1.5335e-03,  ...,  2.1362e-03,
                       -1.9684e-03,  2.1973e-03],
                      [ 8.6594e-04,  9.3460e-04, -1.3504e-03,  ...,  7.9346e-04,
                       -8.2016e-04,  8.1253e-04],
                      [ 1.1597e-03,  8.8120e-04,  3.8910e-04,  ...,  1.2894e-03,
                       -1.1749e-03,  1.0223e-03]], dtype=torch.bfloat16)),
             ('transformer.layers.23.attention.query_key_value.lora_A',
              tensor([[ 0.0047,  0.0006,  0.0095,  ..., -0.0130, -0.0152,  0.0071],
                      [-0.0021, -0.0138,  0.0151,  ..., -0.0014,  0.0108, -0.0084],
                      [-0.0016,  0.0079, -0.0031,  ...,  0.0044,  0.0014,  0.0142],
                      ...,
                      [-0.0092,  0.0031,  0.0140,  ...,  0.0060,  0.0098, -0.0103],
                      [ 0.0094,  0.0027, -0.0132,  ...,  0.0057, -0.0054,  0.0129],
                      [ 0.0075,  0.0087, -0.0112,  ...,  0.0003, -0.0083,  0.0118]],
                     dtype=torch.bfloat16)),
             ('transformer.layers.23.attention.query_key_value.lora_B',
              tensor([[-6.5327e-05,  3.4142e-04, -5.2643e-04,  ..., -1.7357e-04,
                       -1.7643e-04, -3.1662e-04],
                      [ 1.8311e-03, -1.5945e-03,  1.8311e-03,  ...,  1.6556e-03,
                        1.1749e-03,  1.5869e-03],
                      [-7.9346e-04,  6.1035e-05, -8.3542e-04,  ..., -3.3951e-04,
                        3.1090e-04, -8.2016e-04],
                      ...,
                      [ 7.6294e-04, -9.4604e-04, -6.0499e-06,  ...,  1.1444e-03,
                        3.3569e-04, -1.6556e-03],
                      [ 9.2697e-04,  2.9945e-04,  1.0376e-03,  ..., -7.7438e-04,
                        1.1368e-03,  2.1210e-03],
                      [ 1.5974e-05, -8.8215e-06, -3.8719e-04,  ..., -2.9564e-04,
                       -5.2643e-04, -5.5075e-05]], dtype=torch.bfloat16)),
             ('transformer.layers.24.attention.query_key_value.lora_A',
              tensor([[-0.0080, -0.0103, -0.0128,  ..., -0.0078, -0.0095, -0.0156],
                      [-0.0120,  0.0014, -0.0010,  ...,  0.0032,  0.0050,  0.0021],
                      [ 0.0026,  0.0099,  0.0080,  ...,  0.0040, -0.0139,  0.0037],
                      ...,
                      [-0.0015,  0.0131,  0.0090,  ..., -0.0044, -0.0121,  0.0035],
                      [ 0.0034, -0.0026,  0.0058,  ..., -0.0109, -0.0052, -0.0084],
                      [-0.0079,  0.0082, -0.0089,  ...,  0.0123,  0.0116, -0.0042]],
                     dtype=torch.bfloat16)),
             ('transformer.layers.24.attention.query_key_value.lora_B',
              tensor([[-7.1049e-05, -6.2561e-04,  7.0572e-05,  ..., -1.9836e-04,
                        4.8256e-04, -7.3624e-04],
                      [ 7.5817e-05, -4.1389e-04, -4.3869e-04,  ..., -6.7902e-04,
                        5.4550e-04, -2.3746e-04],
                      [ 3.7193e-05,  4.7874e-04,  1.0300e-03,  ...,  5.9509e-04,
                       -6.2943e-04, -1.1826e-03],
                      ...,
                      [-2.6703e-03, -1.6327e-03, -1.4801e-03,  ..., -1.7319e-03,
                        1.5793e-03,  2.1057e-03],
                      [-8.7357e-04,  1.4114e-03,  1.4801e-03,  ...,  1.2665e-03,
                       -1.8692e-03, -1.9073e-03],
                      [-2.3346e-03, -1.9684e-03, -1.8692e-03,  ..., -1.8768e-03,
                        1.8311e-03,  2.0447e-03]], dtype=torch.bfloat16)),
             ('transformer.layers.25.attention.query_key_value.lora_A',
              tensor([[-0.0048, -0.0005,  0.0001,  ..., -0.0117,  0.0122, -0.0090],
                      [ 0.0112,  0.0035,  0.0148,  ...,  0.0095,  0.0110,  0.0037],
                      [-0.0085,  0.0007,  0.0074,  ..., -0.0076,  0.0078,  0.0050],
                      ...,
                      [-0.0038,  0.0011,  0.0150,  ..., -0.0026,  0.0016, -0.0156],
                      [ 0.0013,  0.0148, -0.0107,  ..., -0.0078,  0.0140,  0.0007],
                      [ 0.0027,  0.0121, -0.0115,  ..., -0.0095, -0.0092, -0.0043]],
                     dtype=torch.bfloat16)),
             ('transformer.layers.25.attention.query_key_value.lora_B',
              tensor([[ 5.8365e-04,  9.7656e-04, -2.5749e-04,  ...,  4.3488e-04,
                       -8.8215e-05,  2.5177e-04],
                      [ 1.8539e-03,  1.3351e-03, -1.6708e-03,  ...,  1.7776e-03,
                        1.8311e-03,  2.0142e-03],
                      [ 2.4109e-03,  2.3956e-03, -2.3956e-03,  ...,  2.5787e-03,
                        2.4109e-03,  2.6550e-03],
                      ...,
                      [ 6.9046e-04, -3.2997e-04,  7.5531e-04,  ...,  7.6675e-04,
                        6.9809e-04,  6.7139e-04],
                      [ 1.3046e-03,  6.1035e-04,  1.3351e-03,  ...,  1.3504e-03,
                        1.3275e-03,  9.8419e-04],
                      [-8.0109e-04, -2.1362e-03,  1.8921e-03,  ..., -7.5817e-05,
                       -4.1199e-04, -2.0294e-03]], dtype=torch.bfloat16)),
             ('transformer.layers.26.attention.query_key_value.lora_A',
              tensor([[-0.0004,  0.0095,  0.0145,  ...,  0.0070, -0.0149,  0.0131],
                      [ 0.0023,  0.0054, -0.0156,  ..., -0.0018,  0.0002, -0.0030],
                      [ 0.0116, -0.0014,  0.0027,  ...,  0.0107,  0.0114,  0.0145],
                      ...,
                      [ 0.0023, -0.0156,  0.0094,  ...,  0.0147, -0.0014,  0.0056],
                      [ 0.0087, -0.0027,  0.0156,  ...,  0.0094,  0.0022, -0.0072],
                      [-0.0023,  0.0080,  0.0020,  ..., -0.0019, -0.0052, -0.0106]],
                     dtype=torch.bfloat16)),
             ('transformer.layers.26.attention.query_key_value.lora_B',
              tensor([[-0.0008, -0.0021,  0.0011,  ...,  0.0014,  0.0018,  0.0011],
                      [-0.0012, -0.0015,  0.0015,  ...,  0.0012,  0.0009,  0.0014],
                      [ 0.0009,  0.0014, -0.0011,  ..., -0.0010, -0.0016, -0.0011],
                      ...,
                      [ 0.0010,  0.0026,  0.0014,  ..., -0.0029, -0.0028,  0.0006],
                      [-0.0003,  0.0013,  0.0002,  ..., -0.0009, -0.0010,  0.0002],
                      [-0.0025, -0.0016,  0.0024,  ...,  0.0007, -0.0004,  0.0028]],
                     dtype=torch.bfloat16)),
             ('transformer.layers.27.attention.query_key_value.lora_A',
              tensor([[ 0.0094, -0.0043, -0.0104,  ..., -0.0098,  0.0050,  0.0079],
                      [ 0.0079, -0.0032, -0.0089,  ...,  0.0106,  0.0081,  0.0093],
                      [ 0.0074, -0.0129,  0.0156,  ..., -0.0090, -0.0151,  0.0056],
                      ...,
                      [-0.0103, -0.0149,  0.0156,  ..., -0.0110,  0.0004,  0.0029],
                      [ 0.0045,  0.0028,  0.0110,  ...,  0.0033,  0.0052, -0.0103],
                      [-0.0146, -0.0019, -0.0031,  ..., -0.0052, -0.0129,  0.0065]],
                     dtype=torch.bfloat16)),
             ('transformer.layers.27.attention.query_key_value.lora_B',
              tensor([[-5.1117e-04,  1.0223e-03, -1.5926e-04,  ...,  8.4686e-04,
                       -1.4496e-03,  7.0190e-04],
                      [ 4.9353e-05,  3.8338e-04,  2.1744e-04,  ...,  1.3123e-03,
                       -1.2894e-03,  1.2398e-04],
                      [-7.9727e-04,  9.0027e-04, -1.8082e-03,  ..., -1.8082e-03,
                        3.7956e-04, -1.3962e-03],
                      ...,
                      [ 3.4943e-03, -4.9133e-03,  5.3406e-03,  ...,  4.4861e-03,
                        2.1057e-03,  5.4321e-03],
                      [ 1.2054e-03, -1.1139e-03,  1.6327e-03,  ...,  1.9836e-03,
                       -9.5367e-04,  3.2501e-03],
                      [ 1.3885e-03, -7.4387e-04,  1.4572e-03,  ...,  2.0142e-03,
                       -9.4986e-04,  3.2806e-03]], dtype=torch.bfloat16))])

ChatGlm6b模型


SwiftModel(
  (model): ChatGLMForConditionalGeneration(
    (transformer): ChatGLMModel(
      (word_embeddings): Embedding(130528, 4096)
      (layers): ModuleList(
        (0-27): 28 x GLMBlock(
          (input_layernorm): LayerNorm((4096,), eps=1e-05, elementwise_affine=True)
          (attention): SelfAttention(
            (rotary_emb): RotaryEmbedding()
            (query_key_value): Linear(
              in_features=4096, out_features=12288, bias=True
              (loramodule_default): Linear(
                in_features=4096, out_features=12288, bias=True
                (lora_dropout): Dropout(p=0.05, inplace=False)
              )
            )
            (dense): Linear(in_features=4096, out_features=4096, bias=True)
          )
          (post_attention_layernorm): LayerNorm((4096,), eps=1e-05, elementwise_affine=True)
          (mlp): GLU(
            (dense_h_to_4h): Linear(in_features=4096, out_features=16384, bias=True)
            (dense_4h_to_h): Linear(in_features=16384, out_features=4096, bias=True)
          )
        )
      )
      (final_layernorm): LayerNorm((4096,), eps=1e-05, elementwise_affine=True)
    )
    (lm_head): Linear(in_features=4096, out_features=130528, bias=False)
  )
)

「点点赞赏,手留余香」

    还没有人赞赏,快来当第一个赞赏的人吧!
=====这是一个广告位,招租中,联系qq 78315851====
2 条回复 A 作者 M 管理员
  1. 根据错误描述来看可能是 state_dict 里有的 layer 名称跟 Swift model 不一样导致的问题,请核实你使用的 model 是否与 FlexTrain 对应版本完全相同,并且是否加载了正确的 state_dict 文件。

    例如:检查 state_dict 中的层名是否为 transformer.layers.<层数>.attentions.<子模块>.query_key_value.lora_A 或 transformer.layers.<层数>.attentions.query_key_value.lora_B 等。

  2. “load_state_dict unexpected_keys”这个错误是因为您正在尝试加载的状态字典中包含了一些不在当前模型中的层,这些层可能是在预训练过程中定义的。
    为了解决这个问题,您可以在载入状态字典之前,使用 torch.nn.Module.load_state_dict() 函数的 strict=False 参数,这将忽略掉与当前模型中不存在的层相对应的关键字。
    例如:

    model.load_state_dict(state_dict, strict=False)
  3. 出现 “load_state_dict unexpected_keys” 错误通常是因为在加载状态字典时,模型的层或模块名称发生了变化,导致加载时出现了不匹配的情况。
    在这种情况下,你可以尝试使用 load_state_dict 函数的 strict 参数来加载状态字典。当 strict 参数设置为 True 时,如果模型的层或模块名称与状态字典中的名称不匹配,load_state_dict 函数会自动忽略这些不匹配的项,只加载匹配的项。这样,模型将使用已存在的层和模块来恢复状态,而不会尝试创建新的层或模块。
    示例代码如下:

    model.load_state_dict(state_dict, strict=True)
    CopyCopy

    你提到的 “ZhipuAI/ChatGLM-6B” 模型可能是基于清华大学 KEG 实验室和智谱AI共同训练的 GLM-130B 模型进行微调的。由于模型结构的差异,可能需要对状态字典进行一些调整,以便正确地将其加载到微调后的模型中。