机器学习PAI 我确保没有空值后,报的错误
FAILED: Failed 20231115073834570gt8kpa11w_db136f28_fa93_4d1d_9ad5_d5d6dd4f780e:ODPS-1202005:Algo Job Failed-User Error-Tensorflow script runs failed with exit code: 123, please see the details in logview.
The tail contents of the stderr file:
teratorGetNext[output_shapes=[[?,1], , , , , [?], [?], [?], [?], [?,1], , , [?]], output_types=DT_FLOAT, DT_VARIANT, DT_VARIANT, DT_VARIANT, DT_VARIANT, DT_STRING, DT_STRING, DT_STRING, DT_STRING, DT_FLOAT, DT_VARIANT, DT_VARIANT, DT_INT32], _device=”/job:worker/replica:0/task:0/device:CPU:0″]]
[[{{node pc_log_times_diff_ss_raw_proj_id_weighted_by_pc_log_times_diff_ss_raw_proj_val_embedding/pc_log_times_diff_ss_raw_proj_id_weighted_by_pc_log_times_diff_ss_raw_proj_val_embedding_weights/embedding_lookup_sparse/Unique_S563}} = _Recvclient_terminated=false, recv_device=”/job:ps/replica:0/task:0/device:CPU:0″, send_device=”/job:worker/replica:0/task:0/device:CPU:0″, send_device_incarnation=-41487250130682641, tensor_name=”edge_857_p…rse/Unique”, tensor_type=DT_INT64, _device=”/job:ps/replica:0/task:0/device:CPU:0″]]
日志:
http://logview.alibaba-inc.com/logview/?h=http://service.odps.aliyun-inc.com/api&p=b_risk_dev&i=20231115073834570gt8kpa11w_db136f28_fa93_4d1d_9ad5_d5d6dd4f780e&token=NWwrUnVPcjNRTUhaK0FCQlpaakpXVDFUMks0PSxPRFBTX09CTzoxNDk2MzI3NTcyMDcyNzY0LDE3MDI2MjU5MTUseyJTdGF0ZW1lbnQiOlt7IkFjdGlvbiI6WyJvZHBzOlJlYWQiXSwiRWZmZWN0IjoiQWxsb3ciLCJSZXNvdXJjZSI6WyJhY3M6b2RwczoqOnByb2plY3RzL2Jfcmlza19kZXYvaW5zdGFuY2VzLzIwMjMxMTE1MDczODM0NTcwZ3Q4a3BhMTF3X2RiMTM2ZjI4X2ZhOTNfNGQxZF85YWQ1X2Q1ZDZkZDRmNzgwZSJdfV0sIlZlcnNpb24iOiIxIn0=
https://easyrec.readthedocs.io/en/latest/feature/feature.html#sequencefeature同一个 group 内的序列特征需等长:
,此回答整理自钉群“【EasyRec】推荐算法交流群”
在 PAI 上运行 TensorFlow 脚本失败,并收到错误信息“ODPS-1202005”,这可能是由于存在未定义的张量或变量引起的。在这种情况下,您需要检查脚本中的张量和变量定义是否正确,并确认它们的维度是否匹配。
首先,您可以查看报错信息中的节点名称和位置来找出潜在问题,例如
其次,您可以检查张量和变量的初始化方式,确认它们是否正确地设置了正确的维度,并且与其他张量相匹配。
此外,您还可以检查输入和输出的数据格式是否正确,以及是否有正确地设置 batch_size 和 epochs 等参数。