tongchenkeji

发表于：2023-7-5 9:38:110次点击

请问云效中怎么使用java做过将pdf（内容含有文字，图片）文档的内容插入到word文档中？word[阿里云云效]

请问云效中怎么使用java做过将pdf（内容含有文字，图片）文档的内容插入到word文档中？word文档的一个表格。

「点点赞赏，手留余香」

还没有人赞赏，快来当第一个赞赏的人吧！

=====这是一个广告位，招租中，联系qq 78315851====

6 条回复 A 作者 M 管理员

wljslmzAM 2023-11-28 8:07:42 1

您可以使用Apache POI库来实现Java代码中的PDF转Word功能。首先，您需要将PDF文件读入到内存中，然后使用Apache PDFBox提供的功能将其转换为文本格式。接着，使用POI库读入Word文件，然后将转换后的文本写入Word文件即可。

已喜欢已反对

在云效中使用Java将PDF文档的内容插入到Word文档中，可以使用Apache PDFBox和Apache POI这两个开源库进行操作。

首先，需要导入以下依赖：


    org.apache.pdfbox    pdfbox    2.0.26    org.apache.poi    poi    5.0.0    org.apache.poi    poi-ooxml    5.0.0

然后，可以使用以下代码将PDF文档的内容插入到Word文档中的表格：


import java.io.FileInputStream;import java.io.FileOutputStream;import java.io.IOException;import org.apache.pdfbox.pdmodel.PDDocument;import org.apache.pdfbox.text.PDFTextStripper;import org.apache.poi.xwpf.usermodel.XWPFDocument;import org.apache.poi.xwpf.usermodel.XWPFTable;import org.apache.poi.xwpf.usermodel.XWPFTableRow;import org.apache.poi.xwpf.usermodel.XWPFTableCell;public class PDFToWord {    public static void main(String[] args) {        try {            // 加载PDF文档            PDDocument document = PDDocument.load(new FileInputStream("input.pdf"));            // 创建一个空的Word文档            XWPFDocument wordDocument = new XWPFDocument();            // 获取PDF文档的内容            PDFTextStripper stripper = new PDFTextStripper();            String pdfContent = stripper.getText(document);            // 创建一个表格            XWPFTable table = wordDocument.createTable();            // 将PDF文档的内容按行插入到表格中            String[] lines = pdfContent.split("
?
");            for (String line : lines) {                XWPFTableRow row = table.createRow();                XWPFTableCell cell = row.getCell(0);                cell.setText(line);            }            // 保存Word文档            FileOutputStream out = new FileOutputStream("output.docx");            wordDocument.write(out);            out.close();            // 关闭文档            document.close();            wordDocument.close();            System.out.println("转换完成！");        } catch (IOException e) {            e.printStackTrace();        }    }}

上述代码中，需要将input.pdf替换为实际的PDF文件路径，转换后的Word文档将保存为output.docx。

Java SDK 使用说明：https://help.aliyun.com/document_detail/66496.html

使用 Maven 引入 SDK：

com.aliyun
aliyun-java-sdk-core
4.5.0

com.aliyun
aliyun-java-sdk-codeup
0.0.8

调用 Codeup API
以 CreateRepository 为例：

创建AK、SK见 https://usercenter.console.aliyun.com/manage/ak

创建个人访问令牌见个人访问令牌

可以使用 Codeup Java SDK 或者阿里云 OpenAPI SDK 调用接口，区别在于使用阿里云OpenAPI SDK 需要手动设置接口信息。

使用 Codeup Java SDK

package com.alibaba.openapitest.demo;

import com.alibaba.fastjson.JSON;
import com.aliyuncs.DefaultAcsClient;
import com.aliyuncs.IAcsClient;
import com.aliyuncs.codeup.model.v20200414.CreateRepositoryRequest;
import com.aliyuncs.codeup.model.v20200414.CreateRepositoryResponse;
import com.aliyuncs.exceptions.ClientException;
import com.aliyuncs.exceptions.ServerException;
import com.aliyuncs.http.FormatType;
import com.aliyuncs.profile.DefaultProfile;

public class CreateRepository {


private String accessKeyId = "";private String accessSecret = "";/** * 个人访问令牌；使用AK&SK或STS 临时授权方式不传该字段 */private String personalAccessToken = "";private String regionId = "cn-hangzhou";private String endPoint = "codeup.cn-hangzhou.aliyuncs.com";/** * 企业 ID */private String organizationId = "";public void createRepository() {    DefaultProfile profile = DefaultProfile.getProfile(regionId, accessKeyId, accessSecret);    IAcsClient client = new DefaultAcsClient(profile);    CreateRepositoryRequest request = new CreateRepositoryRequest();    request.setEndpoint(endPoint);    request.setOrganizationId(organizationId);    request.setAccessToken(personalAccessToken);    // 请求体参数，参考API文档    String body = "{"name": "repoName", "path": "repoPath", "visibility_level": 10, "namespace_id": 123}";    request.setHttpContent(JSON.toJSONString(body).getBytes(), "Utf-8", FormatType.JSON);    try {        CreateRepositoryResponse response = client.getAcsResponse(request);        logInfo(String.valueOf(response.getResult().getId()));    } catch (ServerException e) {        logInfo(String.format("Fail. Something with your connection with Aliyun go incorrect. ErrorCode: %s",                e.getErrCode()));    } catch (ClientException e) {        logInfo(String.format("Fail. Business error. ErrorCode: %s, RequestId: %s",                e.getErrCode(), e.getRequestId()));    }}private static void logInfo(String message) {    System.out.println(message);}public static void main(String[] args) {    new CreateRepository().createRepository();}

}
使用阿里云 OpenAPI SDK

import com.aliyuncs.CommonRequest;
import com.aliyuncs.CommonResponse;
import com.aliyuncs.DefaultAcsClient;
import com.aliyuncs.IAcsClient;
import com.aliyuncs.exceptions.ClientException;
import com.aliyuncs.exceptions.ServerException;
import com.aliyuncs.http.FormatType;
import com.aliyuncs.http.MethodType;
import com.aliyuncs.profile.DefaultProfile;
/*
pom.xml

com.aliyun
aliyun-java-sdk-core
4.0.3

*/
public class CreateRepository {
public static void main(String[] args) {
DefaultProfile profile = DefaultProfile.getProfile(“cn-hangzhou”, “”, “”);
IAcsClient client = new DefaultAcsClient(profile);


    CommonRequest request = new CommonRequest();    request.setProtocol(ProtocolType.HTTPS);    request.setMethod(MethodType.POST);    request.setDomain("codeup.cn-hangzhou.aliyuncs.com");    request.setVersion("2020-04-14");    request.setUriPattern("/api/v3/projects");    request.putQueryParameter("RegionId", "cn-hangzhou");    request.putQueryParameter("OrganizationId", "");    request.putHeadParameter("Content-Type", "application/json");    String requestBody = "" +            "{" +            "    "name": ""," +            "    "path": ""," +            "    "visibility_level": 10," +            "    "namespace_id": 123" +            "}";    request.setHttpContent(requestBody.getBytes(), "utf-8", FormatType.JSON);    try {        CommonResponse response = client.getCommonResponse(request);        System.out.println(response.getData());    } catch (ServerException e) {        e.printStackTrace();    } catch (ClientException e) {        e.printStackTrace();    }}

}

在云效中使用 Java 将 PDF 文档的内容插入到 Word 文档中，可以通过使用一些开源的 Java 库来实现。以下是一个简单的示例代码，演示了如何使用 Apache PDFBox 和 Apache POI 库来实现此功能：

import java.io.File;import java.io.FileInputStream;import java.io.FileOutputStream;import java.io.IOException;import org.apache.pdfbox.pdmodel.PDDocument;import org.apache.pdfbox.text.PDFTextStripper;import org.apache.poi.xwpf.usermodel.XWPFDocument;import org.apache.poi.xwpf.usermodel.XWPFTable;import org.apache.poi.xwpf.usermodel.XWPFTableRow;import org.apache.poi.xwpf.usermodel.XWPFTableCell;public class PdfToWordConverter {    public static void main(String[] args) {        try {            // 读取 PDF 文档            PDDocument pdfDoc = PDDocument.load(new FileInputStream("input.pdf"));            // 提取 PDF 文档的文本内容            PDFTextStripper stripper = new PDFTextStripper();            String pdfContent = stripper.getText(pdfDoc);            // 创建 Word 文档            XWPFDocument wordDoc = new XWPFDocument();            XWPFTable table = wordDoc.createTable();            // 将 PDF 内容添加到 Word 表格中            String[] lines = pdfContent.split("
?
");            for (String line : lines) {                XWPFTableRow row = table.createRow();                XWPFTableCell cell = row.getCell(0);                cell.setText(line);            }            // 保存 Word 文档            FileOutputStream out = new FileOutputStream(new File("output.docx"));            wordDoc.write(out);            out.close();            // 关闭文档            pdfDoc.close();            wordDoc.close();            System.out.println("PDF 文档转换为 Word 文档成功！");        } catch (IOException e) {            e.printStackTrace();        }    }}

你可以使用Apache POI和iText库来实现将PDF文档内容插入到Word文档中。具体步骤如下：

使用iText库读取PDF文档，获取文档中的文字和图片等内容；
使用Apache POI库创建一个Word文档，并在其中创建一个表格；
将获取的PDF文档内容插入到表格中。

以下是一个简单的Java代码示例：

import java.io.*;import java.util.List;import org.apache.poi.xwpf.usermodel.*;import com.itextpdf.text.pdf.*;import com.itextpdf.text.*;public class PdfToWord {    public static void main(String[] args) throws IOException, DocumentException {        // 读取PDF文档        PdfReader reader = new PdfReader("input.pdf");        StringWriter output = new StringWriter();        List<TextRenderInfo> textRenderInfos = PdfTextExtractor.getTextFromPage(reader, 1).get(0).getCharacterRenderInfos();        for (TextRenderInfo textRenderInfo : textRenderInfos) {            output.write(textRenderInfo.getText());        }        // 获取PDF文档中的图片        PdfDictionary pageDict = reader.getPageN(1);        PdfDictionary resourcesDict = pageDict.getAsDict(PdfName.RESOURCES);        PdfDictionary xObjectDict = resourcesDict.getAsDict(PdfName.XOBJECT);        for (PdfName name : xObjectDict.getKeys()) {            PdfObject object = xObjectDict.get(name);            if (object.isIndirect()) {                PdfDictionary xObject = (PdfDictionary) PdfReader.getPdfObject(object);                PdfName subType = (PdfName) xObject.get(PdfName.SUBTYPE);                if (subType != null && subType.equals(PdfName.IMAGE)) {                    int width = xObject.getAsNumber(PdfName.WIDTH).intValue();                    int height = xObject.getAsNumber(PdfName.HEIGHT).intValue();                    Image image = Image.getInstance(object);                    // 在Word文档中插入图片                    XWPFDocument doc = new XWPFDocument();                    XWPFTable table = doc.createTable();                    XWPFTableRow row = table.getRow(0);                    XWPFTableCell cell = row.getCell(0);                    cell.addParagraph().createRun().addPicture(image.getData(), XWPFDocument.PICTURE_TYPE_PNG, "image.png", Units.toEMU(width), Units.toEMU(height));                    // 保存Word文档                    FileOutputStream out = new FileOutputStream("output.docx");                    doc.write(out);                    out.close();                }            }        }        reader.close();    }}

需要注意的是，以上代码只是一个简单的示例，实际使用时可能需要根据实际情况进行修改和优化。

算精通AM 2023-11-28 8:07:42 6
使用 Apache POI 和 iText 库来实现。这些库提供了丰富的 API，可以让您轻松地操作 Word 和 PDF 文档。

以下是实现的大致步骤：

使用 iText 库读取 PDF 文档，提取其中的文本和图像等内容。

使用 Apache POI 库创建 Word 文档，并在其中插入表格。

将 PDF 文档中提取出的文本和图像等内容插入到 Word 文档中的表格中。

下面是一个示例代码，演示如何将 PDF 文档中的内容插入到 Word 文档中的表格中：

stylus
Copy
import java.io.FileOutputStream;
import java.io.IOException;
import java.io.InputStream;
import java.net.URL;
import java.util.List;
import org.apache.poi.ss.usermodel.Cell;
import org.apache.poi.ss.usermodel.Row;
import org.apache.poi.xssf.usermodel.XSSFSheet;
import org.apache.poi.xssf.usermodel.XSSFWorkbook;
import com.itextpdf.text.Document;
import com.itextpdf.text.pdf.PdfReader;
import com.itextpdf.text.pdf.parser.PdfTextExtractor;
import com.itextpdf.text.pdf.parser.SimpleTextExtractionStrategy;
import com.itextpdf.text.pdf.parser.TextExtractionStrategy;

public class PdfToWord {
public static void main(String[] args) throws IOException {
// 从 URL 中读取 PDF 文件
URL url = new URL(“https://example.com/myfile.pdf”);
InputStream in = url.openStream();
PdfReader reader = new PdfReader(in);

// 创建 Word 文档和表格 XSSFWorkbook workbook = new XSSFWorkbook(); XSSFSheet sheet = workbook.createSheet("My Sheet"); Row row = sheet.createRow(0); Cell cell = row.createCell(0); // 从 PDF 文档中提取文本和图像等内容 StringBuilder sb = new StringBuilder(); for (int i = 1; i <= reader.getNumberOfPages(); i++) { TextExtractionStrategy strategy = new SimpleTextExtractionStrategy(); String text = PdfTextExtractor.getTextFromPage(reader, i, strategy); sb.append(text); } List images = reader.getPdfObjectRelease(); // 将文本和图像等内容插入到 Word 文档中的表格中 cell.setCellValue(sb.toString()); // 将图像插入到单元格中，具体实现方式可以根据需要自行调整 // ... // 保存 Word 文档 FileOutputStream out = new FileOutputStream("output.docx"); workbook.write(out); out.close(); workbook.close();}

}
需要注意的是，本示例代码仅提供了基本框架，具体实现方式可能需要根据您的具体需求进行调整。

https://help.aliyun.com/document_detail/107313.html?spm=a2c4g.460489.0.i7
已喜欢已反对

您可以使用Apache PDFBox和Apache POI库来实现将PDF文档的内容插入到Word文档中。以下是一个简单的示例，演示如何使用这些库将PDF文档的内容插入到Word文档中的表格中：

首先，您需要添加以下依赖项到您的项目中：

<dependency>    <groupId>org.apache.pdfboxgroupId>    <artifactId>pdfboxartifactId>    <version>2.0.24version>dependency><dependency>    <groupId>org.apache.poigroupId>    <artifactId>poiartifactId>    <version>5.0.0version>dependency><dependency>    <groupId>org.apache.poigroupId>    <artifactId>poi-ooxmlartifactId>    <version>5.0.0version>dependency>

然后，您可以使用以下代码将PDF文档的内容插入到Word文档中的表格中：

import org.apache.pdfbox.pdmodel.PDDocument;import org.apache.pdfbox.text.PDFTextStripper;import org.apache.poi.xwpf.usermodel.*;import java.io.File;import java.io.FileOutputStream;import java.io.IOException;import java.util.List;public class PdfToWordTableExample {    public static void main(String[] args) throws Exception {        // Load the PDF document and extract text using PDFTextStripper        PDDocument document = PDDocument.load(new File("input.pdf"));        PDFTextStripper pdfStripper = new PDFTextStripper();        String text = pdfStripper.getText(document);        document.close();        // Create a new Word document and add a table to it        XWPFDocument wordDocument = new XWPFDocument();        XWPFTable table = wordDocument.createTable(3, 3); // Create a table with three rows and three columns        // Split the text into lines and add them to the table row by row        List<String> lines = Arrays.asList(text.split("
?")); // Split the text into lines using line breaks as delimiter        for (int i = 0; i < lines.size(); i++) {            XWPFTableRow row = table.getRow(i); // Get the current row            XWPFTableCell cell = row.getCell(0); // Get the first cell in the row (column index starts from 0)            cell.setText(lines.get(i)); // Set the cell text to the current line of text        }        // Save the Word document to disk        FileOutputStream out = new FileOutputStream("output.docx");        wordDocument.write(out);        out.close();        wordDocument.close();    }}

一	二	三	四	五	六	日
	1	2	3	4	5	6
7	8	9	10	11	12	13
14	15	16	17	18	19	20
21	22	23	24	25	26	27
28	29	30

请问云效中怎么使用java做过将pdf（内容含有文字，图片）文档的内容插入到word文档中？word[阿里云云效] 暂停朗读为您朗读

请问云效中怎么使用java做过将pdf（内容含有文字，图片）文档的内容插入到word文档中？word[阿里云云效]