Puppeteer 生成图片生成 PDF

🕗 发布于 2024-07-08 15:57 puppeteer 前端 javascript

方案背景

22 年时，那时由于团队权限比较受限，在不开新页面，也不开新服务的情况下，同样是生成图片的需求，性能要求也比较高，当时采用的 Html2canvas 的前端生成方案，做了十几个核心业务模块，在 Web，H5，App，Electron PC App 下，速度 300ms~2500ms 内生成和下载图片

今年遇到同样需求，但由于业务内嵌入了几十张图表组成的可视化大屏 Iframe，Html2canvas 想解决 Iframe 情况有些吃力，也考虑团队权限能用服务端截屏的方案了，那就 Puppeteer 走起

生成 PDF的额外说明

用 puppeteer 生成 pdf 走不通，因此方案里采用了 sharp 手动的把 puppeteer 生成的图片裁切成了 pdf 页面大小的等比例尺寸，再用 pdfkit scale 等比把裁切出来的若干份图片塞进 pdf 里

代码

代码为 POC 方案，离生产可用还是要多做一些稳定性和可用性的支持工作，因此以下代码仅作学习参考

其中流程代码参考意义不大，核心代码里的裁切图片，自动滚动，等待指定 Iframe 加载和获取指定元素上的属性值，比较有参考意义

// 流程代码
import { ConsoleLogger } from '@nestjs/common';
import puppeteer, { Viewport } from 'puppeteer';
import { Browser, CookieParam } from 'puppeteer';
import * as helper from './helper';
const logger = new ConsoleLogger('WebViewer');

export enum TargetFileType {
  png = 'png',
  jpeg = 'jpeg',
  webp = 'webp',
  pdf = 'pdf',
}

export interface ShotParam {
  type: TargetFileType;
  url: string;
  targetFile?: string;
  cookies?: CookieParam[];
  viewport?: Viewport;
  timeout?: number;
}

export class WebViewer {
  private browser: Browser | null = null;
  private inited = false;

  static defaultInstance = new WebViewer();

  constructor(private readonly options?: WebViewerOptions) {
    this.options = options || {
      socketTimeout: 2 * 60 * 1000,
      generateTimeout: 0,
    };
  }

  async init(): Promise<void> {
    this.browser = await puppeteer.launch({
      // defaultViewport: { width: 1920, height: 1080 },
      // headless: 'shell',
      headless: false,
      pipe: true,
      args: [
        '--disable-gpu',
        '--disable-dev-shm-usage',
        '--disable-setuid-sandbox',
        '--no-first-run',
        '--no-sandbox',
        '--no-zygote',
        '--full-memory-crash-report',
        '--unlimited-storage',
      ],
    });
    this.inited = true;
  }

  async shot(param: ShotParam): Promise<Buffer> {
    if (!this.inited) {
      await this.init();
    }

    logger.log(
      `Start to shot url: ${param.url}, type: ${
        param.type
      }, viewport: { heigth:${param.viewport?.height || 0}, width:${
        param.viewport?.width || 0
      }} `,
    );
    const page = await this.browser.newPage();
    page.on('response', (response) => {
      logger.debug(response.url());
    });
    page.on('close', () => {
      logger.debug('Current page has been closed.');
    });
    if (param.cookies) {
      await page.setCookie(...param.cookies);
    }
    await page.goto(param.url, {
      timeout: param?.timeout || this.options.socketTimeout,
      waitUntil: 'networkidle0',
    });
    await helper.waitForFrame(page);
    const minWidth = 1920;
    let width = await page.$eval('.html-table', (el) => el.scrollWidth + 36);
    width = width > minWidth ? width : minWidth;
    param.viewport = { width, height: 1080 };
    await page.setViewport(param.viewport);
    await helper.getValueFromElementDataset(
      page,
      'html',
      'height',
      async (value: string) => !Number.isNaN(Number(value)),
    );
    const bodyHandle = await page.$('body');
    const { height: bodyHeight } = await bodyHandle.boundingBox();
    param.viewport = { width, height: Math.floor(bodyHeight) + 1 };
    await bodyHandle.dispose();
    await page.setViewport(param.viewport);
    await page.waitForSelector('#datart-rendered');
    await helper.sleep(300);

    let buffer = await page.screenshot();
    if (param.type === 'pdf') {
      buffer = await helper.generatePdf(buffer);
    }
    await helper.sleep(300);
    await page.close();
    return buffer;
  }

  async close(): Promise<void> {
    await this.browser?.close();
  }
}

export interface WebViewerOptions {
  socketTimeout: number;
  generateTimeout?: number;
}

// 核心代码
import { Page } from 'puppeteer';
import * as sharp from 'sharp';
import * as pdfkit from 'pdfkit';
import * as getStream from 'get-stream';

function waitForFrame(page: Page) {
  let fulfill;
  const promise = new Promise((resolve) => (fulfill = resolve));
  checkFrame();
  return promise;

  function checkFrame() {
    const frame = page.frames().find((f) => {
      console.log(f.name());
      return f.name() === 'datart';
    });
    if (frame) fulfill(frame);
    else page.once('frameattached', checkFrame);
  }
}

async function autoScroll(page: Page, selector: string) {
  return page.evaluate((selector) => {
    return new Promise((resolve) => {
      //滚动的总高度
      let totalHeight = 0;
      //每次向下滚动的高度 100 px
      const distance = 100;
      const timer = setInterval(() => {
        const dom = document.querySelector(selector);
        if (!dom) {
          return clearInterval(timer);
        }
        //页面的高度 包含滚动高度
        const scrollHeight = dom.scrollHeight;
        console.log(scrollHeight);
        //滚动条向下滚动 distance
        dom.scrollBy(0, distance);
        totalHeight += distance;
        //当滚动的总高度 大于 页面高度 说明滚到底了。也就是说到滚动条滚到底时，以上还会继续累加，直到超过页面高度
        if (totalHeight >= scrollHeight) {
          clearInterval(timer);
          resolve(true);
        }
      }, 100);
    });
  }, selector);
}

async function getValueFromElementDataset(
  page: Page,
  selector: string,
  key: string,
  checkValue: (value: string) => Promise<boolean>,
) {
  return new Promise((resolve) => {
    const interval = setInterval(async () => {
      const value = await page.$eval(
        selector,
        (el: HTMLElement, key: string) => {
          return el.dataset[key];
        },
        key,
      );
      if (checkValue && !(await checkValue(value))) {
        return;
      } else {
        clearInterval(interval);
        resolve(value);
      }
    }, 500);
  });
}

async function clipImage(
  pageWidth: number,
  pageHeight: number,
  buffer: Buffer,
): Promise<{ images: Buffer[]; scale: number }> {
  const imageOriginSharp = sharp(buffer);
  const imageSharp = sharp(buffer).resize(pageWidth);

  const imageBuffer = await imageSharp
    .withMetadata()
    .toBuffer({ resolveWithObject: true });

  const imageOriginBuffer = await imageOriginSharp
    .withMetadata()
    .toBuffer({ resolveWithObject: true });

  // const imageWidth = imageBuffer.info.width;
  const imageHeight = imageBuffer.info.height;
  const imageOriginWidth = imageOriginBuffer.info.width;
  const imageOriginHeight = imageOriginBuffer.info.height;

  const scale = imageOriginHeight / imageHeight;

  const images: Buffer[] = [];
  console.log({
    imageOriginWidth,
    imageOriginHeight,
    scale,
  });

  let startY = 0;
  while (startY < imageHeight) {
    const height = Math.min(pageHeight, imageHeight - startY);
    console.log(Math.ceil(height * scale), Math.ceil(startY * scale));
    const imageOriginSharp = sharp(buffer);
    const liteImage = await imageOriginSharp
      .extract({
        width: imageOriginWidth,
        height: Math.ceil(height * scale),
        left: 0,
        top: Math.floor(startY * scale),
      })
      .toBuffer();

    images.push(liteImage);
    startY += height;
  }

  return { images, scale };
}

const generatePdf = (imageBuffer: Buffer) => {
  return new Promise<Buffer>(async (resolve) => {
    const doc = new pdfkit();

    // 获取PDF页面的宽度和高度
    const pageWidth = doc.page.width;
    const pageHeight = doc.page.height;

    // 图片按 page 高度等比放缩然后裁切为多份，然后通过 doc addpage 以及 doc.image 把每张图放入 pdf
    const { images, scale } = await clipImage(
      pageWidth,
      pageHeight,
      imageBuffer,
    );
    let index = 1;

    for (const image of images) {
      doc.image(image, 0, 0, {
        width: pageWidth,
        scale: 1 / scale,
      });
      if (index < images.length) {
        doc.addPage();
        doc.switchToPage(index);
        index += 1;
      }
    }

    doc.end();
    const pdfBuffer = await getStream.buffer(doc);
    resolve(pdfBuffer);
  });
};

const sleep = (time: number) => new Promise((r) => setTimeout(r, time));

export {
  sleep,
  clipImage,
  autoScroll,
  generatePdf,
  waitForFrame,
  getValueFromElementDataset,
};

原文地址：https://blog.csdn.net/qq_28827635/article/details/140262865

免责声明：本站文章内容转载自网络资源，如本站内容侵犯了原著者的合法权益，可联系本站删除。更多内容请关注自学内容网（zxcms.com）！

上一篇：深入十余家老年鞋品牌，我们发现了193亿市场的最新趋势
下一篇：STM32+ESP8266连接阿里云

基于RK3588的移动充电机器人应用解决方案
伴随着国内新能源汽车保有量的持续增长，充电难题愈发凸显，配套充电设施的建设需求进一步扩大。可外接4G/5G、WiFi模块，用于传感数据、视频数据的上传，同时，通过网络的连接，能够远程监控机器人的状态和
阅读更多2024-11-16
开源，一天200star，解锁视频字幕生成新方式——一款轻量级开源字幕工具，免费，支持花字，剪映最新会员模式吃相太难看了
是一款基于开源的轻量级视频字幕生成工具，由开发者 @WEIFENG2333 精心打造。其主要目的是为用户提供一套免费、易用的字幕生成方案，支持各种视频格式，并结合现代化的自然语言处理技术，实现精准、高
阅读更多2024-11-16
docker：基于Dockerfile镜像制作完整案例
docker：基于Dockerfile镜像制作完整案例
阅读更多2024-11-16
【数据库系列】 Spring Boot 集成 Neo4j 的详细介绍
Spring Boot 提供了对 Neo4j 的良好支持，使得开发者可以更方便地使用图数据库。通过使用 Spring Data Neo4j，开发者可以轻松地进行数据访问、操作以及管理。本文将详细介绍如
阅读更多2024-11-16
微信小程序02-页面制作
微信小程序提供了丰富的组件，如view、image等，用于构建具有微信风格的UI界面。视口单位用于移动端页面适配，如vw和vh。表单组件用于收集用户信息，如form、button、input等。
阅读更多2024-11-16
2024 CCF中国开源大会“开源科学计算与系统建模openSCS”分论坛成功举办
目前MWORKS用户已遍及国内外295所高校，仅深圳就有41家应用验证企业，覆盖新能源、芯片、车辆、低空经济、医疗、制造、自动化、电力、航天、金融、船舶等优势行业单位，已经具备良好的生态基础。该委员会
阅读更多2024-11-16
C&C++内存管理
pChar3是指针变量，存在栈中，*pChar3是由const修饰的，pChar3指向的内容不能修改，内容具有常属性，因此pChae3存在代码段中，也就是常量区。内存泄漏并不是指内存在物理上的消失，而
阅读更多2024-11-16
单片机面试
答案：宏定义是在编译预处理阶段被处理的。预处理包含：头文件包含，宏替换，条件编译，去除注释，添加行号。
阅读更多2024-11-16
Ubuntu23.10下解决C语言调用mysql.h问题
在学习C语言和MySQL的调用的时候遇到包和版本的问题，由于使用的书很老（10年的），因此很多MySQL的包已经过时，在查找很多资料和询问gpt之后得到了解决方案
阅读更多2024-11-16
高级java每日一道面试题-2024年11月07日-Redis篇-Redis有哪些功能?
Redis 是一个功能强大且灵活的 NoSQL 数据库，广泛应用于缓存、消息队列、实时分析等多种场景。在高级 Java 面试中，了解 Redis 的数据类型、持久化、发布/订阅、事务、Lua 脚本、主
阅读更多2024-11-16

Puppeteer 生成图片 生成 PDF

方案背景

生成 PDF的额外说明

代码

相关文章

Puppeteer 生成图片生成 PDF