PDF → PowerPoint — Tables on slides
PowerPoint tables are first-class shapes: rows, columns, cells, each cell its own text box. They can be styled by the deck’s theme (header row, banded rows), edited like any other shape, and restyled in one operation.
Table detection itself is geometric: find the horizontal and vertical line segments on the page, build a KD-tree of their intersections, and read each cell's content from the text bounded by them. What differs on a slide is the context. A slide imposes hard size limits a document does not, and the table is shown rather than read.
What changes when a table moves to a slide
Size
A Word table can occupy an entire page or run across several. A PowerPoint table has to fit in 13.333 × 7.5 inches (16:9) or 10 × 7.5 (4:3), and stay readable at projection distance. Large tables fail both constraints.
After conversion, an oversize PDF table fills the slide with a tiny font. Splitting it across slides is a separate problem (below). Shrinking forever produces text unreadable from any viewing distance.
How the user reads it
A document table sits in prose; the reader scrolls. A slide table is shown, from a projector, to an audience, for a few seconds. The constraints flip: large fonts, few rows, few columns. The converter has no license to redesign; it can only pass through what was in the PDF.
Styling
Word tables tend to be plain. PowerPoint tables are usually styled to match the deck: colored header, banded rows, highlighted totals. Most converters carry over fill colors and borders but stop there. The result is not promoted to a theme-aware table, so styling stays static when the user changes themes.
The conversion path
Per page:
- Run the standard line-detection / intersection / grid algorithm.
- For each table found, create a table-type shape with the matching row and column counts, fill the cells with text, and apply formatting.
- If the table is too large for one slide, decide how to split.
Strategies for oversized tables
Font scaling
Reduce the font in every cell until the table fits. Trivial to implement. Often produces text that is unreadable from a projector. A reasonable starting point only when the source was already close to fitting.
Row split
Break the table into chunks by rows:
- Slide 1: rows 1–10.
- Slide 2: rows 11–20.
- Slide 3: rows 21–30.
Each chunk duplicates the header row. The strategy works for long lists. It requires recognizing which row is the header (a row whose styling differs from the rest).
Column split
Same idea, on the other axis:
- Slide 1: columns 1–5.
- Slide 2: columns 6–10.
Each chunk duplicates the first column when it carries row labels.
Rasterize
Render the table as PNG and insert it as a picture shape. Visual fidelity is preserved; editability is gone. For very large tables, this is often the only option that produces a readable slide.
Most converters skip splitting and either scale the font or let the table overflow the slide.
Merged cells
PowerPoint supports horizontal and vertical merging. The converter detects merges by finding cells where the expected interior border line is missing; everything spanned across that gap becomes one merged cell.
Horizontal merge:
<a:tc gridSpan="2">
<a:txBody>...</a:txBody>
</a:tc>Vertical merge spans multiple rows. The starting cell omits
vMerge; continuation cells declare
vMerge="1":
<a:tc>
<a:txBody>...</a:txBody>
<a:tcPr/>
</a:tc>
<a:tc vMerge="1">
<a:txBody/>
<a:tcPr/>
</a:tc>When merges are missed, what should be one cell becomes a row of empty cells with the content scattered across them. The error compounds visually because every adjacent merge fails the same way.
Header style and theme binding
The right way to mark a header in a PPTX table is:
<a:tblPr firstRow="1" bandRow="1">
<a:tableStyleId>{5940675A-B579-460E-94D1-54222C63F5DA}</a:tableStyleId>
</a:tblPr>firstRow="1" enables header styling.
bandRow="1" enables banded rows. Header and band colors
live in tableStyles.xml and are referenced by
tableStyleId. Theme changes that include a new table style
flow through automatically.
The right behavior: detect the header row by its differing style, set
firstRow and bandRow, and bind to a
tableStyleId. Most converters skip this and copy the
header’s fills and fonts as static formatting. The table looks identical
at first; switching themes changes nothing.
When the table isn’t recognized at all
Borderless tables, partially ruled tables, and tables with rendering artifacts defeat line detection. The contents don’t disappear; they come through as independent text boxes, positioned exactly where they sat in the PDF.
The slide looks like a table at a glance. It is not one. Cells aren’t editable as cells, changes don’t reflow rows, and importing the slide into Excel produces unstructured text. The defense, when you control the source, is to give every table explicit dividing lines.
XML structure
PPT tables use the a: namespace where Word uses
w:, but the structure is parallel:
<a:tbl>: the table element.<a:tblGrid>: column-width definitions.<a:tr>: rows.<a:tc>: cells, each containing<a:txBody>with paragraphs and runs.<a:tableStyleId>: reference to a theme style.
The tableStyleId reference makes a table theme-aware.
Without it, the table is just a grid of formatted text.
Where tables fail
Four points of fragility:
- Borderless tables go undetected and become loose text.
- Large tables become unreadable on a slide.
- Detection failures drop the table to a collection of text boxes.
- Theme styling is not inherited, even when detection succeeds.
If preserving the table accurately matters, keep the original
.pptx or the source data (Excel, CSV) and work from that,
not from a PDF.