Develop a repeatable workflow to extract structured evidence tables from ARS (American Radium Society) and similar clinical guideline PDFs, outputting the data to Excel format.
| File | Size | Pages | Description |
|---|---|---|---|
2021_ARS_ReRT_Full_2021_09_1.pdf |
1.3 MB | 64 | Full ARS guideline on head/neck cancer re-irradiation |
ReRT_Evidence_Table.pdf |
121 KB | 7 | Extracted "Supplemental Table 1: Evidence Table" |
The evidence table contains clinical study data with the following columns:
| Column | Data Type | Notes |
|---|---|---|
| Reference | Text | Author, Year + superscript reference number |
| Study Type | Text | RCT, MA, SR, SAT, RMI, RSI |
| Topic/Objective | Text | Multi-line possible |
| Disease | Text | Cancer type/site |
| Arm(s)/Cohort(s) | Text | Treatment descriptions |
| N | Text/Number | Sample size, sometimes "X studies, Y patients" |
| Median FU (Mo.) | Text | Follow-up duration |
| Results | Text | Outcomes - longest field |
| Study Quality | Number | 1-4 scale |
Target: Excel (.xlsx)
Rationale:
| Model | Input | Output | Best For |
|---|---|---|---|
| Haiku 3 | $0.25 | $1.25 | Simple tasks, high volume |
| Sonnet 4.5 | $3.00 | $15.00 | Balanced, good for parsing |
| Opus 4.5 | $5.00 | $25.00 | Most capable |
Based on the sample ARS document (~146,000 characters ≈ 37,000 tokens input):
| Model | Input Cost | Output Cost | Total Per Document |
|---|---|---|---|
| Haiku 3 | $0.01 | $0.006 | ~$0.02 |
| Sonnet 4.5 | $0.11 | $0.08 | ~$0.19 |
| Opus 4.5 | $0.19 | $0.13 | ~$0.32 |
| Volume | Haiku | Sonnet | Opus |
|---|---|---|---|
| 10 documents | $0.20 | $1.90 | $3.20 |
| 100 documents | $2.00 | $19.00 | $32.00 |
| 1,000 documents | $20.00 | $190.00 | $320.00 |
Recommendation: Sonnet 4.5 (~$0.19/document) offers the best balance of capability and cost for complex medical table parsing.
The Claude Max subscription ($100-200/month) does not include API access. The API requires a separate Console account with pay-per-token billing. These are independent systems:
Before building the extraction workflow, collect 50+ sample PDF files to:
┌─────────────────┐ ┌──────────────────┐ ┌─────────────────┐
│ Source PDF │────▶│ Text Extraction │────▶│ AI Parsing │
│ (Evidence │ │ (Python/C#) │ │ (Claude API) │
│ Table) │ │ │ │ │
└─────────────────┘ └──────────────────┘ └────────┬────────┘
│
▼
┌─────────────────┐ ┌──────────────────┐ ┌─────────────────┐
│ Validation │◀────│ Excel Output │◀────│ Structured │
│ & Review │ │ (.xlsx) │ │ JSON Data │
└─────────────────┘ └──────────────────┘ └─────────────────┘
Action Required: Collect 50+ sample PDFs, then resume design work.
When samples are ready, Claude can:
Document Created: December 20, 2025
Last Updated: December 20, 2025