Why I built it
Neurosurgery has a long tradition of mentorship — techniques, judgment, and instinct passed down through generations of training. That’s one of the best things about the field. But it also means some of what we practice reflects preference as much as proof. And honestly, telling the two apart is harder than it should be. The literature is huge, the evidence is often thin, and the distance between “proven” and “preferred” is wider than we like to admit.
I never wanted to just accept something because that’s what I was told. I wanted to understand how we got here from there — what the data actually says, and where it runs out. Not because the people who taught me were wrong, but because I think the best of them would want someone who asks why.
The problem is that staying honest about evidence means staying current, and staying current in neurosurgery is its own full-time job. Tens of thousands of papers a year, dozens of journals, six subspecialties deep. You mean to keep up, and then a month goes by and you haven’t read anything outside what came up on rounds.
So I built the thing I couldn’t find. A weekly briefing that tells me what actually mattered this week, with evidence grading I can trust, across all the domains I should be tracking. Not a table of contents. Not AI-generated slop. Something with real methodology behind it — transparent and conservative enough that it would rather miss a paper than overstate one.
The whole pipeline is open source. You can see how articles are searched, scored, and summarized — and if you think something should work differently, I want to hear it. Feedback is the thing that turns a side project into something genuinely useful, and every suggestion makes the methodology sharper.
No paywall. No login. No ads. Just the research that matters, every Friday.
— Mike Longo
How it works
Every Friday, an automated pipeline searches, filters, scores, summarizes, and publishes a new digest. Here’s exactly what happens, step by step.
1Source search
The pipeline queries PubMed using domain-specific searches built from MeSH descriptors, curated keywords, and journal filters — each of the six domains has its own set. Target journals include Journal of Neurosurgery, Neurosurgery, Spine, JAMA Neurology, Neuro-Oncology, Epilepsia, and others, depending on the domain. Queries run through a three-tier fallback: precise first, then broader, then widest — and if the initial 30-day window comes up empty, the search expands to 90 days, then 180. Separate searches pull preprints from bioRxiv and medRxiv, recruiting trials from ClinicalTrials.gov, and policy and conference updates from curated feeds.
2Filtering
Raw results pass through eight sequential gates. Articles already featured in previous digests are excluded. Editorials, commentaries, and letters are removed. Abstracts too short to be real are rejected. A relevance score is calculated from neurosurgical procedure terms, decision-making language, and anatomy — articles heavy on pharmacology with no surgical context get penalized. Near-duplicate titles are caught and removed. Hundreds of candidates go in. Typically 8–15 come out.
3Scoring and ranking
Surviving candidates are ranked on a point-based rubric: journal tier, study type (RCTs and meta-analyses outrank case reports), recency, abstract quality, sample size extracted from the text, and markers of human-subjects research. Citation data from Semantic Scholar is folded in when available, but it's never a gating factor. The top-scoring article per domain is selected, with two backups held in reserve.
4Evidence grading
Every article gets a CEBM evidence level — and this part is entirely deterministic. No LLM involved. The grade is derived from PubMed publication types and design markers detected in the text: randomized, double-blind, prospective, retrospective, multicenter. Level 1 for systematic reviews and high-quality RCTs, down through cohort studies, case series, and expert opinion at Level 5. Same classification a human would apply, just applied consistently every time.
5FactSheet extraction
Before any language model touches the article, a pattern-matching step pulls every verifiable number from the text — sample sizes, p-values, confidence intervals, effect sizes, percentages, follow-up durations, funding sources, trial registration IDs. This FactSheet becomes the ground truth. Everything downstream is measured against it.
6Summarization
A large language model generates a structured summary for each article, but it operates under tight constraints. Language is calibrated to evidence level — Level 1–2 gets direct language like "shows" and "demonstrates," Level 3 gets "suggests" and "indicates," Level 4–5 gets "may" and "preliminary evidence." Words like "breakthrough" and "paradigm shift" are banned. Every statistic must trace back to the FactSheet. Each card follows a fixed structure: clinical bottom line, why it matters, study design, population, key findings, limitations, and a teaching pearl that has to be specific to the paper — not generic advice about the subspecialty.
7Editor review
After summarization, every card goes through a second LLM pass that checks three things. First, claim accuracy — every number is cross-referenced against the FactSheet and source text, and anything unverifiable gets removed. Second, language calibration — if a retrospective case series says "demonstrates," the editor catches it and downgrades to "suggests." Third, writing quality — redundancy between fields is eliminated, wordy sentences are tightened, and generic teaching pearls are flagged. If the editor fails on a card, the original summary is kept. Nothing is lost to an editing error.
8Card scoring and replacement
A separate LLM pass scores each finished card on a 1–5 scale for clinical relevance and specificity. Cards below a 3 are automatically swapped out for the next-ranked backup article from that domain and re-summarized from scratch. If all backups are exhausted, a template fallback ensures the domain is still represented.
9Quality gate
Before anything publishes, every card passes a final check. The clinical bottom line has to be substantive. At-a-glance statistics have to include real data. Source URLs have to resolve. Practice-change claims have to use appropriately conservative language. Cards that fail are flagged — the digest still ships, but the issues are logged, not hidden.
This digest is for educational purposes only and does not constitute medical advice. Always consult the original publications and clinical guidelines for patient care decisions.