A weekly series from TaxGPT.ai
Question of the Week
Four AI models. One real tax question. The same retrieved IRS sources. Open evaluation by tax professionals — what each model got right, what it missed, and which one you would actually trust.
§ 1Issues
Each issue takes a real question from the TaxGPT.ai chat, runs all four models against the same retrieved IRS sources, and shows what they said.
About this series
Question of the Week is a transparency project. Tax software products generally show you one AI answer and ask you to trust it. We show you four — what GPT-5.4, Claude Sonnet 4.6, Claude Opus 4.7, and Gemini 3.1 Pro each say when given the same real user question and the same retrieved IRS sources — and we invite tax professionals to evaluate the outputs.
Each issue includes:
- A real user question from a TaxGPT.ai chat session
- The retrieved IRS sources (Code, Treasury Regulations, Publications, Rev. Procs.)
- Each model’s full answer with token counts, latency, and citation tracking
- A source attribution matrix showing which model cited which authority
- Editorial commentary on where the models diverged and why
- A "where the analysis is uncertain" section inviting professional critique
The goal is not to declare a winner. The goal is to help tax professionals understand how different AI models handle the same question, where current tax-AI products have gaps, and where human judgment still matters.
Subscribe to QotW
New issues published weekly. Get notified when each issue drops or when reader discussion updates a previous issue.
Have a tax question of your own?
Ask TaxGPT directly and see what the production model says, with full IRS source citations.
Ask TaxGPT →Frequently asked questions
Are these real tax questions?
Yes. Each issue starts with the actual user message from a real TaxGPT.ai chat session. When the original chat ended at the paywall, we re-run the conversation from scratch using a user simulator so all four models get a complete conversation. This is disclosed in each issue’s methodology footnote.
Are these meant as professional tax advice?
No. Question of the Week is a transparency exercise comparing how AI models handle tax questions. It is not professional tax advice. For specific tax situations, consult a CPA, EA, or tax attorney.
How are the models compared?
All four models receive identical conversation history, the same retrieved IRS source chunks from our Pinecone corpus, and the same system prompt used in production TaxGPT.ai chat. Only the underlying language model differs. Each issue captures token count, latency, citations, and a divergence analysis showing where each model differs from the production answer.
Can I contribute to the discussion?
Yes — each issue invites professional critique. Comment on the LinkedIn post that accompanies each issue, or use our contact page. Substantive contributions are credited and added to the issue’s "Updates from professional discussion" section seven days after publication.
What models are tested?
The current test set is OpenAI GPT-5.4 (which powers TaxGPT.ai chat in production), Anthropic Claude Sonnet 4.6, Anthropic Claude Opus 4.7, and Google Gemini 3.1 Pro Preview. The model lineup updates as new frontier models become available; coverage is disclosed per issue.