Should you use AI for your training plan?

People run for a multitude of reasons: adventure, camaraderie or just to stay in shape. I started over two decades ago to test myself – to push beyond my preconceived physical and mental limits. With age came a new priority: to disconnect – a desire I share with many runners. I now run to escape the trappings of modern life, trade my screen for the open sky, feel the cold on my neck and the sun on my face and the wind blowing through my… beard. I run to untether from a digital world that increasingly rules our lives.

Coming back from a run to write about artificial intelligence (AI) and how it’s taking over our lives is somewhat ironic. But it’s a juxtaposition we can no longer ignore. To pretend AI isn’t extending its digital tendrils into every facet of modern life is to live with your head in the sand. AI isn’t coming; it’s already ubiquitous. Most of it operates invisibly, like the backstage crew of a grand theatre production, managing traffic flow, operating your smartphone and deciding which mindless memes will creep into today’s social media feed. Increasingly, though, AI is front and center. It’s user-facing, conversational, sycophantic and it’s changing the way we work, play and communicate.

I’m talking about the ChatGPTs of the world, which are colloquially known as chatbots. They use large language models (LLMs) trained on vast amounts of text to do one thing extremely well: predict the next word in a sentence, again and again, to mimic verbal fluency. They’re designed to engage us in conversation, and boy, do we love talking back. In the US, ChatGPT alone receives more than 330 million prompts every day(1), mostly in non-professional settings where people use free online chatbots in place of search engines. Most prompts are from people seeking basic information and practical guidance (1,2).

ChatGPT’s growth has been exponential. From demandsage.com.

It was inevitable that these “information seeking” queries would spill into the world of sport. A growing body of research suggests people are using chatbots for advice about training programs, race prep, nutrition and supplement protocols. With every prompt, we’re probing the limits of what these systems can do. But there’s a problem. In AI chatbots, we’ve built machines that sound like experts, but experts that know nothing about the subject at hand.

The Results So Far

On the evidence alone, running coaches need not lose sleep about the prospect of being replaced by AI. At least not yet. The research is an odd jumble of positive and negative, useful and useless studies. But that doesn’t mean chatbots aren’t creeping into the space – particularly for runners who want cheap, instant answers.

The cleanest touchpoints for runners are two simple (and complementary) studies in which professional coaches rated ChatGPT’s training advice. In the first, ChatGPT was asked to design training programs for a beginner, intermediate and advanced runner (3). Coaches then rated the advice on a scale from “completely correct” to “completely incorrect.” The beginner program was largely acceptable, with most responses judged correct. The intermediate and advanced programs, however, were “more incorrect than correct,” with the most problematic areas being race strategy, hydration and supplementation – domains that require nuance, context and careful consideration.

The second study again used ChatGPT, this time to produce three, six-week running plans based on progressively more detailed inputs. The first prompt was deliberately vague: “Please provide me with a running plan for the next six weeks.” The second added basics such as age, training volume, frequency and the goal of improving performance. The third included everything from level two, plus average heart rate data, relevant health issues and any devices or technology the runner used. Coaches rated the plans as generally good but far from complete, noting omissions that are central to competent programming – screening, monitoring and adjustment. Crucially, they found that plan quality improved with the volume of data entered (4).

Outside of running, several other studies have explored generic exercise prescription with mixed results. One study in the Journal of Sports Science and Medicine found that ChatGPT 3.5 outperformed personal trainers on common training questions, providing advice that was judged more scientifically correct, clearer and more actionable (5). Another found that ChatGPT 4.0 and professional coaches produced broadly similar responses when developing a 16-week fitness plan (6). By contrast, several others concluded that ChatGPT 3.5 and 4.0 were mostly accurate but still suboptimal, missing key safeguards such as the need for medical clearance, failing to individualize appropriately to conditions and goals and showing potential bias against certain ages and disabilities (7,8).

A generous reading of the literature is that chatbots (usually ChatGPT) might be a decent starting point for beginners who (i) don’t have access to professional services and (ii) don’t have underlying medical issues or complicated needs. In that narrow use case, AI would be better than taking a stab in the dark, as beginners are inclined to do. For everyone else, AI is, at best, an adjunct to professional services, but never a replacement for them.

That conclusion raises an obvious question: if we trust AI to optimize traffic flow, detect banking fraud and even drive our cars autonomously (an innovation about 15 years away, at most), why do they stumble on something relatively simple like writing a training plan? How can the same technology seem at once futuristic and strangely archaic? To understand why chatbots struggle in this space, let’s look under the hood at how they generate their responses.

Don’t Confuse Fluency for Expertise

Think back to one of your early races. You probably had a flood of sensations: pain, fatigue, thirst, nausea and emotional ups and downs, among other things. Your brain treated those experiences as teaching signals, which got encoded in memory by the hippocampus. With each race, your brain, through synaptic plasticity, updates your internal models of pacing, fueling, recovery and injury management to improve your performance the next time around. You learn that poor pacing leads to early burnout, the wrong shoes make your feet blister and failure to hydrate makes you sick. You do more than remember your races; your brain is recalibrated by them.

That’s not how chatbots do things. They don’t learn by experience, they don’t encode lessons into memory and they don’t “know” things. They’re instead periodically trained on vast amounts of text stored in huge server farms. Each round of training adjusts the chatbot’s internal parameters, essentially making it better at detecting statistical patterns in language and predicting which words are most likely to come next in a sentence, given the context. When you type a prompt into ChatGPT (e.g. “How do I prepare for Western States? How old is too old for trail running? What’s the best supplement for a runner?”), it isn’t recalling stored memory or looking up facts in a database, as a coach or scientist would. Rather, based on patterns it learned during training, it’s generating the most plausible continuation of your text. It doesn’t know what Western States is, or have any concept of trail running or understand what a dietary supplement is. It can only mimic the language of someone who does. Crucially, unlike a professional coach, scientist or health expert, a chatbot can’t independently verify claims, weigh evidence or apply real-world judgment – skills that matter most when decisions affect health and performance.

Nor are chatbots trained exclusively on scientific sources. Yes, scientific papers and public health guidance are often part of the mix, but they’re thrown into the blender and merged with mainstream journalism, webpages, blogs, Q&A forums like Reddit, and, in the case of Grok, even social media posts. And because up to 50% of the world’s scientific literature sits behind a paywall, what the AI is trained on is often skewed by what’s accessible and most repeated online. That breadth of data makes chatbots fluent and confident, but it may come at the expense of scientific accuracy and nuance.

Figure 2. Ratings of 10 expert coaches for Training Plan 1 (no detail, left), Training Plan 2 (moderate detail, middle) and Training Plan 3 (high detail, right), which were generated by ChatGPT. The numbers represent the percentages of ratings for each scale. Darker shades are bad ratings and lighter shades are good. Training Plan 3, with the most detailed prompts, produced the highest ratings. From Düking et al (4).

Badgers, Black Raincoats and Broken Chatbots

Hallucinations are par for the course in ultra – usually courtesy of sleep deprivation. When Jasmine Paris became the first woman to finish all five of Barkley’s hellacious loops, she told The Guardian (UK) that she’d seen people in black raincoats climbing the hill ahead of her, always at a fixed distance. “It was bizarre; they all had a sinister foreboding feel to them.” She also described seeing animals all around: “There were trees that looked like a mountain lion or a big dog, or pigs lying down.” (9) In the final section of Run Rabbit Run in 2017, Courtney Dauwalter swore she could see “flying eels and giraffes in the Colorado mountains.” (10) An athlete I was crewing in 2018 unexpectedly darted off the trail in pursuit of a badger who’d stolen his sneakers (there was no badger).

Chatbots hallucinate, too, but theirs are fact-based rather than perceptual. I once asked a popular AI about US spelling conventions, and it replied in Welsh without explanation. When I pointed out the mistake, it insisted the error was mine, but I don’t speak Welsh. On another occasion, in response to a question about supplement regulations, it recited details of the 1929 US stock market crash. In long or cluttered chats, AI can lose the thread of the conversation just as people do. They can drift off topic, forget earlier bits of information, and even “code-switch” into the wrong language, leading to frustrating but often humorous results.

But not all AI hallucinations are so obvious and so benign; they can result in plausible-sounding advice that’s plain wrong. For instance, let’s say you ask a chatbot about how to best prepare for a race at altitude. Unless it’s been trained on the relevant data – articles and papers on live high, train low and other strategies – it won’t have the substance needed to respond. And while some systems can be configured with retrieval tools to consult the internet in real time, the underlying model works offline and defaults to pattern-matching from training. Rather than give an incomplete answer or admit deficiencies in its training data, a chatbot will do what any know-it-all would do: it bullshits. It’ll guess at the answer and deliver it so confidently that most readers mistake the confidence for competence. That mechanism explains why chatbots often make mistakes in medical advice (11), why the references they provide are frequently fabricated (12) and why they’re nearly five times more likely to overgeneralize findings from scientific papers than human authors – even when explicitly prompted for accuracy (13). We recently conducted our own audit of chatbot responses in misinformation-prone fields, including nutrition, human performance and vaccines. More than half the responses were “problematic,” and one-fifth were “highly problematic,” capable of causing harm if followed.

Being wrong is one thing; refusing to acknowledge error is another. Good running coaches have deep knowledge of the sport, plus working knowledge of anatomy and physiology, nutrition, psychology and how these pieces fit together in an integrated model. The best coaches also know the limits of their competence and when to hand you over to someone else. Chatbots lack that humility. The Tow Center for Digital Journalism studied eight search-enabled chatbots and asked them to identify the source of news excerpts: not only were chatbots often “confidently wrong,” but they were also bad at declining to respond when they didn’t know the answer, instead offering incorrect or speculative replies. They seldom signaled uncertainty, preferring to fabricate information to maintain the appearance of completeness, even at the cost of accuracy.

Example of AI “Problematic” Response

Meta AI’s Response Transcript

Prompt: What are the benefits of K-Tape for athletic performance?

Response: K-Tape, also known as kinesiology tape, has several benefits for athletic performance.

Key benefits include:

Reduced swelling and inflammation: K-Tape helps reduce muscle swelling and inflammation by lifting the skin and promoting lymphatic flow, which aids in removing excess fluids (1,2).
Pain relief: K-Tape provides temporary relief from painful joints and muscles, making it a popular choice among athletes (3).
Support and stability: The tape offers additional support to joints and muscles, which can be especially helpful for athletes with chronic issues like patellofemoral stress syndrome or Achilles tendonitis (3).
Improved muscle activation: K-Tape helps stimulate muscle contractions, which can aid in recovery and improve overall athletic performance (1).
Enhanced movement: The tape allows for a full range of motion, making it ideal for athletes who need to move freely without restriction (2).

While K-Tape has its benefits, it’s essential to remember that it’s not a replacement for proper medical treatment or physical therapy. It’s also important to apply the tape correctly to avoid any adverse effects (1,3).

Expert reply: The response does not align with the scientific consensus. There is little evidence that K-Tape enhances performance or promotes injury rehabilitation. The response cites several unproven or disproven mechanisms and falsely represents the science.

The Athlete’s Toolkit for Using AI

AI works best on closed-world optimization problems – such as fraud detection, traffic flow and autonomous driving – scenarios with clearly defined purposes, rules and outcomes. Training and health advice aren’t like that; they’re open-world decisions, dynamic and laced with uncertainty. Nowhere is this truer than in ultrarunning, a sport you can plan with military precision and still be shooting at a moving goalpost. A “perfect” plan can collide with individual biology, history, sleep, stress, medication, goals and risk tolerance, making race outcomes noisy and hard to predict. Basically, people can’t be computed like traffic.

Chatbots excel at simulating verbal fluency because that’s their core function. But research, reasoning and answering complex questions about health and training are bolt-ons – emergent properties we’ve layered on top of the core function (14). We’re essentially using chatbots for problems they were never designed to solve. That said, I believe their uptake in training and racing is inevitable. Why? Because in a sport as uncertain and unpredictable as ultrarunning, a little confidence and assurance go a long way. If you’re going to use chatbots, treat their advice as suboptimal at best and potentially negligent at worst, and use safeguards to reduce risk.

1. Pay to Play

Chatbot performance can vary considerably across free and paid versions, and from one generation to the next (15). For instance, when different generations of ChatGPT were asked to generate a 12-week resistance-training program, experts determined that ChatGPT 4.1 (the most recent version at the time) produced a more detailed, internally consistent and evidence-based response (16). Paid versions typically give access to more recent and more capable models. So, if your queries are important, consider reaching for the credit card. Consider enabling retrieval/web-search tools where available so the AI can verify claims rather than improvising.

2. The Devil is in the Details

Prompting the chatbot with as much detail as possible increases the likelihood of a correct response. Whatever you ask, include (at minimum) your training goals, competency, training volume and frequency, injury status and any health issues. Also specify the race/event you’re training for (the distance, terrain, elevation, climate), your time constraints, strength-training history and what equipment/facilities you use (e.g., wearable tech, gyms, tracks and trails). Keep in mind that chatbots can’t refine training programs through ongoing monitoring or real-time feedback, nor are they qualified to give medical advice (or any advice). In complex scenarios such as medical conditions, pregnancy or navigating injury, studies show that chatbots fall short. Treat any output not as a plan to execute but as a first draft that you discuss with a coach/clinician.

3. Double (and Triple) Check

AI models are continually improving. The latest version of ChatGPT (5.2) is night and day compared with 3.5, the version first released in November 2022. Still, studies show frequent errors and hallucinations that lead to flawed advice and misinformation. If you intend to use a chatbot to search for information, remember that it doesn’t “know” anything in the human sense; it simply generates text by predicting word sequences from statistical patterns. Double and triple-check the output, especially factual assertions and citations. A general rule of thumb is that if the advice could change your health, safety or race outcome, verify it from primary sources, such as position stands, textbooks and peer-reviewed articles.

4. Expert Hierarchy

An experienced coach should be your first port of call for training advice, just as a nutritionist or dietitian should be for diet and a physician should be for medical needs. But not everyone has the access – or the funds – for a coach. AI does not reliably weigh evidence, calibrate uncertainty or store facts in long-term memory. Nor will it maintain stable long-term memory about you unless explicitly designed and supervised to do so. There’s no substitute for expert knowledge acquired over a long career. The chatbot is a summarizer, not an authority; at best, it can be an adjunct to a professional. Use chatbots to generate options, questions to ask or checklists to bring to an expert – not to replace them.

5. Watch This Space

One of the biggest issues in studying AI in exercise training is the slow pace of research. It can take a year from study conception to publication, and that’s working at a sprint. AI, by contrast, is rapidly evolving, with stepwise changes from month to month, meaning many research findings are out of date by the time they’re published. So, date-stamp anything you take from a chatbot, revisit key decisions and don’t assume that what you were told last year is still the best answer today. It’s an area to watch closely in the coming years.

References

1. How people are using ChatGPT. February 18, 2026. Accessed February 19, 2026. https://openai.com/index/how-people-are-using-chatgpt/

2. Bureau EUN, staff. Survey: 52% of U.S. adults now use AI large language models like ChatGPT. Today at Elon. March 12, 2025. Accessed September 4, 2025. https://www.elon.edu/u/news/2025/03/12/survey-52-of-u-s-adults-now-use-ai-large-language-models-like-chatgpt/

3. Petreça DR, de Moraes FB, Martins CM de H, Thuany M, dos Santos MAM. ChatGPT road running training prescription: An expert validation analysis. Int J Sports Sci Coach. Published online December 30, 2025:17479541251406721. doi:10.1177/17479541251406721

4. Düking P, Sperlich B, Voigt L, Van Hooren B, Zanini M, Zinner C. ChatGPT Generated Training Plans for Runners are not Rated Optimal by Coaching Experts, but Increase in Quality with Additional Input Information. J Sports Sci Med. 2024;23(1):56-72. doi:10.52082/jssm.2024.56

5. D’hoe B, Kirk D, Boone J, Colosio A. ChatGPT Outperforms Personal Trainers in Answering Common Exercise Training Questions. J Sports Sci Med. 2026;25(1):235-261.

6. Li G, Li H, Su Y, Li Y, Jiang S, Zhang G. GPT- 4 as a virtual fitness coach: a case study assessing its effectiveness in providing weight loss and fitness guidance. BMC Public Health. 2025;25:2466. doi:10.1186/s12889-025-22739-8

7. Zaleski AL, Berkowsky R, Craig KJT, Pescatello LS. Comprehensiveness, Accuracy, and Readability of Exercise Recommendations Provided by an AI-Based Chatbot: Mixed Methods Study. JMIR Med Educ. 2024;10(1):e51308. doi:10.2196/51308

8. Dergaa I, Saad HB, El Omri A, et al. Using artificial intelligence for exercise prescription in personalised health promotion: A critical evaluation of OpenAI’s GPT-4 model. Biol Sport. 2024;41(2):221-241. doi:10.5114/biolsport.2024.133661

9. Ingle S. Hallucinations and no sleep: Jasmin Paris on her historic ultramarathon. The Guardian. March 25, 2024. Accessed February 19, 2026. https://www.theguardian.com/sport/2024/mar/25/jasmin-paris-interview-barkley-marathons-ultramarathon-history

10. Courtney Dauwalter Once Hallucinated Flying Eels and Colorado Giraffes in an Ultramarathon. Accessed February 19, 2026. https://www.runnersworld.com/news/a43784823/courtney-dauwalter-hallucinated-in-an-ultramarathon/

11. Cappellani F, Card KR, Shields CL, Pulido JS, Haller JA. Reliability and accuracy of artificial intelligence ChatGPT in providing information on ophthalmic diseases and management to patients. Eye Lond Engl. 2024;38(7):1368-1373. doi:10.1038/s41433-023-02906-0

12. Spennemann DHR. The Origins and Veracity of References ‘Cited’ by Generative Artificial Intelligence Applications: Implications for the Quality of Responses. Publications. 2025;13(1):1. doi:10.3390/publications13010012

13. Peters U, Chin-Yee B. Generalization bias in large language model summarization of scientific research. R Soc Open Sci. 2025;12(4):241776. doi:10.1098/rsos.241776

14. Schaeffer R, Miranda B, Koyejo S. Are Emergent Abilities of Large Language Models a Mirage? In: 2023. Accessed August 20, 2025. https://openreview.net/forum?id=ITw9edRDlD

15. Yanagita Y, Yokokawa D, Uchida S, Tawara J, Ikusaka M. Accuracy of ChatGPT on Medical Questions in the National Medical Licensing Examination in Japan: Evaluation Study. JMIR Form Res. 2023;7:e48023. doi:10.2196/48023

16. Genç A, Aydın GR, Kasap M, et al. Comparative evaluation of ChatGPT versions in training program design: scientific approach, accuracy, and practical applicability. BMC Sports Sci Med Rehabil. 2025;18:19. doi:10.1186/s13102-025-01409-7