10 Three Paths to AI Integration: Navigating Knowledge, Language, and Reflection Support in Business Education
Sean Mitchell
How to cite this chapter:
Mitchell, S. (2025) Three paths to AI integration: Navigating knowledge, language, and reflection support in business education. In R. Fitzgerald (Ed.), Inquiry in Action: Using AI to Reimagine Learning and Teaching. The University of Queensland. https://doi.org/10.14264/a053a7a
Abstract
Most educational uses of generative AI assume its value lies in producing answers. This chapter reframes AI as verifier, coach, and facilitator across three implementations at a research-intensive business school. BBOP verifies assessment choices against course rules without generating analysis, reducing cognitive offloading while preserving student thinking. NotABot embeds tiered, just-in-time language feedback inside online modules to improve pragmatic writing at scale. The Lighthouse prototypes a Socratic reflection platform that elicits metacognitive insight rather than content. Together, these cases show that guardrails and placement matter more than model sophistication; that course-specific context consistently outperforms generic tools; and that institutional infrastructure and timing strongly shape engagement. The chapter offers design principles and implementation lessons for teams seeking responsible, learning-aligned AI that amplifies cognition instead of replacing it.
Practitioner Notes
- Reframe AI as a pedagogical partner that supports inquiry and reflection rather than automating student or educator tasks.
- Multi-layered constraints preserve cognitive engagement and ethical practice, turning AI from answer-giver into thinking catalyst.
- Course-specific AI aligned with curriculum and assessment produces greater learning gains than generic tools.
- Embedding AI activities directly within learning sequences ensures relevance and timely feedback.
- Rapid prototypes can drive engagement, but sustainable impact requires SoTL-aligned evaluation, reflection, and evidence gathering.
Introduction
Three years into the generative AI (GenAI) revolution, educational implementations remain constrained by a fundamental misconception: that AI’s value lies in providing answers. Step into any university classroom discussing AI tools, and you will hear the familiar concerns—students are using ChatGPT to write essays, solve problems, or complete assignments. The discourse assumes AI’s function is to generate content.
By moving beyond the chatbot paradigm, we can begin to see the deeper pedagogical potential of the technology. GenAI need not be limited to answering questions or providing information. It can verify understanding without giving answers, coach specific skills through structured practice, and facilitate self-discovery through guided reflection. The same conversational interfaces can therefore serve fundamentally different educational purposes when we reimagine their role in the learning process.
Background
In 2025, as a Principal Learning Designer at the University of Queensland Business School, I was asked to contribute my expertise in digital pedagogy and AI enhanced learning to explore how GenAI might support student learning in new and responsible ways. This work led to the design and implementation of three initiatives that deliberately subverted the instant-answer model of GenAI. These include:
BBOP – an AI-powered course tutor, embedding course-specific content to verify student understanding of assignment requirements.
NotABot – an interactive language practice tool designed for courses with high culturally and linguistically diverse (CALD) student populations that leads students through structured writing activities, providing real-time, nuanced feedback tailored to business communication contexts.
The Lighthouse – a reflective learning assistant that inverts typical GenAI interaction, guiding students through structured reflection, extracting insights from their own experiences.
This chapter documents these three implementations, examining how content knowledge, pedagogical approaches, and technological capabilities intersect. When we reimagine AI’s role beyond information provision, as verifier, coach, and facilitator, we transform it from a perceived threat into a catalyst that amplifies the very cognitive and metacognitive processes that higher education seeks to develop.
Case 1: BBOP – Knowledge Support Without Answer Generation
The Problem
In the course Leading and Managing People, students analyse a current leader through the lens of approved leadership theories. On paper, the task appears straightforward: the chosen leader must be alive, currently active in their field, and publicly documented and the theories applied must be drawn from a prescribed list covered in the course. Yet every semester, a familiar pattern emerged: a significant number of students submitted work analysing deceased or retired leaders or drew upon theories not included in the course list.
The Course Coordinator, Dr Ree Jordan has provided assessment guidelines, comprehensive FAQs and clearly defined rubrics, however, students continued to misinterpret or overlook foundational requirements. Many appeared confident in their understanding but only discovered their misalignment when they received feedback, too late to make meaningful corrections. The issue was not the lack of information but its accessibility and timing. What students needed was just-in-time access to immediate, accessible guidance embedded at the point of assessment decision making.
Building BBOP (Blackboard-Based Online Pedagogy):
I developed BBOP using Zapier’s ChatGPT-based platform, a prototype built in a few hours. With a carefully focused prompt BBOP could verify whether a student’s chosen leader met the required criteria and whether the selected theory was on the approved list. Encouraged by these early results, I tested broader questions on assessment, content and course logistics.
The initial results revealed BBOP worked brilliantly, too brilliantly! When prompted to write a sample assessment, it produced a sophisticated analysis, which would have received a strong grade. It revealed an unintended consequence: we had inadvertently created the perfect course-specific cheating tool. Unlike general purpose platforms such as Microsoft’s Copilot or generic ChatGPT, BBOP knew the assessment brief, the approved theories and even the specific expectations embedded into the rubric. It was a perfect tool for the course and therefore a perfect risk. To prevent this, we needed to implement guardrails and transform BBOP from a content generator into a learning partner. A system that verified understanding without performing the task for students. This became the foundation of what we call the Verification-Not-Generation Model.
The Verification-Not-Generation Model
Developing effective guardrails proved less of a technical exercise and more an inquiry into how design shapes learning. The first guardrail prompt, “do not write essays for students” failed. Through iterative design cycles, I reframed the challenge as a pedagogical problem rather than a programming one. I recognised that structured boundaries when clearly defined support rather than restrict student autonomy.
It took multiple rounds of testing to find the balance between guidance and independence. The winning formula combined three core elements:
Explicit instruction: A detailed prompt with seventeen explicit refusal policies targeting trigger phrases like “write me” or “draft my”.
Calibrated creativity: Zapier’s creativity setting was reduced to zero to ensure consistent, rule-based responses.
Prompt efficiency: A deliberately shortened system prompt improved adherence to the core teaching intentions.
The result was an AI that could verify without generating. For instance, when a student asked
“Can I write about Elon Musk for Assessment 2?”
BBOP would confirm he met the requirements as a current, active leader. However, when the inevitable follow-up came:
“Can you explain how transformational leadership applies to him?”
BBOP would respond firmly yet supportively:
“I cannot write analysis for you. However, I can confirm that transformational leadership is one of the approved theories in Module 6. You’ll need to apply the four components yourself.”
This shift transformed BBOP from a potential source of academic misconduct into a pedagogical tool that maintained the essential cognitive challenge for students whilst still offering support for decision making, reinforcing principles of integrity, independence, and inquiry the course sought to instil.
This approach aligns with emerging evidence on the design of educational tools. Bastani et al. (2025), in a randomised controlled trial with high school mathematics students, found that those students with unrestricted access to GPT-4 performed 17% worse on final exams than peers without AI support. However, when the researchers introduced a structured version of the tool, “GPT Tutor” with structured prompts and guided interactions, this largely mitigated any negative effects. Their findings reinforce that constraint design matters more than technological sophistication and determines whether AI enhances or undermines teaching. In this sense BBOP’s Verification-Not-Generation model reflects a growing recognition that effective GenAI integration depends on pedagogical framing rather than technical capability.
The Blank Page Problem and Template Solutions
Early usage data revealed students would log in to BBOP but rarely ask a question. To overcome this, we added six template prompts to the landing page:
- “Is [leader name] appropriate for Assessment 2?”
- “What theories are on the approved list?”
- “Can you explain the difference between Theory X and Theory Y?”
- “What are the assessment criteria for the case study?”
- “When is Assessment 2 due?”
- “How should I structure my analysis?”
Engagement increased immediately. Transcript logs showed the majority of conversations began with these templates, which modelled appropriate use whilst subtly reinforcing the tool’s boundaries.
BBOP2: The Infrastructure Challenge
In Semester 2, 2025, following ethics approval and a small grant for Claude API access, we sought to replicate the success in a postgraduate Human Resources course. We rebuilt BBOP from scratch as a custom full stack React/Node/PostgreSQL application. BBOP2 was a significantly more complex development and revealed the hidden infrastructure challenges of educational AI.
The technical evolution from a Zapier chatbot to a custom platform introduced essential capabilities such as authenticated access, conversation history, research data collection, and crisis detection protocols. But this additional capability also introduced friction. Despite superior capabilities, BBOP2 saw lower engagement. Institutional approval for the Claude API was not secured until mid-semester, by which time, students had already established study routines without the tool.
The contrast reveals a fundamental tension in AI learning design. Simple low-code tools allow educators to move quickly – testing ideas, gathering early feedback and engaging students while motivation is high. This immediacy makes it ideal for innovation, experimentation and agile teaching contexts. However, these lightweight solutions often lack the data governance, reliability, and research capability required for large-scale or long-term implementation. In contrast, comprehensive platforms built within institutional systems can support accountability, security and rigorous evaluation but their development and approval processes are slow. By the time they are operational, the opportunity for authentic use may have passed. Meaningful progress in AI-enabled education depends on uniting pedagogy, technology and system change, a challenge for universities (Luckin et al., 2016).
Lessons from BBOP
- Guardrails require systems, not suggestions: Single-layer restrictions fail. Multiple reinforcing mechanisms create robust boundaries that shape student behaviour.
- Course-specific beats generic: BBOP’s knowledge of our exact theories and requirements made it valuable. Generic AI tools (currently) cannot provide this targeted support.
- Deployment timing matters more than features: BBOP1’s early availability drove engagement whereas BBOP2’s delayed launch, despite superior capabilities missed the critical window for student adoption.
- Infrastructure gaps limit innovation: Without institutional support for LTI integration, API approval processes, or secure hosting, every implementation becomes a complex workaround rather than a scalable solution.
- The blank page problem is real: Students often need scaffolded entry points, not open-ended interfaces. Structured prompts guide appropriate use and reduce anxiety around engagement.
BBOP revealed that success depends less on technical sophistication than on timely deployment with appropriate constraints. For universities, this highlights the need for intermediate infrastructure, systems that support rapid experimentation and iteration without compromising governance, ethics or data security. Only then can educational AI realise its potential to enhance learning at scale.
Case 2: NotABot – Embedded Language Support
NotABot delivers real-time feedback on complex language tasks for Management Communication (MGTS7610), a postgraduate course where English language proficiency challenges were undermining student success. With Andrew Sankey, the course’s Embedded Language Support Officer (ELSO) we used AI to transform static online modules into interactive modules that provide real-time personalised feedback.
The Problem
The Master of Business cohort at UQ faces a significant challenge: whilst course content is delivered in English, many students struggle to meet the linguistic demands of professional business communication. The rise of machine translation and GenAI tools have exacerbated this issue, enabling students to bypass the hard work of language learning rather than develop genuine proficiency. MGTS7610 formed part of a wider university initiative to address this issue through enhanced language support and resources. Andrew had created a suite of H5P modules covering business communication principles that tested basic knowledge through multiple choice and drag-and-drop activities. While effective revision tools in their own right, these modules could not replicate the real-time, sophisticated, context-aware feedback that human tutors provide on complex writing tasks that require nuanced word choices and appropriate tone for professional audiences. With more than 500 students enrolled, providing individualised feedback on written work was unfeasible. The pedagogical challenge was to provide meaningful, discipline-specific language feedback at scale without sacrificing quality or authenticity.
Building NotABot
Andrew and I integrated GenAI-powered activities as the final element of each H5P module. Students first completed the traditional knowledge checks, then encountered NotABot as part of the learning sequence as an embedded final task.
The technical setup involved three components working together:
- Activity definitions such as in GitHub: Each activity type was defined in an HTML file with specific parameters. For example, the email subject line activity was defined as:
’email-subjects’: {title: ‘Email Subject Lines’,prompt: ‘Practice writing effective email subject lines…Write the subject line:’}
Figure 1 Activity Definitions
- URL embeddings such as in H5P: Each H5P module included iframes that passed these parameters to Zapier:
<iframe src=”https://teach.business.uq.edu.au/activity.html?activity=email-subjects”></iframe>
Figure 2 URL embeddings
- Structured prompting in Zapier: NotABot received detailed activity-specific instructions generating ten different activity types, ranging from concise email subject lines to structured persuasive proposals to critical reflections. Each activity began with a tailored initiation such as:
“Let’s practise making writing more concise. Here’s a sentence from a typical business email: ‘I am writing to you in order to request…’ How could you make this more direct?”
The Feedback Balance
Designing appropriate feedback proved unexpectedly challenging. Initial versions were overly pedantic; the AI would critique technically correct but stylistically imperfect sentences in ways a human instructor would not. We refined the model extensively, tempering the language with phrases like “that’s moving in the right direction” and “you’re on track” to create feedback that was encouraging and constructive.
A three-tier scaffolding system underpinned the final design:
Tier 1: General hints pointing students toward relevant communication principles
Tier 2: Specific questions highlighting problematic elements
Tier 3: Sentence frames that offered structure without supplying content
Even with maximum support, students were required to construct their own responses. This was deliberate given the cohort’s tendency to rely on translation tools. Our goal was to make learning visible, not make writing easier: authentic language competency without shortcuts.
Evidence Informed Design
Our approach aligns with evidence on GenAI-supported language learning. Mahapatra’s (2024) mixed-methods study found using ChatGPT as a feedback coach led to significant gains in ESL students writing complexity and organisation. Similarly, Deep et al. (2025) reported that AI-enhanced language learning improved proficiency and engagement, when properly scaffolded. The three-tier scaffolding design used in NotABot reflects findings that structured support maintains learning outcomes whilst unfettered access creates dependency (Bastani et al., 2025). By initiating conversations at the point of learning, NotABot addresses help-seeking barriers identified by Seo et al. (2021), who found students valued just-in-time support regardless of who initiated it.
Implementation and Integration
NotABot activities appeared as the final component of each module, a natural continuation after concept introduction and knowledge checks. Students did not need to navigate elsewhere the activities were immediately available when concepts were most salient.
Each AI-supported activity included three to four practice tasks with gradually escalating complexity, typically requiring eight to ten minutes to complete. This integration ensured that language development was woven into the course flow rather than treated as an optional task, reinforcing accessibility and sustained engagement.
Student Response
Informal feedback from 118 respondents (a 23% response rate) showed that 89% found NotABot useful. Students particularly valued understanding why their writing needed improvement, not just what was wrong. Several commented that receiving immediate in-context feedback at the point of learning was far more valuable than waiting for comments on formal assessments.
However, some students found the feedback was occasionally inconsistent, reflecting the ongoing challenge of calibrating prompt design to balance encouragement with appropriate correction. Others requested more workplace-contextualised scenarios rather than generic business examples, suggesting that greater contextualisation would further enhance authenticity and relevance.
Limitations and Lessons
While NotABot’s anonymous architecture protected privacy, it also prevented measurement of learning outcomes. Additionally, Zapier’s text-only environment imposed notable constraints. Ideally, future iterations would include spoken interaction for conversational practice, particularly valuable for students from culturally and linguistically diverse (CALD) backgrounds. Similarly, visual communication activities such as slide layouts or infographics remain on our development wishlist. A full-stack API environment, similar to BBOP2, could enable these multimodal capabilities but would again cause institutional challenges and require the same complex approval processes that risk delaying implementation.
NotABot’s effectiveness emerged through iterative refinement of its feedback approach. As we had transcripts of all student interactions, we were able to analyse these and identify when feedback was not working well. This analysis revealed patterns in both successful and problematic interactions, leading Andrew to codify six core pedagogical principles that governed how the AI should respond. The first and most fundamental principle established that the AI must never provide complete, corrected answers, as doing so prevents genuine learning. Instead, the system employs a scaffolding method that progresses through three levels of support: general conceptual hints, targeted leading questions, and only as a final step, sentence frames that guide structure without providing content.
The remaining principles address specific interaction challenges that emerged from the transcripts. When students provided minimal or off-topic responses, the AI follows an engagement protocol that gently redirects on the first instance, applies more direct scaffolding on the second, and gracefully concludes if no genuine effort materialises. To prevent superficial learning, the system recognises and discourages mimicry, prompting students to produce original attempts when they simply copy provided examples. A relevant tangent protocol allows the AI to briefly address student questions related to the broader topic before transitioning back to the task, validating curiosity without derailing learning objectives. Finally, a proficiency protocol ensures the AI recognises professionally acceptable responses and concludes successfully rather than over-correcting strong work with minor stylistic suggestions. Together, these principles transformed NotABot from a generic chatbot into a pedagogically responsive tutor that balances support with challenge.
Ultimately, NotABot demonstrated that conversational AI can deliver meaningful, sophisticated feedback for complex language tasks at scale. Its success rested on collaboration between learning design and language support expertise, neither could have created this alone. The fundamental challenge for Higher Education lies in balancing rapid, responsive deployment to meet immediate learning needs with the infrastructure required for rigorous evaluation and sustainable growth.
Lessons from NotABot
- Feedback tone matters: Overly critical or mechanical AI feedback discourages engagement; human-like encouragement and formative phrasing sustain motivation.
- Scaffolding sustains learning: Structured, tiered feedback prevents dependency and promotes genuine language development.
- Context drives authenticity: Discipline-specific examples and scenarios make AI feedback more relevant and transferable to real workplace communication.
- Integration beats add-ons: Embedding AI activities within existing modules ensures consistent engagement and lowers the barrier to use.
- Collaboration is essential: Effective AI-supported learning requires multiple stakeholder expertise to achieve meaningful outcomes in isolation.
Together, BBOP and NotABot illustrate that the success of educational AI depends on pedagogical agility rather than technological sophistication, the ability to design, deploy, and adapt tools rapidly in response to evolving learner needs and institutional contexts.
Case 3: The Lighthouse – Reflective Learning Platform
The Lighthouse is the most ambitious and as yet untested of our three implementations: an AI powered reflection tool designed to facilitate self-knowledge. The Lighthouse remains in pre-pilot development and is currently being built by an in-house developer. Its inclusion here documents design rationale for AI role inversion before evidence can validate or challenge assumptions.
The Problem
Cameron Turner, an entrepreneurship course coordinator, faced a recurring pattern with MBA students tackling wicked problems like homelessness and climate change. Students often approached these challenges with confidence but little depth, proposing solutions after minimal investigation. They assumed understanding of problems they had never personally experienced and their proposals were predictably superficial .To address this, Cameron embedded two mechanisms intended to cultivate humility and insight, drawing from Owens et al.’s (2013) expressed humility framework: structured interviews and reflective writing. Rather than focusing solely on entrepreneurial skills, the approach aims to foster humility, a willingness to view oneself accurately, value others’ strengths and contributions, and remain open to learning.
Students interviewed 100 people with genuine insights or expertise into the chosen problem. Some interviews were formal, while others were spontaneous, the unplanned conversations often yielding the richest insights. Through these encounters, students confronted the gap between their initial assumptions and the complex realities they sought to address. Weekly written reflections throughout the semester deepened the process, with brief and targeted feedback from Cameron.
The combination proved transformative. Exposure to lived experience, structured reflection, and instructor feedback instilled genuine humility and deeper understanding. Students recognised the gap between their initial assumptions and the complex reality revealed through deep listening. Better yet, these insights led to more nuanced, viable solutions to niche problems, that could not have been reached without this reflective process. For many MBA students from corporate backgrounds, this was their first encounter with reflective practice. The learning curve was steep, but the growth was tangible.
Yet the model came at cost. Providing even brief feedback on weekly reflections for a large cohort required substantial time and energy. Cameron recognised that the power came from the structured reflection itself, and the probing questions forced students to examine assumptions and biases. If such guided reflection could be systematised without sacrificing depth, it could become accessible at scale without requiring intensive instructor feedback on every submission.
Building The Lighthouse
The core hypothesis behind the Lighthouse emerged from observation: the most powerful learning comes from questioning not answering. This question over answering approach draws support from multiple research domains. Favero et al. (2024) found that a Socratic chatbot significantly improved learning outcomes compared to direct-help baselines, with engagement increasing over successive conversation turns. Similarly, Kestin et al. (2025) in a Harvard randomised control trial reported that carefully prompted AI tutoring outperformed in-class active learning, providing the prompts were expertly crafted and structured scaffolding was explicit. In contrast, Fakour and Imani’s (2025) comparative study observed a bimodal response pattern, while some students thrived with AI Socratic tutoring, others found it alienating, highlighting important questions about how to scale intimate reflective learning through AI facilitation.
The Lighthouse inverts the typical AI interaction entirely. Rather than providing information or teaching skills, it asks questions, extracting insights from students rather than providing answers. The platform architecture is designed around multiple reflection modes, each serving different pedagogical purposes:
- Baseline measure: A comprehensive 20-minute guided reflection exploring values, assumptions, and readiness before students begin their interviews. This establishes a foundation for measuring growth over time.
- Micro-reflections: After each interview, students upload artefacts (recordings, notes, photos) and immediately respond to brief prompts capturing takeaways before the memory fades or rationalisation sets in. Questions such as “What surprised you?” or “What assumption did this person challenge?” capture authentic reactions.
- Scheduled check-ins: Periodic reflections throughout the semester prompt students to revisit earlier thinking. The system references prior responses, “Three weeks ago you said X. How has your thinking changed?” to scaffold metacognitive awareness.
- End-of-semester synthesis: A longer reflection draws together the full arc of learning, with 100+ micro-reflections and multiple check-ins, students trace specific moments where understanding shifted.
This design is supported by Xu et al. (2025) whose study on metacognitive prompts in GenAI found significantly improvement in both self-regulated learning and student experience versus control groups. Prompts encourage learners to assess comprehension, identify confusion, and seek clarification fostering awareness of their own thinking processes, precisely the capability that the Lighthouse’s multi-modal approach aims to cultivate.
The Facilitation Approach
This case investigates how AI can act as a facilitator of reflective learning. Initial testing revealed a persistent AI problem, verbosity. Early responses to student reflections generated 200-word sycophantic paragraphs, creating what testers described as “being lectured at” rather than listened to. The point of the tool was to listen and help probe for insights, not to provide a running commentary. The system prompt now mandates 60-word maximum responses following a strict format: warmly acknowledge what the student shared, create a bridge to the next question, ask the question, nothing more. This constraint required careful calibration. Research on Socratic questioning emphasises validation before challenge, students will not engage with probing questions unless they feel heard first (Fakour & Imani, 2025; Owens et al., 2013). The warm acknowledgement serves this purpose, mirroring the humble leadership principle of appreciating others’ contributions before guiding deeper examination, but must remain brief enough to maintain focus on student thinking rather than AI observations.
The baseline reflection tracks students through nine thematic stages, deliberately sequenced from comfortable topics (warmup questions about recent positive experiences) to more challenging territory (examining assumptions and uncertainties) and finally to integrative reflection (connecting values to decisions) (Owens et al., 2013). This mirrors the approach of skilled facilitators who first build psychological safety before guiding learners into more complex and potentially uncomfortable territory.
The micro-reflections employ graduated probing similar to NotABot’s scaffolding but applied to reflective thinking rather than skills practice. When students give surface-level responses, the persona they interact with, Keeper Kim, asks progressively more specific questions: “You mentioned surprise—what specifically surprised you? What were you expecting instead?” This iterative exchange continues until genuine examination emerges, or it becomes clear the student is not yet ready to engage deeply.
Safety Parameters and Wellbeing Concerns
Facilitating deep personal reflection introduces risks absent from knowledge verification or language coaching. When students examine assumptions about wicked problems involving marginalised communities, they may confront uncomfortable realisations about privilege or ignorance. Some may also disclose mental health concerns, trauma, or crisis situations. Current protocols include crisis detection keywords triggering referral to counselling services, explicit disclaimers that Keeper Kim is not a therapist, boundaries around engagement topics, and opt-out mechanisms. The key challenge is distinguishing between productive discomfort that is necessary for growth and psychological distress, which requires intervention. Keeper Kim must ask challenging questions about assumptions whilst recognising when students cross from reflection into crisis.
Fakour and Imani’s (2025) comparative study found bimodal distribution when comparing ChatGPT to human tutors for Socratic teaching, working well for some students whilst alienating others. This uncertainty about how students will respond to AI facilitated self-examination informs our staged pilot approach, which functions as an ethical safeguard and a systematic inquiry into student experience.
Pre-Pilot Status and Next Steps
The Lighthouse is still in development. A small pilot version will be released to students for feedback late in Semester 2, 2025, with plans for a full release in Semester 1, 2026. Initial prototyping confirmed the concept’s technical feasibility, but institutional approvals delayed full platform development, which is underway at time of writing.
The Lighthouse will be deployed in two postgraduate entrepreneurship courses. The staged rollout reflects both technical complexity and the lack of direct precedent. Whilst research validates Socratic AI for academic content, The Lighthouse applies this approach to personal development and self-examination, territory with less empirical grounding. OpenAI’s Study Mode, launched in July 2025, employs similar Socratic principles but focuses on academic content rather than intimate personal reflection (OpenAI, 2025). The pilot will test only the baseline reflection component, the twenty minute guided conversation with Keeper Kim. Micro-reflections, scheduled check-ins, and synthesis components remain in development. The phased approach allows us to validate the core facilitation model before building the full platform infrastructure. We are piloting with volunteers rather than requiring participation, introducing selection bias, but aligning with ethical principles around sensitive personal data.
Key questions remain unanswered: Can AI distinguish genuine from performative reflection? Will students engage honestly with AI facilitation, or find it clinical and alienating? What is the optimal balance between structured question sequences and flexibility for tangential exploration? What is lost versus gained in AI mediation compared to human facilitation?
Perhaps most significantly: we are testing whether systematised reflection can achieve outcomes that are demonstrably valuable but difficult to scale. While questioning frameworks have been validated for academic content, applying them to intimate self-examination enters less-charted territory. The pilot will reveal whether The Lighthouse represents pedagogical innovation or technological overreach. This Lighthouse pilot reflects the call by Fitzgerald et al (2025) for AI research that is pedagogically grounded, ethically aware and focused on Human-AI collaboration. Alongside BBOP and NotaBot, it tests how design and ethics intersect in shaping meaningful learning.
Conclusion
These three implementations trace a clear progression, from AI verifying knowledge (BBOP), to coaching skills (NotABot), to facilitating reflection (The Lighthouse). Each moved further from the instant-answer paradigm that defines most educational AI use and with that came increasing technical, pedagogical, and institutional complexity.
BBOP’s journey from Zapier prototype to custom architecture demonstrates a fundamental tension in educational AI development. Simple tools enable quick rollout and early adoption when engagement matters most. More comprehensive systems offer research infrastructure and enhanced capabilities but risk missing the critical engagement windows through lengthy approval processes. Universities need intermediate solutions, infrastructure and processes that enable us to move fast while staying rigorous and compliant.
Across all three cases, the guardrails prove more critical than the underlying technology. BBOP’s 17-policy framework to prevent cognitive offloading, NotABot’s three-tier scaffolding that maintained productive struggle, and The Lighthouse’s safety protocols to distinguish growth from distress, these design choices shaped learning far more than model selection. The same technology, applied differently, produced very different outcomes. Building on these constraints is hard work; it takes iteration, pedagogical judgement, and a willingness to let go of some of functionality for the sake of learning integrity.
NotABot also showed how timing and placement shape engagement. Embedding practice immediately after concept introduction, initiated by the AI rather than requiring student navigation, created engagement patterns that exceeded typical chatbot interactions. The architecture is not extremely complex, built from GitHub definitions, iframe parameters and Zapier prompts that require moderate technical expertise. The key element is the pedagogy, not the technology. This challenges the assumption that educational AI requires complex infrastructure. Sometimes the innovation lies in when and where the tool appears, not in how it works.
Several patterns emerge across all three cases. Course-specific implementations consistently outperform generic tools. BBOP trained on leadership theories, NotABot embedded within business communication modules and The Lighthouse designed for entrepreneurship reflection. Collaboration between learning designers and subject experts was essential every time. Anonymous architectures such as BBOP1 and NotABot allowed rapid deployment but limited outcome measurement, while authenticated systems like BBOP2 supported research capability but introduced engagement friction.
Most significantly, these implementations depended on a level of institutional readiness that many universities still lack. IT approval processes, ethics protocols, LTI integration capabilities, API access policies, the infrastructure that supports sophisticated educational AI is still catching up. The technical complexity we navigated is not unique to our context. Until universities develop intermediate infrastructure that allows rapid iteration under clear governance, educational AI innovation will remain caught between the two options: limited-but-quick tools or comprehensive-but-slow.
The chatbot paradigm persists because it feels intuitive: ask a question, get an answer. Moving beyond that requires reimagining AI’s role entirely, as a verifier rather than a generator, coach rather than answer provider and questioner rather than expert. These three implementations demonstrate such reimagination is possible. Collectively, these cases illustrate the shift that Fitzgerald et al. (2025) call for, toward pedagogically grounded, ethically aware AI research and echo Costello et al. (2025) in reminding us that the future of AI in education depends not only on what we build, but on how we frame, question, and understand it.
AI Use Declaration
This chapter was prepared with assistance from Claude and ChatGPT, which supported aspects of writing and editing, including clarity, structure, and citation formatting. All ideas, findings, and interpretations are the authors’ own, and the AI did not generate or analyse primary data or replace academic judgement.
References
Bastani, H., Bastani, O., Sungu, A., Ge, H., Kabakcı, Ö., & Mariman, R. (2025). Generative AI without guardrails can harm learning: Evidence from high school mathematics. Proceedings of the National Academy of Sciences – PNAS, 122(26), e2422633122. https://doi.org/10.1073/pnas.2422633122
Costello, E., Ferreira, G., Hrastinski, S., McDonald, J.K., Tlili, A., Veletsianos, G., Marin, V.I., Huijser, H., & Altena, S. (2025). Artificial intelligence in education research and scholarship: Seven framings. Journal of University Teaching and Learning Practice, 22(3). https://doi.org/10.53761/xs5e3834
Deep, P. D., Martirosyan, N., Ghosh, N., & Rahaman, M. S. (2025). ChatGPT in ESL Higher Education: Enhancing Writing, Engagement, and Learning Outcomes. Information, 16(4), 316. https://doi.org/10.3390/info16040316
Fakour, H., & Imani, M. (2025). Socratic wisdom in the age of AI: A comparative study. Frontiers in Education, 10, 1528603. https://doi.org/10.3389/feduc.2025.1528603
Favero, L. A., Pérez-Ortiz, J. A., Käser, T., & Oliver, N. (2024). Enhancing critical thinking in education by means of a Socratic chatbot. AIEER Workshop at ECAI 2024. https://doi.org/10.48550/arXiv.2409.05511
Fitzgerald, R., Roe, J., Roehrer, E., Yang, J., & Kumar, J. A. (2025). What do we want? Meaningful AI research in higher education. Journal of University Teaching and Learning Practice, 22(4). https://doi.org/10.53761/jwt7ra63
Gerlich, M. (2025). AI tools in society: Impacts on cognitive offloading and the future of critical thinking. Societies, 15(1), 6. https://doi.org/10.3390/soc15010006
Kestin, G., Miller, K., Klales, A., Milbourne, T., & Ponti, G. (2025). AI tutoring outperforms in-class active learning: An RCT. Scientific Reports, 15, 17458. https://doi.org/10.1038/s41598-025-97652-6
Luckin, R., & Holmes, W. (2016). Intelligence unleashed: An argument for AI in education. UCL Knowledge Lab.
Mahapatra, S. (2024). Impact of ChatGPT on ESL students’ academic writing skills: A mixed-methods intervention study. Smart Learning Environments, 11, 9. https://doi.org/10.1186/s40561-024-00295-9
OpenAI. (2025). Introducing study mode. OpenAI. https://openai.com/index/chatgpt-study-mode/
Owens, B. P., Johnson, M. D., & Mitchell, T. R. (2013). Expressed humility in organizations: Implications for performance, teams, and leadership. Organization Science, 24(5), 1517-1538. https://doi.org/10.1287/orsc.1120.0795
Seo, K., Tang, J., Roll, I., Fels, S., & Yoon, D. (2021). The impact of artificial intelligence on learner–instructor interaction in online learning. International Journal of Educational Technology in Higher Education, 18, 54. https://doi.org/10.1186/s41239-021-00292-9
Xu, X., Qiao, L., Cheng, N., Liu, H., & Zhao, W. (2025). Enhancing self-regulated learning and learning experience in generative AI environments: The critical role of metacognitive support. British Journal of Educational Technology, 56(1). https://doi.org/10.1111/bjet.13599