Continuous Evaluation Frameworks for AI in Educational Settings

Diane Gavin, Executive Director – Center for Academic Innovation, Texas A&M University-San Antonio

Diane Gavin, Executive Director – Center for Academic Innovation, Texas A&M University-San Antonio

In an era where many colleges and universities have closed, while others are making significant cuts due to budget deficits, does investing in AI make fiscal sense? As higher education leaders are having to make difficult decisions regarding spending and resources, can implementing AI at our institutions be considered in terms of cost benefit and return on investment (ROI)?

Continuous evaluation frameworks for AI in educational technology have evolved considerably in 2025. The fundamental shift is moving from evaluation as a periodic activity to evaluation as an integrated, continuous process that directly feeds back into system improvement based on rigorous documentation of both positive and negative impact.

positive and negative impact. Compared to traditional continuous evaluation frameworks like the Kirkpatrick Model for training evaluation, the COSO framework for internal controls, and the CDC Program Evaluation Framework for public health programs, continuous evaluation frameworks with AI use automated evaluation processes that rely on data orchestration tools and pre-defined metric data to model predictions, including input, output and real-time, realworld (or ground truth) data. “Ground truth data” are data known to be real or true, gathered and delivered through direct observation and measurement instead of information provided by interpretation or inference.

For AI systems, ground truth data evaluate students’ ideal performance by comparing outputs to the “correct answer” (the data based on real-world observations). Educational technology professionals can explain that with continuous evaluation frameworks rooted in AI, the AI uses student input based on data and correct answers observed and recorded to evaluate what the student’s ideal performance is or could be grounded in the data gathered from the AI’s observations of learner responses.

“AI-driven continuous evaluation in education transforms feedback into real-time insights, driving ongoing system improvement”

There are various AI-driven continuous evaluation frameworks that educational technologists can choose from to understand learner performance. The following areas – multi-dimensional assessment, stakeholder feedback loops, comparative benchmarking, cross-institutional data sharing (with privacy protections) and adaptation protocols – are current good practice in applying continuous evaluation frameworks for AI use across various education environments.

Multi-dimensional Assessment Approach

With a multi-dimensional assessment approach, contemporary evaluation frameworks no longer focus solely on academic metrics but incorporate multiple dimensions:

• Learning outcome measurements track subject mastery using adaptive assessments that adjust to user/learner progress.

• Engagement metrics measure not just time spent but the quality of interactions with AI systems.

• Equity indicators show if AI systems are serving all student populations effectively. •Factors inherent in some multi-dimensional approaches, also known as well-being factors, can check for potential negative impact on student mental health or social development.

Currently, multi-dimensional or multi-modal assessments incorporate various data inputs that integrate traditional assessment with digital learning behaviors like biometrics and verbal/non-verbal cues. Process-based analytics that capture both the answers and keystroke patterns, eye tracking, and time-on-task metrics also align with a multi-dimensional approach with AI systems. These assessment practices, as well as collaboration assessment models that evaluate group dynamics and collaborative learning through interaction analysis offer avenues to capture holistic views of student learning.

Stakeholder Feedback Loops

With continuous evaluation frameworks that include AI, effective evaluation now includes structured input from all parties to provide a 360° feedback for learning. In a K-12 environment, for instance, stakeholder feedback loops with AI often looks like the following

• Student feedback mechanisms for assignments built directly into AI interfaces.

• Teacher observation protocols to document classroom impact on learning.

• Parent portals that gather insights about home learning experiences based on parental use of the portal.

• Administrator review cycles that assess institutional integration and student performance.

Comparative Benchmarking

Similar to the TIMMS Index for K-12 math success, the Programme for International Student Assessment (PISA) Index uses strong comparative benchmarking frameworks include contextual performance analysis to measure student performance over time and to measure education reforms and politics to decide if positive changes occur in three-year student cycles. PISA Index comparative benchmarking is considered a best-practice model to guide the formation of national or federal assessment practices.

Comparative benchmarking has been found to

• Complement validated national data against international benchmarking.

• Help in evaluating the effectiveness of educational reform.

• Set and monitor school system performance targets and indicators.

• Establish cross-institutional data sharing (with privacy protections).

• Construct control group comparisons where appropriate.

• Generate longitudinal tracking to show long-term impact beyond immediate school year gains when using AI tools.

Adaptation Protocols

Modern evaluation frameworks connected to AI-enabled learning use adaptation protocols to close the loop. There are a range of adaptive assessment protocols that exist for AI-enabled learning, from knowledge graph navigation that maps students’ conceptual understanding and generates target questions to explore or expose weak areas in a student’s knowledge structure to scenario-based adaptive assessment that relies upon dynamic simulations that branch based on students’ demonstrated competencies and decisions. Some adaptation protocols function using continuous calibration systems that recalibrate questions in real time; for example, GMAT and GRE exams recalibrate difficulty based on immediate test-taker performance.

Conclusion

Regardless of the continuous evaluation frameworks for AI being reviewed, education technologists need to consider learner level when selecting a protocol. Be sure to

• Select version control systems that track adjustments to AI models.

• Consider the impact on current assessment requirements before implementing significant changes.

• Develop transparency reporting to all stakeholders about system performance and changes.

Weekly Brief

Read Also

Building Future Leaders with Faith and Integrity

Building Future Leaders with Faith and Integrity

Scott Hamm, Director of Pedagogical Partnerships, Abilene Christian University
Building Practical Readiness in Cybersecurity Talent

Building Practical Readiness in Cybersecurity Talent

Dan Han, Chief Information Security Officer, Virginia Commonwealth University
Virtualization as a Bridge For Multi-Campus Support

Virtualization as a Bridge For Multi-Campus Support

Larvell Davis, Director of Information Technology, VCU School of Pharmacy
The Liberal Arts to the Rescue

The Liberal Arts to the Rescue

Renee Glass-Starek, Director, Career and Professional Development Center and Adjunct Instructor, Seton Hill University
Faith Based Counseling Supporting Holistic Student Growth

Faith Based Counseling Supporting Holistic Student Growth

Raina Foote, Director of Counseling, Rejoice Christian Schools
Will ChatGPT End Higher Ed? Not So Fast

Will ChatGPT End Higher Ed? Not So Fast

Dr. Mira Lalovic-Hand, Senior Vice Chancellor and CIO, Rowan University