PRISMA Article Screening Process
PRISMA Article Screening Process

Measuring Creativity with Deep Learning Techniques: A Scoping Review of Automated Evaluation Methods

1. Introduction

Creativity is increasingly recognized as a crucial 21st-century skill, becoming a core component of educational policies and curricula worldwide (Plucker et al., 2023). This multifaceted concept has seen significant research advancements in understanding its various elements, including idea generation in collaborative creative processes (Sawyer, 2011, 2022). Furthermore, the critical role of creativity evaluation has emerged as a key area of study (Guo et al., 2023). Creativity evaluation, defined as the capacity to accurately identify creative ideas, solutions, or individual traits, is essential for understanding creative strengths and potential (Kim et al., 2019). In education, this evaluation is particularly vital for teachers and students, facilitating the monitoring, refinement, and implementation of innovative ideas, ultimately enhancing students’ creative performance throughout the creative process (Rominger et al., 2022).

However, measuring creativity presents a complex challenge in research. Creativity evaluation traditionally focuses on four core dimensions: fluency (the number of meaningful ideas), flexibility (the variety of idea categories), elaboration (the depth of detail in ideas), and novelty (the uniqueness of ideas) (Bozkurt Altan and Tan, 2021). Manual creativity evaluations, including paper-based tests and psychological assessments, have been widely employed (Rafner et al., 2022). Examples of these include the Torrance Tests of Creative Thinking (Torrance, 2008), the Creativity Assessment Packet (CAP) (Williams, 1980), and Divergent Production abilities (DP) tests (Guilford, 1967). Other manual methods encompass rating scales (Gong and Zhang, 2017; Birkey and Hausserman, 2019), surveys and questionnaires (De Stobbeleir et al., 2011; Gong et al., 2019), grading rubrics (Vo and Asojo, 2018), and subjective scoring of creativity dimensions (George and Wiley, 2020). Despite their prevalence, these manual methods are prone to errors due to subjectivity in expert ratings and are notably time-consuming (Said-Metwaly et al., 2017; Doboli et al., 2020).

To overcome these limitations, automated creativity evaluation leveraging Artificial Intelligence (AI) techniques offers a promising alternative. AI-driven approaches can also enrich co-creation processes by providing real-time feedback, guiding students toward the development of novel solutions (George and Wiley, 2020; Kenworthy et al., 2023). AI, in essence, empowers machines to execute tasks that typically require human intelligence. Within AI, Machine Learning (ML) algorithms enable systems to learn from data and make predictions. Specifically, computer vision is used for analyzing visual data, while Natural Language Processing (NLP) is employed for textual data analysis. Given the focus on textual ideas in creativity, NLP becomes instrumental in enabling machines to understand, interpret, analyze, and generate human language (Braun et al., 2017).

NLP encompasses a diverse array of approaches and techniques, including text similarity, text classification, topic modeling, information extraction, and text generation. These techniques utilize computational methods ranging from statistical analyses to sophisticated predictive and deep learning models. NLP provides various avenues for computing variables relevant to creativity dimensions. Within the vector space created by NLP, five key variables can be derived: (1) Contextual and semantic similarity, used to gauge idea uniqueness and originality (Hass, 2017; Doboli et al., 2020); (2) text clustering, capable of identifying different categories within text; (3) text classification, employed to compute novelty (Simpson et al., 2019); (4) keyword searching, primarily used for measuring elaboration (Dumas et al., 2021); and (5) information retrieval, applicable for scoring the level of idea elaboration (Vartanian et al., 2020). These applications of NLP in co-creative processes can automate creativity evaluation and enhance co-creation by providing valuable feedback (Bae et al., 2020; Kang et al., 2021; Kovalkov et al., 2021).

Current research is increasingly focused on exploring how various computational techniques, particularly deep learning, can be effectively used for measuring creativity dimensions (Doboli et al., 2020). This area of research has been highly productive, leading to the development of diverse computational techniques. For instance, (1) novelty is assessed using keyword similarity (Prasch et al., 2020), part-of-speech tagging (Karampiperis et al., 2014; Camburn et al., 2019), and various ML classifiers like Bayesian classifiers, random trees, and Support Vector Machines (SVM) (Manske and Hoppe, 2014; Simpson et al., 2019; Doboli et al., 2020); (2) originality is measured using Latent Semantic Analysis (LSA) (Dunbar and Forster, 2009), Global Vectors for Word Representation (GloVe) (Dumas et al., 2021), and part-of-speech tagging (Georgiev and Casakin, 2019); (3) fluency is evaluated with LSA (Dumas and Dunbar, 2014; LaVoie et al., 2020); (4) elaboration is measured via part-of-speech tagging (Dumas et al., 2021); and (5) the level of detail is assessed using text-mining methods (Camburn et al., 2019).

This study addresses four main challenges in the current research landscape of computational creativity assessment: (1) the wide range of computational techniques applied to evaluate diverse creativity dimensions; (2) the lack of consensus on specific techniques for measuring particular creativity dimensions; (3) the often-unexplained rationale behind using certain techniques for specific dimensions, such as using LSA for category switch evaluation (Dunbar and Forster, 2009); and (4) the necessity to consider the inherent limitations of computational techniques that may affect the accuracy of creativity dimension evaluation (Olivares-Rodríguez et al., 2017; Doboli et al., 2020). To the best of our knowledge, no existing literature review comprehensively addresses these challenges. This exploration leads us to two key research questions: (1) What NLP approaches and techniques are currently employed to automatically measure creativity? and (2) Which creativity dimensions are being automatically computed, and how? Answering these questions allows us to tackle the aforementioned challenges in automatic creativity evaluation, fostering a deeper understanding of NLP approaches and creativity dimensions, their applications in evaluating creativity, identifying research gaps and limitations, and proposing alternative solutions to advance the evaluation and promotion of creativity. Therefore, we adopted a scoping review methodology, which is effective for understanding key concepts and identifying knowledge gaps (Munn et al., 2018, ultimately aiming to inspire innovation and improve education through advanced technologies.

2. Research Objectives

This scoping review is structured around two primary objectives:

  1. To identify and categorize the various ML approaches utilized in automatic creativity evaluation. This includes highlighting their application scenarios and discussing the inherent limitations of different computational approaches and techniques. This categorization aims to provide a more profound understanding of the contributions of various ML approaches to the field of automated creativity assessment.

  2. To analyze the definitions and computational methods used for different creativity dimensions in automatic creativity evaluation research. This analysis seeks to foster a more unified understanding of creativity dimensions and their computation, paving the way for future advancements in automatic creativity evaluation methodologies and ensuring more robust measures of creativity.

3. Method

This section outlines the sampling method used to gather and synthesize state-of-the-art approaches in automatic creativity evaluation. Our methodological framework follows the PRISMA technique (Dickson and Yeung, 2022), employing a scoping review to identify relevant and significant research papers based on four core concepts:

  1. Creativity: Articles must be directly related to creativity, particularly the creative process (Sawyer, 2011).

  2. Measurement/Evaluation/Assessment of Creativity Dimensions: The studies must focus on methods for measuring, evaluating, or assessing creativity dimensions.

  3. Technology: We selected studies that utilize technology for assistance or evaluation. This concept aims to review technological support in creativity evaluation and explore future research in the creative process involving technology.

  4. Domain: We focused on creativity processes applicable within the educational sector, aiming to enhance student creativity. Fields like medicine, finance, and business were excluded from the search.

Considering these core concepts, we included peer-reviewed journal articles and conference papers in this mapping study, searching the Scopus database for publications between 2005 and 2021. Interestingly, despite the time span, the earliest study meeting our inclusion criteria dates back to 2009, with the majority published in recent years. This indicates that automatic creativity evaluation is a relatively recent area of research that is rapidly gaining attention and remains an active and evolving field.

We excluded articles focusing solely on individual or organizational creativity evaluation, domains outside of education (e.g., medicine and finance), articles not in English, those published before 2005, and studies lacking a technological component in creativity evaluation.

For this scoping review, we extracted articles from Scopus using the search query: [(creativ* OR “Creative Process” OR “Novelty” OR “Flexibility” OR “Fluency” OR “Elaboration” OR “Originality”) AND (Measur*OR Evaluat* OR Asses* OR Calcul* OR Analys* OR Scor* OR Qunat*) AND (Automat* OR Comput* OR Machin* OR Natural* OR Artificial* OR Deep learning OR Mathemat* OR Mining) AND (E-learning OR educa* OR Learn* OR School OR students*)].

This search yielded 364 research articles. Applying inclusion and exclusion criteria through title, abstract, keyword, and conclusion review narrowed the selection to 65 articles. Subsequently, the authors thoroughly read, checked, and discussed these selected articles, conducting all screening stages to address the two research questions. Discrepancies were resolved through consensus among the authors, a member-checking process to ensure “trustworthiness” in qualitative research (Toma, 2011). After this rigorous process, 26 articles were ultimately included in this scoping review. The complete article selection procedure, following the PRISMA technique, is illustrated in Figure 1.

FIGURE 1

Figure 1. Screening procedure of the articles using the PRISMA technique.

4. Results

4.1. Approaches and Techniques Used in Automatic Creativity Evaluation (RQ1)

The compilation of computational approaches and techniques employed in automatic creativity evaluation research, aimed at answering the first research question, revealed three key findings:

Firstly, research in creativity evaluation is distributed across three primary NLP approaches: (1) Text Similarity, which measures the relatedness and closeness of words, sentences, or paragraphs within a numerical space; (2) Text Classification, a supervised learning approach (requiring data training) that utilizes ML algorithms (such as K-Nearest Neighbor (KNN) and Random Forest) to automatically analyze text and assign predefined tags or categories; and (3) Text Mining, which uses NLP to examine and transform large volumes of unstructured text data to uncover new information and patterns. These three NLP approaches and their associated computational techniques, as identified in the reviewed studies, are depicted in Figure 2.

FIGURE 2

Figure 2. Different NLP approaches in creativity evaluation.

Secondly, the scoping review indicated that text similarity is the most frequently used approach (in 69% of the reviewed studies), followed by text classification (27%), with text mining being the least common (only 4% of studies), as illustrated in Figure 2.

Thirdly, our review identified and categorized the specific computational techniques used within these three NLP approaches (text similarity, text classification, and text mining), and the creativity dimensions they were used to automatically evaluate. The following sections present the mapping constructed from a detailed analysis of all studies included in this scoping review, providing a comprehensive overview of techniques for measuring creativity with deep learning techniques and other NLP methods.

Within the text similarity approach, NLP transforms textual ideas into a numerical vector space. This conversion in the reviewed studies employed a wide range of techniques, categorized into three main types: string-based similarity, corpus-based similarity, and knowledge-based similarity. These categories and their computational techniques are shown in Figure 3, with Table 1 mapping the automatic creativity evaluation studies to these categories and specific techniques.

FIGURE 3

Figure 3. Text similarity approaches, categories, sub-categories, and their computational techniques.

TABLE 1

Table 1. Categorizing of review studies in text similarity approaches and percentages of studies included in the review that use each approach.

In the first category, string-based similarity (6% of text similarity approaches in reviewed studies) focuses on matching exact keywords or alphabet strings, using techniques like Longest Common Substring (LCS) or N-gram (a subsequence of n items from a given text sequence). Keyword matching is used to compute the string similarity of ideas with existing ideas in a database (Prasch et al., 2020).

The second category, corpus-based similarity, is the most prevalent (72% of textual similarity approaches), as detailed in Table 1. This category is further divided into two sub-categories: statistical-based models and deep learning-based models. Statistical-based models, such as LSA, represent a corpus in a word-document matrix with words as row vectors and documents as column vectors, applying weighting and dimension reduction schemes before calculating cosine similarity between word vectors (Martin and Berry, 2007; Wagire et al., 2020). Deep learning-based models (both word and sentence embeddings) utilize supervised (data-trained), semi-supervised, or unsupervised methods (no prior training), trained on large corpora like Wikipedia and the Common Crawl dataset. Deep learning models such as Word2Vec (Mikolov et al., 2013) or GloVe (Pennington et al., 2014) leverage knowledge from extensive datasets, encode data, and identify word or sentence similarities. The GloVe model has demonstrated reliable results, particularly for single-word creativity tasks, showing comparability with expert scores (Beaty and Johnson, 2021; Johnson and Hass, 2022). This highlights the increasing role of deep learning techniques in measuring creativity.

The third category is knowledge-based similarity (22% of text similarity approaches in reviewed studies, Table 1), which uses ontologies to represent textual data as semantic network graphs of nodes (semantic memory) and lines. Ontologies are extensive dictionaries of millions of lexically associated words, such as WordNet, Wikipedia, and DBpedia.

Text classification, the second NLP approach, was used in 27% of the reviewed studies for automatic creativity evaluation (Figure 1). Classification is an ML technique categorizing text into predefined categories, involving four main steps: (1) data collection, pre-processing (data acquisition, cleaning, and labeling), and data presentation (feature selection, training/testing dataset division); (2) application of classifier models; (3) classifier evaluation; and (4) prediction (testing data output). These steps are crucial when using text classification for measuring creativity. Table 2 provides an overview of the classification approach, datasets, classifiers, evaluations, and creativity dimensions in creativity evaluation research.

TABLE 2

Table 2. Text classification-based creativity evaluation studies.

Text mining, the third approach in automatic creativity evaluation, involves analyzing large textual data collections to identify key concepts, trends, patterns, and hidden relationships. In this review, text mining was used in studies like Dumas et al. (2021), employing techniques such as all word count, stop list inclusion (removing non-meaningful terms), part-of-speech counting, and inverse document frequency (extracting rare and important documents).

4.2. Creativity Dimensions Computed Automatically (RQ2)

Across the studies included in this review of automatic creativity evaluation, we identified 25 distinct creativity dimensions. These are listed in the second column (Manifestation) of Table 3. Analyzing the conceptual definitions and computational approaches used in studies assessing different creativity dimensions allowed us to categorize these 25 manifestations into seven core creativity dimensions: novelty, value, flexibility, elaboration, fluency, feasibility, and others related to playful creativity aspects like humor. These core dimensions are presented in the first column of Table 3 (Core Dimension).

TABLE 3

Table 3. Characterization of 25 creativity dimensions into seven core creativity dimensions (first column) and creativity dimensions manifested (second column) based on similarities in definitions (third column) and computation (fourth column).

Furthermore, the results answering research question two are visualized in Figure 4, showing the percentage distribution of the seven core creativity dimensions identified. Novelty emerges as the most frequently evaluated dimension in the reviewed studies.

FIGURE 4

Figure 4. Percentage distribution of each core creativity dimension in the reviewed studies.

5. Discussion

5.1. Approaches and Techniques Used in Automatic Creativity Evaluation

This scoping review identified three primary NLP approaches used in automatic creativity evaluation: (1) text similarity, (2) text classification, and (3) text mining. The following sections discuss each approach’s contribution, applications, limitations, research gaps, and recommendations for future automatic creativity evaluations, particularly focusing on the role of deep learning techniques.

The prevalence of the text similarity approach, used in 69% of studies, highlights its importance in understanding creative thinking (Li et al., 2023). Its widespread use in automatic creativity evaluation stems from the focus on evaluating originality, novelty, similarity, and diversity – dimensions that inherently involve assessing the similarity of ideas to existing ones. The text similarity approach offers a variety of computational techniques for this, as shown in Figure 3.

Comparing the three categories of text similarity – string similarity, corpus-based similarity, and knowledge-based similarity – as detailed in Table 3, reveals differences in their similarity computation processes and applicability. String-based and knowledge-based similarities have limited application in automatic creativity evaluation. String-based similarity is restricted by its focus on syntactic similarity rather than semantic meaning, while knowledge-based similarity often extracts specific entities rather than capturing the nuanced technical or scientific jargon used in complex ideas (Camburn et al., 2019). For example, in brainstorming renewable energy solutions, a knowledge-based approach might miss specific terms like “photovoltaics” or “wind turbines.” Corpus-based techniques are more widely used, and we will elaborate on these further, particularly focusing on deep learning.

Corpus-based similarity is commonly used in automatic evaluation due to its range of techniques, from statistical to deep learning models (Figure 2). Statistical models like LSA, applied to examine semantic similarity, memory, and creativity (Beaty and Johnson, 2021), have shown reliable originality scoring in divergent thinking tasks, sometimes outperforming human raters (Dunbar and Forster, 2009; Dumas and Dunbar, 2014; LaVoie et al., 2020, Table 1). However, LSA’s statistical techniques, including Probabilistic Latent Semantic Analysis (Hofmann, 1999), Latent Dirichlet Allocation (Blei et al., 2003), and Non-Negative Matrix Factorization (Lee and Seung, 1999), are limited as they primarily consider word statistics (e.g., word co-occurrence) rather than contextual and semantic meaning. Deep learning models address these limitations.

Recent advancements in NLP, particularly with deep learning models based on deep neural architectures, have revolutionized text modeling, allowing for greater nuance and complexity. This began with word embedding models like GloVe and Word2Vec, pre-trained on extensive datasets like Wikipedia and news articles. These predictive models utilize neural networks with hidden layers to learn word vector representations. GloVe has shown results comparable to human expert scores in single-word creativity tasks (Beaty and Johnson, 2021; Olson et al., 2021), demonstrating the potential of deep learning for measuring creativity. However, word embedding models don’t differentiate between keyword lists and meaningful sentences, limiting their ability to capture semantic and contextual sentence meaning.

Vectorizing entire sentences marks a significant innovation in text modeling. Transformer architectures, leveraging the concept of attention (Vaswani et al., 2017), generally outperform word embedding models by significant margins in standard tasks (Wang et al., 2018, 2019). Attention makes it computationally feasible for transformer models to process long text sequences by focusing on the most important parts. This has led to two main categories: pre-trained sentence embedding models and text generation models.

Sentence embedding models vectorize entire sentences, preserving semantic and contextual meaning. Unsupervised techniques like Unsupervised Smooth Inverse Frequency (uSIF) (Ethayarajh, 2018) and Geometric Sentence Embedding (GEM) (Yang et al., 2018) require no external data. Transformers like BERT (Devlin et al., 2018), Sentence Transformer (Reimers and Gurevych, 2019), MPNet (Song et al., 2020), Skip-Thought (ST) (Kiros et al., 2015), InferSent (Conneau et al., 2017), and Universal Sentence Encoder (USE) (Cer et al., 2018) can be fine-tuned or trained on specific datasets for improved performance. The USE model has been used in creativity research to evaluate idea novelty (Kenworthy et al., 2023), suggesting further exploration of various sentence embedding models, or their combinations, for evaluating creative ideas in open-ended co-creation is warranted.

Text generation models, such as Generative Pre-trained Transformer (GPT-3) (Brown et al., 2020), Text-to-Text Transfer Transformer (T5) (Raffel et al., 2020), and Long Short-Term Memory (LSTM) (Huang et al., 2022), generate new text similar to a given prompt. In creativity research, Generative Adversarial Networks (GANs) (Aggarwal et al., 2021), a text generation model, were used by Franceschelli and Musolesi (2022) to evaluate novelty, surprise, and relevance, demonstrating the application of deep learning techniques in measuring creativity. However, there are criticisms regarding text generation models for evaluating open-ended ideas. Firstly, text generation is optimized for generating text from given prompts, useful for dialog generation, machine translation, chatbots, and prompt-based learning (Liu et al., 2023. Secondly, as models improve in text generation, they may become more likely to replicate input data rather than produce novel or creative outputs. Despite these concerns, text generation models haven’t been extensively tested in creativity research, suggesting future investigations are needed to understand their limitations and potential in measuring creativity.

In conclusion, for single-word tasks in creativity research, word embedding models, particularly GloVe, are effective. For open-ended co-creation with sentence-structured ideas, sentence embedding models are more suitable because they (a) represent entire sentences in vector space, capturing semantic and contextual meaning; (b) outperform word embedding models in textual similarity tasks; and (c) can be applied to small datasets and open-ended problems due to pre-training on large corpora. We recommend validating sentence embedding models and further exploring text generation models within broader co-creation contexts to fully understand the potential of deep learning techniques for measuring creativity.

Sentence embedding models offer a powerful tool that can be used alongside statistical (Acar et al., 2021), word embedding models (Organisciak et al., 2023), and standard subjective scoring methods for evaluating the creative process and its outputs (Kenett, 2019).

The text classification approach automates the categorization of textual data into predefined classes using machine learning classifiers. This approach relies heavily on a large dataset, typically split into training (70%) and testing (30%) sets. The ML classifier learns from the training set and then categorizes the testing set. Integrating text classification into automatic creativity evaluation is contingent on four key factors: dataset quality and size, ML classifier selection, classifier accuracy, and the specific creativity dimensions being evaluated. These factors are highlighted in Table 2.

Dataset considerations are critical for text classification. Firstly, datasets require pre-processing and labeling, including noise removal and assigning class labels to each idea. Secondly, large datasets are essential for training ML classifiers effectively; classifier prediction capability improves with larger training datasets. Most studies reviewed in Table 2, except Stella and Kenett (2019), used over a thousand ideas for classification. Smaller datasets may require different or more balanced approaches. Thirdly, classifiers trained on one data type are not transferable to other data types. For example, classifiers trained on linguistic data cannot be directly applied to scientific data.

Classifier selection and accuracy are also crucial. Different ML classifiers operate differently and are suited to different dataset characteristics. For example, SVM performs well in multiclass classification, while random forests excel with numerical and categorical features. Logistic regression works for linear problems, K-Nearest Neighbor is suitable for text, and SVM can also handle multiclass datasets. Bayesian approaches are simple and fast algorithms. The reviewed studies often lacked justification for specific classifier choices. Classifier accuracy is also a concern, with potential for low accuracy. Model accuracy is evaluated using metrics like confusion matrices, entropy, and sensitivity (Table 2). It is advisable to test multiple classifiers and select the one with the highest accuracy for prediction within a similar domain, optimizing the measurement of creativity.

While text classification can evaluate various creativity dimensions, its reliance on large, labeled datasets limits its application in creativity research. Dataset preparation and labeling can be expensive, potentially negating the advantages of automatic evaluation over manual methods in terms of accuracy, cost, and time. Furthermore, text classification problems are domain-dependent. While public datasets exist for tasks like object use and alternate use tasks, these may not be suitable for small, open-ended creative tasks that are domain-independent and lack sufficient data for classifier training. In summary, the extensive data preparation, labeling, and domain dependence can make text classification less reliable and more expensive than manual creativity evaluation for certain types of creativity assessment.

Text mining uses NLP statistical computations to discover new information and patterns, employing statistical indicators like word frequency, patterns, and correlations. Dumas et al. (2021) used text mining techniques to measure elaboration scores in Alternate Use Tasks (AUT), using methods like unweighted word count, stop list inclusion, part-of-speech counting, and inverse document frequency.

These text mining techniques represent basic statistical NLP operations. Text mining has the potential to process massive datasets to uncover new information, patterns, trends, and relationships relevant to creativity research. Applications include search engines, product suggestion analysis, social media analytics, and trend analysis, suggesting broader applications in measuring creativity.

5.2. Automatically Computed Creativity Dimensions

This scoping review identified 25 automatically computed creativity dimensions. However, our analysis indicates that these dimensions are not always well-grounded in prior creativity research and theory. This leads to theoretical and methodological inconsistencies that future research needs to address. Firstly, some dimensions are defined and computed based on specific challenges or creativity tasks designed for experiments, rather than on a robust theoretical framework. For example, “category switch” is defined as the similarity difference between successive responses in object use tasks (Dunbar and Forster, 2009). Similarly, “quality” (reusability) and “usefulness” (degree of completion) are defined within the context of programming problems (Manske and Hoppe, 2014). Secondly, inconsistency arises from variations in manifestations across studies. Dimensions like novelty (Prasch et al., 2020), similarity (LaVoie et al., 2020), and originality (Beaty and Johnson, 2021) are often similarly defined, focusing on idea or solution similarity, and measured using semantic textual similarity, albeit with different computational techniques.

To address these shortcomings and improve the measurement of creativity, this review analyzed the conceptual and computational frameworks of each study, contributing to the identification of seven core creativity dimensions that can be automatically evaluated more consistently: novelty, elaboration, flexibility, value, feasibility, fluency, and playful aspects (humor, recreational effort). We discuss each core dimension, highlighting conceptual definitions and computational approaches.

Novelty, the most evaluated core dimension (59% of reviewed studies), shows significant diversity in definitions and measures. Studies use different terms for novelty, including: (1) uniqueness (concept distinctiveness (Camburn et al., 2019)); (2) originality (difference from standard solutions, semantic distance between ideas (Georgiev and Casakin, 2019; Beaty and Johnson, 2021)); (3) similarity (meaning similarity between texts, distance between texts (LaVoie et al., 2020; Olson et al., 2021)); (4) diversity (user query diversity); (5) rarity (rare combinations, unique solutions (Karampiperis et al., 2014; Doboli et al., 2020)); (6) common use (difference between common and uncommon solutions); (7) surprise (artifact deviation from existing attributes (Shrivastava et al., 2017)); and (8) influence (artifact comparison with others (Shrivastava et al., 2017)).

Despite labeling diversity, six characteristics emerge as defining novelty and aiding automatic evaluation: (1) deviation from standard problem-solving (Manske and Hoppe, 2014); (2) semantic distance between ideas (Beaty and Johnson, 2021); (3) meaning similarity between texts (LaVoie et al., 2020); (4) semantic similarity of user queries to challenge concepts; (5) property combinations (Karampiperis et al., 2014); and (6) surprise and unexpected ideas (Shrivastava et al., 2017). These characteristics reflect the complexity of defining novelty and the challenges in developing automatic measures of creativity, especially novelty.

Despite these challenges, common computational approaches for measuring novelty as a core dimension include: (1) distance of new solution to existing solutions (Manske and Hoppe, 2014); (2) semantic distance between ideas (Beaty and Johnson, 2021; Olson et al., 2021); (3) semantic similarity of user queries to relevant Wikipedia concepts; (4) semantic distance between story clusters; and (5) semantic distance between consecutive story fragments (Karampiperis et al., 2014). Therefore, when automatically evaluating novelty, semantic distance from existing solutions is a key consideration for measuring creativity.

Value, the second core dimension, is related to concepts like overall value (societal perception (Georgiev and Casakin, 2019)); quality (reliability, maintainability, extensibility, adaptability of programming solutions (Manske and Hoppe, 2014)); usefulness (correctness); and adaptiveness (problem-solving effectiveness (Jimenez-Mavillard and Suarez, 2022)). These concepts share a common meaning of usefulness and quality, constituting the value dimension of creativity. In computer science, value, quality, usefulness, adaptiveness, and style are non-functional quality attributes. Value, quality, and usefulness computation varies by task, e.g., programming solution quality is reusability and scalability (Manske and Hoppe, 2014), and usefulness is task completion degree (Prasch et al., 2020). The value dimension requires clearer definitions and computational metrics for accurate measurement of creativity.

Flexibility, the third core dimension, is a key executive function in creative thinking (Boot et al., 2017), driving individuals to explore diverse directions and pathways, increasing the likelihood of highly creative ideas (Zhang et al., 2020; Acar et al., 2021). Flexibility is defined in two ways: category switching (transitioning between semantic concepts (Dunbar and Forster, 2009; Acar et al., 2019; Mastria et al., 2021)); and the number of semantic categories, varieties, or topics generated (Dunbar and Forster, 2009). Different computational approaches are used due to these varying definitions. Flexibility as category switching is measured using semantic similarity approaches like LSA (Dunbar and Forster, 2009), network graphs (Cosgrove et al., 2021), and sentence embedding models. Flexibility as semantic categories is evaluated using text clustering (Sung et al., 2022) or topic modeling techniques (e.g., LDA (Chauhan and Shah, 2021)) to categorize or extract topics from textual ideas, providing different methods for measuring creativity. Category switch flexibility is computationally simpler, requiring text similarity measures rather than complex category identification.

Elaboration, another core dimension, is defined as the degree to which participants detail their responses (Camburn et al., 2019; Dumas et al., 2021), adding reasoning or cause to ideas. Automatic evaluation measures elaboration by counting words in an idea (Camburn et al., 2019). Four methods for evaluating elaboration include: (1) counting all words (unweighted); (2) counting words excluding stop words; (3) counting nouns, verbs, and adverbs; and (4) counting adjectives and high-weight uncommon words (inverse frequency weighting). More words indicate higher elaboration. However, this computation may miss conjunctions (Tuzcu, 2021) or reasoning words (Sedova et al., 2019; Hennessy et al., 2020) that add explanation. Semantic search for reasoning-related words (e.g., because, therefore, since) could improve elaboration measurement of creativity.

Fluency is defined as the number of ideas generated. This dimension has a consensus on definition (number of ideas) and computation (counting ideas) (Dumas and Dunbar, 2014; Stella and Kenett, 2019). More ideas increase the chance of original outputs (Dumas and Dunbar, 2014). Fluency is easy to measure and independent of other dimensions, making it a straightforward measure of creativity.

Feasibility is defined as a solution achievable in practice (Georgiev and Casakin, 2019). Transcendence and realization are manifestations of feasibility (Jimenez-Mavillard and Suarez, 2022), referring to practical achievement. While important in creativity research, automatic computation of feasibility, transcendence, and realization lacks clear rationale from creativity research. Feasibility is product-oriented and used in ideation, but transforming ideas into practice remains a challenge for automatic measurement of creativity. Further research is needed to automatically measure feasible, transcendent, and realistic ideas.

Other dimensions related to playful creativity aspects include humor (Simpson et al., 2019) and recreational effort (Karampiperis et al., 2014). Humor, the funniness of ideas, is measured by pairwise text comparison. Recreational effort, solution difficulty, is measured using clustering. These dimensions contribute to playful creativity and require clear definitions and computational approaches from both psychology and computer science perspectives for effective measurement of creativity.

6. Conclusion

This scoping review aimed to analyze automatic creativity evaluation from computer science and education perspectives. We addressed two research questions: identifying NLP approaches and techniques used, and analyzing which creativity dimensions are computed and how.

The first research question’s contributions include: (1) identifying ML approaches and techniques in automatic creativity evaluation; (2) categorizing approaches (text similarity, classification, mining), highlighting text similarity as most common; (3) classifying studies by techniques within these approaches (string, corpus, knowledge-based similarity), showing corpus-based methods as widely used; (4) identifying limitations and alternative techniques (e.g., statistical and word embedding limitations, sentence embedding potential); and (5) providing a broad overview of automatic creativity evaluation. We concluded that word embedding models (GloVe) are effective for single-word tasks, while sentence embedding models are promising for open-ended, sentence-structured ideas, especially when measuring creativity with deep learning techniques.

The second research question’s contributions include: examining automatically evaluated creativity dimensions; noting 25 dimensions in automatic creativity evaluation, compared to standardized tests’ four dimensions; analyzing dimension definitions and measures; identifying similarities in definitions and computations; and categorizing 25 dimensions into seven core dimensions (novelty, elaboration, flexibility, value, feasibility, fluency, playful aspects). This analysis provides a coherent framework for core creativity dimensions and their computation, improving the measurement of creativity.

This review bridges computer science and education. For computer scientists, it offers insights to refine NLP approaches and develop novel methods for evaluating and promoting creativity, particularly using deep learning techniques. Educators can use automatic evaluations as pedagogical tools in classrooms. Automatic creativity evaluation can assess and nurture creativity, aligning with educational policy initiatives. Ultimately, AI serves as a valuable tool for evaluating and enhancing creativity, equipping future citizens to innovate solutions for global challenges through improved measures of creativity.

6.1. Limitations and Future Work

This scoping review has two limitations: search keyword strategy may have missed key articles, and inclusion/exclusion criteria may have omitted relevant studies. We mitigated this by using an inclusive search string and explicit criteria with co-author consensus.

Future work will experimentally evaluate the reliability of deep learning models like sentence embedding models for measuring novelty in open-ended co-creative processes. We also suggest using text generation models to recommend hints for divergent thinking. Addressing the research gap in fully automating core creativity dimensions, we plan to simultaneously measure different core dimensions using ML techniques. Developing reliable automatic evaluation of creativity dimensions can enable real-time recommendations during the creative process, fostering student creativity and improving methods for measuring creativity with deep learning techniques.

Data availability statement

The original contributions presented in the study are included in the article/supplementary material, further inquiries can be directed to the corresponding author.

Author contributions

IU has contributed in the conceptualization of the paper, methodology and investigation; he has participated in writing the original manuscript, revision and edition. MP is the principal investigator of the research project and she has designed the project, she has also contributed in the conceptualization of the paper, methodology and investigation; she has participated in writing the manuscript, revision and edition. Both authors contributed to the article and approved the submitted version.

Funding

This research has been funded by the Ministry of Science and Innovation of the Government of Spain under Grants EDU2019-107399RB-I00 and PID2022-139060OB-I00.

Conflict of interest

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Publisher’s note

All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.

References

Acar, S., Berthiaume, K., Grajzel, K., Dumas, D., Flemister, C., and Organisciak, P. (2021). Applying automated originality scoring to the verbal form of torrance tests of creative thinking. Gifted Child Quart. 67, 3–17. doi: 10.1177/00169862211061874

CrossRef Full Text | Google Scholar

Acar, S., Runco, M. A., and Ogurlu, U. (2019). The moderating influence of idea sequence: A re-analysis of the relationship between category switch and latency. Person. Indiv. Differ. 142, 214–217. doi: 10.1016/j.paid.2018.06.013

CrossRef Full Text | Google Scholar

Aggarwal, A., Mittal, M., and Battineni, G. (2021). Generative adversarial network: An overview ofvtheory and applications. Int. J. Inform. Manage. Data Insights 1, 100004. doi: 10.1016/j.jjimei.2020.100004

CrossRef Full Text | Google Scholar

Bae, S. S., Kwon, O.-H., Chandrasegaran, S., and Ma, K.-L. (2020). “Spinneret: aiding creative ideationvthrough non-obvious concept associations,” in Proceedings of the 2020 CHI Conference on HumanvFactors in Computing Systems 1–13. doi: 10.1145/3313831.3376746

CrossRef Full Text | Google Scholar

Beaty, R. E., and Johnson, D. R. (2021). Automating creativity assessment with semdis: An open platformvfor computing semantic distance. Behav. Res. Methods 53, 757–780. doi: 10.3758/s13428-020-01453-w

PubMed Abstract | CrossRef Full Text | Google Scholar

Birkey, R., and Hausserman, C. (2019). “Inducing creativity in accountants’ task performance: The effects of background, environment, and feedback,” in Advances in Accounting Education: Teaching and Curriculum Innovations (Emerald Publishing Limited) 109–133. doi: 10.1108/S1085-462220190000022006

CrossRef Full Text | Google Scholar

Blei, D. M., Ng, A. Y., and Jordan, M. I. (2003). Latent dirichlet allocation. J. Mach. Learn. Res. 3, 993–1022. doi: 10.5555/944919.944937

CrossRef Full Text | Google Scholar

Boot, N., Baas, M., Mühlfeld, E., de Dreu, C. K., and van Gaal, S. (2017). Widespread neural oscillations in the delta band dissociate rule convergence from rule divergence during creative idea generation. Neuropsychologia 104, 8–17. doi: 10.1016/j.neuropsychologia.2017.07.033

PubMed Abstract | CrossRef Full Text | Google Scholar

Bozkurt Altan, E., and Tan, S. (2021). Concepts of creativity in design-based learning in STEM education. Int. J. Technol. Design Educ. 31, 503–529. doi: 10.1007/s10798-020-09569-y

CrossRef Full Text | Google Scholar

Braun, D., Hernandez Mendez, A., Matthes, F., and Langen, M. (2017). “Evaluating natural language understanding services for conversational question answering systems,” in Proceedings of the 18th Annual SIGdial Meeting on Discourse and Dialogue (Saarbrucken, Germany: Association for Computational Linguistics) 174–185. doi: 10.18653/v1/W17-5522

CrossRef Full Text | Google Scholar

Brown, T., Mann, B., Ryder, N., Subbiah, M., Kaplan, J. D., Dhariwal, P., et al. (2020). Language models are few-shot learners. Adv. Neural Inf. Proc. Syst. 33, 1877–1901. doi: 10.48550/arXiv.2005.14165

CrossRef Full Text | Google Scholar

Camburn, B., He, Y., Raviselvam, S., Luo, J., and Wood, K. (2019). “Evaluating crowdsourced design concepts with machine learning,” in International Design Engineering Technical Conferences and Computers and Information in Engineering Conference 7. doi: 10.1115/DETC2019-97285

CrossRef Full Text | Google Scholar

Cer, D., Yang, Y., Kong, S.-,y., Hua, N., Limtiaco, N., John, R. S., et al. (2018). Universal sentence encoder. arXiv preprint arXiv:1803.11175.

Google Scholar

Chauhan, U., and Shah, A. (2021). Topic modeling using latent dirichlet allocation: A survey. ACM Comput. Surv. 54, 1–35. doi: 10.1145/3462478

CrossRef Full Text | Google Scholar

Conneau, A., Kiela, D., Schwenk, H., Barrault, L., and Bordes, A. (2017). Supervised learning of universal sentence representations from natural language inference data. arXiv preprint arXiv:1705.02364.

Google Scholar

Cosgrove, A. L., Kenett, Y. N., Beaty, R. E., and Diaz, M. T. (2021). Quantifying flexibility in thought: The resiliency of semantic networks differs across the lifespan. Cognition 211, 104631. doi: 10.1016/j.cognition.2021.104631

PubMed Abstract | CrossRef Full Text | Google Scholar

De Stobbeleir, K. E., Ashford, S. J., and Buyens, D. (2011). Self-regulation of creativity at work: The role of feedback-seeking behavior in creative performance. Acad. Manage. J. 54, 811–831. doi: 10.5465/amj.2011.64870144

CrossRef Full Text | Google Scholar

Devlin, J., Chang, M.-W., Lee, K., and Toutanova, K. (2018). Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805.

Google Scholar

Dickson, K., and Yeung, C. A. (2022). PRISMA 2020 updated guideline. Br. Dental J. 232, 760–761. doi: 10.1038/s41415-022-4359-7

PubMed Abstract | CrossRef Full Text | Google Scholar

Doboli, S., Kenworthy, J., Paulus, P., Minai, A., and Doboli, A. (2020). “A cognitive inspired method for assessing novelty of short-text ideas,” in 2020 International Joint Conference on Neural Networks (IJCNN) (IEEE), 1–8. doi: 10.1109/IJCNN48605.2020.9206788

CrossRef Full Text | Google Scholar

Dumas, D., and Dunbar, K. N. (2014). Understanding fluency and originality: A latent variable perspective. Think. Skills Creat. 14, 56–67. doi: 10.1016/j.tsc.2014.09.003

CrossRef Full Text | Google Scholar

Dumas, D., Organisciak, P., Maio, S., and Doherty, M. (2021). Four text-mining methods for measuring elaboration. J. Creat. Behav. 55, 517–531. doi: 10.1002/jocb.471

CrossRef Full Text | Google Scholar

Dunbar, K., and Forster, E. (2009). “Creativity evaluation through latent semantic analysis,” in Proceedings of the Annual Meeting of the Cognitive Science Society, 31.

Google Scholar

Ethayarajh, K. (2018). “Unsupervised random walk sentence embeddings: A strong but simple baseline,” in Proceedings of The Third Workshop on Representation Learning for NLP 91–100. doi: 10.18653/v1/W18-3012

CrossRef Full Text | Google Scholar

Franceschelli, G., and Musolesi, M. (2022). Deepcreativity: Measuring Creativity With Deep Learning Techniques. Intell. Artif. 16, 151–163. doi: 10.3233/IA-220136

CrossRef Full Text | Google Scholar

George, T., and Wiley, J. (2020). Need something different? Here’s what’s been done: Effects of examples and task instructions on creative idea generation. Memory Cogn. 48, 226–243. doi: 10.3758/s13421-019-01005-4

PubMed Abstract | CrossRef Full Text | Google Scholar

Georgiev, G. V., and Casakin, H. (2019). “Semantic measures for enhancing creativity in design education,” in Proceedings of the Design Society: International Conference on Engineering Design (Cambridge: Cambridge University Press), 369–378. doi: 10.1017/dsi.2019.40

CrossRef Full Text | Google Scholar

Gong, Z., Shan, C., and Yu, H. (2019). The relationship between the feedback environment and creativity: a self-motives perspective. Psychol. Res Behav. Manag. 12, 825–837. doi: 10.2147/PRBM.S221670

PubMed Abstract | CrossRef Full Text | Google Scholar

Gong, Z., and Zhang, N. (2017). Using a feedback environment to improve creative performance: a dynamic affect perspective. Front. Psychol. 8, 1398. doi: 10.3389/fpsyg.2017.01398

PubMed Abstract | CrossRef Full Text | Google Scholar

Guilford, J. P. (1967). Creativity: Yesterday, today and tomorrow. J. Creat. Behav. 1, 3–14. doi: 10.1002/j.2162-6057.1967.tb00002.x

CrossRef Full Text | Google Scholar

Guo, Y., Lin, S., Williams, Z. J., Zeng, Y., and Clark, L. Q. C. (2023). Evaluative skill in the creativeprocess: A cross-cultural study. Think. Skills Creativ. 47, 101240. doi: 10.1016/j.tsc.2023.101240

CrossRef Full Text | Google Scholar

Hass, R. W. (2017). Tracking the dynamics of divergent thinking via semantic distance: Analytic methods and theoretical implications. Memory Cogn. 45, 233–244. doi: 10.3758/s13421-016-0659-y

PubMed Abstract | CrossRef Full Text | Google Scholar

Hennessy, S., Howe, C., Mercer, N., and Vrikki, M. (2020). Coding classroom dialogue: Methodological considerations for researchers. Learning, Cult. Soc. Interact. 25, 100404. doi: 10.1016/j.lcsi.2020.100404

CrossRef Full Text | Google Scholar

Hofmann, T. (1999). “Probabilistic latent semantic indexing,” in Proceedings of the 22nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval SIGIR’99 (New York, NY, USA: Association for Computing Machinery), 50–57. doi: 10.1145/312624.312649

CrossRef Full Text | Google Scholar

Huang, R., Wei, C., Wang, B., Yang, J., Xu, X., Wu, S., et al. (2022). Well performance prediction based on long short-term memory (lstm) neural network. J. Petroleum Sci. Eng. 208, 109686. doi: 10.1016/j.petrol.2021.109686

CrossRef Full Text | Google Scholar

Jimenez-Mavillard, A., and Suarez, J. L. (2022). A computational approach for creativity assessment of culinary products: the case of elbulli. AI Soc. 37, 331–353. doi: 10.1007/s00146-021-01183-3

CrossRef Full Text | Google Scholar

Johnson, D. R., and Hass, R. W. (2022). Semantic context search in creative idea generation. J. Creat. Behav. 56, 362–381. doi: 10.1002/jocb.534

CrossRef Full Text | Google Scholar

Kang, Y., Sun, Z., Wang, S., Huang, Z., Wu, Z., and Ma, X. (2021). “Metamap: Supporting visual metaphor ideation through multi-dimensional example-based exploration,” in Proceedings of the 2021 CHI Conference on Human Factors in Computing Systems 1–15. doi: 10.1145/3411764.3445325

CrossRef Full Text | Google Scholar

Karampiperis, P., Koukourikos, A., and Koliopoulou, E. (2014). “Towards machines for measuring creativity: The use of computational tools in storytelling activities,” in 2014 IEEE 14th International Conference on Advanced Learning Technologies 508–512. doi: 10.1109/ICALT.2014.150

CrossRef Full Text | Google Scholar

Kenett, Y. N. (2019). What can quantitative measures of semantic distance tell us about creativity? Curr. Opin. Behav. Sci. 27, 11–16. doi: 10.1016/j.cobeha.2018.08.010

CrossRef Full Text | Google Scholar

Kenworthy, J. B., Doboli, S., Alsayed, O., Choudhary, R., Jaed, A., Minai, A. A., et al. (2023). Toward the development of a computer-assisted, real-time assessment of ideational dynamics in collaborative creative groups. Creativ. Res. J. 35, 396–411. doi: 10.1080/10400419.2022.2157589

CrossRef Full Text | Google Scholar

Kim, S., Choe, I., and Kaufman, J. C. (2019). The development and evaluation of the effect of creative problem-solving program on young children’s creativity and character. Think. Skills Creativ. 33, 100590. doi: 10.1016/j.tsc.2019.100590

CrossRef Full Text | Google Scholar

Kiros, R., Zhu, Y., Salakhutdinov, R. R., Zemel, R., Urtasun, R., Torralba, A., et al. (2015). “Skip-thought vectors,” in Advances in Neural Information Processing Systems 28.

Google Scholar

Kovalkov, A., Paaßen, B., Segal, A., Pinkwart, N., and Gal, K. (2021). Automatic creativity measurement in scratch programs across modalities. IEEE Trans. Learn. Technol. 14, 740–753. doi: 10.1109/TLT.2022.3144442

CrossRef Full Text | Google Scholar

LaVoie, N., Parker, J., Legree, P. J., Ardison, S., and Kilcullen, R. N. (2020). Using latent semantic analysis to score short answer constructed responses: Automated scoring of the consequences test. Educ. Psychol. Measur. 80, 399–414. doi: 10.1177/0013164419860575

PubMed Abstract | CrossRef Full Text | Google Scholar

Lee, D. D., and Seung, H. S. (1999). Learning the parts of objects by non-negative matrix factorization. Nature 401, 788–791. doi: 10.1038/44565

PubMed Abstract | CrossRef Full Text | Google Scholar

Li, Y., Du Ying, X. I. E., Liu, C., Yang, Y., Li, Y., and Qiu, J. (2023). A meta-analysis of the relationship 649 between semantic distance and creative thinking. Adv. Psychol. Sci. 31, 519. doi: 10.3724/SP.J.1042.2023.00519

CrossRef Full Text | Google Scholar

Liu, P., Yuan, W., Fu, J., Jiang, Z., Hayashi, H., and Neubig, G. (2023). Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Comput. Surv. 55, 1–35. doi: 10.1145/3560815

CrossRef Full Text | Google Scholar

Manske, S., and Hoppe, H. U. (2014). “Automated indicators to assess the creativity of solutions to programming exercises,” in 2014 IEEE 14th International Conference on Advanced Learning Technologies 497–501. doi: 10.1109/ICALT.2014.147

CrossRef Full Text | Google Scholar

Marrone, R., Cropley, D. H., and Wang, Z. (2022). Automatic assessment of mathematical creativity using natural language processing. Creat. Res. J. 2022, 1–16. doi: 10.1080/10400419.2022.2131209

CrossRef Full Text | Google Scholar

Martin, D. I., and Berry, M. W. (2007). “Mathematical foundations behind latent semantic analysis,” in Handbook of Latent Semantic Analysis 35–56.

Google Scholar

Mastria, S., Agnoli, S., Zanon, M., Acar, S., Runco, M. A., and Corazza, G. E. (2021). Clustering and switching in divergent thinking: Neurophysiological correlates underlying flexibility during idea generation. Neuropsychologia 158, 107890. doi: 10.1016/j.neuropsychologia.2021.107890

PubMed Abstract | CrossRef Full Text | Google Scholar

Mikolov, T., Chen, K., Corrado, G., and Dean, J. (2013). Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781.

Google Scholar

Munn, Z., Peters, M. D., Stern, C., Tufanaru, C., McArthur, A., and Aromataris, E. (2018). Systematic review or scoping review? Guidance for authors when choosing between a systematic or scoping review approach. BMC Med. Res. Methodol. 18, 1–7. doi: 10.1186/s12874-018-0611-x

PubMed Abstract | CrossRef Full Text | Google Scholar

Olivares-Rodríguez, C., Guenaga, M., and Garaizar, P. (2017). Automatic assessment of creativity in heuristic problem-solving based on query diversity. DYNA 92, 449–455. doi: 10.6036/8243

CrossRef Full Text | Google Scholar

Olson, J. A., Nahas, J., Chmoulevitch, D., Cropper, S. J., and Webb, M. E. (2021). Naming unrelated words predicts creativity. Proc. Nat. Acad. Sci. 118, e2022340118. doi: 10.1073/pnas.2022340118

PubMed Abstract | CrossRef Full Text | Google Scholar

Organisciak, P., Newman, M., Eby, D., Acar, S., and Dumas, D. (2023). How do the kids speak? Improving educational use of text mining with child-directed language models. Inf. Learn. Sci. 124, 25–47. doi: 10.1108/ILS-06-2022-0082

CrossRef Full Text | Google Scholar

Pennington, J., Socher, R., and Manning, C. D. (2014). “Glove: global vectors for word representation,” in Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), 1532–1543. doi: 10.3115/v1/D14-1162

CrossRef Full Text | Google Scholar

Plucker, J. A., Meyer, M. S., Karami, S., and Ghahremani, M. (2023). “Room to run: Using technology to move creativity into the classroom,” in Creative Provocations: Speculations on the Future of Creativity, Technology and Learning (Springer) 65–80. doi: 10.1007/978-3-031-14549-0_5

CrossRef Full Text | Google Scholar

Prasch, L., Maruhn, P., Brünn, M., and Bengler, K. (2020). “Creativity assessment via novelty and usefulness (canu) – approach to an easy to use objective test tool,” in Proceedings of the Sixth International Conference on Design Creativity (ICDC) 019–026. doi: 10.35199/ICDC.2020.03

CrossRef Full Text | Google Scholar

Raffel, C., Shazeer, N., Roberts, A., Lee, K., Narang, S., Matena, M., et al. (2020). Exploring the limits of transfer learning with a unified text-to-text transformer. J. Mach. Learn. Res. 21, 687 5485–5551. doi: 10.48550/arXiv.1910.10683

CrossRef Full Text | Google Scholar

Rafner, J., Biskjær, M. M., Zana, B., Langsford, S., Bergenholtz, C., Rahimi, S., et al. (2022). Digital games for creativity assessment: strengths, weaknesses and opportunities. Creat. Res. J. 34, 28–54. doi: 10.1080/10400419.2021.1971447

CrossRef Full Text | Google Scholar

Reimers, N., and Gurevych, I. (2019). Sentence-bert: Sentence embeddings using siamese bert-networks. arXiv preprint arXiv:1908.10084.

Google Scholar

Rominger, C., Benedek, M., Lebuda, I., Perchtold-Stefan, C. M., Schwerdtfeger, A. R., Papousek, I., et al. (2022). Functional brain activation patterns of creative metacognitive monitoring. Neuropsychologia 177, 108416. doi: 10.1016/j.neuropsychologia.2022.108416

PubMed Abstract | CrossRef Full Text | Google Scholar

Said-Metwaly, S., Van den Noortgate, W., and Kyndt, E. (2017). Approaches to measuring creativity: A systematic literature review. Creativity. 4, 238–275. doi: 10.1515/ctra-2017-0013

CrossRef Full Text | Google Scholar

Sawyer, R. K. (2011). Explaining creativity: The science of human innovation (Oxford university press) Sawyer R. K. (2021). The iterative and improvisational nature of the creative process. J. Creat. 31, 100002. doi: 10.1016/j.yjoc.2021.100002

CrossRef Full Text | Google Scholar

Sawyer, R. K. (2022). The dialogue of creativity: Teaching the creative process by animating student work as a collaborating creative agent. Cogn. Instruct. 40, 459–487. doi: 10.1080/07370008.2021.1958219

CrossRef Full Text | Google Scholar

Sedova, K., Sedlacek, M., Svaricek, R., Majcik, M., Navratilova, J., Drexlerova, A., et al. (2019). Do those who talk more learn more? the relationship between student classroom talk and student achievement. Learn. Instruct. 63, 101217. doi: 10.1016/j.learninstruc.2019.101217

CrossRef Full Text | Google Scholar

Shrivastava, D., Ahmed, C. G. S., Laha, A., and Sankaranarayanan, K. (2017). A machine learning approach for evaluating creative artifacts. ArXiv abs/1707.05499.

Google Scholar

Simpson, E., Do Dinh, E.-L., Miller, T., and Gurevych, I. (2019). “Predicting humorousness and metaphor novelty with gaussian process preference learning,” in Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics 5716–5728. doi: 10.18653/v1/P19-1572

CrossRef Full Text | Google Scholar

Song, K., Tan, X., Qin, T., Lu, J., and Liu, T.-Y. (2020). Mpnet: Masked and permuted pre-training for language understanding. Adv. Neural Inf. Proc. Syst. 33, 16857–16867. doi: 10.48550/arXiv.2004.09297

CrossRef Full Text | Google Scholar

Stella, M., and Kenett, Y. N. (2019). Viability in multiplex lexical networks and machine learning characterizes human creativity. Big Data Cogn. Comput. 3, 45. doi: 10.3390/bdcc3030045

CrossRef Full Text | Google Scholar

Sung, Y.-T., Cheng, H.-H., Tseng, H.-C., Chang, K.-E., and Lin, S.-Y. (2022). “Construction and validation of a computerized creativity assessment tool with automated scoring based on deep-learning techniques,” in Psychology of Aesthetics, Creativity, and the Arts. doi: 10.1037/aca0000450

CrossRef Full Text | Google Scholar

Toma, J. D. (2011). “Approaching rigor in applied qualitative,” in The SAGE Handbook for Research in Education: Pursuing Ideas as the Keystone of Exemplary Inquiry 263–281. doi: 10.4135/9781483351377.n17

CrossRef Full Text | Google Scholar

Torrance, E. P. (2008). The Torrance Tests of Creative Thinking Norms—Technical Manual Figural (Streamlined) Forms a and b. 1998. Bensenville, IL: Scholastic Testing Service.

Google Scholar

Tuzcu, A. (2021). The impact of google translate on creativity in writing activities. Lang. Educ. Technol. 1, 40–52.

Google Scholar

Vartanian, O., Smith, I., Lam, T. K., King, K., Lam, Q., and Beatty, E. L. (2020). The relationship between methods of scoring the alternate uses task and the neural correlates of divergent thinking: Evidence from voxel-based morphometry. NeuroImage 223, 117325. doi: 10.1016/j.neuroimage.2020.117325

PubMed Abstract | CrossRef Full Text | Google Scholar

Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., et al. (2017). “Attention is all you need,” in Advances in Neural Information Processing Systems 30.

Google Scholar

Vo, H., and Asojo, A. (2018). Feedback responsiveness and students’ creativity. Acad. Exch. Quart. 1, 53–57.

Google Scholar

Wagire, A. A., Rathore, A., and Jain, R. (2020). Analysis and synthesis of industry 4.0 research landscape: Using latent semantic analysis approach. J. Manuf. Technol. Manag. 31, 31–51. doi: 10.1108/JMTM-10-2018-0349

CrossRef Full Text | Google Scholar

Wang, A., Pruksachatkun, Y., Nangia, N., Singh, A., Michael, J., Hill, F., et al. (2019). “Superglue: A stickier benchmark for general-purpose language understanding systems,” in Advances in neural Information Processing Systems 32.

Google Scholar

Wang, A., Singh, A., Michael, J., Hill, F., Levy, O., and Bowman, S. R. (2018). Glue: A multi-task benchmark and analysis platform for natural language understanding. arXiv preprint arXiv:1804.07461.

Google Scholar

Williams, F. (1980). Creativity Assessment Packet (CAP). Buffalo, NY: D. O. K. Publishers Inc.

Google Scholar

Yang, Z., Zhu, C., and Chen, W. (2018). Parameter-free sentence embedding via orthogonal basis. arXiv preprint arXiv:1810.00438.

Google Scholar

Zhang, W., Sjoerds, Z., and Hommel, B. (2020). Metacontrol of human creativity: The neurocognitive mechanisms of convergent and divergent thinking. NeuroImage 210, 116572. doi: 10.1016/j.neuroimage.2020.116572

PubMed Abstract | CrossRef Full Text | Google Scholar

Zuñiga, D., Amido, T., and Camargo, J. (2017). “Communications in computer and information science,” in Colombian Conference on Computing (Cham: Springer).

Google Scholar

Keywords: review, creativity process, ideation, evaluation, artificial intelligence

Citation: Ul Haq I and Pifarré M (2023) Dynamics of automatized measures of creativity: mapping the landscape to quantify creative ideation. Front. Educ. 8:1240962. doi: 10.3389/feduc.2023.1240962

Received: 15 June 2023; Accepted: 18 September 2023; Published: 12 October 2023.

Edited by:

Mohammad Khalil, University of Bergen, Norway

Reviewed by:

Chen-Yao Kao, National University of Tainan, Taiwan Faisal Saeed, Kyungpook National University, Republic of Korea

Copyright © 2023 Ul Haq and Pifarré. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

*Correspondence: Manoli Pifarré, bWFub2xpLnBpZmFycmVAdWRsLmNhdA==

Comments

No comments yet. Why don’t you start the discussion?

Leave a Reply

Your email address will not be published. Required fields are marked *