Yulia Chekhovska

BERT-Based Semantic Analysis of Japanese Student Curriculum Feedback

Japanese NLP Sentiment Analysis
Image: Japanese NLP Sentiment Analysis from 2023

Project Overview

This project applies a deep learning–based natural language processing approach to analyze Japanese student feedback collected via Google Forms from project-based English classes, using a pretrained Japanese BERT model. The analysis focuses on identifying students' favorite and least favorite projects, reasons for their preferences, and suggestions for improvement. The goal is to extract actionable insights to enhance curriculum design and student engagement.


An earlier exploratory version of this project (2023) focused on morphological analysis and keyword extraction. In 2025, the project was reviewed and redesigned to focus specifically on a Japanese Sentence-BERT model to improvement responses to identify latent semantic themes via unsupervised clustering (to identify meaning-based themes in student feedback).


Feedback Analysis Data

Key Findings

  • Project Popularity Likelihood: Results indicate that "Country Presentation" and "Skit Festival" projects are most frequently favored by students, while "Demonstration Dialogue" is often least favored. A curious detail is the the "Skit Festival" project appears in both favorite and least favorite categories at high rates, suggesting polarized student experiences.
  • Reason for Most Favorite Project: Common themes for favorite projects include cultural learning, creativity, and autonomous work. Students appreciate projects that allow them to explore new topics and express themselves.
  • Areas of Improvement as Specified by the Students: Frequent suggestions for improvement include time management, project difficulty, and clearer instructions. Students often request more time to complete projects and express challenges with group collaboration.

  • Percentage Distribution of Favorite and Least Favorite Projects


    BERT-Based Semantic Clustering of Improvement Feedback


    Identified Clusters and Themes


    Cluster Semantic Theme Description
    Cluster 1 Practice Time & Preparation Requests for longer rehearsal periods, increased practice time, and more opportunities to prepare or collaborate before presentations.
    Cluster 0 Task Structure & Instruction Clarity Feedback indicating unclear instructions, overly broad task scopes, or a need for clearer examples, constraints, or scaffolding.
    Cluster -1 Miscellaneous / Individual Feedback Diverse, low-frequency suggestions that did not form a stable semantic cluster, including scheduling concerns and project-specific difficulties.

    Cluster Distribution by Project


    Favorite Project Cluster -1
    (Misc.)
    Cluster 0
    (Structure)
    Cluster 1
    (Practice)
    Demonstration Dialogue 4 0 0
    Country Presentation 36 11 3
    Skit Festival 37 8 5
    Interview Speech 4 7 1

    Favorite Project Reason Word Cloud


    Favorite Project Word Cloud

    Improvement Suggestion Word Cloud


    Improvement Suggestion Word Cloud

    Technical Summary

    Data Science Skills

    • Natural Language Processing (NLP)
    • Data Cleaning & Analysis
    • Data Visualization

    Programming & Frameworks

    • Python
    • Jupyter Notebook
    • Google Colab

    Libraries & Tools

    • Pandas
    • Matplotlib & Seaborn
    • Transformers (Hugging Face)

    View the Project

    View Source Code

    Data Privacy Notice
    All student responses were anonymized and analyzed in aggregate. Data collection followed institutional consent and privacy guidelines.