Amane TTS Japanese-Optimized Voice Synthesis System

A voice synthesis system trained on 400,000 hours of Japanese-specific data, powered by the Dual-AR × GFSQ × FF-GAN architecture.

Achieves rapid, high-fidelity voice and emotion cloning from just 8–15 seconds of reference audio.

Exceptional Emotional Expressiveness

High Pronunciation Accuracy

Rapid Voice Cloning

All comparison samples are generated using the same voice cloning technology to ensure fair and objective comparison standards.

System Features Overview

Slow & Fast Transformer serial architecture ensures semantic stability and acoustic finesse
Grouped Finite Scalar Vector Quantization with codebook utilization ≈ 100%
FF-GAN vocoder combined with ParallelBlock provides high-fidelity output
LLM-driven language feature extraction, supporting multilingual without G2P frontend
Voice cloning and emotion rendering with just 8–15 seconds of reference speech

Audio Showcase

Audio Comparison · Natural Conversation Scenarios

The following comparison showcases 8 natural conversation scenarios, highlighting the synthesis quality differences between Amane TTS and a commercial TTS model in real-world daily dialogues. Both models employ identical voice cloning workflows to ensure objective and fair evaluation.

Note: Amane TTS supports rapid voice cloning with 8–15 seconds of reference audio.

Sample 01

Shopping Advice · Emotional Expression

ねえねえ、正直に言って！この色、私に似合う？なんか派手すぎない？でも春だし、明るい色着たいんだよね〜。え？いける？本当に？友達だからって嘘つかないでよ〜？じゃあ買っちゃおうかな、あ、でもクレジットの請求やばいかも。

Amane TTS Female Voice

Performance: Excellent

✅ Natural interplay of hesitation and excitement with rich emotional depth

Commercial TTS Model Speech-2.6-HD

Performance: Average

⚠️ Flat emotional expression, lacking conversational feel

Sample 02

Diet Plan · Dialogue Interaction

ねえ、一緒にダイエットしない？夏までに絶対5キロ痩せたいの！え？無理？なんで〜？一緒にやれば続くって！ジムは高い？じゃあ毎朝ランニングとか？あー、朝起きれない？私も〜。じゃあ夜ご飯だけ炭水化物抜くのは？

Amane TTS Female Voice

Performance: Excellent

✅ Natural dialogue rhythm, smooth emotional transitions

Commercial TTS Model Speech-2.6-HD

Performance: Below Average

❌ Weak conversational dynamics with monotonous intonation

Sample 03

Hair Consultation · Hesitation

今日はどうしようかな〜。あ、そうだ、前髪作ろうかと思ってるんですけど、似合いますかね？でも朝のセット面倒くさそうで。え？簡単？本当ですか？じゃあお願いします！あ、でも短すぎないようにしてくださいね、眉毛の下くらいで。

Amane TTS Female Voice

Performance: Excellent

✅ Natural and smooth transition from hesitation to decision

Commercial TTS Model Speech-2.6-HD

Performance: Average

⚠️ Unclear emotional changes, somewhat flat expression

Sample 04

Relationship Troubles · Complex Emotions

聞いてよ〜、彼氏がさ、また約束忘れてたの！今回で3回目だよ？ひどくない？しかも『ごめん、仕事で忙しくて』って、それ言い訳でしょ？あー、もう別れようかな。え？まだ好き？うん…好きだけどさ〜、でもこういうのって直らないよね？

Amane TTS Female Voice

Performance: Excellent

✅ Clear layers of complex emotions: anger, helplessness, conflict

Commercial TTS Model Speech-2.6-HD

Performance: Below Average

❌ Flat emotional expression with limited listener engagement

Sample 05

Travel Planning · Excitement & Anticipation

韓国行きたくない？コスメも安いし、料理も美味しいし！いつがいい？来月の連休は？え？もう予定ある？じゃあ再来月は？いける？やった〜！ホテルどこにする？明洞の近く？江南？どっちも行きたいよね〜！

Amane TTS Female Voice

Performance: Excellent

✅ Vibrant excitement with strong conversational presence

Commercial TTS Model Speech-2.6-HD

Performance: Average

⚠️ Insufficient excitement, limited tonal variation

Sample 06

Gossip Sharing · Surprise & Confusion

で、昨日何があったと思う？田中くんがさ〜、急に『今度二人で飲みに行かない？』って！びっくりしちゃった〜。えっ、行くかって？うーん、悪い人じゃないけど、タイプじゃないんだよね〜。でも断り方が難しくて。どう断ればいいと思う？

Amane TTS Female Voice

Performance: Excellent

✅ Delicate and realistic expression of surprise and confusion

Commercial TTS Model Speech-2.6-HD

Performance: Average

⚠️ Insufficient emotional fluctuation, lacking expressiveness

Sample 07

Shopping Decision · Conflict & Impulse

ねえ、ちょっと見て見て！これ超可愛くない？あ、でも5千円か〜、うーん、どうしよう。え？似合う？本当？じゃあ試着してみる！あ、でもさ、これ着ていく場所あるかな？

Amane TTS Female Voice

Performance: Excellent

✅ Natural emotional transition from conflict to impulse buying

Commercial TTS Model Speech-2.6-HD

Performance: Below Average

❌ Unclear emotional transitions with limited expressiveness

Sample 08

Nail Consultation · Choice & Decision

今回どんなデザインにしようかな〜。春っぽいのがいいけど、ピンクは飽きちゃった。あ、このフレンチ可愛い！でも仕事的に大丈夫かな？派手すぎる？そんなことない？じゃあこれで！ストーン少なめでお願いします〜。

Amane TTS Female Voice

Performance: Excellent

✅ Natural thought-to-decision process, friendly tone

Commercial TTS Model Speech-2.6-HD

Performance: Average

⚠️ Monotonous tonal variation, lacking warmth

Evaluation

Evaluation Summary

In controlled comparisons with a commercial TTS model (Speech-2.6-HD) under identical conditions, Amane TTS demonstrates exceptional emotional expressiveness and conversational dynamics in natural dialogue scenarios, accurately capturing and rendering subtle emotional nuances found in everyday conversations.

Core Advantages

Amane TTS is a high-performance voice synthesis system optimized specifically for Japanese, excelling in real-world conversational scenarios. Powered by 400,000 hours of Japanese-specific training data and the Dual-AR × GFSQ × FF-GAN architecture, it accurately reproduces complex emotional dynamics in everyday dialogue, covering diverse emotional states including excitement, hesitation, conflict, anger, and surprise. Voice cloning can be completed in just 8–15 seconds, representing industry-leading technical capabilities in Japanese voice synthesis.

Nuanced and authentic emotional expression with rich depth

Strong conversational presence with natural, fluid rhythm

Accurate and natural complex emotional transitions

Precise prosody processing with clear pronunciation

Rapid voice cloning · 8–15 seconds reference audio

Deep optimization with 400,000 hours of Japanese data