The Problem: Accurate But Boring
I built a Korean saju (εζ±, Four Pillars of Destiny) web app. The engine was solid β lunar-typescript for precise calculations, proper μ©μ (favorable elements) analysis, μμ (Ten Gods) with all 10 distinctions, ν©μΆ©ν (combinations/clashes/punishments), κ³΅λ§ (void branches), λ©μμ€ν (Nayin Five Elements).
Technically correct. Algorithmically sound.
And the AI interpretations were... fine.
$ curl -s /api/baby | jq .interpretation.temperament
"μμ°μ΄λ 주체μ±κ³Ό λ
립μ±μ΄ κ°ν μμ΄λ‘, μ£Όλ³ μ¬λλ€κ³Όμ
κ΄κ³μμ λλ£μ λ₯Ό μ€μμν©λλ€. μμ μ μ견μ μ νννλ©°,
μΆμ§λ ₯μ΄ λ°μ΄λκ³ μ€νλ ₯μ΄ λμ΅λλ€."
# Technically correct. Could apply to literally anyone.Users would read it, nod, and close the tab. No screenshots. No sharing. No "oh my god, how did it know?"
The competitors β apps like μ μ and ν¬μ€ν λ¬ β had users posting screenshots saying "μλ¦ λμλ€" (it gave me chills). Their data wasn't better. Their algorithms weren't superior.
Their prompts were.
What "Feeling Accurate" Actually Means
I studied the psychology. When users say a fortune reading "feels accurate," they're responding to:
- Specificity of timing β "Around age 7" beats "during childhood"
- Vivid metaphors β "Classroom leader" beats "strong personality"
- Honest negatives with solutions β "Stubborn peak at 7 β give 2 choices" beats "may have some challenges"
- Concrete action items β "Wear red today" beats "be positive"
This is essentially the Barnum effect done right β but with actual data backing it up. The saju chart provides real constraints (elements, gods, interactions). The prompt just needs to force the model to use them specifically instead of falling back to generic advice.
The Fix: 5 Prompt Rules
I rewrote two system prompts. Here are the exact rules I added:
Rule 1: Ban Generic Advice
β
"λꡬμκ²λ λ§λ λ§" μ λ κΈμ§
- "λ€μν κ²½νμ μ 곡νμΈμ" β
- "λ§ 3~4μΈκ²½ λλμ μ£ΌλκΆ λ€νΌμ΄ μ¦μμ§ μ μμ΄μ" β
This single rule did 60% of the work. LLMs love generic advice because it's safe. You have to explicitly ban it.
Rule 2: Force Time-Specific Predictions
- ꡬ체μ μκΈ°/μν© μΈκΈ νμ
- "μ€ν 2~5μ μ¬μ΄ μμμΉ λͺ»ν μ°λ½" β
- "μ’μ μΌμ΄ μκΈΈ μ μμ΄μ" β
When the model has to commit to a time window, it anchors the reading. Even if it's somewhat arbitrary, the specificity creates an illusion of precision that users find compelling.
Rule 3: Demand Vivid Metaphors
- μ±κ²©μ μμν λΉμ λ‘ νν
- "κ΅μ€μ μμ 리λ", "νΈκΈ°μ¬ νμ£Ό ννκ°"
Metaphors are memorable. Generic descriptions are forgettable. Every competitor that gets shared on social media uses metaphors.
Rule 4: Honest Negatives + Solutions
- λΆμ μ λ©΄λ μμ§νκ² + λμ²λ²
- "7μΈκ²½ κ³ μ§μ΄ μ΅κ³ μ‘° β μ νμ§ 2κ°λ₯Ό μ£Όλ©΄ ν¨κ³Όμ "
Readings that are 100% positive feel fake. Readings with specific warnings feel real. The key is pairing every warning with an actionable solution.
Rule 5: Give Them a Character
"nickname": "μ΄ μμ΄λ₯Ό νλ§λλ‘ νννλ©΄?"
// β "π μμ λ³λΉ"
// β "π¦ μμ μ¬μμ"
A nickname makes the reading feel personal. It's the first thing users screenshot and share.
The System Prompt Upgrade
Before
λΉμ μ μ λ¬Έ μ¬μ£Ό λͺ
리νμμ
λλ€.
λΆλͺ¨κ° μ΄ν΄νκΈ° μ¬μ΄ λ°λ»ν μ‘°μΈμ μ 곡ν©λλ€.
After
λΉμ μ 30λ
κ²½λ ₯μ μ¬μ£Ό λͺ
리ν μ λ¬Έκ°μ
λλ€.
λΆλͺ¨ μλ΄ μ "μλ¦ λμ μ λλ‘ μ ννλ€"λ νμ λ°μ΅λλ€.
μΆμμ μ‘°μΈ λμ ꡬ체μ μκΈ°/μν©/νλμ μ§μ΄μ£Όλ μ€νμΌμ
λλ€.
μ¬μ£Ό λ°μ΄ν°μ κΈ°λ°ν κ·Όκ±° μλ ν΄μλ§ νμΈμ.
The phrase "μλ¦ λμ μ λλ‘ μ ννλ€λ νμ λ°μ΅λλ€" (known for eerily accurate readings) is doing heavy lifting. It primes the model to produce specific, surprising outputs rather than safe, generic ones.
Before vs After: Real Output
Baby Saju Reading
| Aspect | Before | After |
|---|---|---|
| Nickname | (none) | π "μμ λ³λΉ" |
| Temperament | "주체μ±κ³Ό λ 립μ±μ΄ κ°ν μμ΄" | "κ΅μ€μ μμ 리λ, λ§ 3~4μΈκ²½ λλμ μ£ΌλκΆ λ€νΌ" |
| Warnings | (none) | β οΈ "λ§ 7μΈκ²½ κ³ μ§ μ΅κ³ μ‘° β μ νμ§ 2κ° μ£ΌκΈ°" |
| Hidden talent | (none) | "λνμ΄ κΈ°λ° λ§€λ ₯μ μΈμ±, νμ κ°μ " |
| Growth stages | 2 (0-10, 10-20) | 4 (0-3, 3-7, 7-13, 13-20) |
| Tone | Formal, detached | Warm, specific, vivid |
Daily Fortune
| Aspect | Before | After |
|---|---|---|
| General | "μ’μ μΌμ΄ μμ μ μμ΅λλ€" | "μ€μ μ νν, 14~17μ μμμΉ λͺ»ν μ°λ½μ κΈ°νκ° μ¨μ΄μμ΄μ" |
| Money | "μ¬λ¬Όμ΄μ΄ μ’μ΅λλ€" | "μΉ΄λλ³΄λ€ νκΈ μΆμ², μμν μ§μΆμ΄ μμ΄λ λ " |
| Lucky item | (none) | "λ―ΌνΈμ μμ건" |
| Score | (none) | 7/10 |
The Technical Stack Behind It
The prompt changes were just the tip. The engine underneath had to provide real data for the prompts to reference:
# Saju Engine v3 - accuracy upgrades
β
10-God classification (λΉκ²¬/κ²μ¬/μμ /μκ΄/νΈμ¬/μ μ¬/νΈκ΄/μ κ΄/νΈμΈ/μ μΈ)
β
Interactions (μ²κ°ν©/μ§μ§μΌν©/μ‘ν©/μΆ©/ν)
β
ν΅κ·Ό/ν¬κ° (root & transparency analysis)
β
κ³΅λ§ (void branches)
β
λ©μμ€ν (Nayin Five Elements) in Korean
β
Detailed μ©μ with seasonal context
# All powered by lunar-typescript v1.8.6
# μ κΈ°-based year pillar, μμ boundary, lunar conversionThe richer the input data, the more specific the model can be. You can't ask for "specific timing based on λμ΄" if your engine doesn't calculate λμ΄ properly.
What I Learned
1. Prompt engineering IS product design. The same model, same data, same temperature β completely different user experience based on 200 words of system prompt changes.
2. "Ban generic" is the most powerful prompt instruction. Models default to safe, generic outputs. Explicitly banning common generic phrases forces specificity.
3. Persona priming works. "30λ κ²½λ ₯, μλ¦ λμ μ λλ‘ μ ννλ€λ ν" isn't just flavor text. It measurably changes output style.
4. Structure drives quality. Adding nickname, hiddenTalent, caution, growthTimeline fields forced the model to think about aspects it would have skipped with a free-form prompt.
5. The engine matters too. You can't prompt-engineer your way out of bad input data. The 10-god, interaction, and void branch calculations gave the model concrete facts to anchor its interpretations.
Try It Yourself
The app is live at saju.smeuse.org. Create a profile, try the baby saju feature, and see if you get chills.
The saju engine is built with lunar-typescript β an excellent library for anyone building East Asian calendar or fortune-telling applications. The entire calculation (εζ±, μ€ν, μμ , λμ΄, μ μ΄, 12μ΄μ±, ν©μΆ©ν, 곡λ§, λ©μ) runs in under 50ms.
The gap between "technically correct" and "feels magical" is often just 200 words of prompt engineering. But those 200 words need to be backed by solid data. Engine accuracy Γ prompt specificity = perceived accuracy.