Aku Kotkavuo

@eagleflo

Portrait

I am a software generalist from Helsinki, Finland. I’ve been working with software for most of my life. I practise writing about related topics here.

My open source projects include mpyq and jisho.


Hearthstone Card List

19 July 2014

Ever since the Hearthstone beta was opened to everyone, I've been playing this latest game from Blizzard. At first I was pretty skeptic: as an old Magic: The Gathering fan, I had my doubts about dumbing down the format and shallower gameplay experience. Now, a few months after the release, I have come around: Hearthstone's accesibility, pretty graphics and fun casual arena runs have won me over.

As a budding hobbyist game developer, I was interested in how Hearthstone was made. It turns out Hearthstone has been created with Unity! There is even an impressive case story about it here. I'm not sure why this surprised me, since studios of all sizes have been flocking towards Unity in the last few years. This also explains how they were so fast with the iOS client.

If I were to work on any Hearthstone-related side projects in my spare time, the first thing I would need is a list of cards and their data. This list is exactly what we are going to create in this post.

The list could be created manually, but it would take a long time and errors would inevitably creep in. Even worse, a manually compiled list would get stale as new expansions and cards are released. In fact, as I'm writing this the first expansion for the game, Curse of Naxxramas, is just around the corner. It will contain 30 new cards, so all the various lists found around the web will require updates in a few days.

It would be a better approach to take a look at the game's data files and see how the cards are represented there. This will also give us clues as to what Hearthstone's internal data structures look like.

Let's take a little dive into the installation to see what's there. Here's the top level folder:

$ du -cks /Applications/Hearthstone/* | sort -rn
1060124  total
525920   /Applications/Hearthstone/Data
348596   /Applications/Hearthstone/base-OSX.MPQ
115676   /Applications/Hearthstone/Hearthstone.app
53268    /Applications/Hearthstone/Updates
9992     /Applications/Hearthstone/Hearthstone Beta Launcher.app
3312     /Applications/Hearthstone/SetupOSX.mpq
2900     /Applications/Hearthstone/Strings
232      /Applications/Hearthstone/DBF
172      /Applications/Hearthstone/Hearthstone.tfil
24       /Applications/Hearthstone/Logs
16       /Applications/Hearthstone/manifest-cards.csv
4        /Applications/Hearthstone/client.config
4        /Applications/Hearthstone/Launcher.db
4        /Applications/Hearthstone/Hearthstone.mfil
4        /Applications/Hearthstone/ConnectLog.txt
0        /Applications/Hearthstone/bnet_client.log

So what are we looking at here? First of all, the total size of Hearthstone installation is around one gigabyte. That's surprisingly small for a modern game! (Of course it is a card game.)

Years ago I wrote mpyq to help me in my data mining quests, but it's not needed this time around: Blizzard has opted to simply extract the MPQs straight to disk when installing the game. base-OSX.MPQ contains most of the other files inside this directory, including the actual OS X .app bundle. It's pretty curious that they've opted to keep these archive files around after extraction.

So what's inside the largest subdirectory, Data/?

$ du -cks /Applications/Hearthstone/Data/OSX/* | sort -rn
525920  total
55528   /Applications/Hearthstone/Data/OSX/cardtextures0.unity3d
51996   /Applications/Hearthstone/Data/OSX/shared1.unity3d
45448   /Applications/Hearthstone/Data/OSX/cardtextures1.unity3d
44232   /Applications/Hearthstone/Data/OSX/movies0.unity3d
43820   /Applications/Hearthstone/Data/OSX/shared0.unity3d
39668   /Applications/Hearthstone/Data/OSX/gameobjects0.unity3d
32280   /Applications/Hearthstone/Data/OSX/spells0.unity3d
32124   /Applications/Hearthstone/Data/OSX/premiummaterials0.unity3d
30992   /Applications/Hearthstone/Data/OSX/uiscreens0.unity3d
30272   /Applications/Hearthstone/Data/OSX/spells1.unity3d
24020   /Applications/Hearthstone/Data/OSX/boards0.unity3d
23600   /Applications/Hearthstone/Data/OSX/actors0.unity3d
18516   /Applications/Hearthstone/Data/OSX/sounds1.unity3d
18312   /Applications/Hearthstone/Data/OSX/sounds0.unity3d
15424   /Applications/Hearthstone/Data/OSX/actors1.unity3d
13284   /Applications/Hearthstone/Data/OSX/textures0.unity3d
3288    /Applications/Hearthstone/Data/OSX/cardbacks0.unity3d
2160    /Applications/Hearthstone/Data/OSX/cardxml0.unity3d
320     /Applications/Hearthstone/Data/OSX/cards1.unity3d
316     /Applications/Hearthstone/Data/OSX/cards2.unity3d
312     /Applications/Hearthstone/Data/OSX/cards0.unity3d
8       /Applications/Hearthstone/Data/OSX/fonts0.unity3d

Lots of .unity3d files! I'm assuming this is pretty much all of the game's assets. There is an especially interesting file named cardxml0.unity3d here. The name alone rewards a closer look at this file.

I haven't worked with Unity3D files before, but based on this preamble it seems like these files are in a custom binary format. I'm guessing the numbers after UnityRaw are version numbers.

$ hexdump -Cn 64 cardxml0.unity3d | cut -c 11-
55 6e 69 74 79 52 61 77  00 00 00 00 03 33 2e 78  |UnityRaw.....3.x|
2e 78 00 34 2e 32 2e 32  66 31 00 00 21 b9 e8 00  |.x.4.2.2f1..!...|
00 00 3c 00 00 00 01 00  00 00 01 00 21 b9 ac 00  |..<.........!...|
21 b9 ac 00 21 b9 e8 00  00 00 1c 00 00 00 00 01  |!...!...........|

Taking a longer hexdump reveals that the file consists mostly of XML fragments. I assume that the file begins with Unity3D metadata and some other information before listing each card in somewhat garbled XML format. Each card seems to be its own XML document. Here is an example from the end of the file:

<?xml version="1.0" encoding="UTF-8"?>
<Entity version="2" CardID="DS1_188">
  <Tag name="CardName" enumID="185" type="String">
    <enUS>Gladiator's Longbow</enUS>
    <frFR>Arc long du gladiateur</frFR>
    <zhTW>鬥士長弓</zhTW>
    <zhCN>角斗士的长弓</zhCN>
    <ruRU>Длинный лук</ruRU>
    <ptBR>Arco Longo do Gladiador</ptBR>
    <plPL>Długi Łuk Gladiatora</plPL>
    <koKR>검투사의 장궁</koKR>
    <itIT>Arco del Gladiatore</itIT>
    <esMX>Arco largo de Gladiador</esMX>
    <esES>Arco largo de Gladiador</esES>
    <deDE>Langbogen des Gladiators</deDE>
  </Tag>
  <Tag name="CardSet" enumID="183" type="CardSet" value="3" />
  <Tag name="CardType" enumID="202" type="CardType" value="7" />
  <Tag name="Faction" enumID="201" type="Faction" value="3" />
  <Tag name="Class" enumID="199" type="Class" value="3" />
  <Tag name="Rarity" enumID="203" type="Rarity" value="4" />
  <Tag name="Cost" enumID="48" type="Number" value="7" />
  <Tag name="Atk" enumID="47" type="Number" value="5" />
  <Tag name="Durability" enumID="187" type="Number" value="2" />
  <Tag name="AttackVisualType" enumID="251" type="AttackVisualType" value="8" />
  <Tag name="Collectible" enumID="321" type="Bool" value="1" />
  <Tag name="CardTextInHand" enumID="184" type="String">
    <enUS>Your hero is &lt;b&gt;Immune&lt;/b&gt; while attacking.</enUS>
    <frFR>Votre héros est &lt;b&gt;Insensible&lt;/b&gt; quand il attaque.</frFR>
    <zhTW>你的英雄在攻擊時&lt;b&gt;免疫&lt;/b&gt;</zhTW>
    <zhCN>你的英雄在攻击时具有&lt;b&gt;免疫&lt;/b&gt;。</zhCN>
    <ruRU>Атакуя, ваш герой не получает урона.</ruRU>
    <ptBR>Seu herói fica &lt;b&gt;Imune&lt;/b&gt; enquanto ataca.</ptBR>
    <plPL>Twój bohater posiada &lt;b&gt;Niewrażliwość&lt;/b&gt; podczas ataku.</plPL>
    <koKR>내 영웅이 공격할 때 &lt;b&gt;피해 면역&lt;/b&gt; 상태가 됩니다.</koKR>
    <itIT>Il tuo eroe è &lt;b&gt;Immune&lt;/b&gt; quando attacca.</itIT>
    <esMX>Tu héroe es &lt;b&gt;Inmune&lt;/b&gt; mientras ataca.</esMX>
    <esES>Tu héroe es &lt;b&gt;inmune&lt;/b&gt; al atacar.</esES>
    <deDE>Euer Held ist &lt;b&gt;immun&lt;/b&gt;, während er angreift.</deDE>
  </Tag>
  <Tag name="EnchantmentBirthVisual" enumID="330" type="EnchantmentVisualType" value="0" />
  <Tag name="EnchantmentIdleVisual" enumID="331" type="EnchantmentVisualType" value="0" />
  <Tag name="ArtistName" enumID="342" type="String">
    <enUS>Peter C. Lee</enUS>
  </Tag>
  <Tag name="FlavorText" enumID="351" type="String">
    <enUS>The longbow allows shots to be fired from farther away and is useful for firing on particularly odorous targets.</enUS>
    <zhTW>長弓能從更遠的地方發射,對於特別臭的目標格外好用。</zhTW>
    <zhCN>弓弦很长,这使得射手能够射得更远,对付那些难闻的目标尤其有效。</zhCN>
    <ruRU>Длинный лук позволяет стрелять намного дальше, что особенно полезно при стрельбе по дурно пахнущим целям.</ruRU>
    <ptBR>O arco longo permite atirar flechas mais de longe e é útil para disparar contra alvos especialmente fedorentos.</ptBR>
    <plPL>Ten długi łuk pozwala na ostrzał z naprawdę daleka i przydaje się do likwidowania szczególnie cuchnących celów.</plPL>
    <koKR>멀리서 쏠 수 있기 때문에 냄새가 지독한 적을 상대할 때 유용합니다.</koKR>
    <itIT>Gli archi lunghi possono colpire da molto lontano, il che è utile quando bisogna colpire bersagli particolarmente maleodoranti.</itIT>
    <frFR>Cet arc long permet de tirer depuis une certaine distance, ce qui est particulièrement utile contre les cibles qui... sentent.</frFR>
    <esMX>El arco largo te permite disparar desde más lejos y es muy útil cuando los enemigos son especialmente apestosos.</esMX>
    <esES>El arco largo permite disparar desde mayores distancias y es muy útil para eliminar a objetivos particularmente apestosos.</esES>
    <deDE>Da dieser Langbogen sich hervorragend dazu eignet, Feinde aus großer Entfernung zu erledigen, wird er für die Jagd auf stark riechende Ziele empfohlen.</deDE>
  </Tag>
  <ReferencedTag name="Cant Be Damaged" enumID="240" type="Bool" value="1" />
  <Power definition="d96807ae-a9fb-42f9-9c53-36e0f4da9446" />
</Entity>

That's a whole lot of information about the card. Each separate card appears to be demarcated by <Entity> tags. Each property of the card is inside a <Tag>. Generating a full cardlist based on this file should not be too difficult. Here's a Python script I came up with:

#!/usr/bin/env python
import re

cardxml = open('cardxml0.unity3d', 'r').read()
cards = re.findall(r'\<Entity.*?Entity\>', cardxml, re.DOTALL)
with open('cards.xml', 'w') as f:
    f.write('<?xml version="1.0" encoding="UTF-8"?>')
    for card in cards:
        f.write(card)
        f.write('\n')

Now we have the raw data in our hands in an easy to access format. A good next step would be to clean up the data by removing all the duplicate cards and cards that are not obtainable in the game. We should also prune all the properties we don't care about.

Some of the properties are pretty self-explanatory, but there are some where the numeric values don't mean much by themselves. Here's a table for some of the properties.

Value  Class    Rarity     Card Type   Card Set   Race
1      -        Common     -           -          -
2      Druid    Free       -           Basic      -
3      Hunter   Rare       Hero        Expert     -
4      Mage     Epic       Minion      Reward     -
5      Paladin  Legendary  Spell       Missions   -
6      Priest   -          -           -          -
7      Rogue    -          Equipment   -          -
8      Shaman   -          -           Debug      -
9      Warlock  -          -           -          -
10     Warrior  -          Hero Power  -          -
11     -        -          -           Promo      -
14     -        -          -           -          Murloc
15     -        -          -           -          Demon
16     -        -          -           Credit     -
20     -        -          -           -          Beast
21     -        -          -           -          Totem
23     -        -          -           -          Pirate
24     -        -          -           -          Dragon

There are also certain boolean properties that only show up if the card has them, like Battlecry, Charge, Deathrattle, Divine Shield, Stealth and Taunt. There is also a property called "Poisonous" for minions like Emperor Cobra who kill everything they touch.

That's all I had time for today. Drop me a line in Twitter if you enjoyed reading this.