Foreword#
In current work, we can find that configuration and testing are significant pain points for game voice. In my working environment, a voice requires coordination among three parties from resource production to final integration into the game: planning provides requirements → audio design provides resources → audio design configures Wwise → programming integrates into the game → planning verifies the effect.
It can be seen that this process is quite lengthy.
Therefore, we developed our own voice pipeline for the UE engine called "VGT," which is based on Wwise decoding and playback at the core, while the upper layer customizes resource management, playback logic, and some supporting systems (including blueprint support, callback system, sequence system, audio-visual synchronization system, etc.). This pipeline not only addresses the aforementioned pain points but also avoids the cumbersome Wwise configuration process, achieving efficient resource management and loading characteristics.
Core System Modules#
UI Module#
VGT provides a UI interface to help users quickly import resources and test playback. It also configures through Excel data-driven methods. The imported wav voice files will be converted to wem format using Wwise's transcoding tools.
Resource Management Module#
In large games, the number of voices can reach a terrifying level, putting significant pressure on memory for both resource storage and loading.
Therefore, VGT adopts a "Module" architecture, similar to SoundBank in Wwise, where each module is completely independent and can be loaded and unloaded on demand in the game. Additionally, runtime data loading, sound entity management, and event playback status are also managed on a module basis. The simplified data structure for the module is defined as follows:
struct VoiceModuleData{
FString moduleName;
FString modulePath;
int32 maxPlayedEventInstanceNum;
int32 maxSavedEventInfoNum;
// Playable voices in this module
TMap<FString, TArray<TSharedPtr<EventData>>> eventMap;
// Container types corresponding to each voice in this module (random container, sequential container...)
TMap<FString, int32> eventTypeMap;
}
For locally transcoded wem voice resources, an ID is allocated as a unique identifier to reduce performance consumption caused by excessive strings. It also facilitates file path management based on ID ranges: VGT uses a mapping of paths and ID ranges to construct an AVL tree, maintaining multiple ID ranges under each path, allowing for quick retrieval of the complete file path based on ID during streaming playback.
The above diagram shows a simple directory structure example, where each directory node maintains several ID ranges. When searching for the corresponding file based on the voice ID, it can quickly locate the corresponding directory path, achieving O(log n) lookup efficiency.
For event information management, the data structure definition for events in the VGT system is roughly as follows:
struct EventInfo{
int32 ResourceID; // Unique ID allocated for the resource
int32 Order; // Whether it is played in order
int32 Probability; // Whether it is played randomly
int32 AdditionalFieldsNum; // Additional filtering conditions
}
The event information table is stored in the corresponding module directory and is loaded or unloaded as needed.
Resource Playback Module#
To reduce memory pressure, VGT adopts a streaming playback method, alleviating the pressure of loading voices into memory.
Below is a simplified version of the underlying playback logic:
// Construct externalSource based on audio path
TArray<AkOSChar> FullPath;
FullPath.Reserve(fullPath.Len() + 1);
FullPath.Append(TCHAR_TO_AK(*fullPath), fullPath.Len() + 1);
AkExternalSourceInfo externalSourceInfo(FullPath.GetData(), externalSrcCookie, AKCODECID_VORBIS);
TArray<AkExternalSourceInfo> externalSources;
externalSources.Add(externalSourceInfo);
// Play
playingID = AkAudioDevice->PostEventOnGameObjectID(eventID, akGameObjectID, AK_EndOfEvent | AK_EnableGetSourcePlayPosition, &FVoicePlayer::OnEventCallback, Data.Get(), externalSources);
At the previous layer, we constructed two types of logic for random and sequential playback, determining which method to call based on the EventType table stored in the module. Below is a simplified example, mainly demonstrating the random playback logic.
Random Playback:
float TotalPercentage = 0.0f;
for (const TSharedPtr<RandomEventConfig>& Config : EventConfigs)
{
TotalPercentage += Config->Probability;
}
// If the sum of all probabilities is not 100, normalize them
if (TotalPercentage != 100.0f)
{
if (fabs(TotalPercentage) < FLT_EPSILON)
{
for (const TSharedPtr<RandomEventConfig>& Config : EventConfigs)
{
Config->Probability = (1.0 / EventConfigs.Num()) * 100.0f;
}
}
else
{
for (const TSharedPtr<RandomEventConfig>& Config : EventConfigs)
{
Config->Probability = (Config->Probability / TotalPercentage) * 100.0f;
}
}
}
float AvailableTotalPercentage = 0.0f;
TArray<TSharedPtr<RandomEventConfig>> AvailableConfigs;
for (auto i = 0; i < EventConfigs.Num(); ++i)
{
if (EventConfigs[i]->bShouldReplaced)
{
continue;
}
AvailableTotalPercentage += EventConfigs[i]->Probability;
AvailableConfigs.Add(EventConfigs[i]);
}
if (AvailableTotalPercentage <= 0)
{
ForceRefreshPlaySet();
return PlayVoice(akGameObjectID, trackID, percent, callback, pParam);
}
float RandomNumber = FMath::RandRange(0.0f, AvailableTotalPercentage);
float AccumulatedPercentage = 0.0f;
for (const TSharedPtr<RandomEventConfig>& Config : AvailableConfigs)
{
AccumulatedPercentage += Config->Probability;
if (RandomNumber < AccumulatedPercentage)
{
if (Config->bShouldReplaced == true)
{
continue;
}
PlayVoice();
break;
}
}
After constructing the random and sequential playback logic, we also need to design a callback system to facilitate program calls. The VGT callback system mainly designs a method for centralized handling of Wwise callback events, triggering our predefined callbacks based on context, which is set up similarly in both C++ and blueprints.
Performance Optimization and Thread Safety#
In VGT, to achieve more efficient performance, we do not load all module information tables into memory at once, but rather load them as needed. Additionally, since each module has a significant amount of event information, deserializing it from JSON format can also be a performance pain point. Therefore, during packaging, we convert the JSON format to binary format, and when loading modules, we do not parse or deserialize but directly map it to virtual memory, only parsing the offset positions of each event in the data to achieve direct access, reducing performance consumption. Below is the pseudo code:
GetEvent(eventName) -> LoadEventOffsetMap() -> LoadBinaryEventConfigMap():
TUniquePtr<FArchive> DataFileReader(IFileManager::Get().CreateFileReader(*EventConfigPath));
DataFileReader->Seek(EventDataOffsetMap[eventName]);
ResultConfig->LoadBinary(DataFileReader);
return ResultConfig;
Regarding thread safety, the structural diagram is as follows:
Summary#
The above is an introduction to the core modules of the entire VGT project. In fact, there are many other modules not listed, such as audio-visual synchronization and sequence modules. If the author has time in the future, they will further improve the introduction.