Logo
 
  English Flag English   Japan Flag Japanese
About Us   |   Downloads   |   Contact Us   |   Site Map   |  Policy
Go
Home  /  DTS Mind Upload Service
DTS Mind Upload Service

Introduction:

This document describes about the demo part of the mind upload. The demo consists of two module described below:

  1. Mind Recording Module
  2. Mind Retrieval Module

Other modules can be added as and when required or the functionality of the modules can be changed as required.

Mind Recording Module:

In this module we will record the mind of each individual and upload in the brain engine. By recording of the mind we mean that we will record different speeches of the person whom mind we want to upload. Once different recordings of the person have been arranged we can use speech to text library to convert each recording into text. We can then try to categorize the speech based on the text we have got into personal or general category. That is each recording of every individual will be either personal or general.

In summary this module will perform the following steps:

  • Speech recoding of the person
  • Speech to text conversion
  • Categorization of the speech based on the text into personal or general category
  • Uploading of the speech audio into relevant category in the brain engine server

Here we provide more details. First we will record the audio of the user using the microphone service and then will upload this audio to a third party server for further processing. The Android Media Recorder library or the Audio Record library can be used for recording the voice from the target phone. On the desktop Java supports the sound package for recording the audio from microphone. On the server we will pass this audio to some speech recognition service to get the text against this audio. There are different implementations available for performing speech recognition like CMU Sphinx, iSpeech, Google Speech etc. CMU Sphinx is an open-source speech recognition library written in Java. CMU Sphinx comes with a set of different language and acoustic models to utilize according to the accent of the speaker e.g. there are multiple acoustic models for English etc. Google Speech API is a cloud based speech recognition service. iSpeech also provides paid Text to Speech (TTS) and Automatic Speech Recognition (ASR) SDK for Java desktop clients.

Some speech recognition libraries provide more than one results against the provided audio. Here our filter 1 will try to get the best transcription against the input audio. The output of the filter 1 will be passed to filter 2 which will try to optimize the transcription further in order to get more accurate results in the end. We will also create and use a training dataset to add more accuracy to the text we have received.

Finally the output of filter 2 will be provided to the NLP module which will tokenize the provided speech text after removing stop words. At the end we will get tokens for different places, organizations, persons and other tokes. There are different NLP implementations available like Stanford, General Architecture for Text Engineering (GATE), Open NLP etc.

Using these tokens we will categorize the input audio into some category.

Figure 1: Mind Upload System Overview

Mind Retrieval Module:

This module will retrieve the voice of the person whose mind has been uploaded. Suppose person ‘X’ mind has been uploaded already in the brain engine. Now suppose person ‘Y’ wants to listen want person ‘X’ has said about for example politics. The steps that will be performed at this point will be the following. First the question of person ‘Y’ will be analyzed. In the example mentioned politics is being mentioned in the question. We know that politics is not a personal topic but a general topic so we will retrieve all the audios of person ‘X’ that the first module categorized into general category.

Figure 2: Mind Upload System Technical Flow

Now for each audio we will calculate semantic similarity of the audio with politics and the audio that has the maximum similarity will be returned. In this way person ‘Y’ will be able to listen about person ‘X’ comments regarding politics in his absence also. In other words person ‘Y’ will be able to listen to person ‘X’ anywhere and anytime.

In summary this module will perform the following steps:

  • Analysis of the question of the person ‘Y’ who want to hear person ‘X’ in his absence
  • Categorization of the question into personal or general category
  • Retrieval of all audios of the personal or general category based on question category
  • Semantic similarity of each audio with the question of person ‘Y’
  • Delivery of the most semantic similar audio of person ‘X’ to person ‘Y’

Diagrammatically we can represent the above idea as described in figure 1. Both modules have been combined together in the diagram.

The technical flow of our idea is provided in figure 2.

DemonstrationVideos:

In order to demonstrate the mind upload technology to our valuable clients we have created the following two videos for demonstration. Please watch both of these and provide us your feedback. Your feedback is a valuable input for us.

(C) 2013, DTS Inc. All Rights Reserved.