I'm trying to implement a simple speech recognition by Deep Learning Toolbox.
the network receives a piece of spectrogram,and it gives the possibility of each phone(ɔi au əu,。。。 etc) at each timetick.
I have some speech sentences recording & corresponding phonetic sign,but I must use CTC(Connectionist Temporal Classification) algorithm to match the phoentic sequence onto the possibility sequence outputed by CNN. But the problem is:
CTC algorithm contains several nesting looped process,making it extremely unfriendly to automatic differentiation.A typical CTC call takes less then 0.1sec,but it takes 20 seconds to finish under dlfeval().
CTC algorithm gives a relation index,like(#1~#5 output of NN-> first sign in Supervise output,#6~#7 -> second sign...etc ),
On the other words,Loss = sum( [Neural_Network_Outp_vector - Supervise(CTC_relation_index) ].^2,'all'),and (CTC relation index) is a constant indexing vector here .
So,How to bypass the automatic differentiation here,to let the CTC call gives the relation index quickly?
Now I use this method:call neural network forward() and CTC outside dlfeval() to get the relation index ,and send both the input and this index into dlfeval(),forward() again to get the gradient.
Can I do forward() once?