Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Updated Automatic Speech Recognition using CTC example for Keras v3 #1768

Open
wants to merge 4 commits into
base: master
Choose a base branch
from

Conversation

lpizzinidev
Copy link
Contributor

Updates the "Automatic Speech Recognition using CTC" example to support Keras v3.

Copy link
Member

@fchollet fchollet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR!

@@ -244,16 +249,74 @@ def encode_single_sample(wav_file, label):
"""


# Reference: https://github.com/keras-team/keras/blob/ec67b760ba25e1ccc392d288f7d8c6e9e153eea2/keras/legacy/backend.py#L674-L711
def ctc_label_dense_to_sparse(labels, label_lengths):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rather than rewriting this code, you can just use the built-in Keras 3 loss function keras.losses.CTC. I expect it will also enable the code example to run with all backends.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the feedback 👍
After removing the legacy code we still have some references to tf in the example and I'm not sure this can be made backend-agnostic.
Please let me know if I should substitute the remaining tf references.

Copy link
Member

@fchollet fchollet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thank you! You can add the generated files.

@@ -320,7 +307,7 @@ def build_model(input_dim, output_dim, rnn_layers=5, rnn_units=128):
# Optimizer
opt = keras.optimizers.Adam(learning_rate=1e-4)
# Compile the model and return
model.compile(optimizer=opt, loss=CTCLoss)
model.compile(optimizer=opt, loss=keras.losses.ctc)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Prefer using CTC() (ends up running the same thing but it's more idiomatic)

input_length = tf.cast(input_length, tf.int32)

if greedy:
(decoded, log_prob) = tf.nn.ctc_greedy_decoder(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, we're going to have to use TF for this and ctc_beam_search_decoder I guess, unless we implement them as new backend ops.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again, thanks for the feedback 👍
I created an issue to address this.
Please let me know if I should change the description or add/remove details.
Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
3 participants