Adding subcards from NeMo model cards
Browse filesSigned-off-by: taejinp <tango4j@gmail.com>
- bias.md +5 -0
- explainability.md +14 -0
- privacy.md +15 -0
- safety.md +6 -0
bias.md
ADDED
|
@@ -0,0 +1,5 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
Field | Response
|
| 2 |
+
:---------------------------------------------------------------------------------------------------|:---------------
|
| 3 |
+
Participation considerations from adversely impacted groups [protected classes](https://www.senate.ca.gov/content/protected-classes) in model design and testing: | None
|
| 4 |
+
Measures taken to mitigate against unwanted bias: | None
|
| 5 |
+
Bias Metric (If Measured): | None
|
explainability.md
ADDED
|
@@ -0,0 +1,14 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
Field | Response
|
| 2 |
+
:------------------------------------------------------------------------------------------------------|:---------------------------------------------------------------------------------
|
| 3 |
+
Intended Task/Domain: | Speaker Diarization (Speaker Tagging in Speech Recognition)
|
| 4 |
+
Model Type: | FastConformer Encoder, Transformer Encoder, and RNNT Decoder
|
| 5 |
+
Intended Users: | People working with conversational AI models that transcribe speech-to-text for multiple users.
|
| 6 |
+
Output: | Text with speaker tags
|
| 7 |
+
Describe how the model works: | The model incorporates a novel mechanism, the Arrival-Order Speaker Cache (AOSC). This cache management technique dynamically adjusts each speaker’s cache size, prioritizing the speech frames most valuable to cache. The model is fine-tuned with increased weighting on far-field datasets to perform better for meeting-style speech.
|
| 8 |
+
Name the adversely impacted groups this has been tested to deliver comparable outcomes regardless of: | Not Applicable
|
| 9 |
+
Technical Limitations & Mitigation: | This model can detect up to four speakers; performance degrades in recordings with five or more speakers. The model was trained on publicly available English speech datasets. As a result, it is not suitable for non-English audio. Performance may also degrade on out-of-domain data, such as recordings in noisy conditions.
|
| 10 |
+
Verified to have met prescribed NVIDIA quality standards: | Yes
|
| 11 |
+
Performance Metrics: | Concatenated minimum-permutation word error rate (cpWER) and time-constrained minimum-permutation word error rate (tcpWER)
|
| 12 |
+
Potential Known Risks: | Transcripts may not be 100% accurate in instances with background noise. Punctuation/capitalization may not be 100% accurate.
|
| 13 |
+
Licensing: | GOVERNING TERMS: Use of this model is governed by the NVIDIA Open Model License Agreement (found [here](https://www.nvidia.com/en-us/agreements/enterprise-software/nvidia-open-model-license/)
|
| 14 |
+
|
privacy.md
ADDED
|
@@ -0,0 +1,15 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
Field | Response
|
| 2 |
+
:----------------------------------------------------------------------------------------------------------------------------------|:-----------------------------------------------
|
| 3 |
+
Generatable or reverse engineerable personal data? | No
|
| 4 |
+
Personal data used to create this model? | Yes - Voice
|
| 5 |
+
Was consent obtained for any personal data used? | Yes
|
| 6 |
+
How often is dataset reviewed? | Before Release
|
| 7 |
+
Is a mechanism in place to honor data subject right of access or deletion of personal data? | Yes
|
| 8 |
+
If personal data was collected for the development of the model, was it collected directly by NVIDIA? | Yes
|
| 9 |
+
If personal data was collected for the development of the model by NVIDIA, do you maintain or have access to disclosures made to data subjects? | Yes
|
| 10 |
+
If personal data was collected for the development of this AI model, was it minimized to only what was required? | Yes
|
| 11 |
+
Was data from user interactions with the AI model (e.g. user input and prompts) used to train the model? | No
|
| 12 |
+
Is there provenance for all datasets used in training? | Yes
|
| 13 |
+
Does data labeling (annotation, metadata) comply with privacy laws? | Yes
|
| 14 |
+
Is data compliant with data subject requests for data correction or removal, if such a request was made? | The data is compliant where applicable, but is not applicable for all data.
|
| 15 |
+
Applicable Privacy Policy | [https://www.nvidia.com/en-us/about-nvidia/privacy-policy/]
|
safety.md
ADDED
|
@@ -0,0 +1,6 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
Field | Response
|
| 2 |
+
:---------------------------------------------------|:----------------------------------
|
| 3 |
+
Model Application Field(s): | Speaker Tagging in Speech Recognition Systems
|
| 4 |
+
Describe the life critical impact (if present). | Not Applicable
|
| 5 |
+
Use Case Restrictions: | Abide by [NVIDIA Open Model License Agreement](https://www.nvidia.com/en-us/agreements/enterprise-software/nvidia-open-model-license/)
|
| 6 |
+
Model and dataset restrictions: | The Principle of least privilege (PoLP) is applied limiting access for dataset generation and model development. Restrictions enforce dataset access during training, and dataset license constraints adhered to.
|