Audio Characteristics Visualization

This tool allows you to generates intuitive visualizations like Chromagrams, Waveforms, or Spectrograms from your audio files, offering you a unique perspective to understand and analyze your audio content.

To use this feature, select 'Choose File' and pick the audio file you wish to analyze. Then, press 'Upload' to initiate the process. The system processes your audio material and creates the final diagram based on the selected visualization form.

Upload File

.mp3 or .wav
Max file size: 100MB

Select the kind of visualisation:

Behind the Scenes

A waveform is a graphical representation of the variation in an audio signal over time. It displays the amplitude (or volume) of the sound wave, represented vertically, and time, represented horizontally.

In a waveform graph, loud sounds appear as taller waves, and soft sounds appear as shorter waves. Similarly, silence is represented by a flat line.

Waveforms are commonly used in audio editing and production as they provide an easy-to-understand visual cue about the loudness, silence, and duration of the audio file. They are especially useful in identifying and editing specific segments within an audio file.

A spectrogram is a two-dimensional representation of an audio signal. It displays how the frequencies of a signal are distributed with respect to time. On a spectrogram, the X-axis represents time, the Y-axis represents frequency, and the intensity of colors represents the amplitude (or energy) of the frequencies at any given time.

Spectrograms provide a detailed view of how different frequencies interact within an audio file. They are used to identify different sounds within a complex audio scene, to detect noise in a signal, and in speech processing and recognition. They can reveal patterns like harmonic structures, transient noises, or frequency modulations that are not easily identifiable in waveforms.

A chromagram is a musical representation that displays how the intensity of different pitches (or notes) varies over time in an audio signal. It is often visualized as a heat map, with the X-axis representing time, the Y-axis representing the 12 different pitch classes (from C to B), and the color intensity representing the energy of each pitch class.

Chromagrams are particularly useful in music analysis, as they provide insights into the harmonic and melodic structures of a piece of music. They can help identify chords, key changes, and other musical features. In the context of music information retrieval, chromagrams can be used for tasks like chord recognition, key detection, and song identification.

Mel-frequency cepstral coefficients (MFCCs) are a type of spectral feature that are often used in speech and audio processing. They provide a compact representation of the spectral shape of a sound signal and are thought to closely mirror human auditory perception. MFCCs have found significant use in the field of music information retrieval, being used for genre classification, instrument recognition, and more.

The pitch contour of a sound signal shows how the pitch (frequency) of the sound changes over time. This can be useful in analyzing music or speech signals, where the pitch may carry important information. This can be particularly useful for tasks such as melody extraction, singer identification in music, or prosody analysis in speech.

Spectral contrast is a measure of the difference in amplitude between peaks and valleys in a sound spectrum. This feature can give you an idea of the richness of the sound, where a larger contrast might indicate a richer, more complex sound. It can be particularly useful for tasks like instrument recognition or music genre classification.

The zero crossing rate is the rate at which a signal changes from positive to negative or back. This feature has been used extensively in both speech recognition and music information retrieval, being a key feature to classify percussive sounds. A higher zero crossing rate may indicate a noisier signal or a signal with more high-frequency content.