|-- DT_{N}
                        |   |-- Raw
                    |   |   |-- S{M}
                    |   |   |   |-- The Big Bang_S0{M}{I}.json
                        |   |-- test.json
                        |   |-- train.json
                        |   `-- val.json
                    
                
                where N is the no. of dialogue turns for that sub dataset, M represents the season of the series (varies from 1 to 5) and I represents the episode number in that season (like 01, 02, and so on).
                Episode level extracted dialogues are in the Raw folder. Dialogues split into the train, val, and test categories are in train.json, val.json, and test.json, respectively.
        |   | 
| A tSNE plot made by randomly selecting 1500 images (each from Humorous and Non-Humorous set) as the last frame of some visual dialog turns. Sometimes these visual models could cheat by detecting some pattern inHumorous/Non-Humorous visual dialogs like specific camera angle etc. The above plot hints towards its absence.To visualize the plot better, each image is represented by a dot and the corresponding plot is shown below. (Currentplot is slightly scaled up to ease the visibility.) | 
|   | 
| A green dot represents a humorous sample and red dot, a non-humorous sample. They seem to be randomly distributed, hinting towards absence of any such bias. | 
|   |   | 
|   |   | 
|   |   | 
|   |   |   | 
| The figure showing average time per turn in a Dialog, across the Dataset. | The figure showing average dialog time, across the Dataset. | The figure showing contribution of each speaker in generating humor, across the Dataset. | 
|   | 
|   | 
|   | 
|   |   | 
| Text based Fusion Model (TFM) | Video based Fusion Model (VFM) | 
|   |   | 
| Text based Attention Model (TAM) | Video based Attention Model (VAM) |