The project involves interpretation of observed behavior from full motion video (FMV) and using vision-based techniques to recognize the actions and activities of people in the videos. It also involves prediction of activities from the behavioral cues gleaned from observing individual person or crowd in FMV. The behavior of the person or group is derived using appropriate features obtained by tracking object motion among other activities in the video. The extracted features from the image sequences are then used for image representation. These features are used to represent various actions and activities after segmenting the streams of motion into preferably single action instances which are used as initial training sequences to be learnt by the developed models. This image representation helps in developing a model to link behavior cues to particular actions or activities and emotions for training of the vision-based learning machine. The characteristics of the developed model are compared to the behavioral cues gathered and then this information is used to recognize the action or activity that has taken place. The same model is also used to predict the action or activity to be observed by the computer vision system.