The goal of Computer Vision is the automatic labeling of images containing multiple objects as well as noise and clutter. Recent work has focused on two main tasks. The first is the classification among object classes in segmented images containing only one object and the second is the detection of a particular object class in a large image. Both tasks have been primarily addressed using discriminative learning.It is not clear however how these methods can extend to deal with the recognition of multiple object classes in images containing a number of objects in a wide range of configurations.I will present an approach which starts from simple statistical models for individual objects. With these models the important notion of invariance can be clearly formulated.Furthermore the individual object models can be composed to define models for object configurations. Decisions are likelihood based and do not depend on pretrained decision boundaries. The model formulation also leads to a coarse to fine strategy for efficient computation of the optimal scene annotation.These ideas will be illustrated in several applications reading handwritten zipcodes, detecting faces, and tracking vesicles in video microscopy.