Patent Issued for Systems And Methods Regarding 2D Image And 3D Image Ensemble Prediction Models (USPTO 10,262,226)
2019 APR 29 (NewsRx) -- By a
Patent number 10,262,226 is assigned to
The following quote was obtained by the news editors from the background information supplied by the inventors: “Images and video taken from modern digital camera and video recording devices can be generated and stored in a variety of different formats and types. For example, digital cameras may capture dimensional (2D) images and store them in a vast array of data formats, including, for example, JPEG (
“These 2D formats are typically based on rasterized image data captured by the camera or recording device where the rasterized image data is typically generated and stored to produce a rectangular grid of pixels, or points of color, viewable via a computer screen, paper, or other display medium. Other 2D formats may also be based on, for example, vector graphics. Vector graphics may use polygons, control points or nodes to produce images on a computer screen, for example, where the points and nodes can define a position on x and y axes of a display screen. The images may be produced by drawing curves or paths from the positions and assigning various attributes, including such values as stroke color, shape, curve, thickness, and fill.
“Other file formats can store 3D data. For example, the PLY (Polygon File Format) format can store data including a description of a 3D object as a list of nominally flat polygons, with related points or coordinates in 3D space, along with a variety of properties, including color and transparency, surface normal, texture coordinates and data confidence values. A PLY file can include can include large number of points to describe a 3D object. A complex 3D object can require thousands or tens-of-thousands of 3D points in a PLY file to describe the object.
“A problem exists with the amount of different file formats and image types. Specifically, while the use, functionality, and underlying data structures of the various image and video formats are typically transparent to a common consumer, the differences in the compatibility of the various formats and types creates a problem for computer systems or other electronic devices that need to analyze or otherwise coordinate the various differences among the competing formats and types for specific applications. This issue is exacerbated because different manufacturers of the camera and/or video devices use different types or formats of image and video files. This combination of available different file formats and types, together with various manufacturer’s decisions to use differing file formats and types, creates a vast set of disparate image and video files and data that are incompatible and difficult to interoperate for specific applications.”
In addition to the background information obtained for this patent, NewsRx journalists also obtained the inventors’ summary information for this patent: “Accordingly, there is a need for systems and methods to provide compatibility, uniformity, and interoperability among the various image file formats and types. For example, certain embodiments disclosed herein address issues that derive from the complexity and/or size of the data formats themselves. For example, a 3D file, such as a PLY file can have tens-of-thousands numbers of 3D points to describe a 3D image. Such a fine level of granularity may not be necessary to analyze the 3D image to determine, for example, items of interest within the 3D image, such as, for example, human features or behaviors identifiable in the 3D image.
“Moreover, certain embodiments herein further address that each 3D file, even files using the same format, e.g., a PLY file, can include sequences of 3D data points in different, unstructured orders, such that the sequencing of 3D points of one 3D file can be different from the sequencing of 3D points of another file. This unstructured nature can create an issue when analyzing 3D images, especially when analyzing a series of 3D images, for example, from frames of a 3D movie, because there is no uniform structure to comparatively analyze the 3D images against.
“For the foregoing reasons, systems and methods are disclosed herein for ‘Distification’ of 3D imagery. As further described herein, Distification can provide an improvement in the accuracy of predictive models, such as the prediction models disclosed herein, over known normalization methods. For example, the use of Distification on 3D image data can improve the predictive accuracy, classification ability, and operation of a predictive model, even when used in known or existing predictive models, neural networks or other predictive systems and methods.
“As described herein, a computing device may provide 3D image Distification by first obtaining a three dimensional (3D) image that includes rules defining a 3D point cloud. The computing device may then generate a two dimensional (2D) image matrix based upon the 3D image. The 2D image matrix may include 2D matrix point(s) mapped to the 3D image. Each 2D matrix point can be associated with a horizontal coordinate and a vertical coordinate. The computing device can generate an output feature vector that includes, for at least one of the 2D matrix points, the horizontal coordinate and the vertical coordinate of the 2D matrix point, and a depth coordinate of a 3D point in the 3D point cloud of the 3D image. The 3D point can have a nearest horizontal and vertical coordinate pair that corresponds to the horizontal and vertical coordinates of the at least one 2D matrix point.
“In some embodiments, the output feature vector may indicate one or more image feature values associated with the 3D point. The feature values can define one or more items of interest in the 3D image. The items of interest in the 3D image can include, for example, a person’s head, a person’s facial features, a person’s hand, or a person’s leg. In some aspects, the output feature vector is input into a predictive model for making predictions with respect to the items of interest.
“In some embodiments, the output feature vector can further include a distance value generated based on the distance from the at least one 2D matrix point to the 3D point. In other embodiments, a total quantity of the 2D matrix points mapped to the 3D image can be less (i.e., to create a courser granularity) than a total quantity of horizontal and vertical coordinate pairs for all 3D points in the 3D point cloud of the 3D image.
“In other embodiments, the 3D imagery, and rules defining the 3D point cloud, are obtained from one or more respective PLY files or PCD files. The 3D imagery may be a frame from a 3D movie. The 3D images may be obtained from various computing devices, including, for example, any of a camera computing device, a sensor computing device, a scanner computing device, a smart phone computing device, or a tablet computing device.
“In other embodiments, Distification can be executed in parallel such that the computing device, or various networked computing devices, can Distify multiple 3D images at the same time.
“Distification can be performed, for example, as a preprocessing technique for a variety of applications, for example, for use with 3D predictive models. For example, systems and methods are disclosed herein for generating an image-based prediction model. As described, a computing device may obtain a set of one or more 3D images from a 3D image data source, where each of the 3D images are associated with 3D point cloud data. In some embodiments, the 3D image data source is a remote computing device (but it can also be collocated). The Distification process can be applied to the 3D point cloud data of each 3D image to generate output feature vector(s) associated with the 3D images. A prediction model may then be generated by training a model with the output feature vectors. For example, in certain embodiments, the prediction model may be trained using a neural network, such as a convolutional neural network.
“In some embodiments, training the prediction model can include using one or more batches of output feature vectors, where batches of the output feature vectors correspond to one or more subsets of 3D images from originally obtained 3D images.
“In certain embodiments, the 3D images used to generate the prediction model may depict driver behaviors. The driver behaviors can include, for example, driver gestures such as: left hand calling, right hand calling, left hand texting, right hand texting, eating, drinking, adjusting the radio, or reaching for the backseat. The prediction model may determine a driver behavior classification and corresponding probability value for a 3D image, where the probability value can indicate the probability that the 3D image is associated with a driver behavior classification, e.g., ‘eating.’ The 3D image may then be associated with the driver behavior classification, such that the 3D image is said to identify or otherwise indicate the driver behavior for the driver.
“In some embodiments, the driver behavior classification and the probability value can be transmitted to a different computing device, such as a remote computing device or a local, but separate computing device.
“Distification can also be used for interoperating 3D imagery with 2D imagery. For example, the differing file formats and types are especially problematic when comparing or attempting to interoperate 3D and 2D image types, which typically have vastly different file formats tailored to 3D and 2D imagery, respectively. For example, a 2D JPEG image uses a rasterized grid of pixels to form an image. 2D images are typically concerned with data compression (for file size purposes), color, and relative positioning (with respect to the other pixels) within the rasterized grid forming the image, and are typically not concerned with where the pixels or points of the 2D image that are within, for example, some larger space outside of the rasterized grid. 3D images, on the other hand, depend on 3D coordinates and positioning in 3D space in order to represent a 3D object built, for example, by numerous polygon shapes that each have their own vertices (e.g., x, y and z coordinate positions) that define the position of the polygons, and, ultimately, the object itself in 3D space. Other attributes of a 3D file format may be concerned with color, shape, texture, line size, etc., but such attributes are typically indicated in a 3D file in a completely different format from 2D file formats to accommodate the rendering of the images in 3D space versus 2D rasterisation.
“For the foregoing reasons, systems and methods are disclosed herein for generating an enhanced prediction from a 2D and 3D image-based ensemble model. As described herein, a computing device may be configured to obtain one or more sets of 2D and 3D images. Each of the 2D and 3D images may be standardized to allow for comparison and interoperability between the images. In one embodiment, the 3D images are standardized using Distification. In addition, corresponding 2D and 3D image pairs (i.e., a ‘2D3D image pair’) may be determined from the standardized 2D and 3D pairs where, for example, the 2D and 3D images correspond based on a common attribute, such as a similar timestamp or time value. The enhanced prediction may utilize separate underlying 2D and 3D prediction models, where, for example, the corresponding 2D and 3D images of a 2D3D pair are each input to the respective 2D and 3D prediction models to generate respective 2D and 3D predict actions.
“The predict actions can include classifications and related probability values for those classifications for each of the 2D and 3D images. For example, the 2D prediction model may generate a 20% value for a ‘texting’ class for a given 2D image and the 3D prediction model may generate a 50% value for the same ‘texting’ class for a given 3D image, such as a 3D image paired with the 2D image in the 2D3D image pair. The ensemble model may then generate an enhanced prediction for the 2D3D image pair, where the enhanced prediction can determine an overall 2D3D image pair classification for the 2D3D image based upon the 2D and 3D predict actions. Thus, for example, the 2D3D image pair may indicate that the driver was ‘texting.’ In some embodiments, the enhanced prediction determines the 2D3D image pair classification by summing one or more probability values associated with the 2D predict actions and the 3D predict actions to determine a maximum summed probability value, wherein the maximum summed probability value is determined from the sums of one or more classification probability values associated with each of the 2D predict actions and the 3D predict actions. Thus, for the example above, the 20% probability value and the probably 50% value from the 2D and 3D models, respectively, could be summed to compute an overall 70% value. If the 70% summed value was the maximum value, when compared to other classifications, e.g., ‘eating,’ then the classification (e.g., ‘texting’) associated with the maximum summed probability can be identified as the 2D3D image pair classification for the 2D3D image pair.
“In some embodiments, the 2D and 3D images input into the ensemble model are sets of images defining a ‘chunk’ of images sharing a common timeframe, such as images 2D and 3D images taken at the same time for a movie. In some embodiments, a chunk classification can be determined for the common timeframe, where the chunk classification is based on one or more 2D3D image pair classifications of the 2D3D image pairs that make up the movie.
“In other embodiments, the ensemble model can generate a confusion matrix that includes one or more 2D3D image pair classifications. The confusion matrix can be used for further analysis or review of the ensemble model, for example, to compare the accuracy of the model with other prediction models.
“In some embodiments, the ensemble model may be used to generate a data structure series that can indicate one or more driver behaviors as determined from one or more 2D3D image pair classifications. The driver behaviors can be used to determine or develop a risk factor for a given driver. As mentioned herein, the driver behaviors can include any of left hand calling, right hand calling, left hand texting, right hand texting, eating, drinking, adjusting the radio, or reaching for the backseat.
“Advantages will become more apparent to those of ordinary skill in the art from the following description of the preferred embodiments which have been shown and described by way of illustration. As will be realized, the present embodiments may be capable of other and different embodiments, and their details are capable of modification in various respects. Accordingly, the drawings and description are to be regarded as illustrative in nature and not as restrictive.”
The claims supplied by the inventors are:
“What is claimed is:
“1. A multi-dimensioning computing device configured to generate an enhanced prediction from a 2D and 3D image-based ensemble model, the multi-dimensioning computing device comprising: an image processor; a memory communicatively coupled to the image processor; an ensemble component implemented on the memory and executing on the image processor to: obtain one or more sets of two dimensional (2D) images; obtain one or more sets of three dimensional (3D) images; generate one or more sets of standardized 2D images from the one or more sets of 2D images, wherein each standardized 2D image represents a 2D image in the one or more sets of 2D images; generate one or more sets of standardized 3D images from the one or more sets of 3D images, wherein each standardized 3D image represents a 3D image in the one or more sets of 3D images; determine at least one 2D3D image pair, the at least one 2D3D image pair determined from a paired 2D image in the one or more sets of standardized 2D images that corresponds to a paired 3D image in the one or more sets of standardized 3D images, the 2D3D image pair configured in the memory; execute a 2D prediction model using the paired 2D image, wherein the 2D prediction model determines one or more 2D predict actions based on the paired 2D image; execute a 3D prediction model using the paired 3D image, wherein the 3D prediction model determines one or more 3D predict actions based on the paired 3D image; and generate an enhanced prediction for the at least one 2D3D image pair, wherein the enhanced prediction determines a 2D3D image pair classification for the at least one 2D3D image pair, the 2D3D image pair classification based upon the one or more 2D predict actions and the one or more 3D predict actions, and wherein the image processor classifies the one or more sets of 2D images or the one or more sets of 3D images with the 2D3D image pair classification to determine a real-world action.
“2. The multi-dimensioning computing device of claim 1, wherein the one or more sets of standardized 3D images are standardized using Distification.
“3. The multi-dimensioning computing device of claim 1, wherein the paired 2D image is associated with a first timestamp and the paired 3D image is associated with a second timestamp, wherein the first timestamp and the second timestamp each have a similar time value.
“4. The multi-dimensioning computing device of claim 1, wherein the enhanced prediction determines the 2D3D image pair classification by summing one or more probability values associated with the 2D predict actions and the 3D predict actions to identify a maximum summed probability value, wherein the maximum summed probability value is determined from one or more sums of one or more classification probability values associated with each of the 2D predict actions and the 3D predict actions.
“5. The multi-dimensioning computing device of claim 1, further configured to generate a confusion matrix, wherein the confusion matrix includes one or more 2D3D image pair classifications.
“6. A multi-dimensioning method of generating an enhanced prediction from a 2D and 3D image-based ensemble model, the multi-dimensioning method comprising: obtaining, via an image processor, one or more sets of two dimensional (2D) images; obtaining, via the image processor, one or more sets of three dimensional (3D) images; generating, with the image processor, one or more sets of standardized 2D images from the one or more sets of 2D images, wherein each standardized 2D image represents a 2D image in the one or more sets of 2D images; generating, with the image processor, one or more sets of standardized 3D images from the one or more sets of 3D images, wherein each standardized 3D image represents a 3D image in the one or more sets of 3D images; determining, with the image processor, at least one 2D3D image pair, the at least one 2D3D image fair determined from a paired 2D image in the one or more sets of standardized 2D images that corresponds to a paired 3D image in the one or more sets of standardized 3D images, the 2D3D image pair configured in a memory communicatively coupled to the image processor; executing a 2D prediction model using the paired 2D image, wherein the 2D prediction model determines one or more 2D predict actions based on the paired 2D image; executing a 3D prediction model using the paired 3D image, wherein the 3D prediction model determines one or more 3D predict actions based on the paired 3D image; and generating an enhanced prediction for the at least one 2D3D image pair, wherein the enhanced prediction determines a 2D3D image pair classification for the at least one 2D3D image pair, the 2D3D image pair classification based upon the one or more 2D predict actions and the one or more 3D predict actions, and wherein the image processor classifies the one or more sets of 2D images or the one or more sets of 3D images with the 2D3D image pair classification to determine a real-world action.
“7. The multi-dimensioning computing device of claim 1, wherein at least one of the sets of 2D images is a 2D image chunk and at least one of the sets of 3D images is a 3D image chunk, wherein the 2D image chunk and the 3D image chunk share a common timeframe.
“8. The multi-dimensioning computing device of claim 7, wherein a chunk classification is determined for the common timeframe of the 2D image chunk and the 3D image chunk, wherein the chunk classification is based on one or more 2D3D image pair classifications.
“9. The multi-dimensioning computing device of claim 1, further configured to generate a data structure series, the data structure series indicating one or more driver behaviors, the one or more driver behaviors determined from one or more 2D3D image pair classifications.
“10. The multi-dimensioning computing device of claim 9, further configured to determine a risk factor for a driver, wherein the driver is associated with the data structure series indicating the one or more driver behaviors.
“11. The multi-dimensioning computing device of claim 9, wherein the one or more driver behaviors include any one or more of the following: left hand calling, right hand calling, left hand texting, right hand texting, eating, drinking, adjusting a radio, or reaching for a backseat of a vehicle.
“12. The multi-dimensioning method of claim 6, wherein the enhanced prediction determines the 2D3D image pair classification by summing one or more probability values associated with the 2D predict actions and the 3D predict actions to identify a maximum summed probability value, wherein the maximum summed probability value is determined from one or more sums of one or more classification probability values associated with each of the 2D predict actions and the 3D predict actions.
“13. The multi-dimensioning method of claim 6, wherein the one or more sets of standardized 3D images are standardized using Distification.
“14. The multi-dimensioning method of claim 6, wherein the paired 2D image is associated with a first timestamp and the paired 3D image is associated with a second timestamp, wherein the first timestamp and the second timestamp each have a similar time value.
“15. The multi-dimensioning method of claim 6, further comprising generating a confusion matrix, wherein the confusion matrix includes one or more 2D3D image pair classifications.
“16. The multi-dimensioning method of claim 6, wherein at least one of the sets of 2D images is a 2D image chunk and at least one of the sets of 3D images is a 3D image chunk, wherein the 2D image chunk and the 3D image chunk share a common timeframe.
“17. The multi-dimensioning method of claim 16, wherein a chunk classification is determined for the common timeframe of the 2D image chunk and the 3D image chunk, wherein the chunk classification is based on one or more 2D3D image pair classifications.
“18. The multi-dimensioning method of claim 6, further comprising generating a data structure series, the data structure series indicating one or more driver behaviors, the one or more driver behaviors determined from one or more 2D3D image pair classifications.
“19. The multi-dimensioning method of claim 18, further comprising determining a risk factor for a driver, wherein the driver is associated with the data structure series indicating the one or more driver behaviors.
“20. The multi-dimensioning method of claim 18, wherein the one or more driver behaviors include any one or more of the following: left hand calling, right hand calling, left hand texting, right hand texting, eating, drinking, adjusting a radio, or reaching for a backseat of a vehicle.”
URL and more information on this patent, see: Flowers,
(Our reports deliver fact-based news of research and discoveries from around the world.)



Medicare, fraud prevention seminar coming to Bellamy Center
EDITORIAL: Long-term thinking on nursing homes
Advisor News
- Affordability on Florida lawmakers’ minds as they return to the state Capitol
- Gen X confident in investment decisions, despite having no plan
- Most Americans optimistic about a financial ‘resolution rebound’ in 2026
- Mitigating recession-based client anxiety
- Terri Kallsen begins board chair role at CFP Board
More Advisor NewsAnnuity News
- Reframing lifetime income as an essential part of retirement planning
- Integrity adds further scale with blockbuster acquisition of AIMCOR
- MetLife Declares First Quarter 2026 Common Stock Dividend
- Using annuities as a legacy tool: The ROP feature
- Jackson Financial Inc. and TPG Inc. Announce Long-Term Strategic Partnership
More Annuity NewsHealth/Employee Benefits News
- TRUMP ADMINISTRATION DROPS MEDICAID VACCINE REPORTING REQUIREMENTS
- SLOTKIN, WHITEHOUSE, AND SCHAKOWSKY INTRODUCE PUBLIC HEALTH INSURANCE OPTION LEGISLATION
- Wittman, Kiggans split on subsidies
Wittman, Kiggans split on subsidies
- Wittman, Kiggans split on subsidies
Va. Republicans split over extending health care subsidies
- Report: Connecticut can offset nearly $1B in federal cuts
More Health/Employee Benefits NewsLife Insurance News