Habr AI→ original

Goodbye, patches: TAPe + ML architecture changes the rules of computer vision

Modern neural networks for computer vision spend colossal resources processing arbitrary patches and pixels. The new T+ML architecture offers a radically differ

AI-processed from Habr AI; edited by Hamidun News
Goodbye, patches: TAPe + ML architecture changes the rules of computer vision
Source: Habr AI. Collage: Hamidun News.
◐ Listen to article

Farewell, Patches: T+ML Architecture Changes the Rules of Computer Vision

Modern neural networks for computer vision demonstrate striking results, but their development and training require massive computational resources. Enormous datasets, complex architectures, thousands of graphics processors, and weeks or even months of continuous training—such is the price of progress. Meanwhile, a significant portion of these resources is spent on destroying the original data structure—dividing images into random fragments (patches)—and subsequent attempts to restore this structure from the resulting "chaos." The new T+ML architecture proposes a radically different approach, based on the theory of active perception (TAPe), which promises to make the process of training AI systems significantly faster and more cost-effective.

Context: The standard approach to computer vision in deep learning involves processing images as sets of pixels or small, arbitrarily selected patches. Convolutional neural networks (CNN) and transformers, despite their successes, operate on exactly this principle. CNNs sequentially apply filters to extract features from local regions, while transformers divide images into patches and use attention mechanisms to establish connections between them.

Both methods essentially attempt to "assemble" image understanding from fragmented parts. TAPe, however, proposes to change the paradigm itself: instead of working with "raw" data, the system operates with structured "building blocks" with pre-established connections. This allows the model to immediately recognize object architecture rather than attempting to reconstruct it from data chaos, which is the foundation of active perception theory.

T+ML is the implementation of this theory, combining it with the power of machine learning.

Deep Dive: The T+ML architecture differs fundamentally from traditional CNNs and transformers. Instead of dividing an image into identical, often unrelated patches, T+ML uses TAPe elements, which are higher-level, semantically meaningful "building blocks." These blocks have a known internal structure and predefined connections between them.

For example, instead of considering individual pixels or small groups of pixels that make up part of a car wheel, T+ML can operate with an already complete "wheel block," understanding its shape, function, and typical location on a car. Machine learning (ML) in this case is used to train the model on how to effectively use these structured blocks and how to establish complex dependencies between them to solve specific tasks. This approach allows the model to form a holistic understanding of an object much faster, bypassing the stage of "assembly" from small details.

Implications: Initial tests and theoretical frameworks related to T+ML architecture demonstrate significant advantages. Reduced computational load means that model training can become substantially faster and require less expensive equipment. This opens doors for wider application of advanced computer vision technologies in areas where resources are limited, such as mobile devices, embedded systems, or even wearable electronics. Additionally, more efficient use of data and computational power can contribute to creating more robust and energy-efficient AI systems, which is an important step toward "green" artificial intelligence. The versatility of the T+ML architecture also suggests that it may be applicable to a wide range of computer vision tasks, from object recognition and image segmentation to video stream analysis and 3D reconstruction.

Conclusion: The T+ML architecture, based on the theory of active perception, represents a promising direction in the development of computer vision. Moving away from processing arbitrary patches in favor of structured "building blocks" promises to revolutionize the process of training neural networks, making it faster, more cost-effective, and more accessible. If these initial results are confirmed in larger-scale research, we may witness a true breakthrough that will allow AI to "see" the world more meaningfully and efficiently than ever before.

ZK
Hamidun News
AI news without noise. Daily editorial selection from 400+ sources. A product by Zhemal Khamidun, Head of AI at Alpina Digital.

Want to stop reading about AI and start using it?

AI News is a curated feed of AI/tech news. Hamidun Academy teaches you to use AI systematically in your work.

What do you think?
Loading comments…