TWC #18

TWC Team

Dec 6, 2022 • 6 min read

State-of-the-art papers with Github avatars of researchers who released code, models (in most cases) and demo apps (in few cases) along with their paper. Image created from papers described below

State-of-the-art (SOTA) updates for 28 Nov – 4 Dec 2022.

This weekly newsletter highlights the work of researchers who produced state-of-the-art work breaking existing records on benchmarks. They also

authored their paper
released their code
released models in most cases
released notebooks/apps in few cases

Nearly half of released source code licenses allow commercial use with just attribution. Machine Learning powered companies owe their existence at least in part to the work of these researchers. Please consider supporting open research by starring/sponsoring them on Github

New records were set on the following tasks (in order of papers)

Image Super-Resolution
Anomaly detection
Semantic Segmentation
Anomaly detection
3D Point Cloud Registration (3D point cloud registration is like taking multiple pictures of an object from different angles and then piecing them together to create a single, comprehensive image - a layman's explanation generated by ChatGPT better than definitions found on the web).

This weekly is a consolidation of daily twitter posts tracking SOTA researchers. Daily SOTA updates are also done on @twc@sigmoid.social - "a twitter alternative by and for the AI community"

To date, 27.7% (92,934) of total papers (335,599) published have code released along with the papers (source).

SOTA details below are snapshots of SOTA models at the time of publishing this newsletter. SOTA details in the link provided below the snapshots would most likely be different from the snapshot over time as new SOTA models emerge.

#1 in Image Super-Resolution on 5 datasets

Paper:**Perception-Oriented Single Image Super-Resolution using Optimal Objective Estimation**

Github code with pretrained models **released by** Seung Ho Park (first author in paper)

Model Name: SROOE

Notes: Single-image super-resolution (SISR) networks trained with perceptual and adversarial losses provide high-contrast outputs compared to those of networks trained with distortion-oriented losses, such as L1 or L2. However, it has been shown that using a single perceptual loss is insufficient for accurately restoring locally varying diverse shapes in images, often generating undesirable artifacts or unnatural details. For this reason, combinations of various losses, such as perceptual, adversarial, and distortion losses, have been attempted, yet it remains challenging to find optimal combinations. This paper proposes a new SISR framework that applies optimal objectives for each region to generate plausible results in overall areas of high-resolution outputs. Specifically, the framework comprises two models: a predictive model that infers an optimal objective map for a given low-resolution (LR) input and a generative model that applies a target objective map to produce the corresponding SR output. The generative model is trained over the proposed objective trajectory representing a set of essential objectives, which enables the single network to learn various SR results corresponding to combined losses on the trajectory. The predictive model is trained using pairs of LR images and corresponding optimal objective maps searched from the objective trajectory.

Demo page: No demo page yet. However Github page has several examples like the one below

License: Apache-2.0 license

#1 in Anomaly detection on 2 datasets

Paper:**Residual Pattern Learning for Pixel-wise Out-of-Distribution Detection in Semantic Segmentation**

Github code with pretrained models **released by** Yuyuan Liu (first author in paper)

Model Name: RPL+CoroCL

Notes: Semantic segmentation models classify pixels into a set of known (``in-distribution'') visual classes. When deployed in an open world, the reliability of these models depends on their ability not only to classify in-distribution pixels but also to detect out-of-distribution (OoD) pixels. Historically, the poor OoD detection performance of these models has motivated the design of methods based on model re-training using synthetic training images that include OoD visual objects. Although successful, these re-trained methods have two issues: 1) their in-distribution segmentation accuracy may drop during re-training, and 2) their OoD detection accuracy does not generalise well to new contexts (e.g., country surroundings) outside the training set (e.g., city surroundings). This paper mitigates these issues with: (i) a new residual pattern learning (RPL) module that assists the segmentation model to detect OoD pixels without affecting the inlier segmentation performance; and (ii) a novel context-robust contrastive learning (CoroCL) that enforces RPL to robustly detect OoD pixels among various contexts.

Demo page: No demo page yet.

License: MIT license

#1 in Semantic Segmentation on S3DIS dataset

Paper:**Meta Architecture for Point Cloud Analysis**

Github code with pretrained models **released by** Haojia Lin (first author in paper)

Model Name: PointMetaBase-XXL

Notes: Recent advances in 3D point cloud analysis bring a diverse set of network architectures to the field. However, the lack of a unified framework to interpret those networks makes any systematic comparison, contrast, or analysis challenging, and practically limits healthy development of the field. This paper proposes a unified framework called PointMeta, to which the popular 3D point cloud analysis approaches could fit. This has three benefits. First, it allows one to compare different approaches in a fair manner, and use quick experiments to verify any empirical observations or assumptions summarized from the comparison. Second, it enables one to think across different components, and revisit common beliefs and key design decisions made by the popular approaches. Third, based on the learnings from the previous two analyses, by doing simple tweaks on the existing approaches, one can derive a basic building block that shows very strong performance in efficiency and effectiveness.

Demo page: No demo page yet.

License: MIT license

#1 in Anomaly detection on 3 datasets

Paper:**Attribute-based Representations for Accurate and Interpretable Video Anomaly Detection**

Github code **released by** Tal Reiss (first author in paper).

Model Name: AI-VAD

Notes: Video anomaly detection (VAD) is a challenging computer vision task with many practical applications. As anomalies are inherently ambiguous, it is essential for users to understand the reasoning behind a system's decision in order to determine if the rationale is sound. This paper proposes a method that improves VAD accuracy and interpretability using attribute-based representations. Every object is represented by its velocity and pose. The anomaly scores are computed using a density-based approach.

Demo page: No demo page yet.

License: Commercial use permitted only under a commercial license

#1 in 3D Point Cloud Registration on 2 datasets

Paper:**Challenging the Universal Representation of Deep Models for 3D Point Cloud Registration**

#1 in Point Cloud Registration on 2 datasets

Github Placeholder code **released by** David Bojanić (first author in paper). A benchmark dataset also released

Model Name: Greedy Grid Search

Notes: This paper experimentally tests several state-of-the-art learning-based methods for 3D point cloud registration against a non-learning baseline registration method. The baseline method either outperforms or achieves comparable results w.r.t. learning based methods. This paper also introduces a dataset on which learning based methods have a hard time to generalize.

Demo page: No demo page yet.

License: MIT license