📑

英語で論文執筆する人を助けるかもしれない英語表現集

2023/02/25に公開

内容はv0.0.3くらいのイメージ.

Abstract

査読付きの国際会議やジャーナルに論文を投稿する際にネイティブではない人々が皆乗り越えねばならないのが言語の壁である.
昨今は英語での執筆をサポートしてくれるツールが多く存在しており[ DeepL, Grammarly, ChatGPT, Thesaurus.com ],英語に苦しむ人々を取り巻く状況は改善していると言える.
しかしこういったツールを用いるにしても,そもそも英語表現の存在を知っているのと知らないのとでは大きく違うだろう.欲を言えば実際にトップ会議などに採択されているような論文で使われている表現から欲しいものをピックアップできれば嬉しい.
そこで本記事では,8割自分のため,残りの2割で多少の人々が救われると良いなという動機で,項目ごとに使えそうな表現と実際の用例そのものをAI系の論文(トップ会議,ジャーナルの査読を通ったものを基本とする)からピックアップしてまとめる.なお内容は気が向いた時に追記する形で継続的にアップデートする予定.
これを使ってリジェクトされても知ったことではn

注意書き

大事な前提として, 本記事では論文から一節を文脈を無視して抽出する都合上細かいニュアンスは勘案されていない. そもそも細かいニュアンスを考慮して文を書くことは私含め素人には非常に難しいので,十分に訓練されたエキスパートに英文添削を依頼することをさぼってはいけない.
また,専門分野特有の表現などはネイティブであってもわからないので,最終的には分野のエキスパートに見てもらおう.

記事をアップデートする自身への注意

使えそうな表現と実際の用例を紐づけておくことが価値になるのでできるだけ出所を共に明記すること.

Introduction, Conclusion

研究の対象タスクは何か

We address

In this paper, we address the degradation problem by introducing a deep residual learning framework.

from "Deep Residual Learning for Image Recognition" [He+, CVPR16]

タスクの難しさ

Seems daunting

the task of learning an open set of visual concepts from natural language seems daunting.

from "Learning Transferable Visual Models From Natural Language Supervision" [Radford+, ICML21]

Related work

書き出し

Explore

Another concurrent work explores a similar line of thinking to build multi-resolution feature maps on Transformers.

from "Swin Transformer: Hierarchical Vision Transformer Using Shifted Windows" [Liu+, ICCV21]

A line of work (Tan & Bansal, 2019; Lu et al., 2019; Li et al., 2019; Chen et al., 2020b; Li et al., 2020; Su et al., 2020; Zhang et al., 2021) has explored vision-language pretraining (VLP) that learns a joint representation of both modalities to be finetuned on vision-language (VL) benchmarks, such as visual question answering (VQA) (Goyal et al., 2017).

from "SimVLM: Simple Visual Language Model Pretraining with Weak Supervision" [Wang+, ICLR22]

Line of work

In comparison, another line of work (Radford et al., 2021; Ramesh et al., 2021; Jia et al., 2021) utilizes weakly labeled/aligned data crawled from the web to perform pretraining, achieving good performance and certain zero-shot learning capability on image classification and image-text retrieval.

from "SimVLM: Simple Visual Language Model Pretraining with Weak Supervision" [Wang+, ICLR22]

Proposed Method

何を提案するか

We present

We present a combined scaling method - named BASIC - that achieves 85.7% top-1 accuracy on the ImageNet ILSVRC-2012 validation set without learning from any labeled ImageNet example. This accuracy surpasses best published similar models - CLIP and ALIGN - by 9.3%.

from "Combined scaling for open-vocabulary image classification" [Pham+, arXiv21]

We propose

We propose OTTER (Optimal TransporT distillation for Efficient zero-shot Recognition), which uses online entropic optimal transport to find a soft image-text match as labels for contrastive learning.

from "Data Efficient Language-Supervised Zero-Shot Recognition with Optimal Transport Distillation" [Wu, ICLR22]

We demonstrate that

We demonstrate that the simple pre-training task of predicting which caption goes with which image is an efficient and scalable way to learn SOTA image representations from scratch on a dataset of 400 million (image, text) pairs collected from the internet.

from "Learning Transferable Visual Models From Natural Language Supervision" [Radford+, ICML21]

数式表現

Let XXX be YYY. ZZZ <equation>

Let again \mathrm{x} be a layer input, treated as a vector, and \mathcal{X} be the set of these inputs over the training data set. The normalization can then be written as a transformation

\mathrm{x} = \mathrm{Norm}\left(\mathrm{x}, \mathcal{X}\right)

which depends not only on ...

from "Batch normalization: Accelerating deep network training by reducing internal covariate shift" [Ioffe, ICML15]

Denote

where \hat{\mathrm{z}}^l and \mathrm{z}^l denote the output features of the (S)W-MSA module and the MLP module for block l, respectively; W-MSA and SW-MSA denote window based multi-head
self-attention using regular and shifted window partitioning
configurations, respectively.

from "Swin Transformer: Hierarchical Vision Transformer Using Shifted Windows" [Liu+, ICCV21]

Experiments

Setup

Quantitative results

We evaluate

First, we evaluate the effectiveness of the proposed methods (i.e. image-text contrastive learning, contrastive hard negative mining, and momentum distillation)

from "Align before Fuse: Vision and Language Representation Learning with Momentum Distillation" [Li+, NeurIPS21]

We evaluate the representation learning capabilities of ResNet, Vision Transformer (ViT), and the hybrid.

from "An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale" [Dosovitskiy+, ICLR21]

We validate

We also validate the effect of hard negative mining in the last column.

from "Align before Fuse: Vision and Language Representation Learning with Momentum Distillation" [Li+, NeurIPS21]

We also validate our approach on a Franka Panda with a multi-task agent trained from scratch..

from "Perceiver-Actor: A Multi-Task Transformer for Robotic Manipulation" [Shridhar+, CoRL22]

Unveil

However, the results unveil that both models still largely fall behind human performances
by a large margin (47.2% of the best model vs. 90.06% of amateur human testers).

from "SQA3D: Situated Question Answering in 3D Scenes" [Ma+, ICLR23]

Qualitative results

Ablation Study

To study the contributions from ..., we conduct ablation study on...

To study the contributions from each model component, we conduct ablation study on \mathrm{SimVLM}_\mathrm{small} various models with an embedding dimension of 512 and 8 layers.

from "SimVLM: Simple Visual Language Model Pretraining with Weak Supervision" [Wang+, ICLR22]

Ablate important design elements in the proposed XXX

In this section, we ablate important design elements in the proposed Swin Transformer, using ImageNet-1K image classification, ...

from "Swin Transformer: Hierarchical Vision Transformer Using Shifted Windows" [Liu+, ICCV21]

Misc. (Phrase, Vocab., Keywords, Synonyms)

We refer to A as B for brevity

Throughout the paper we refer to PerceiverIO [1] as Perceiver for brevity

from "Perceiver-Actor: A Multi-Task Transformer for Robotic Manipulation" [Shridhar+, CoRL22]

We refer to A as B, for (what B stands for)

We refer to this resulting model as FLAN, for Finetuned Language Net.

from "Finetuned Language Models are Zero-Shot Learners" [Wei+, ICLR22]

Meticulously

\fallingdotseq very thoroughly

We meticulously curate the SQA3D to include diverse situations and interesting questions.

from "SQA3D: Situated Question Answering in 3D Scenes" [Ma+, ICLR23]

Albeit

\fallingdotseq although

late Middle English: from the phrase all be it 'although it be (that').

Albeit these promising advances, their actual performances in real-world embodied environments could still fall short of human expectations, ...

from "SQA3D: Situated Question Answering in 3D Scenes" [Ma+, ICLR23]

Promising

\fallingdotseq good, encouraging, favorable, positive, blight

Albeit these promising advances, their actual performances in real-world embodied environments could still fall short of human expectations, ...

from "SQA3D: Situated Question Answering in 3D Scenes" [Ma+, ICLR23]

Endeavor

\fallingdotseq try, attempt, undertake

The benefits of increasing the number of parameters come from two factors: additional computations at training and inference time, and increased memorization of the training data. In this work, we endeavor to decouple these, by exploring efficient means of augmenting language models with a massive-scale memory without significantly increasing computations.

from "Improving Language Models by Retrieving from Trillions of Tokens" [Borgeaud+, ICML22]

In recent years, the endeavor of building intelligent embodied agents has delivered fruitful achievements.

from "SQA3D: Situated Question Answering in 3D Scenes" [Ma+, ICLR23]

Subsequently

s. Subsequently, we adopt the “exact match” as our evaluation metric, i.e., the accuracy of answer classification in the test set.

from "SQA3D: Situated Question Answering in 3D Scenes" [Ma+, ICLR23]

[synonyms] auxiliary

  • additional

[synonyms] akin to

  • similar to

[synonyms] intriguing

  • interesting

from "Transformers Learn In-Context by Gradient Descent" [Oswald+, ICML23]

[synonyms] sorely

  • only

from "Transformers Learn In-Context by Gradient Descent" [Oswald+, ICML23]

[keyword] challenging

おきもち()

国際会議論文やジャーナルには日常的に目を通すと思うが,筆者は基本的にdeepLやその他のツールを用いて全体を日本語化することはしていない.するとある内容を書こうと思って一番最初に頭に出てくるのが英語の書き出し,みたいなことがたまーーーーーーーに起こるようになってくる.横文字に慣れ過ぎて日本語の表現を忘れるやつの酷いバージョンかもしれん😢

勿論素早く大量の論文をかみ砕くために要約ツールや翻訳ツールを(というより使えるツールなら何でも)利用するのは非常に良いことだし,それを否定するのは愚かなことだと思う.一方で論文を一々日本語化しないで読んでいれば英語論文での表現方法の知識はおまけ感覚でついてくるのでお得かもしれない.

情報を自分が読み込みやすいように圧縮 and/or 加工して読み込んでいると,圧縮で抜け落ちた情報に関する知識はつかないね...という.

To sum up

英語で読んで,英語で書こうぜ!(台無し)

GitHubで編集を提案

Discussion