A PyTorch GPU Memory Leak Example

I ran into this GPU memory leak issue when building a PyTorch training pipeline. After spending quite some time, I finally figured out this minimal reproducible example. Kicking off the training, it shows constantly increasing allocated GPU memory. This “AverageMeter” has been used in many popular repositories (e.g., https://github.com/facebookresearch/moco). It’s by-design tracking the average of […]

[Bug] g++4.6 参数顺序

遇到一个bug, 看起来像是g++-4.6的问题。问题是这样的。这个源文件用到了OpenCV: //< file: test.cpp #include int main (int argc, char** argv) { cv::Mat image; return 0; } 用这样一行命令编译: g++-4.6 `pkg-config --libs opencv` -o test.bin test.cpp 遇到了错误: /tmp/ccs2MlQz.o: In function `cv::Mat::~Mat()': test.cpp:(.text._ZN2cv3MatD2Ev[_ZN2cv3MatD5Ev]+0x39): undefined reference to `cv::fastFree(void*)' /tmp/ccs2MlQz.o: In function `cv::Mat::release()': test.cpp:(.text._ZN2cv3Mat7releaseEv[cv::Mat::release()]+0x47): undefined reference to `cv::Mat::deallocate()' collect2: ld returned 1 exit status 错误的原因应该是g++没有正确的链接到OpenCV的库。各种尝试之后发现只要调换一下参数的位置就可以正常编译 -_-!! 改用这样一行命令编译就没有问题了。 […]

libstdc++ 4.6 type_traits 的一个bug

如果你的编译环境和我一样，然后又在用C++11的时候，不小心直接或者间接用到了<chrono>这个头文件，应该就会遇到这个bug。完整的错误信息如下: || In file included from /usr/lib/gcc/x86_64-linux-gnu/4.6/../../../../include/c++/4.6/thread:37: /usr/include/c++/4.6/chrono|240 col 10| error: cannot cast from lvalue of type ‘const long’ to rvalue reference type ‘rep’ (aka ‘long &&’); types are not compatible || : __r(static_cast(__rep)) { } || ^~~~~~~~~~~~~~~~~~~~~~~ /usr/include/c++/4.6/chrono|128 col 13| note: in instantiation of function template specialization ‘std::chrono::duration >::duration’ requested here || return […]