编译自: https://opensource.com/article/17/7/20-sysadmin-commands
但是本文和原文不完全一样,有些命令也已经被替换。

在云计算,虚拟化的世界里,Windows 系统管理员转型到 Linux 系统管理员,需要一些基础的命令行知识,它们不单单可以应用于 Linux 操作系统,还可以应用于容器,虚拟机和 bare metal。

1. curl

curl 是抓取 URL的工具,可以用来检测服务的状态,例如一个应用,甚至一个数据库。

譬如下面的例子,报告 500 错误,表明不能链接到 MongoDB 数据库:

$ curl -I -s myapplication:5000

HTTP/1.0 500 INTERNAL SERVER ERROR

-I 显示 HTTP头,
-s 不输出返回的正文

$ curl -I -s database:27017

HTTP/1.0 200 OK

常用的一个方法是检查网页是否正常工作:

$ curl -I -s https://opensource.com HTTP/1.1 200 OK

2. python -m json.tool / jq

采用 curl 抓取到 API 调用的 JSON 格式输出后,或许难以阅读,这个时候就可以用 Python 内置的 JSON 库来格式化输出。

cat test.json | python -m json.tool

{    “properties”: {        “age”: {            “description”: “Age in years”,             “minimum”: 0,             “type”: “integer”        },         “firstName”: {            “type”: “string”        },         “lastName”: {            “type”: “string”        }    },     “required”: [        “firstName”,         “lastName”    ],     “title”: “Person”,     “type”: “object”}

如果需要更高级的 JSON 文件分析,可以安装 jq,来格式化。

$ cat test.json | jq{  “title”: “Person”,   “type”: “object”,   “properties”: {    “firstName”: {      “type”: “string”    },     “lastName”: {      “type”: “string”    },     “age”: {      “description”: “Age in years”,       “type”: “integer”,       “minimum”: 0    }  },   “required”: [    “firstName”,     “lastName”  ]}

3. ls

ls 可以说是最简单的,最最基础的命令了。 但是需要读者了解 rwx 的权限规则。 对应的就是 chmod/chown/chgrp。以及对 755,644等权限模式的理解。另外还有 umask 的掌握。另外更高级的是对 sticky bit 的理解。

$ ./myapp bash: ./myapp: Permission denied $ ls -l myapp -rw-r–r–. 1 root root 33 Jul 21 18:36 myapp

4. tail,查看文件尾部。

tail 默认输出一个文件的最后10行。如果用 tail -f 可以一直查看文件的输出,在查看日志文件时十分有用。

example_tail.png

如果只要查看最后两行,可以如下的命令:

 tail -2 /var/log/httpd/access_log

5. cat:打印文件内容

cat  其实是一个不太常用的命令,因为打印长文件的时候,不会分页。我推荐使用 more 或者 less 来查看文件。

$ less requirements.txt flask flask_pymongo

 

6. grep:在文件中查找字符

grep 是最常用的文件中查找字符串的工具。如果了解正则表达式,一起来使用的话是强大的武器,还有 -B -A 前后参数,可以查找匹配到的前后 n 行。 grep -ir 是常用的当前文件夹下递归搜索的好办法。 但是不建议在有很深的目录下使用。

cat tomcat.log | grep org.apache.catalina.startup.Catalina.start

01-Jul-2017 18:03:47.542 INFO [main] org.apache.catalina.startup.Catalina.start Server startup in 681 ms

7. ps:查看进程表

ps 的BSD 版本实现不需要在参数前加减号。最常用的是
ps auxww
相关的 kill 命令, 或者 pkill  命令,可以杀死对应的进程, 需要了解信号量的含义。

$ ps -ef

UID        PID  PPID  C STIME TTY          TIME CMD root         1     0  2 18:55 ?        00:00:02 /docker-java-home/jre/bi root        59     0  0 18:55 pts/0    00:00:00 /bin/shroot        75    59  0 18:57 pts/0    00:00:00 ps -ef

pgrep 是另外一个代替下面命令的好工具:

$ ps -ef | grep tomcat root         1     0  1 18:55 ?        00:00:02 /docker-java-home/jre/bi

8. env:设置或者打印环境变量

env  用来查看相关的环境变量是否设置,如果知道变量的名字, 直接 echo $HOME 也就可以查看 HOME 的值了。

$ env

PYTHON_PIP_VERSION=9.0.1
HOME=/root
DB_NAME=test
PATH=/usr/local/bin:/usr/local/sbin
LANG=C.UTF-8
PYTHON_VERSION=3.4.6
PWD=/
DB_URI=mongodb://database:27017/test

 

9. top: 查看占用内存和CPU最多的进程

 top 有很多子命令可以调整显示的,按 h 可以获得帮助。

example_top.png

  10. netstat:查看网络连接状态

netstat 在检查网络状态时十分有用,以下命令就可以检查所有打开的端口及其相关的进程。

# netstat -tulpn

example_netstat.png

 11. ip address

If ip address does not work on your host, it must be installed with the iproute2package. ip address shows the interfaces and IP addresses of your application’s host. You use ip address to verify your container or host’s IP address. For example, when your container is attached to two networks, ip address can show which interface connects to which network. For a simple check, you can always use the ip addresscommand to get the IP address of the host. The example below shows that the web tier container has an IP address of 172.17.0.2 on interface eth0.

example_ipaddr_0.png

Using ip address shows that the IP address of the eth0 interface is 172.17.0.2

12. lsof

lsof lists the open files associated with your application. On some Linux machine images, you need to install lsof with the lsof package. In Linux, almost any interaction with the system is treated like a file. As a result, if your application writes to a file or opens a network connection, lsof will reflect that interaction as a file. Similar to netstat, you can use lsof to check for listening ports. For example, if you want to check if port 80 is in use, you use lsof to check which process is using it. Below, you can see that httpd (Apache) listens on port 80. You can also use lsof to check the process ID of httpd, examining where the web server’s binary resides (/usr/sbin/httpd).

example_lsof.png

Lsof shows that httpd listens on port 80. Examining httpd’s process ID also shows all the files httpd needs in order to run.

The name of the open file in the list of open files helps pinpoint the origin of the process, specifically Apache.

13. df

You can use df (display free disk space) to troubleshoot disk space issues. When you run your application on a container orchestrator, you might receive an error message signaling a lack of free space on the container host. While disk space should be managed and optimized by a sysadmin, you can use df to figure out the existing space in a directory and confirm if you are indeed out of space.

example_df.png

Df shows the disk space for each filesystem, its absolute space, and availability.

The -h option prints out the information in human-readable format. The example above shows plenty of disk space on this host.

14. du

To retrieve more detailed information about which files use the disk space in a directory, you can use the du command. If you wanted to find out which log takes up the most space in the /var/log directory, for example, you can use du with the -h(human-readable) option and the -s option for the total size.

$ du -sh /var/log/*1.8M  /var/log/anaconda 384K  /var/log/audit 4.0K  /var/log/boot.log0 /var/log/chrony 4.0K  /var/log/cron 4.0K  /var/log/maillog 64K /var/log/messages

The example above reveals the largest directory under /var/log to be /var/log/audit. You can use du in conjunction with df to determine what utilizes the disk space on your application’s host.

15. id

To check the user running the application, use the id command to return the user identity. The example below uses Vagrant to test the application and isolate its development environment. After you log into the Vagrant box, if you try to install Apache HTTP Server (a dependency) the system states that you cannot perform the command as root. To check your user and group, issue the id command and notice that you are running as the “vagrant” user in the “vagrant” group.

$ yum -y install httpd Loaded plugins: fastestmirror You need to be root to perform this command. $ iduid=1000(vagrant) gid=1000(vagrant) groups=1000(vagrant) context=unconfined_u:unconfined_r:unconfined_t:s0-s0:c0.c1023

To correct this, you must run the command as a superuser, which provides elevated privileges.

16. chmod

When you run your application binary for the first time on your host, you may receive the error message “permission denied.” As seen in the example for ls, you can check the permissions of your application binary.

$ ls -ltotal 4-rw-rw-r–. 1 vagrant vagrant 34 Jul 11 02:17 test.sh

This shows that you don’t have execution rights (no “x”) to run the binary. chmod can correct the permissions to enable your user to run the binary.

$ chmod +x test.sh[vagrant@localhost ~]$ ls -ltotal 4-rwxrwxr-x. 1 vagrant vagrant 34 Jul 11 02:17 test.sh

As demonstrated in the example, this updates the permissions with execution rights. Now when you try to execute your binary, the application doesn’t throw a permission-denied error. Chmod may be useful when you load a binary into a container as well. It ensures that your container has the correct permissions to execute your binary.

17. dig / nslookup

A domain name server (DNS) helps resolve a URL to a set of application servers. However, you may find that a URL does not resolve, which causes a connectivity issue for your application. For example, say you attempt to access your database at the mydatabase URL from your application’s host. Instead, you receive a “cannot resolve” error. To troubleshoot, you try using dig (DNS lookup utility) or nslookup (query Internet name servers) to figure out why the application can’t seem to resolve the database.

$ nslookup mydatabase Server:   10.0.2.3 Address:  10.0.2.3#53** server can‘t find mydatabase: NXDOMAIN

Using nslookup shows that mydatabase can’t be resolved. Trying to resolve with digyields the same result.

$ dig mydatabase ; <<>> DiG 9.9.4-RedHat-9.9.4-50.el7_3.1 <<>> mydatabase;; global options: +cmd;; connection timed out; no servers could be reached

These errors could be caused by many different issues. If you can’t debug the root cause, reach out to your sysadmin for more investigation. For local testing, this issue may indicate that your host’s nameservers aren’t configured appropriately. To use these commands, you will need to install the BIND Utilities package.

18. iptables

iptables blocks or allows traffic on a Linux host, similar to a network firewall. This tool may prevent certain applications from receiving or transmitting requests. More specifically, if your application has difficulty reaching another endpoint, iptables may be denying traffic to the endpoint. For example, imagine your application’s host cannot reach Opensource.com. You use curl to test the connection.

$ curl -vvv opensource.com* About to connect() to opensource.com port 80 (#0)*   Trying 54.204.39.132…* Connection timed out* Failed connect to opensource.com:80; Connection timed out* Closing connection 0curl: (7) Failed connect to opensource.com:80; Connection timed out

The connection times out. You suspect that something might be blocking the traffic, so you show the iptables rules with the -S option.

$ iptables -S-P INPUT DROP-P FORWARD DROP-P OUTPUT DROP-A INPUT -p tcp -m tcp –dport 22 -j ACCEPT-A INPUT -i eth0 -p udp -m udp –sport 53 -j ACCEPT-A OUTPUT -p tcp -m tcp –sport 22 -j ACCEPT-A OUTPUT -o eth0 -p udp -m udp –dport 53 -j ACCEPT

The first three rules show that traffic drops by default. The remaining rules allow SSH and DNS traffic. In this case, follow up with your sysadmin if you require a rule to allow traffic to external endpoints. If this is a host you use for local development or testing, you can use the iptables command to allow the correct traffic. Use caution when adding rules that allow traffic to your host.

19. sestatus

You usually find SELinux (a Linux security module) enforced on an application host managed by an enterprise. SELinux provides least-privilege access to processes running on the host, preventing potentially malicious processes from accessing important files on the system. In some situations, an application needs to access a specific file but may throw an error. To check if SELinux blocks the application, use tailand grep to look for a “denied” message in the /var/log/audit logging. Otherwise, you can check to see if the box has SELinux enabled by using sestatus.

$ sestatus SELinux status:                 enabled SELinuxfs mount:                /sys/fs/selinux SELinux root directory:         /etc/selinux Loaded policy name:             targeted Current mode:                   enforcing Mode from config file:          enforcing Policy MLS status:              enabled Policy deny_unknown status:     allowed Max kernel policy version:      28

The output above indicates that the application’s host has SELinux enabled. On your local development environment, you can update SELinux to be more permissive. If you need help with a remote host, your sysadmin can help you determine the best practice for allowing your application to access the file it needs.

20. history

When you issue so many commands for testing and debugging, you may forget the useful ones! Every shell has a variant of the history command. It shows the history of commands you have issued since the start of the session. You can use history to log which commands you used to troubleshoot your application. For example, when you issue history over the course of this article, it shows the various commands you experimented with and learned.

$ history    1  clear    2  df -h    3  du

What if you want to execute a command in your previous history, but you don’t want to retype it? Use ! before the command number to re-execute.

example_history.png