标签 ‘Solaris’ 归档文章

Solaris 8 性能优化

尽管是针对 Solaris 8 的,但是读起来“风韵犹存”。

When talking about tuning the performance of a UNIX system, sys admins generally work through a number of concepts, such as the current perceived performance, the difference between this level and a desired level of performance, the optimal level claimed by the application vendor, how to measure performance differences, and finally, what to tune and test.

The current perceived level of performance is based on what your users tell you and what you observe in your regular system monitoring. In a networked environment, if you have some users on the end of a 100-Mbit full-duplex connection, and others who are still using 10-Mbit half-duplex, then the 10-Mbit users will always tell you that the system is slow. However, you might have to wait for a serious application overload or resource crunch before the 100-Mbit users complain. There are also copious logfiles and measurable kernel statistics to facilitate system-level monitoring. Some sites also make use of tools such as CA UniCentre TNG, HP OpenView, or Sun’s SunMC to monitor via snmp MIBs. However, these are tools that must be used by competent systems and application administrators and should not be used alone.

The desired level of performance is somewhat different. Benchmarks are sometimes less than useful, but if you are using an application or system from a large vendor, they should have a specific benchmarking group that will replicate your environment and allow you to test realistic loads. You can trust a benchmark done in this manner because it is specifically tailored to your company’s requirements. If you don’t have the time or resources to organize a tailored benchmark, you’ll have to spend extra time doing it yourself. Before you get started, determine what measurements will be valid for your environment and how to obtain them. Also, work with your users in regard to when you can test tuning and monitoring. Keep the users informed of what you are doing so that they can recognize changes and appreciate the value that you provide to them.

How to Measure before and after Performance Differences

To tune effectively, you must first decide what you want to achieve. Here are some tips to consider:

* Determine your criteria for success.
* Log everything, and run Sun Explorer data gatherer every time you make changes in order to keep a snapshot of your system configuration at each point.
* Analyze the logs carefully.
* Measure the changed performance against what you wish to achieve.
* Be methodical.

If you do not already keep a record of system and resource loads, and have no method of monitoring the applications on your system, start simply. You can measure cpu/memory/swap utilization, disk utilization (specifically your application filesystems), and network utilization using standard system tools. Solaris 8 allows you to use the kstat(1M) utility to view various system properties in a variety of output formats. df -k is also useful. You can write scripts to dump these statistics to a file on a periodic basis or use tools such as apache and mrtg1 to give you a regularly refreshed view. A little bit of shell, awk, sed, or Perl will give you lots of numbers to graph.

The most useful monitoring of performance requires frequent and regular snapshots of your system for a period of time that you, your application admins, and users decide is long enough. My rule of thumb is to accrue not less than eight days of monitoring data (preferably including a peak load period) so that you have obvious peaks and troughs to look at. Another aspect of measurement is to survey the users both before and after to determine how they believe performance has changed in the course of the monitoring period. If your performance tuning has resulted in differences that the users cannot perceive, then I would say the performance hasn’t really been tuned. Useful questions to ask include:

* How long did it take to perform a standard task (measured in seconds) at 9am, 12noon, 2pm, 5pm before you started tuning?
* How long does this task take now?
* Do you think that the performance has (improved, stayed the same, become worse) over the past (X) days? (At least a seven-point scale is recommended for this question).

You, the sys admin, also need to measure these standard tasks yourself so that you have a frame of reference for your measurements.

What to Tune and When to Tune It

I/O and Buffers (Including VxVM and VxFS)

The most common target of performance tuning is obviously I/O, and in Solaris there are several buffers and tunables that we can work on. You are probably aware of the relative speeds of disks versus ram, and how this difference in speed has changed dramatically over the past 20 years. This is particularly important when configuring “virtual memory” for your server because of the way Solaris implements paging2. A primary concern here is to avoid paging data to your slow disk (even a 10000-rpm fcal disk is still slow) and ensure that the data stays in ram until it is sent down the SCSI or tcp stack to the next appropriate point.

How does one prevent paging? This question requires an appreciation of “good” and “bad” paging. So-called “good” paging happens when the system allocates or reclaims pages from a process, whereas “bad” paging is when this allocation relies upon a disk device and the system incurs a penalty for access. So, we don’t particularly care about the “good” paging, because the intelligence built into the kernel’s paging algorithms ensure that it happens as infrequently as possible. The priority_paging setting is necessary for Solaris 2.6 and Solaris 7. Neither Solaris 8 or 9 require this, because a modified version of priority_paging, called the cyclic page cache, was integrated into the new kernel. “Bad” paging is also known as swapping, and in extreme circumstances can result in such a high I/O load that your I/O subsystem is said to be thrashing.

You must specifically be aware of two tunables: ncsize and lotsfree from the Solaris side. If you are running Veritas Volume Manager (vxvm), then you need to be aware of volkio and friends. If you are running Veritas FileSystem (vxfs), you must carefully tune vxfs:vxfs_ninode and vxfs:vx_bc_bufhwm. I will address each of these variables in turn.

ncsize

The “nc” in ncsize refers to the Name Cache, and is used to set the size of the Directory Name Lookup Cache, or dnlc for short. This is an optimization to give you better performance from your filesystems because inodes are cached in memory and only flushed out of the cache to disk if they have been idle for quite a while. The default setting for ncsize depends on two other variables, max_nprocs and maxusers with the following relationship:

ncsize = (4 * (max_nprocs+maxusers)) + 320
max_nprocs = 10 + (16 * maxusers)
maxusers = physmem – 2

Setting maxusers to less than the maximum possible value of 2048 used to be a required configuration task going back to SunOS 4.x. However, Solaris 2.x does not require maxusers to be set, and you can actually decrease the performance of your system by doing so. If maxusers is set to the system default, you might not have a very large dnlc. The recommendation is first to set maxusers to 2048, reboot, and after about a week of average usage, examine the kernel statistics on the dnlc (scroll forward to the line starting with dnlcstat). Under both Solaris 8 and 9, you can use the kstat(1M) utility. When looking at these statistics, note the value of misses/hits and see how close that ratio is to 0. The closer it is to 0, the less you need to worry about tuning ncsize. For example, my workstation (an Ultra80) has been up for about 36.5 days, and has this output from kstat -m unix -n dnlcstats:

module: unix instance: 0
name: dnlcstats class: misc
crtime 63.090334184

hits 141141001
misses 1953220
negative_cache_hits 2070966
pick_free 802468
pick_heuristic 1102434
pick_last 257563

snaptime 3162130.15322328 (ends here)

In my case, this works out to be 0.01383, meaning that my dnlc efficiency is very good. On a major fileserver that I use, the values are considerably different:

module: unix instance: 0
name: dnlcstats class: misc
crtime 180.496237386

hits 313301011
misses 77458379
negative_cache_hits 14429400
pick_free 3664314
pick_heuristic 51820181
pick_last 24067557
snaptime 6143454.64953917 (ends here)

Here, the ratio works out to be 0.2472, indicating that there is only about a 75% efficiency of the dnlc for this server, so ncsize should be tuned upwards. For the system above, I would recommend that the dnlc size be doubled. It is currently set to the system-calculated default of 139488, which for this E420R running in 64-bit mode takes up slightly more than 8.7mb of kernel memory 3. The system can easily afford double this amount of kernel memory for dnlc, given its workload.

lotsfree

This variable is used as the boundary condition for when to invoke the page scanner and make it look for pages to free. Typically, this is set to 1/64th the number of physical pages in your system, which works reasonably well. However, as the Kernel Tunable Parameters Manual indicates, if your system load is such that it cannot cope with sudden sharp increases in demand for memory, then you should seriously consider increasing this value. A general rule I’ve seen recommended is to set lotsfree to be 1/16th of the physmem value, rather than the default 1/64th. This allows the page scanner to activate in a more timely manner. When combined with priority paging or the cyclic page cache of Solaris 8, you should see that the memory load curve of your system is much smoother than before.

Veritas FileSystem Variables — vxfs:vxfs_ninode and vxfs:vx_bc_bufhwm

These two variables are particularly important to tune. These have an interdependency on ncsize that is related to their usage of kernel memory. When you use Veritas FileSystem (VxFS), you must tune ncsize to be within 50% and 80% of the value of vxfs:vxfs_ninode in order to achieve good performance. Good performance in this context means that the performance curve of CPU cycles and memory used stays relatively close to the performance curve algorithm that Veritas builds into the product.

The upper limit on the amount of kernel memory that VxFS will allocate for its cache is set with vxfs:vx_bc_bufhwm — the Buffer Cache’s Buffer High Water Mark. Once the allocated amount reaches this limit, VxFS inodes are flushed from the cache. The vxfs:vxfs_ninode variable is the limit on the number of VxFS inode structures held in memory. This number is usually determined to be 125% of the value of ncsize for reasons of CPU cycle efficiency, and if you have your dnlc set by way of setting maxusers to 2048, then ncsize/vxfs:vxfs_ninode is approximately 0.78. This is very close to the 80% figure mentioned in the VxFS installation guide.

Veritas Volume Manager Variables

Many sites use Veritas Volume Manager (VxVM) to provide data protection and enhance their systems’ performance. Mirrored filesystems, striped data filesystems, and RAID-5 filesystems are all features that VxVM provides. If you use VxVM as well as VxFS, then you should look at vxio:vol_maxio. This variable controls the maximum size of io requests that are sent down the SCSI chain without breaking the request up. Veritas recommends that this tunable not exceed 20% of kernel memory or physical memory (whichever is smaller), and that you match this tunable to the size of your widest stripe. Apart from this tunable, there are no others that must be tuned, and you should only really look at tuning the variables specified in the Veritas Volume Manager Administrator Guide if you are directed to by an appropriate technical contact within Veritas or a Veritas partner. This is because Veritas, like Sun with Solaris, has spent a lot of time and effort in making sure that the self-tuning algorithms work well for the vast majority of systems. If you are fortunate enough to have the sort of large installation where serious VxVM tuning is necessary, then you should engage Veritas or Sun Professional Services to analyze and tune your configuration.

Shared memory, Semaphores, Message Queues

Applications such as rdbms engines (Oracle, Sybase, Informix, DB2, etc.), middleware like MQ-Series and Tuxedo, and some backup packages (Veritas NetBackup) make heavy use of shared memory, semaphore sets, and message queues in order to maximise their performance. For configuring these settings, you should always start with the application vendor’s recommendations, and find out from the vendor how they log a deficiency in these settings. Veritas NetBackup, for example, will dump messages in logfiles like:

waited for empty buffer X times or
waited for full buffer Y times

which in conjunction with the NetBackup Troubleshooting guide will indicate whether you need more semaphores or message queues.

Remember that the kernel will not allow more than 25% of the dedicated kernel space (segkp) to be allocated for the shared memory, message queue, and semaphore structures. So, if you experiment with large values on a machine without much physical memory, you may see different values when you check with the sysdef(1M) utility after rebooting. For my workstation (see the “System Specification File” sidebar), the kernel memory usage under Solaris 8 (64-bit mode) for the example semaphore settings is approximately 15898 Kb, for the message queues approximately 1038 Kb, and for the shared memory itself, approximately 18 Kb. You might be wondering why the overhead for shared memory (shmsys) settings is so small compared with the semaphore (semsys) and message queue (msgsys). That is because the only variable used for administrative purposes with shmsys is shminfo_shmmni. This is the maximum number of shmid_ds structures in the system, each of which is 88 bytes.

The rule of thumb for shared memory, semaphores, and message queues is to tune these in conjunction with your application vendor, because the vendor will have concrete customer data and can advise you appropriately.

SCSI tunables

sd_max_throttle and sd_io_time

Two very common tuning targets are those that relate to the SCSI disk (sd) driver module: sd_max_throttle and sd_io_time. You commonly see these set (or need to set them) if you are using EMC, IBM, or Hitachi storage attached to your Solaris system. The sd_io_time variable is the limiter on how long an I/O can be outstanding before an error condition is returned. The Solaris default is 60 seconds (0x3c), but this is often set to 31 seconds (0x1f). The variable sd_max_throttle provides the limit on how many outstanding I/Os the system can handle at any one time and is commonly referred to as the “queue depth.” A common setting for this (the default is 256 or 0×100) is 25, which is the mandated setting from JNI Corporation for use with their fcaw driver4 and EMC5 or Hitachi storage6.

The general recommendation for both these variables, however, is that if you are not required to set them by your storage vendor, then leave them unset in your /etc/system file and let Solaris handle the settings for you. This is because these variables cannot be set on a per-LUN or per-instance basis: any change made to these two variables affects the entire SCSI subsystem. If you set them to values greater than or much less than your storage subsystem can handle, you run a very real risk of having a badly performing system for disk, tape, and memory operations. Another important aspect to tuning the sd driver is that Sun’s disk storage attaches using either the sd or, if attached using the Fibre Channel (FC) protocol, the ssd drivers. This allows for separate tuning of those stacks, but could well change in a later version of Solaris.

maxphys

The maxphys setting, often seen in conjunction with JNI and Emulex HBAs, is the upper limit on the largest chunk of data that can be sent down the SCSI path for any single request. There are no real issues with increasing the value of this variable to 8 Mb (in /etc/system, set maxphys=8388608), as long as your IO subsystem can handle it. All current Fibre Channel adapters are capable of supporting this, as are most ultra/wide SCSI HBAs, such as those from Sun, Adaptec, QLogic, and Tekram. It is possible (although I have not yet tested it) to set this variable in an (E)IDE-based system, such as a PC running Solaris for Intel, a Sun Ultra 5/10/Blade 100, or the lower end Netra systems. With the current range of (E)IDE disks and at least an ATA-66 interface, the system should be able to support this value for maxphys.

Networking Tunables

At a former employer, I worked closely with the DBAs on a particular system running a financial management application. After monitoring the system for several weeks, the first step in tuning was to get a 100-Mbit switched interface activated. When this 100FDX interface was connected, we noticed an immediate improvement in system performance: less swap in use due to less buffering of data, fewer “wasted” cpu cycles, and a much happier group of users. The change was so significant that we put off implementing the rest of the tuning while we analyzed whether our plan was still relevant. We did not need to do any specific configuration on our server because the switch handled it.

If you need to force your interface, there are several simple ways to do this. To begin, you must be absolutely certain that your Ethernet interface is cabled to plug into a 100-Mbit Ethernet switch, otherwise you will not get any response from your network connection. Let’s look at the two most common methods: an rc script and editing /etc/system.

The boot-time rc script allows you to specify which instance of the interface you want to change the settings for. In the example below, I am setting the properties for my qfe3 interface:

#!/bin/sh
#
# script to force the interface properties for qfe3
# script name is /etc/rc2.d/S50ndd_qfe3
#
ndd -set /dev/qfe instance 3
# force OFF 100Mb half duplex
ndd -set /dev/qfe adv_100hdx_cap 0
# force OFF 100Mb T4
ndd -set /dev/qfe adv_100T4_cap 0
# force ON 100Mb full duplex
ndd -set /dev/qfe adv_100fdx_cap 1
# force OFF autonegotiation (FORCE mode)
ndd -set /dev/qfe adv_autoneg_cap 0
# end of script

Here’s the /etc/system modification method:

set qfe:adv_100hdx_cap=0
set qfe:adv_100T4_cap=0
set qfe:adv_100fdx_cap=1
set qfe:adv_autoneg_cap=0
set qfe:adv_10hdx_cap=0

This method sets all of your qfe interfaces to operate at 100FDX, and if that’s what your switch is configured to do, then you are ready to reboot and enjoy the benefits.

Of the two methods, I recommend using a boot-time rc script. It’s easier to maintain if you write it correctly (because it uses the Bourne shell), and you don’t have to worry about testing by way of a reboot because a quick unplumb/plumb followed by ndd(1m) allows you to make changes while your system is up and running. The /etc/system method is also useful, but given that you cannot really tune each instance separately using this method, it may not be appropriate for every site.

Linux 和 Solaris 下对 timezone 的操作

由 徐永久 发表于 2007年03月14日 14:39。

Linux 的 时区数据库存放在 /usr/share/zoneinfo 下,
系统启动时阅读 /etc/localtime 文件,采用 ln -s 命令即可方便的把时区设置为指定的时区文件。

# ln -fs /etc/localtime /usr/share/zoneinfo/PST8PDT

#zdump -v /etc/localtime | grep 2007
/etc/localtime Sun Mar 11 09:59:59 2007 UTC = Sun Mar 11 01:59:59 2007 PST isdst=0 gmtoff=-28800
/etc/localtime Sun Mar 11 10:00:00 2007 UTC = Sun Mar 11 03:00:00 2007 PDT isdst=1 gmtoff=-25200
/etc/localtime Sun Nov 4 08:59:59 2007 UTC = Sun Nov 4 01:59:59 2007 PDT isdst=1 gmtoff=-25200
/etc/localtime Sun Nov 4 09:00:00 2007 UTC = Sun Nov 4 01:00:00 2007 PST isdst=0 gmtoff=-28800

Linux 设置好时区后,还需要用 hwclock 同步 BIOS 时间和 OS 时间,重新启动后生效。

而 Solaris 下的数据库在 /usr/share/lib/zoneinfo 下,启动时的时区设置文件为

/etc/TIMEZONE -> /etc/default/init

bash-3.00# zdump -v US/Pacific|grep 2007
US/Pacific Wed Mar 14 06:31:44 2007 UTC = Tue Mar 13 23:31:44 2007 PDT isdst=1
US/Pacific Sun Mar 11 09:59:59 2007 UTC = Sun Mar 11 01:59:59 2007 PST isdst=0
US/Pacific Sun Mar 11 10:00:00 2007 UTC = Sun Mar 11 03:00:00 2007 PDT isdst=1
US/Pacific Sun Nov 4 08:59:59 2007 UTC = Sun Nov 4 01:59:59 2007 PDT isdst=1
US/Pacific Sun Nov 4 09:00:00 2007 UTC = Sun Nov 4 01:00:00 2007 PST isdst=0

Solaris 8 的时区补丁为 109809-06
Solaris 10/x86 的时区补丁为 122033

Linux 时区数据库可以搜索 tzdata

Label many disks in Solaris

由 徐永久 发表于 2006年11月30日 05:27。

Solaris 后面挂接 SAN 阵列后,采用 vxfs 建立卷之前,要让 OS 认识这些 LUN 之前必须先 用 format 命令 label disk 。 本脚本可以用来对多个顺序排列的磁盘进行 label ,而无须采用菜单驱动,选磁盘的方式来 label 。

采用 bash 脚本运行

labeldisk() {
# Usage: labeldisk c2t42d 32 49
echo label >/tmp/label.cmd
CT=$1
i=$2
END=$3

echo Labeling Disk from “$CT”$i to “$CT”$END
until [ $i -gt $END ]
do
format -s -d “$CT”$i -f /tmp/label.cmd
i=$((i + 1))
done

rm /tmp/label.cmd
echo Running devfsadm -C …
devfsadm -C
echo Runing vxdctl enable …
vxdctl enable
echo Label Disk Done.
return
}

非常有用之 Unix/Linux 单行脚本(第一集)

由 徐永久 发表于 2006年06月15日 20:26。

删除 core 文件
# find ~ -name core -exec file {} \; -exec rm -i {} \;
查看使用文件的进程
# fuser -u /usr/my_application/foo
搜索字符串
#grep “hello world” `find ./ -name “*” -print -exec file {} \; |grep text | cut -d ‘:’ -f 1`
目录
#alias dir=’ls -Lla|grep ^d’
(more…)

在 Solaris 10 上启动 ntpd

由 徐永久 发表于 2006年03月21日 14:38。

拥有正确的网站时间有时候是非常重要的。 本文描述了在 Solaris 10 下启动 xntpd 进程的方法。

ntp 的配置文件位于: /etc/inet/ntp.conf

检查 xntp 依赖的服务或者资源:

#svcs -l svc:/network/ntp:default

如果所有的资源都存在,则运行以下命令:
# svcadm enable svc:/network/ntp
# svcadm refresh svc:/network/ntp
# svcadm restart svc:/network/ntp

这样 xntp 就已经重新启动了。

# svcs | grep ntp
online 14:31:24 svc:/network/ntp:default

# ps -ef|grep ntp
/usr/lib/inet/xntpd

Solaris 内置诊断工具汇总

由 徐永久 发表于 2006年02月18日 20:30。

Solaris 10 已经出来了, OpenSolaris 大有逼退 Linux 的意思。本文是我工作中需要使用的一些工具汇总,可能并不是全面或者并非准确。

(more…)

几种 Unix 程序的日期格式表达方法

由 徐永久 发表于 2005年02月25日 14:29。

一、find 命令
%Ck 按照文件状态修改时间,用指定的格式符 k 表示。
%Tk 按照文件最近修改时间,用指定的格式符 k 表示。
%Ak 按照最近存取时间,用指定的格式符 k 表示。

(more…)

Sun 认证考试是否合适于你?

由 徐永久 发表于 2002年01月28日 14:55。

自从上个世纪的 90年代早期, Sun 的 Solaris 就成为市场上最受欢迎的 Unix 操作系统,从 Solaris 2.4 开始, Sun 就提供 SCSA 认证。 Solaris 已经走过了 2.4,2.5,2.6,7以及现在的 8 等几个版本。SCSA 日渐流行,而且它是 SCNA 的基础。

对于 Solaris 8系列而言,考生必须通过两门考试才能得到 SCSA 证书,这两门考试是: 310-011 和 310-012 ,分别是系统管理 I 和 II 。虽然 I 和 II 的考试顺序随便你自己决定,但是仅仅通过其中一门都不能成为 SCSA。

考试有多重选择,填空,拉放几种形式,在各大思而文或者 VUE 的考点都可以参加,考试费为 150 美元,合人民币 1250 元。考试时间为 90 分钟,系统管理 I 有 57 道题目,通过的分数为 66%,系统管理 II 有61 道题目,通过分数为 70%。

一旦通过这两门课程,就可以朝 SCNA 的认证发展了。这门课的考试号为 310-043,和系统管理不同的是,这门课有 58 道题目,通过的分数为 67%,时间为 120 分钟。

认证有什么用呢?恐怕国内的用处不是很大。但是从学习的角度而言, Solaris 作为一种 Unix 系统和其他 Unix 系统是相通的,因此,在学习诸如 AIX ,System V 的其他变种时,就能很快上手。

可笑的是,笔者的 Unix ,就是先熟悉 SCO Unix ,然后 Linux ,然后才是 Solaris,等到接触过 Solaris 后,它的神秘感就完全消失了。

系统管理 I 包含的主要内容:

Solaris features
User administration
System security
Directories and files
Device configuration
Disk administration
The Solaris UFS filesystem
Filesystem administration
Process scheduling
Print administration
The boot PROM
System initialization
Software installation
Software patch administration
Backup and recovery

系统管理 II 包含:

Solaris Networking (TCP/IP, OSI Layers)
Syslog
Virtual disk management
Swap space
NFS
CacheFS
Automount
Name service
NIS
Solstice AdminSuite
JumpStart

学习的资源,最有用的当然是参加 Sun 的培训,可是十分的昂贵(每门课一万人民币左右,8-9千),所以,如果有一定的 Unix 基础的话,完全没有必要花这些冤枉钱。(希望 Sun 培训中心的那位大姐不要骂我哦!)

网络上有很多的 Braindump 的东西,yahoo 里面有个 solarisdigest@yahoogroups.com 的邮件列表,里面的东东还是很不错的。当然最省力的就是用 man 来学习,应该是最权威的了。

国内的市面上,没有几本关于 Solaris 的图书,如果有的话,我想翻译的质量也是可以预料的。如果要考试,就没有必要看翻译以后的资料,因为考试本身就是 E 文的么。

我的观点来看,SCSA I 和 II 的考试,还是很简单的。就像准备任何技术认证一样,有了第一手的实际工作经验,加上那些 BrainDump 的资料,拿这样的证书并不是一件困难的事情。因为本身考试内容是不超出培训教材范围的。而培训教材还是可以流传的。

这就是我要把培训教材出售这么贵的原因。因为,我可以保证你通过。

前提是你必须有一年以上的 Unix/Linux 经验!这就是本文想要说的主题。

Why I get Solaris Certification?

由 徐永久 发表于 2001年11月25日 13:41。

The first time I decided to get MCSE was Nov. 1998, I then got certificated in May, 1999. And I know much about TCP/IP in Windows environment, including DHCP/WINS/DNS/NETBEUI. This time I got Solaris 8 , I know NFS/NIS/Cache FS/Auto FS/Pseudo FS/Jumpstart.

I am glad that my knowledge depositing day by day. I am sure I can benefit much from the ongoing Oracle training course. All the knowledge make me know them all more clear and know Linux’s disadvantage more. That’s not mean I will give up Linux, on the contrary, I always believe Linux can step into enterprise gradually, and at last dominant the enterprise server market(from low end to high end).
(more…)