If poll times out it will return 0 (no data to read on socket). In
165e5abdbf we changed tcp_read_wait from
returning -1 to returning -errno, which means we return 0 instead of -1
in this case.
This makes tcp_read() get into an infinite loop by repeatedly trying to
read from the socket and getting EAGAIN.
Fix by explicitly checking for a 0 return from poll(2) and returning
EAGAIN in that case.
Fixes: http://tracker.ceph.com/issues/18184
Signed-off-by: Sage Weil <sage@redhat.com>
The submit_log_entries machinery depends on the destructor for the
functor cleaning up after itself to handle cancelation. I could have
introduced a local intrusive_ptr and captured that instead, but this is
slightly less magic.
Fixes: http://tracker.ceph.com/issues/18180
Signed-off-by: Samuel Just <sjust@redhat.com>
The test still fails even after being enabled:
2016-12-07T18:00:44.337 INFO:teuthology.orchestra.run.mira105:Running: 'mpiexec -f /home/ubuntu/cephtest/mpi-hosts -wdir /home/ubuntu/cephtest/gmnt sudo /home/ubuntu/cephtest/fsx-mpi -o 1MB -N 50000 -p 10000 -l 1048576 /home/ubuntu/cephtest/gmnt/test'
2016-12-07T18:00:44.486 INFO:teuthology.orchestra.run.mira105.stderr:Warning: Permanently added '172.21.8.122' (ECDSA) to the list of known hosts.
2016-12-07T18:00:44.571 INFO:teuthology.orchestra.run.mira105.stdout:skipping zero size read
2016-12-07T18:00:44.591 INFO:teuthology.orchestra.run.mira105.stdout:truncating to largest ever: 0x7cccb
2016-12-07T18:00:44.606 INFO:teuthology.orchestra.run.mira083:Running: 'sudo logrotate /etc/logrotate.d/ceph-test.conf'
2016-12-07T18:00:44.611 INFO:teuthology.orchestra.run.mira100:Running: 'sudo logrotate /etc/logrotate.d/ceph-test.conf'
2016-12-07T18:00:44.614 INFO:teuthology.orchestra.run.mira105:Running: 'sudo logrotate /etc/logrotate.d/ceph-test.conf'
2016-12-07T18:00:44.887 INFO:teuthology.orchestra.run.mira105.stdout:skipping zero size read
2016-12-07T18:00:44.954 INFO:teuthology.orchestra.run.mira105.stdout:Size error: expected 0xa6f7c stat 0xd4000 seek 0xd5000
2016-12-07T18:00:44.954 INFO:teuthology.orchestra.run.mira105.stdout:LOG DUMP (2 total operations):
2016-12-07T18:00:44.954 INFO:teuthology.orchestra.run.mira105.stdout:1(1 mod 256): SKIPPED (no operation)
2016-12-07T18:00:44.954 INFO:teuthology.orchestra.run.mira105.stdout:2(2 mod 256): WRITE 0x1c748 thru 0xa6f7b (0x8a834 bytes) HOLE
2016-12-07T18:00:44.990 INFO:teuthology.orchestra.run.mira105.stdout:Correct content saved for comparison
2016-12-07T18:00:44.990 INFO:teuthology.orchestra.run.mira105.stdout:(maybe hexdump "/home/ubuntu/cephtest/gmnt/test" vs "/home/ubuntu/cephtest/gmnt/test.fsxgood")
2016-12-07T18:00:45.000 INFO:teuthology.orchestra.run.mira105.stdout:
2016-12-07T18:00:45.000 INFO:teuthology.orchestra.run.mira105.stdout:===================================================================================
2016-12-07T18:00:45.000 INFO:teuthology.orchestra.run.mira105.stdout:= BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
2016-12-07T18:00:45.000 INFO:teuthology.orchestra.run.mira105.stdout:= EXIT CODE: 120
2016-12-07T18:00:45.000 INFO:teuthology.orchestra.run.mira105.stdout:= CLEANING UP REMAINING PROCESSES
2016-12-07T18:00:45.000 INFO:teuthology.orchestra.run.mira105.stdout:= YOU CAN IGNORE THE BELOW CLEANUP MESSAGES
2016-12-07T18:00:45.000 INFO:teuthology.orchestra.run.mira105.stdout:===================================================================================
2016-12-07T18:00:45.000 INFO:teuthology.orchestra.run.mira105.stderr:[proxy:0:0@mira105] HYD_pmcd_pmip_control_cmd_cb (./pm/pmiserv/pmip_cb.c:886): assert (!closed) failed
2016-12-07T18:00:45.000 INFO:teuthology.orchestra.run.mira105.stderr:[proxy:0:0@mira105] HYDT_dmxu_poll_wait_for_event (./tools/demux/demux_poll.c:77): callback returned error status
2016-12-07T18:00:45.001 INFO:teuthology.orchestra.run.mira105.stderr:[proxy:0:0@mira105] main (./pm/pmiserv/pmip.c:206): demux engine error waiting for event
2016-12-07T18:00:45.002 INFO:teuthology.orchestra.run.mira105.stderr:[mpiexec@mira105] HYDT_bscu_wait_for_completion (./tools/bootstrap/utils/bscu_wait.c:76): one of the processes terminated badly; aborting
2016-12-07T18:00:45.002 INFO:teuthology.orchestra.run.mira105.stderr:[mpiexec@mira105] HYDT_bsci_wait_for_completion (./tools/bootstrap/src/bsci_wait.c:23): launcher returned error waiting for completion
2016-12-07T18:00:45.002 INFO:teuthology.orchestra.run.mira105.stderr:[mpiexec@mira105] HYD_pmci_wait_for_completion (./pm/pmiserv/pmiserv_pmci.c:217): launcher returned error waiting for completion
2016-12-07T18:00:45.002 INFO:teuthology.orchestra.run.mira105.stderr:[mpiexec@mira105] main (./ui/mpich/mpiexec.c:331): process manager error waiting for completion
I am not sure what the cause is. I'm leaving the test disabled for now and merging this PR.
Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
When we hold exclusive auth caps, then the client is responsible for
handling changes to the mode. Make sure we remove any setuid/setgid
bits on an ownership change.
Signed-off-by: Jeff Layton <jlayton@redhat.com>
If we get a ownership change, POSIX mandates that you clear the
setuid and setgid bits unless you are "appropriately privileged", in
which case the OS is allowed to leave them intact.
Linux however always clears those bits, regardless of the process
privileges, as that makes it simpler to close some potential races.
Have ceph do the same.
Signed-off-by: Jeff Layton <jlayton@redhat.com>
The test case is not stable due to racing console output. This
results in spurious failures.
Fixes: http://tracker.ceph.com/issues/10773
Signed-off-by: Jason Dillaman <dillaman@redhat.com>
Make this easy: write a singel yaml that does the hammer install,
some limited work, then upgardes to jewel. Copy it from the
parallel suite. Then, symlink all of the rest from the jewel-x
stress-split suite.
Signed-off-by: Sage Weil <sage@redhat.com>
Previously relied on client being able to unmount
while the MDS was offline, which is not necessarily
so. Use kill instead.
Signed-off-by: John Spray <john.spray@redhat.com>
As per Sam Just advice, remove the EXPECT_DEATH tests to avoid
intermittent hang because they do not play well with threads.
Fixes: http://tracker.ceph.com/issues/18030
Signed-off-by: Loic Dachary <loic@dachary.org>
* find_package(keyutils REQUIRED) if (WITH_LIBCEPHFS OR WITH_RBD)
prior to this change, we detect keyutils if the building platform is not
FreeBSD, we should instead check the WITH_* options, and let the
maintainer to decided what is the best for his/her platform, and error
out if the building host cannot fulfill the requirement to build the
asserts.
* build krbd.cc if (WITH_RBD)
Signed-off-by: Kefu Chai <kchai@redhat.com>
"start" is used to calculate the global bluestore commit latency
and hence shall not be updated at each internal state enter/exit.
Otherwise the l_bluestore_commit_lat counter won't reflect the
real commit latency precisely.
Signed-off-by: xie xingguo <xie.xingguo@zte.com.cn>
The libcephfs tests are negatively affected by other mounts. This commit
adds a kclient disable in addition to the ceph-fuse one.
Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
This commit synchronizes the multimds suite with the fs suite. The
basic/verify sub-suites now do the same tests except with different
cluster layouts (i.e. multiple actives). This is mostly accomplished by
symlinking parts of each sub-suite to its counterpart in the fs suite.
This commit also does a few things of note to the prior multimds suite:
o Turn on directory fragmentation.
o Add several tests from fs/basic/tasks to multimds/basic.
o Remove libcephfs as fs/basic/tasks already contain
multimds/basic/tasks.
Prior implementation and discussion are in PR#1114: https://github.com/ceph/ceph-qa-suite/pull/1114
Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
queue_op will check the op epoch with current osdmap epoch,
and then, the op will be push into waiting_for_map or mark
queued flag.
but when the op pop from waiting_for_map, take_op_map_waiters
forget to mark queued flag before handle it.
Signed-off-by: Yunchuan Wen <yunchuan.wen@kylin-cloud.com>
Previously this assumed it was running with exactly two MDS
daemons. When there were more, it would fail to execute
"fs reset" because the extra daemons were active in
the map.
Signed-off-by: John Spray <john.spray@redhat.com>