Sage Weil
2e1edef3ff
os/bluestore/BlueFS: fix replay of unlink
...
Signed-off-by: Sage Weil <sage@redhat.com>
2016-01-01 13:06:57 -05:00
Sage Weil
3745afb4c8
os/bluestore: support second block.wal device
...
Use this device for the bluefs log.
Signed-off-by: Sage Weil <sage@redhat.com>
2016-01-01 13:06:57 -05:00
Sage Weil
02605a6612
os/bluestore/BlueStore: fix zero gap bug
...
Signed-off-by: Sage Weil <sage@redhat.com>
2016-01-01 13:06:57 -05:00
Sage Weil
9f114ac24b
os/bluestore: disable overlay for now
...
Signed-off-by: Sage Weil <sage@redhat.com>
2016-01-01 13:06:57 -05:00
Sage Weil
b48798787d
os/bluestore/BlockDevice: restructure interface
...
use atomics, do not track in-flight extents or magically cope
with racing ios (that is the users responsibility).
Signed-off-by: Sage Weil <sage@redhat.com>
2016-01-01 13:06:57 -05:00
Sage Weil
1727cebdae
os/bluestore/BlueFS: fix overwrite
...
Signed-off-by: Sage Weil <sage@redhat.com>
2016-01-01 13:06:56 -05:00
Sage Weil
13655fbb4a
os/bluestore/BlueFS: fix writes spanning extents
...
Signed-off-by: Sage Weil <sage@redhat.com>
2016-01-01 13:06:56 -05:00
Sage Weil
ccce793f60
os/bluestore: reenable rocksdb recycling
...
Signed-off-by: Sage Weil <sage@redhat.com>
2016-01-01 13:06:56 -05:00
Sage Weil
ef06380b9a
os/bluestore/BlockDevice: lock device while open
...
Signed-off-by: Sage Weil <sage@redhat.com>
2016-01-01 13:06:56 -05:00
Sage Weil
e3fd2795d0
os/bluestore/BlockDevice: debug read result
...
Signed-off-by: Sage Weil <sage@redhat.com>
2016-01-01 13:06:56 -05:00
Sage Weil
f6f4ed3dfc
os/bluestore/BlockDevice: fix alignment check
...
Signed-off-by: Sage Weil <sage@redhat.com>
2016-01-01 13:06:56 -05:00
Sage Weil
db754e7df3
os/bluestore/BlockDevice: check aio return values
...
Signed-off-by: Sage Weil <sage@redhat.com>
2016-01-01 13:06:56 -05:00
Sage Weil
e7cce09c4d
os/bluestore/BlueFS: avoid lock during reads
...
Signed-off-by: Sage Weil <sage@redhat.com>
2016-01-01 13:06:55 -05:00
Sage Weil
05be4c6c11
os/bluestore/BlueFS: prevent read+write sharing
...
Signed-off-by: Sage Weil <sage@redhat.com>
2016-01-01 13:06:55 -05:00
Sage Weil
9785bc9866
vstart.sh: debug bluefs and rocksdb
...
Signed-off-by: Sage Weil <sage@redhat.com>
2016-01-01 13:06:55 -05:00
Sage Weil
73adec4c98
os/bluestore/BlueFS: periodically compact log
...
Rewrite only the current metadata in a fresh log
periodically to free log space.
Signed-off-by: Sage Weil <sage@redhat.com>
2016-01-01 13:06:55 -05:00
Sage Weil
dd901498c9
os/bluestore/BlueFS: simplify extent list
...
Merge contiguous extents.
Signed-off-by: Sage Weil <sage@redhat.com>
2016-01-01 13:06:55 -05:00
Sage Weil
b073028528
os/bluestore/BlueFS: fix read
...
Signed-off-by: Sage Weil <sage@redhat.com>
2016-01-01 13:06:55 -05:00
Sage Weil
ac05b4c1c5
ceph_test_objectstore: trivial init fix
...
Signed-off-by: Sage Weil <sage@redhat.com>
2016-01-01 13:06:55 -05:00
Sage Weil
9341eec54d
kv/RocksDBStore: rocksdb_separate_wal_dir option
...
Signed-off-by: Sage Weil <sage@redhat.com>
2016-01-01 13:06:54 -05:00
Sage Weil
3649a80a89
os/bluestore/BlueFS: ref count BlueFS::File *
...
There are FileWriters that exist when the file is
deleted.
Signed-off-by: Sage Weil <sage@redhat.com>
2016-01-01 13:06:54 -05:00
Sage Weil
98485dee05
os/bluestore/BlueFS: readdir list dirs, too
...
Signed-off-by: Sage Weil <sage@redhat.com>
2016-01-01 13:06:54 -05:00
Sage Weil
b8630ee48c
ceph-bluefs-tool: simple tool to export bluefs content
...
Currently we just do a dump. We'll add more
functionality later.
Signed-off-by: Sage Weil <sage@redhat.com>
2016-01-01 13:06:54 -05:00
Sage Weil
2d0537853a
os/bluestore/BlueFS: many fixes
...
Signed-off-by: Sage Weil <sage@redhat.com>
2016-01-01 13:06:54 -05:00
Sage Weil
e4f6148c9f
os/bluestore/BlueStore: share space with BlueFS
...
Signed-off-by: Sage Weil <sage@redhat.com>
2016-01-01 13:06:54 -05:00
Sage Weil
653882c446
os/bluestore/BlockDevice: move to simple mutex model
...
Just for now, while we get the rest of this working.
Signed-off-by: Sage Weil <sage@redhat.com>
2016-01-01 13:06:53 -05:00
Sage Weil
dd04391706
os/bluestore/BlueFS: simple file system to back rocksdb
...
BlueFS is a simple file system that will back rocksdb.
BlueRocksEnv is the rocksdb::Env implementation that
glues them together.
Signed-off-by: Sage Weil <sage@redhat.com>
2016-01-01 13:06:53 -05:00
Sage Weil
6f5ac50171
ceph_test_objectstore: less verbose
...
Signed-off-by: Sage Weil <sage@redhat.com>
2016-01-01 13:06:53 -05:00
Sage Weil
226b3476a3
ceph_test_objectstore: less verbose on hash collision test
...
Signed-off-by: Sage Weil <sage@redhat.com>
2016-01-01 13:06:53 -05:00
Sage Weil
1b8d5b6068
os/bluestore/BlueStore: fix _do_read
...
Signed-off-by: Sage Weil <sage@redhat.com>
2016-01-01 13:06:53 -05:00
Sage Weil
1ffd5e6963
os/bluestore/StupidAllocator: fix locking
...
Signed-off-by: Sage Weil <sage@redhat.com>
2016-01-01 13:06:53 -05:00
Sage Weil
14460484ff
os/bluestore/StupidAllocator: fix misc bugs
...
Can't use invalid iterator; fix init_rm_free.
Signed-off-by: Sage Weil <sage@redhat.com>
2016-01-01 13:06:52 -05:00
Sage Weil
08a94d95e1
os/bluestore/Allocator: init_rm_free
...
Signed-off-by: Sage Weil <sage@redhat.com>
2016-01-01 13:06:52 -05:00
Sage Weil
65f720ae9d
kv/RocksDBStore: take custom Env
...
Signed-off-by: Sage Weil <sage@redhat.com>
2016-01-01 13:06:52 -05:00
Sage Weil
a869f92fac
os/bluestore: fix _do_read return value
...
Signed-off-by: Sage Weil <sage@redhat.com>
2016-01-01 13:06:52 -05:00
Sage Weil
d704628cab
os/bluestore/BlockDevice: fix read return value
...
Signed-off-by: Sage Weil <sage@redhat.com>
2016-01-01 13:06:52 -05:00
Sage Weil
9d01b8df9a
os/bluestore: separate Allocator from freelist storage
...
FreelistManager perists our freelist. Allocator is a policy that
allocates it.
Signed-off-by: Sage Weil <sage@redhat.com>
2016-01-01 13:06:52 -05:00
Sage Weil
a62ffb0d03
newstore -> bluestore
...
Signed-off-by: Sage Weil <sage@redhat.com>
2016-01-01 13:06:52 -05:00
Sage Weil
3a4d583f85
os/newstore: always create db.wal
...
Signed-off-by: Sage Weil <sage@redhat.com>
2016-01-01 13:06:32 -05:00
Sage Weil
ad9f9fad01
os/newstore: create db dir
...
Signed-off-by: Sage Weil <sage@redhat.com>
2016-01-01 13:06:17 -05:00
Sage Weil
5658665ce7
os/newstore: consume a raw block device
...
Signed-off-by: Sage Weil <sage@redhat.com>
2016-01-01 13:06:17 -05:00
Sage Weil
32e768391f
os/newstore: make collection_list tolerate sloppy start position
...
Because of this change (#6076 ), the hobject_t will contain pool id, hence
the ghobject_t having this hobject_t will be not equal to ghobject_t().
In newstore, this will cause assertion failure:
FAILED assert(k >= start_key && k < end_key)
The fix is to make compatible with previous change to create a
ghobject_t object with pool id and shard id in newstore.
Fixes : #13801
Reported-by: Zhi Zhang <zhangz.david@outlook.com>
Signed-off-by: Sage Weil <sage@redhat.com>
2016-01-01 13:05:18 -05:00
Sage Weil
2dae3df8af
os/newstore: make key names more efficient
...
- pack u32 and u64 in binary (instead of in hex)
- avoid duplicating the object name while making things still
sort by (key,name). Use < when key < name, = when key == name,
> when key > name) as a prefix. And in the = case (which is
basically always) include the name just once.
Note that this breaks on-disk compatibility.
Signed-off-by: Sage Weil <sage@redhat.com>
2016-01-01 13:05:18 -05:00
Sage Weil
5e566dd7cb
os/newstore: fix collection_list vs max entries
...
Signed-off-by: Sage Weil <sage@redhat.com>
2016-01-01 13:05:18 -05:00
Sage Weil
84646ab1c2
os/newstore: do not set/change frag_size if there are overlays
...
Signed-off-by: Sage Weil <sage@redhat.com>
2016-01-01 13:05:18 -05:00
Sage Weil
9291e1682a
os/newstore: define a fid_backpointer_t type
...
Signed-off-by: Sage Weil <sage@redhat.com>
fix wal_oP_t
2016-01-01 13:05:17 -05:00
Sage Weil
b2db842e4d
os/newstoer: add newstore types to ceph-dencoder
...
Signed-off-by: Sage Weil <sage@redhat.com>
2016-01-01 13:05:17 -05:00
Sage Weil
0af0dbdc14
os/newstore: set alloc hint on new frags
...
Signed-off-by: Sage Weil <sage@redhat.com>
2016-01-01 13:05:17 -05:00
Sage Weil
f0f815fb9e
os/newstore: dump onode contents
...
Signed-off-by: Sage Weil <sage@redhat.com>
2016-01-01 13:05:17 -05:00
Sage Weil
299350461b
os/newstore: fixed fragment size
...
Instead of a single, variable-length fragment for each object,
set a fixed size (newstore_min_frag_size = 1 MB) and stripe the
object over these. The last fragment will be smaller
than 1 MB if the object is not a multiple of 1 MB.
On write, this is basically free: we can just as cheaply write
4 inodes created together and fsync them than we can one. On
overwrite, it allows us to replace individual fragments and avoid
write-ahead many cases.
On read it is a bit slower because of inode lookups and disk
seeks. In the common case (big object written sequentially) we
hope that fs prefetching will hide most of it (e.g., all inodes
will be loaded together in the same metadata btree node, and the
files' data is written sequentially on disk).
Allowing for a singe large fragment in the case of a sequentially
written large object may save us something, but it complicates
the code significantly.
Signed-off-by: Sage Weil <sage@redhat.com>
2016-01-01 13:05:17 -05:00