mikros/rust - rust - Gitea.pterpstra.com

Author	SHA1	Message	Date
bors	f7c4359a2c	auto merge of #8237 : blake2-ppc/rust/faster-utf8, r=brson Use unchecked vec indexing since the vector bounds are checked by the loop. Iterators are not easy to use in this case since we skip 1-4 bytes each lap. This part of the commit speeds up is_utf8 for ASCII input. Check codepoint ranges by checking the byte ranges manually instead of computing a full decoding for multibyte encodings. This is easy to read and corresponds to the UTF-8 syntax in the RFC. No changes to what we accept. A comment notes that surrogate halves are accepted. Before: test str::bench::is_utf8_100_ascii ... bench: 165 ns/iter (+/- 3) test str::bench::is_utf8_100_multibyte ... bench: 218 ns/iter (+/- 5) After: test str::bench::is_utf8_100_ascii ... bench: 130 ns/iter (+/- 1) test str::bench::is_utf8_100_multibyte ... bench: 156 ns/iter (+/- 3) An improvement upon the previous pull #8133	2013-08-04 07:10:56 -07:00
Daniel Micay	1008945528	remove obsolete `foreach` keyword this has been replaced by `for`	2013-08-03 22:48:02 -04:00
blake2-ppc	0504d7e57b	std: Speed up str::is_utf8 Use unchecked vec indexing since the vector bounds are checked by the loop. Iterators are not easy to use in this case since we skip 1-4 bytes each lap. This part of the commit speeds up is_utf8 for ASCII input. Check codepoint ranges by checking the byte ranges manually instead of computing a full decoding for multibyte encodings. This is easy to read and corresponds to the UTF-8 syntax in the RFC. No changes to what we accept. A comment notes that surrogate halves are accepted. Before: test str::bench::is_utf8_100_ascii ... bench: 165 ns/iter (+/- 3) test str::bench::is_utf8_100_multibyte ... bench: 218 ns/iter (+/- 5) After: test str::bench::is_utf8_100_ascii ... bench: 130 ns/iter (+/- 1) test str::bench::is_utf8_100_multibyte ... bench: 156 ns/iter (+/- 3)	2013-08-02 23:20:57 +02:00
Kevin Ballard	aa94dfa625	str: Add method .into_owned(self) -> ~str to Str The method .into_owned() is meant to be used as an optimization when you need to get a ~str from a Str, but don't want to unnecessarily copy it if it's already a ~str. This is meant to ease functions that look like fn foo<S: Str>(strs: &[S]) Previously they could work with the strings as slices using .as_slice(), but producing ~str required copying the string, even if the vector turned out be a &[~str] already.	2013-08-01 15:54:58 -07:00
blake2-ppc	78cde5b9fb	std: Change `Times` trait to use `do` instead of `for` Change the former repetition:: for 5.times { } to:: do 5.times { } .times() cannot be broken with `break` or `return` anymore; for those cases, use a numerical range loop instead.	2013-08-01 16:54:22 +02:00
Daniel Micay	1fc4db2d08	migrate many `for` loops to `foreach`	2013-08-01 05:34:55 -04:00
Daniel Micay	dabd476203	make `in` and `foreach` get treated as keywords	2013-08-01 00:21:13 -04:00
blake2-ppc	8f9014c159	std: Mark the static constants in str.rs as private static variables are pub by default, which is not reflected in our code (we need to use priv).	2013-07-30 19:34:54 +02:00
blake2-ppc	aa89325cb0	std: Add from_bytes test for utf-8 using codepoints above 0xffff	2013-07-30 19:16:12 +02:00
blake2-ppc	b4ff95599a	std: Deny overlong encodings in UTF-8 An 'overlong encoding' is a codepoint encoded non-minimally using the utf-8 format. Denying these enforce each codepoint to have only one valid representation in utf-8. An example is byte sequence 0xE0 0x80 0x80 which could be interpreted as U+0, but it's an overlong encoding since the canonical form is just 0x00. Another example is 0xE0 0x80 0xAF which was previously accepted and is an overlong encoding of the solidus "/". Directory traversal characters like / and . form the most compelling argument for why this commit is security critical. Factor out common UTF-8 decoding expressions as macros. This commit will partly duplicate UTF-8 decoding, so it is now present in both fn is_utf8() and .char_range_at(); the latter using an assumption of a valid str.	2013-07-30 19:16:12 +02:00
blake2-ppc	6dd185930d	std: Disallow bytes 0xC0, 0xC1 (192, 193) in utf-8 Bytes 0xC0, 0xC1 can only be used to start 2-byte codepoint encodings, that are 'overlong encodings' of codepoints below 128. The reference given in a comment -- https://tools.ietf.org/html/rfc3629 -- does in fact already exclude these bytes, so no additional comment should be needed in the code.	2013-07-30 17:25:29 +02:00
bors	576f395ddf	auto merge of #8121 : thestinger/rust/offset, r=alexcrichton Closes #8118, #7136 ~~~rust extern mod extra; use std::vec; use std::ptr; fn bench_from_elem(b: &mut extra::test::BenchHarness) { do b.iter { let v: ~[u8] = vec::from_elem(1024, 0u8); } } fn bench_set_memory(b: &mut extra::test::BenchHarness) { do b.iter { let mut v: ~[u8] = vec::with_capacity(1024); unsafe { let vp = vec::raw::to_mut_ptr(v); ptr::set_memory(vp, 0, 1024); vec::raw::set_len(&mut v, 1024); } } } fn bench_vec_repeat(b: &mut extra::test::BenchHarness) { do b.iter { let v: ~[u8] = ~[0u8, ..1024]; } } ~~~ Before: test bench_from_elem ... bench: 415 ns/iter (+/- 17) test bench_set_memory ... bench: 85 ns/iter (+/- 4) test bench_vec_repeat ... bench: 83 ns/iter (+/- 3) After: test bench_from_elem ... bench: 84 ns/iter (+/- 2) test bench_set_memory ... bench: 84 ns/iter (+/- 5) test bench_vec_repeat ... bench: 84 ns/iter (+/- 3)	2013-07-30 07:01:19 -07:00
Marvin Löbel	e33fca9ffe	Added str::char_offset_iter() and str::rev_char_offset_iter() Renamed bytes_iter to byte_iter to match other iterators Refactored str Iterators to use DoubleEnded Iterators and typedefs instead of wrapper structs Reordered the Iterator section Whitespace fixup Moved clunky `each_split_within` function to the one place in the tree where it's actually needed Replaced all block doccomments in str with line doccomments	2013-07-30 12:55:48 +02:00
Daniel Micay	ef870d37a5	implement pointer arithmetic with GEP Closes #8118, #7136 ~~~rust extern mod extra; use std::vec; use std::ptr; fn bench_from_elem(b: &mut extra::test::BenchHarness) { do b.iter { let v: ~[u8] = vec::from_elem(1024, 0u8); } } fn bench_set_memory(b: &mut extra::test::BenchHarness) { do b.iter { let mut v: ~[u8] = vec::with_capacity(1024); unsafe { let vp = vec::raw::to_mut_ptr(v); ptr::set_memory(vp, 0, 1024); vec::raw::set_len(&mut v, 1024); } } } fn bench_vec_repeat(b: &mut extra::test::BenchHarness) { do b.iter { let v: ~[u8] = ~[0u8, ..1024]; } } ~~~ Before: test bench_from_elem ... bench: 415 ns/iter (+/- 17) test bench_set_memory ... bench: 85 ns/iter (+/- 4) test bench_vec_repeat ... bench: 83 ns/iter (+/- 3) After: test bench_from_elem ... bench: 84 ns/iter (+/- 2) test bench_set_memory ... bench: 84 ns/iter (+/- 5) test bench_vec_repeat ... bench: 84 ns/iter (+/- 3)	2013-07-30 02:50:31 -04:00
blake2-ppc	5307d3674e	std: Implement Extendable for hashmap, str and trie	2013-07-30 02:32:38 +02:00
blake2-ppc	4b45f47881	std: Rename Iterator adaptor types to drop the -Iterator suffix Drop the "Iterator" suffix for the the structs in std::iterator. Filter, Zip, Chain etc. are shorter type names for when iterator pipelines need their types written out in full in return value types, so it's easier to read and write. the iterator module already forms enough namespace.	2013-07-29 04:20:56 +02:00
blake2-ppc	4849a42bf6	std: Implement FromIterator for ~str FromIterator initially only implemented for Iterator<char>, which is the type of the main iterator.	2013-07-29 02:40:28 +02:00
jmgrosen	a0f0f3012e	Refactored vec and str iterators to remove prefixes	2013-07-28 13:37:35 -07:00
bors	5157e05049	auto merge of #8036 : sfackler/rust/container-impls, r=msullivan A couple of implementations of Container::is_empty weren't exactly self.len() == 0 so I left them alone (e.g. Treemap).	2013-07-27 11:16:31 -07:00
Alex Crichton	5aaaca0c6a	Consolidate raw representations of rust values This moves the raw struct layout of closures, vectors, boxes, and strings into a new `unstable::raw` module. This is meant to be a centralized location to find information for the layout of these values. As safe method, `repr`, is provided to convert a rust value to its raw representation. Unsafe methods to convert back are not provided because they are rarely used and too numerous to write an implementation for each (not much of a common pattern).	2013-07-26 09:53:03 -07:00
Steven Fackler	feb18fe8da	Added default impls for container methods A couple of implementations of Container::is_empty weren't exactly self.len() == 0 so I left them alone (e.g. Treemap).	2013-07-25 15:17:30 -07:00
bors	330378d1a1	auto merge of #7996 : erickt/rust/cleanup-strs, r=erickt This is a cleanup pull request that does: * removes `os::as_c_charp` * moves `str::as_buf` and `str::as_c_str` into `StrSlice` * converts some functions from `StrSlice::as_buf` to `StrSlice::as_c_str` * renames `StrSlice::as_buf` to `StrSlice::as_imm_buf` (and adds `StrSlice::as_mut_buf` to match `vec.rs`. * renames `UniqueStr::as_bytes_with_null_consume` to `UniqueStr::to_bytes` * and other misc cleanups and minor optimizations	2013-07-24 13:25:36 -07:00
Birunthan Mohanathas	d047cf1ec6	Change 'print(fmt!(...))' to printf!/printfln! in src/lib*	2013-07-24 09:45:20 -04:00
Erick Tryzelaar	9c3679a9a2	std: make str::append move self This eliminates a copy and fixes a FIXME.	2013-07-23 16:57:00 -07:00
Erick Tryzelaar	bbedbc0450	std: inline str::with_capacity and vec::with_capacity	2013-07-23 16:57:00 -07:00
Erick Tryzelaar	cced3c9013	std: simplify str::as_imm_buf and vec::as_{imm,mut}_buf	2013-07-23 16:57:00 -07:00
Erick Tryzelaar	037a5b1af4	str: move as_mut_buf into OwnedStr, and make it `self`	2013-07-23 16:56:58 -07:00
Erick Tryzelaar	31b77aecfc	std: remove str::to_owned and str::raw::slice_bytes_owned	2013-07-23 16:56:23 -07:00
Erick Tryzelaar	cc9666f68f	std: rename str.as_buf to as_imm_buf, add str.as_mut_buf	2013-07-23 16:56:22 -07:00
Erick Tryzelaar	cf75330807	std: add test for str::as_c_str	2013-07-23 16:56:22 -07:00
Erick Tryzelaar	7af56bb921	std: move StrUtil::as_c_str into StrSlice	2013-07-23 16:56:22 -07:00
Erick Tryzelaar	9fdec67a67	std: move str::as_buf into StrSlice	2013-07-23 16:56:22 -07:00
Erick Tryzelaar	9ad815e063	std: rename str.as_bytes_with_null_consume to str.to_bytes_with_null	2013-07-23 16:56:17 -07:00
Graydon Hoare	978e5d94bc	std: wrap "long" utf8 lines.	2013-07-23 16:02:14 -07:00
Graydon Hoare	e5cbede103	std: add preliminary str benchmark.	2013-07-22 16:56:10 -07:00
Daniel Micay	ed67cdb73c	new snapshot	2013-07-22 01:09:48 -04:00
bors	fe3f75ff8e	auto merge of #7932 : blake2-ppc/rust/str-clear, r=huonw ~str and @str need separate implementations for use in generic functions, where it will not automatically use the impl on &str. fixes issue #7900	2013-07-21 15:28:38 -07:00
blake2-ppc	24b6901b26	std: Implement Clone for VecIterator and iterators using it The theory is simple, the immutable iterators simply hold state variables (indicies or pointers) into frozen containers. We can freely clone these iterators, just like we can clone borrowed pointers. VecIterator needs a manual impl to handle the lifetime struct member.	2013-07-20 20:30:57 +02:00
blake2-ppc	3509f9d5ae	str: Implement Container for ~str, @str and Mutable for ~str ~str and @str need separate implementations for use in generic functions, where it will not automatically use the impl on &str.	2013-07-20 19:28:38 +02:00
Patrick Walton	99b33f7219	librustc: Remove all uses of "copy".	2013-07-17 14:57:51 -07:00
Daniel Micay	e118555ce6	remove headers from unique vectors	2013-07-15 23:57:27 -04:00
Gary Linscott	5aee5a11e3	Optimize is_utf8 Manually unroll the multibyte loops, and optimize for the single byte chars.	2013-07-11 14:23:15 -04:00
Gary Linscott	179637304a	char_range_at perf work Moves multibyte code to it's own function to make char_range_at easier to inline, and faster for single and multibyte chars. Benchmarked reading example.json 100 times, 1.18s before, 1.08s after.	2013-07-11 14:23:14 -04:00
bors	30c8aac677	auto merge of #7612 : thestinger/rust/utf8, r=huonw	2013-07-08 16:10:53 -07:00
Daniel Micay	44770ae3a8	Merge pull request #7595 from thestinger/iterator remove some method resolve workarounds	2013-07-08 01:42:07 -07:00
Daniel Micay	641aec7407	remove some method resolve workarounds	2013-07-07 19:51:13 -04:00
Daniel Micay	01833de7ea	remove extra::rope It's broken/unmaintained and needs to be rewritten to avoid managed pointers and needless copies. A full rewrite is necessary and the API will need to be redone so it's not worth keeping this around. Closes #2236, #2744	2013-07-06 17:06:30 -04:00
Daniel Micay	51eb1e14d4	str: stop encoding invalid out-of-range `char`	2013-07-05 21:07:37 -04:00
Huon Wilson	cdea73cf5b	Convert vec::{as_imm_buf, as_mut_buf} to methods.	2013-07-04 00:46:50 +10:00
Huon Wilson	c437a16c5d	rustc: add a lint to enforce uppercase statics.	2013-07-01 17:52:57 +10:00

1 2 3

136 Commits